
Quick Start
Installation
To use evolearn, first install it using pip:
(.venv) $ pip install evolearn
Genetic Hyperparameter Tuning CV
To perform hyperparameter tuning using genetic algoritm, you need to first import other modules from
evolearn.hyperparameter_tuning.initialization
evolearn.hyperparameter_tuning.evaluation
evolearn.hyperparameter_tuning.selection
evolearn.hyperparameter_tuning.mating
evolearn.hyperparameter_tuning.reproduction
evolearn.hyperparameter_tuning.mutation
evolearn.hyperparameter_tuning.environment
(optional)evolearn.hyperparameter_tuning.genetic_hyperparameter_tuning
Although the modules from environment
are optional for you to determine to
use them in your search or not, the searching might end up stopping early or not
finding the ideal results. These modules can help to prevent pre-mature convergence
and also control other hyperparameters for GA.
For example:
>>> from evolearn.hyperparameter_tuning.initialization import Genes
>>> from evolearn.hyperparameter_tuning.evaluation import FitnessFunction
>>> from evolearn.hyperparameter_tuning.selection import (RankSelection,
RouletteWheelSelection,
SteadyStateSelection,
TournamentSelection,
StochasticUniversalSampling,
BoltzmannSelection
)
>>> from evolearn.hyperparameter_tuning.mating import MatingFunction
>>> from evolearn.hyperparameter_tuning.reproduction import (KPointCrossover,
LinearCombinationCrossover,
FitnessProportionateAverage
)
>>> from evolearn.hyperparameter_tuning.mutation import (Boundary,
Shrink
)
>>> from evolearn.hyperparameter_tuning.environment import (AdaptiveReproduction,
AdaptiveMutation,
Elitism
)
>>> from evolearn.hyperparameter_tuning.genetic_hyperparameter_tuning import GenesSearchCV
>>> from sklearn.ensemble import RandomForestRegressor
>>> search_space_rf = {
'max_depth':(1, 16, 'uniform'),
'n_estimators':(100, 1000, 'uniform'),
'criterion':('squared_error', 'absolute_error', 'poisson')
}
>>> opt = GenesSearchCV(
n_gen=10,
initialization_fn=Genes(search_space=search_space_rf, pop_size=30),
fitness_fn=FitnessFunction(
estimator=RandomForestRegressor(n_jobs=-1),
cv=3,
scoring='neg_mean_absolute_error',
),
selection_fn=StochasticUniversalSampling(.7),
mating_fn=MatingFunction(increst_prevention=False),
reproduction_fn=KPointCrossover(1),
mutation_fn=Shrink(),
adaptive_population=AdaptiveReproduction(10),
elitism=Elitism(),
adaptive_mutation=AdaptiveMutation()
)
>>> opt.fit(X_train, y_train)
Max Fitness: -2023.200579609583
{'max_depth': 5, 'n_estimators': 561, 'criterion': 'absolute_error'}
The choices of selection_fn
, reproduction_fn
, mutation_fn
are
actually up to your personal preference. One can pick what they believe
are most benefit to their searching preocess.
Genetic Feature Selection
To perform feature selection using genetic algoritm, you need to first import other modules from
evolearn.feature_selection.initialization
evolearn.feature_selection.evaluation
evolearn.feature_selection.selection
evolearn.feature_selection.mating
evolearn.feature_selection.reproduction
evolearn.feature_selection.mutation
evolearn.feature_selection.environment
(optional)evolearn.feature_selection.genetic_hyperparameter_tuning
The modules looks similar to those modules from the
GenesSearchCV
section, but in fact their internal mechanisim
work slightly differently. You need to be ware of importing the
wrong modules when using genetic feature selection.
For example:
>>> from evolearn.feature_selection.initialization import Genes
>>> from evolearn.feature_selection.evaluation import FitnessFunction
>>> from evolearn.feature_selection.selection import (RankSelection,
RouletteWheelSelection,
SteadyStateSelection,
TournamentSelection,
StochasticUniversalSampling,
BoltzmannSelection
)
>>> from evolearn.feature_selection.mating import MatingFunction
>>> from evolearn.feature_selection.reproduction import KPointCrossover
>>> from evolearn.feature_selection.mutation import (BitStringMutation,
ExchangeMutation,
ShiftMutation
)
>>> from evolearn.feature_selection.environment import (AdaptiveReproduction,
AdaptiveMutation,
Elitism
)
>>> from evolearn.feature_selection.genetic_feature_selection import GeneticFeatureSelection
>>> from sklearn.ensemble import RandomForestRegressor
>>> opt = GeneticFeatureSelection(
n_gen=10,
initialization_fn=Genes(pop_size=50),
fitness_fn=FitnessFunction(
estimator=RandomForestRegressor(n_jobs=-1),
cv=3,
scoring='neg_mean_absolute_error'
),
selection_fn=RouletteWheelSelection(.7),
mating_fn=MatingFunction(),
reproduction_fn=KPointCrossover(k=4),
mutation_fn=BitStringMutation(),
adaptive_population=None,
elitism=None,
adaptive_mutation=None
)
>>> opt.fit(X_train, y_train)
>>> print(opt.best_fitness_)
>>> print(opt.best_params_)
-2797.7245589631652
{'age': True, 'sex': False, 'bmi': True, 'children': True, 'smoker': True, 'region': False}
Hyperparameter Tuning
GenesSearchCV
n_gen
: intMaximum number of generation (or loop) GenesSearchCV will run.
initialization_fn
Class object to generate solution candidates.
fitness_fn
Class object to evalute the fitness of solution candidates.
selection_fn
Class object to evalute the fitness of solution candidates.
- Can either be:
- hyperparameter_tuning.selection.RankSelection,
hyperparameter_tuning.selection.RouletteWheelSelection,
hyperparameter_tuning.selection.SteadyStateSelection,
hyperparameter_tuning.selection.TournamentSelection,
hyperparameter_tuning.selection.StochasticUniversalSampling,
hyperparameter_tuning.selection.BoltzmannSelection
mating_fn
Class object to pair the solution candidates for reproduction.
reproduction_fn
Class object to reproduce child population.
- Can either be
hyperparameter_tuning.reproduction.KPointCrossover,
hyperparameter_tuning.reproduction.LinearCombinationCrossover,
hyperparameter_tuning.reproduction.FitnessProportionateAverage
mutation_fn
Class object to mutate the child population.
- Can either be
hyperparameter_tuning.mutation.Boundary,
hyperparameter_tuning.mutation.Shrink
``adaptive_population``=None
Class object to adaptively change the mating rate of the mating_fn.
``elitism``=None
Class object to perform elites selection, ace comparison and elites’ traits induction.
``adaptive_mutation``=None
Class object to adaptively change the mutation probaility of the mutation_fn.
Initialization
Genes
search_space
: dictDefines the search range of the algorithm. Where keys are parameter names (strings) and values are int, float or str. Represents search spaceover parameters of the provided estimator.
pop_size
: intSize of the initial population.
Evaluation
FitnessFunction
estimator
: BaseEstimatorA object of that type is instantiated for each search point. This object is assumed to implement the scikit-learn estimator api. Either estimator needs to provide a
score
function, orscoring
must be passed.
cv
: int- cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 3-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
An object to be used as a cross-validation generator.
An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and y
is either binary or multiclass, StratifiedKFold
is used. In all other cases, KFold
is used.
scoring
: strcallable or None, default=None A string (see model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
. IfNone
, thescore
method of the estimator is used.
Selection
RankSelection
pct_survivors
: int, floatArgument that controls the number of survivors.
RouletteWheelSelection
pct_survivors
: int, floatArgument that controls the number of survivors.
SteadyStateSelection
elimination_ratio
: float [default=.3]Determine how many candidates are eliminated.
TournamentSelection
k
: int [default=2]Argument that controls the number of participants in each tourament.
preserve_remainders
: bool [default=True]If True, the remaining individuals not selected for tournament will survive the selection process.
StochasticUniversalSampling
pct_survivors
: int, floatArgument that controls the number of survivors.
BoltzmannSelection
pct_survivors
: floatArgument that controls the number of survivors.
T0
: int, floatInitial Temperature to calculate Boltzmann probability. A number between [5, 100].
a
: int, floatAlpha, a constant between [0, 1].
Mating
MatingFunction
cr_proba
: int, float [default=1]Percentage of survived population. Determines how many couples are paired during mating.
increst_prevention
: bool [default=True]If True, solution candidates sharing the same parents will be paired together.
Reproduction
KPointCrossover
k
: intNumber of times of the chromosomes being splitted.
c_pt
: int, str [default=’random’]If int, c_pt will be the position index of the splitting points. If str, the splitting point location where be randomly determined. If ‘random’, the splitting point will be randomly picked.
LinearCombinationCrossover
* a
: float
Alpha, a constant to determine the scale of combinations.
FitnessProportionateAverage No parameters required to instantiate.
Mutation
Boundary
* epsilon
: float [default=.15]
Mutation rate that determines if genes will mutate or not.
Shrink
* epsilon
: float [default=.15]
Mutation rate that determines if genes will mutate or not.
prior
: str [default=’normal’]Determines the probability distribution of sampling.
Environment
AdaptiveReproduction
pop_cap
: int [default=None]Maximum population size.
AdaptiveMutation
a
: int, float [default=.2]Alpha, a constant to adjust the self-adaptive mutation rate.
Elitism
pct
: int, float [default=.05]Percentage of population being selected as elites.
Feature Selection
GeneticFeatureSelectionCV
n_gen
: intMaximum number of generation (or loop) GenesSearchCV will run.
initialization_fn
Class object to generate solution candidates.
fitness_fn
Class object to evalute the fitness of solution candidates.
selection_fn
Class object to evalute the fitness of solution candidates.
- Can either be:
- optimization.selection.RankSelection,
optimization.selection.RouletteWheelSelection,
optimization.selection.SteadyStateSelection,
optimization.selection.TournamentSelection,
optimization.selection.StochasticUniversalSampling,
optimization.selection.BoltzmannSelection
mating_fn
Class object to pair the solution candidates for reproduction.
reproduction_fn
Class object to reproduce child population.
- Can either be
optimization.reproduction.KPointCrossover,
optimization.reproduction.LinearCombinationCrossover,
optimization.reproduction.FitnessProportionateAverage
mutation_fn
Class object to mutate the child population.
- Can either be
optimization.mutation.Boundary,
optimization.mutation.Shrink
``adaptive_population``=None
Class object to adaptively change the mating rate of the mating_fn.
``elitism``=None
Class object to perform elites selection, ace comparison and elites’ traits induction.
``adaptive_mutation``=None
Class object to adaptively change the mutation probaility of the mutation_fn.
Initialization
Genes
search_space
: dictDefines the search range of the algorithm. Where keys are parameter names (strings) and values are int, float or str. Represents search spaceover parameters of the provided estimator.
pop_size
: intSize of the initial population.
Evaluation
FitnessFunction
estimator
: BaseEstimatorA object of that type is instantiated for each search point. This object is assumed to implement the scikit-learn estimator api. Either estimator needs to provide a
score
function, orscoring
must be passed.
cv
: int- cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 3-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
An object to be used as a cross-validation generator.
An iterable yielding train, test splits.
For integer/None inputs, if the estimator is a classifier and y
is either binary or multiclass, StratifiedKFold
is used. In all other cases, KFold
is used.
scoring
: strcallable or None, default=None A string (see model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
. IfNone
, thescore
method of the estimator is used.
Selection
RankSelection
pct_survivors
: int, floatArgument that controls the number of survivors.
RouletteWheelSelection
pct_survivors
: int, floatArgument that controls the number of survivors.
SteadyStateSelection
elimination_ratio
: float [default=.3]Determine how many candidates are eliminated.
TournamentSelection
k
: int [default=2]Argument that controls the number of participants in each tourament.
preserve_remainders
: bool [default=True]If True, the remaining individuals not selected for tournament will survive the selection process.
StochasticUniversalSampling
pct_survivors
: int, floatArgument that controls the number of survivors.
BoltzmannSelection
pct_survivors
: floatArgument that controls the number of survivors.
T0
: int, floatInitial Temperature to calculate Boltzmann probability. A number between [5, 100].
a
: int, floatAlpha, a constant between [0, 1].
Mating
MatingFunction
cr_proba
: int, float [default=1]Percentage of survived population. Determines how many couples are paired during mating.
increst_prevention
: bool [default=True]If True, solution candidates sharing the same parents will be paired together.
Reproduction
KPointCrossover
k
: intNumber of times of the chromosomes being splitted.
c_pt
: int, str [default=’random’]If int, c_pt will be the position index of the splitting points. If str, the splitting point location where be randomly determined. If ‘random’, the splitting point will be randomly picked.
Mutation
BitStringMutation
epsilon
: float [default=.15]Mutation rate that determines if genes will mutate or not.
ExchangeMutation
epsilon
: float [default=.15]Mutation rate that determines if genes will mutate or not.
ShiftMutation
epsilon
: float [default=.15]Mutation rate that determines if genes will mutate or not.
Environment
AdaptiveReproduction
pop_cap
: int [default=None]Maximum population size.
AdaptiveMutation
a
: int, float [default=.2]Alpha, a constant to adjust the self-adaptive mutation rate.
Elitism
pct
: int, float [default=.05]Percentage of population being selected as elites.
Warnings
Population Decline Warning
The
Population Decline Warning
indicates the current population size has become smaller than the previous generation, which might leads to premature convergence
Elitism Failed Warning
The
Elitism Failed Warning
indicates the number of elites selected was zero due to round issue.
Low Population Warning
The
Low Population Warning
indicates the initial population size might be too smaller. Which might leads to premature convergence.
Low Population Diversity Warning
The
Low Population Diversity Warning
indicates most of the candidates in the current generation were reproduced by the same parents. Which might leads to premature convergence.