Evolutionary optimization algorithms

Empedocles says that the greater part of the members of animals were generated by chance...
What, then, hinders but that the parts in Nature may also thus arise [from necessity]? For instance, that the teeth should arise from necessity, the front teeth sharp and adapted to divide the food, the molars broad and adapted to breaking the food into pieces.
It may be said that they were not made for this purpose, but that this purposive arrangement came about by chance; and the same reasoning is applied to other parts of the body in which subsistence for some purpose is apparent. And it is argued that where all things happened as if they were made for some purpose, being aptly united by chance, these were preserved, but such as were not aptly made, these were lost and still perish, according to what Empedocles says concerning the bull species with human heads. This, therefore, and similar reasoning, may lead some to doubt on this subject.
Aristotle, Physics, Fourth Century B.C.

Single objective evolutionary algorithms

Evolutionary algorithm (EA) is a collective term for all variants of probabilistic optimisation algorithms that are inspired by Darwinian evolution. Optimal states are approximated by successive improvements based on the variation selection paradigm. Thereby, the variation operators produce genetic diversity and the selection directs the evolutionary search.

A genetic algorithm (GA) is a variant of EA, which, in analogy to the biological DNA alphabet, operates on strings which are usually bit strings of constant length. The string corresponds to the genotype of the individual. The phenotype of the individual is obtained by a mapping onto the object parameters (genotype-phenotype mapping).

A GA is according to Koza: “a highly parallel mathematical algorithm that transforms a set (population) of individual mathematical objects (typically fixed-length character strings patterned after chromosome strings), each with an associated fitness value, into a new population (i.e., the next generation) using operations patterned after the Darwinian principle of reproduction and survival of the fittest and after naturally occurring genetic operations (notably sexual recombination)”.

The GA is usually characterized by fitness proportionate selection and tournament selection, respectively. Crossover is the main variation operator; mutation (bit mutation) is usually regarded as a background operator. Usually a GA has the following components:

* A representation of potential solutions to the problem

* A method to create an initial population of potential solutions

* An evaluation function that plays the role of the environment, rating solutions in term of their fitness

* Genetic operators that alter the composition of the population members of the next generation.

* Values of various parameters that the genetic algorithm uses (population size, probabilities of applying genetic operators, etc.)

Fitness is an evaluation of an individual with respect to its reproduction capability and is usually a function of the objective function (to be optimised), which depends on the object parameters. The term fitness function is often used as a synonym for objective function.

Selection in EA is based on the fitness. Generally, it is determined on the basis of the objective value(s) of the individual in comparison with all other individuals in the selection pool. The fitness function may additionally depend on different side conditions/constraints and stochastic influences (fitness noise/noisy fitness). The hope is that there is a correlation between the fitness of the parents and offspring otherwise the search would be essentially a random walk search. Mutation searches usually in the local neighbourhood of the parent solution whereas crossover results in a more global oriented exploration of the search space.

GAs have a significant advantage over other search systems because instead of using a single searcher, as in calculus based search methods, a population is used making GA less probable to be trapped by local extremes.

Optimisations with GA are not limited to non-continuous or unimodal functions and therefore a larger class of problems can be solved than with traditional, gradient-based search methods.

An implementation of a GA begins with a population of (typical random) chromosomes. One then evaluates in such a way that those chromosomes which represent a better solution to a problem are given more chances to reproduce than those chromosomes which are poorer solutions. The goodness of a solution is typically defined with respect to the current population defined by a fitness function. Selection focuses the search to more promising points in the search space while mutation and crossover try to generate new and better points from these solutions. Figure 1 shows the principal steps for GAs.

: Population size, maximum number of generations, the crossover and mutation probability. I = {1,…,N}. Pt is the population at generation t are temporary populations.

1. Initialization. Set and t = 0.

For i =1,…,N

Choose according to some probability distribution.

Set P0 = P0+ {i}..

2. Fitness assignment. For each individual determine the encoded decision vector as well as the objective vector y = f(x) and calculate the scalar fitness value F(i).

3. Selection. Set .

For i=1,…,N do

Select one individual according to a given scheme and based on its fitness value F(i). Set P’ = P’+ {i}.

4. Crossover. Set .

For i = 1,…,N/2 do

Choose two individuals and remove them from P’.

Recombine i and j. The results are .

Add k.l to P’’ with probability pC. Otherwise add to P’’.

5. Mutation. Set .

For each individual do

Mutate i with probability The resulting individual is .

Set P’’ = P’’+ {i}.

6. Stop. Set Pt+1=P’’’ and t = t+1. If or any other stopping criterion is satisfied then set else go to 2.

Figure 1. Principal steps of a genetic algorithm.

Tournament selection

In tournament selection a number T of individuals is chosen randomly from the population and the best individual from this group is selected as parent. This process is repeated as often as individuals to choose. These selected parents randomly produce uniform offspring. The parameter for tournament selection is the tournament size T. T takes values ranging from 2 to the number of individuals in the population. Increasing T increases the selection pressure, i.e. the probability of the best individual being selected compared to the average probability of selection of all individuals. Also the selection intensity increases, i.e. the expected average fitness value of the population after applying the selection method.

However, a high selection intensity could lead to premature convergence and thus to a poor quality of the solutions.

The dependence of the algorithm depends on the representation, the type of crossover and mutation and a large number of experiments are used to obtain a set of parameters and operators with optimal performance for the genetic algorithm


In contrast to the canonical GA with bit encoded parameters, the genome of real-coded GA consists of real-valued object parameters, i.e. evolution operates on the natural representation. Usually for each chromosome an upper and lower limit of possible values is given which reduces the size of the search space.

Special crossover operators such as the simulated binary crossover (SBX) (Deb and Agrawal) allow the parameter space to be searched initially sufficiently in large steps. During the evolution the search is limited around the current parameter values with increasing accuracy. For two parents with variables xi1 and xi2 two new variables yi1 and yi2 are produced using a probability distribution .

A large value gives a high probability generating solutions close to the parents. The parameter determines the distance of the children from their parents. The probability distribution produces near-parent solutions with a larger probability than solutions far from the parents.


Various mutation operators have been proposed for real-coded GAs.

* Polynomial mutation by Deb and Goyal. If is the value of the ith parameter selected for mutation with a probability pm and the result of the mutation is the new value obtained by a polynomial probability distribution . and are the lower and upper bound of respectively and is a random number in [0,1]

The distribution is controlled by the parameter .

* Non-uniform mutation by Michalewicz given by

is the maximum number of generations, b a user specified control parameter and a random bit, 0 or 1. The exponent of approaches 0 as t approaches tmax and is closer to . This allows reducing the search closer to the optimum value found as evolution proceeds.


Elitism is a method that guarantees that the best ever found solution would always survive the evolutionary process. Elitism is important for the convergence properties of evolutionary algorithms. For multiobjective optimisation elitism has to consider that a single optimum solution does not exist but a set of so-called non-dominated solutions.

Some methods save the individuals with the best value for each objective found. Other methods additionally use an external population where a fraction of non-dominated solutions is archived. The size of this archive can be either free or can have a maximum specified number of archived solutions. Non-dominated solutions are added to the archive and dominated solutions are removed. If the number of solutions in the archive reaches the maximum allowed number then various methods have been proposed if a solution will be added and if so which solution of the archive will be replaced.

Multiobjective Evolutionary methods

Multiobjective evolutionary algorithms (MOEAS) provide a representative set of the Pareto front. The following problems have to be considered

* How to maintain a diverse Pareto front approximation (Density estimation, Diversity).

* How to prevent non-dominated solutions from being lost? (Environmental selection).

* How to guide the population towards the Pareto front? (Mating selection).

MOEAS are mainly either aggregation or dominance based

* Aggregation basedNon-dominated solutions are obtained by a weighted sum of the individual objective functions. The importance parameters can be varied during the evolution of the population and non-dominated solutions are acquired.

* Dominance basedMOEAs use in this case the dominance relation as a measure of the fitness of each individual. No parameters are required such as importance factors for aggregate-based algorithms.

MOEAs use the dominance relation with different methods

* Dominance rank (NPGA algorithm by Horn and Nafpliotis).

* Dominance count (NSGA-II algorithm by Deb et al).

* Dominance depth (SPEA2 algorithm by Zitzler et al)

A representative set of a finite number of solutions requires that the non-dominated solutions are as uniformly as possible distributed on the Pareto front of interest. In multiobjective optimisation although populations are able to search many local optima, a finite population tends to settle on a single good optimum, even if other equivalent optima exist. Finite populations tend to converge to a small part of the Pareto front. This phenomenon is called genetic drift and requires special mechanisms to prevent this occurring. Niche induction methods promote the simultaneous sampling of several different optima by favouring diversity in the population.

In order to simulate individual competition for finite resources in a geographical environment, fitness sharing models can be used. Individuals close to one another mutually decrease each other's fitness. Even if initially considered less fit, isolated individuals are thus given a greater chance of reproducing, favouring diversification. In order to prevent the population forming niches (crowds) and since a uniform distribution over the Pareto frontier is desired special mechanisms can be applied using information obtained by density estimation techniques.

* Nearest neighbour. Using a sphere hat defines the nearest neighbours.

* Histogram. Counting the Number of solutions in the same box. (e-dominance, Laumanns et al).

Environmental selection is used to obtain a representative efficient set.

* Dominance. Only non-dominated solutions are kept.

* Crowding. Density less crowded regions are preferred to crowded regions.

MO optimisation requires constraints to be satisfied for acceptable clinical solution. This can be achieved by MOEAs by

* Penalty. A penalty is added to the fitness of the individuals.

* New objectives. The constraints are considered as additional objectives.

* Modification of dominance relations. Not only the dominance relation is used but the degree of constraint violation is also considered in the selection process.

A special case are density estimation algorithms (EDA) Reference which generate new offspring by sampling from a probability distribution induced every generation from a fraction of the best performing individuals. One way is by finding a factorial probability distribution, i.e. a product of probability density functions (pdf). Factorisation is composed either of multivariate joint pdfs or of multivariate conditional pdfs in which a single variable is conditioned on a multiple of others.

MOEAs can be more effective than multistart single objective optimisation algorithms. The search space is explored in a single optimisation run. It is easier to select a solution if alternatives are known. Aggregation of several objectives into a single one requires setting of parameters. True multiobjective evolutionary algorithms do not require importance factors.

In its basic form a population evolves into the next generation with proportional replication, which reproduces a member with a probability proportional to its fitness. Crossover and mutation are then performed on the members of this population. The proportional replication in multiobjective optimisation has to be modified, since for every member of the population a vector of fitness values is assigned with components for the value of each objective function.

Exampled of MOEAs: NSGA-II

A representative MOEA algorithm, the ranking-based evolutionary algorithm NSGA-II, Deb et al, combines elitism and a mechanism to distribute the solutions as much as possible over the entire local Pareto set. For multiobjective optimisation elitism requires that some portion of the non-dominated solutions will survive.

NSGA-II classifies a population P in mutually exclusive equivalent classes Pi i.e. and These classes are called fronts. The number of classes varies from generation to generation and the members in each class are equivalent. That is, it cannot be stated which individual is better. This classification which is called non-dominated sorting is implemented as follows.

All non-dominated individuals are classified into one category and assigned a dummy fitness value or rank. These classified individuals are therefore ignored and from the remaining members of the population the non-dominated individuals are selected for forming the next layer. This process continues until all members are classified. Individuals of the first layer have the highest fitness while members of the last layer are assigned the smallest fitness. All Individuals from the first layer produce more copies in the next generation and therefore the population evolves towards the Pareto front.

For each member of the population an estimation of the density of solutions surrounding it is calculated using the so-called crowding distance measure: the population is sorted for each objective function value in their ascending order of magnitude. The solutions with the smallest and largest value are assigned a very large distance estimate to guarantee that they will be selected in the next generation. This helps to significantly increase the extent of the population on the Pareto front. All other solutions are assigned a distance value equal to the absolute difference in the function values of two adjacent solutions. The overall crowding distance is the sum of the individual distance values to each objective.

The elitism is used by combining together the population of children Qt and the parent population Pt at generation t together. A non-dominated sorting is applied and a new population is formed. A population of children Qt+1 from Pt+1 is formed using a binary crowded tournament selection, crossover and mutation.