Multiobjective optimization

Michael Lahanas

Plan: To bother about the best method of accomplishing an accidental result.

Ambrose Bierce, The Enlarged Devil’s Dictionary.

Optimization problems for real applications have often to consider many objectives and we thus have a multiobjective (MO) problem. A trade-off between the objectives exist and we never have a situation in which all the objectives can be in a best possible way be satisfied simultaneously.

MO optimization provides the information of all possibilities of alternative solutions we can have for a given set of objectives. By analyzing the spectrum of solutions we have to decide which of these solutions is the most appropriate. The two steps are to solve the MO problem and decide what the optimal solution is.

The MO problem (also called multicriteria optimization or vector optimization), can be defined as the problem of determining the following. “A vector of decision variables which satisfies constraints and optimizes a vector function whose elements represent M objective functions. These functions form a mathematical description of performance criteria that are usually in conflict with each other”. Hence optimizes means finding a solution which would give acceptable results as possible for the values of all objective functions.

We call decision variables xj, j=1,...,N for which values are to be chosen in an optimization problem.

In order to know how good is a certain solution, we need to have some criteria for evaluation. These criteria are expressed as computable functions f1(x),...,fM(x) of the decision variables, which are called objective functions. These form a vector function f = (f1(x),...,fM(x). In general, some of these will be in conflict with others, and some will have to be minimized while others are maximized. The multiobjective optimization problem can be now defined as the problem to find the vector x=(x1,x2,...,xN), i.e. solution which optimize the vector function f

The constraints define the feasible region X and any point x in X defines a feasible solution. The vector function f(x) is a function that maps the set X in the set F that represents all possible values of the objective functions. Normally we never have a situation in which all the fi(x) values have an optimum in X at a common point x. We therefore have to establish certain criteria to determine what would be considered an optimal solution. One interpretation of the term optimum in multiobjective optimization is the Pareto optimum, see Fig 1.

A solution x1 dominates a solution x2 if and only if the two following conditions are true:

* x1 is no worse than x2 in all objectives, i.e. fj(x1) £ fj(x2) " j=1,…,M.

* x1 is strictly better than x2 in at least one objective, i.e. fj(x1) < fj(x2) for at least one j Î {1,…,M}.

Figure 1 Example of a bi-objective space (f1, f2). We assume a minimization problem. (a) The Pareto front is the boundary between the points P1 and P2 of the feasible set F. Solutions 1 and 3 are non-dominated Pareto optimal solutions. Solution 2 is not Pareto optimal as solution 1 has simultaneously smaller values for both objectives. There is no reason why solution 2 should be accepted rather than solution 1. Therefore the aim of MO optimization is to obtain a representative set of non-dominated solutions. (b) The ideal and anti-ideal or nadir point I and A respectively. The point A+ defined sometimes as anti-ideal is defined over the whole F range, whereas the points I and A are defined by the set of efficient points.

We assume, without loss of generality, that this is a minimization problem. x1 is said to be non-dominated by x2 or x1 is non-inferior to x2 and x2 is dominated by x1.

Among a set of solutions P, the non-dominated set of solutions P’ are those that are not dominated by any other member of the set P, see figure 1(a). When the set P is the entire feasible search space then the set P’ is called the global Pareto optimal set. If for every member x of a set P there exists no solution in the neighbourhood of x then the solutions of P form a local Pareto optimal set. The image f(x) of the Pareto optimal set is called the Pareto front, see figure 1(a). The Pareto optimal set is defined in the parameter space, while the Pareto front is defined in the objective space.

The ideal point and the nadir or anti-ideal point are characterized by the components of the best and worse objective values of efficient points respectively.

Single objective optimization

In single objective optimization only one objective function is considered. Although many problems are multiobjective problems the majority of optimization algorithms used for their solution are single objective optimization algorithms. The M objective functions fm(x) are combined into a single objective function f(x), e.g. by using a weighted sum of all objectives.


The weights wm are also known as importance factors and are considered as a measure of the significance of each objective in the optimization process. A representative convex part of the Pareto set can be sampled by running a single objective optimization algorithm each time with a different vector of importance factors.

We call a set of importance factors vectors normalized and uniformly distributed if each importance factor of each objective takes one of the following values: , where k is the sampling parameter and for each vector .

The vector sum where are unit vectors of the objective space. For two objectives the weighted sum is given by

, i.e (2)

The minimization of the weighted sum can be interpreted as finding the value of y for which the line with slope just touches the boundary of F as it proceeds outwards from the origin. If x* is a Pareto optimal solution then there exists a weight vector , such that x* is a solution of the multiobjective convex optimization problem.

Figure 2 shows an example for a weighted sum optimization for a bi-objective problem. The solution obtained by is the non-dominated solution P. If the value f1 is not satisfactory then we can increase the importance factor to . The new set of importance factors specify a new direction and the solution is P*. The value of f1 of P* is smaller but the value f2(P*) is larger than f2(P). If we are not satisfied by the value of f2(P) we can try another importance factor with .

In a trial and error method the optimization is repeated with different importance factors until the optimization result is acceptable. The designer by a trial and error method determines the structure of the Pareto front.

While for two objectives a solution very close to the optimal can be found it is more difficult as more objectives are considered. The complexity of the Pareto front increases rapidly and also the combinatorial complexity. To determine the dependence of the results on the importance factors with a sufficient accuracy requires repeating the optimization a large number of times with different importance factor combinations. The solution which is obtained in the conventional weighted sum approach depends on the shape of the Pareto front and the importance factors used.

Figure 2 Single objective weighted sum optimization w1f1 + w2f2 for a bi-objective problem. For a given set of importance factors the vector sum of w1, w2, if we consider these as vectors, specifies a direction S shown by the dashed line. The optimization provides a solution which is the point P of a line perpendicular to the direction S that will touch the Pareto front as the line is moved away from the origin along S.  Solution P* will be obtained if we replace w1 by a larger importance factor w1*.

Decision making

Multiobjective optimization requires a decision making process as there is not a single solutions but a set of non-dominated solutions out of which the best must be chosen. Three strategies can be used.

* An a priori method. The decision making (DCM) is specified in terms of a scalar function and an optimization engine is used to obtain the corresponding solution.

* An a posteriori method. An optimization engine exists which finds all solutions. Decision making is applied at the end of the optimization manually, or using a decision engine. This method decouples the solutions from the decision making process. A new decision is possible without having to repeat the optimization.

* Mixture of a priori and a posteriori methods. During the optimization periodically information can be used which may be used to reformulate the goals as some of these can physically not be achieved.

The non-dominated solutions provide information on the trade-off between the objectives. This trade-off is described by the form of the Pareto front. In figure 3 two different forms of trade-offs are shown.

In figure 3(a) there is a strong trade-off between the objective f1 and f2. The smaller the f1 value is that we want the larger is the corresponding f2 value. We can see how much this depends on the choice of f1. There is a weak trade-off between f1 and f2 in figure 3(b). It is possible to minimize f1 significant and close to the ideal point (f1,min, f2,min). Only very close to f1,min we see a rapid increase of f2. This is a case for a set of parameters for which we can obtain simultaneously almost the individual optimal values for f1 and f2. The Pareto front provides also ranges of objective values.

Figure 3 Trade-off between two objectives of a bi-objective problem: (a) Strong and (b) weak trade-off between f1 and f2.

For more than two objectives the trade-off can be difficult to analyse graphically. A sensitivity analysis can be performed numerically such that it considers the local slope of the Pareto front as a measure of the trade-off.

The main two tasks of MO optimization are

* Obtaining a representative set of non-dominated solutions.

* Selecting a solution from this set, i.e. the decision making process.


Re-optimization is understood literally as starting from the optimal solution of one problem to find an optimal solution for the next problem in the sequence. The multiobjective optimization using deterministic gradient based algorithms can be seen as an application of the algorithm many times based on a sequence of importance factors sets. One method to obtain a representative sample of solutions faster, could be the use of the result of a previous optimization as the starting point for the optimization with another set of importance factors w* which is close to w, see Figure 1. With this approach it should be possible to reduce the number of the iterations required if the solutions are also close in objective space.

Figure 4 Optimization methods to obtain a consecutive number of solutions ordered in objective space: (a) Cold Start (b) Warm Start. The numbers show the order in which each of the five solutions is processed. This can be seen schematically in (a) and (b) if the lengths of all the arrows are summed. The total length is smaller in (b) than in (a).

The method which produces sets of uniformly or randomly distributed importance factors has been modified using a rearrangement of the order of the importance factors such that a spatial proximity of consecutive sets in importance factors space is obtained. This method has been used to test whether there are trapping regions. The rearrangement is shown in Figure 5(a) for randomly distributed importance factors and in Figure 5(b) for uniformly distributed importance factors for 3 objectives. Even for the initial uniformly distributed importance factors we have a reduction of the Euclidean distance in importance space from 0.35±0.26 to 0.24± 0.037 units.

Figure 5 Rearrangement of solutions in importance space. (a) Solutions using randomly distributed importance factors before and after sorting. (b) Uniformly distributed importance factors before and after sorting.