Evaluation of The Scatter Code Encodings in GAs.
CS591 Fall 97 [Dr.Forrest]
Special Topics in Genetic Algrotithms
S. Madhu. CS Dept., UNM
<madhu@cs>
Wed Aug 19 13:11:37 1998

Motivation

The way in which candidate solutions are encoded is a central factor in the success of a genetic algorithm [Mitch97] . Using fixed length fixed order bitstrings in binary or gray encodings is common, the drawback being the existance of hamming cliffs, which affect the fitness landscape in the solution search spaces. Scatter codes [Derek94] provide us with an an encoding from linear space that has exponential capacity and no hamming cliffs. In this report we study the issues in using scatter code encodings in genetic search.

Scatter Codes

Scatter codes are fixed length binary bitsrings. To generate a sequence of N n-bit scatter codes, the first code is chosen randomly. The remaining codes are obtained by applying the scatter operation on the previous code (code i+1 is obtained by scattering code i.)
The scatter operation selects at random a few bits of the previous code and flips them. Each bit is flipped with probability p, (0<p<1) the flip-bit probability for the code.

The scatter code was invented to generate a mapping from multi-dimensional sense data into hamming space for use in Neural Networks, where independent scatter codes encoding different variables (on different sense axes) can be composed naturally with the XOR operator. The motivation to use scatter codes in GAs comes from the expectation that abscence of hamming cliffs in the encoding lead to fitness landscapes that are "better".

Approach

To assess the usefulness of scatter code encodings in genetic search three methods were considered. The first two methods involve actual experiments on GAs while the third analyses fitness distance correlations. [The Third approach is the focus of this interim report]

The first approach is to choose a set of well known functions, and optimize them on both binary and scatter codes, on the same domain and see which attains the optimum first. The second approach is to use the same functions in a fixed number of fitness evaluations and see which encoding method finds the optimum more often. These two approaches were implemented using the [GECO] package under Allegro Common Lisp, and raised some issues:

Because of these issues, it wasnt clear if scatter codes were going to be a win in function optimization. Implementations in Common Lisp were extremely slow. The above tests need to be reimplemented in C taking advantage of 32bit integers before satisfactory conclusions can be made. There was still the lurking suspicion that there might be a class of problems where scatter codes are a win.

FDC. The third approach to assessing scatter encodings was to use fitness distance correlation (FDC). In asking what makes a GA "difficult" [Terry95, p172] proposes that the relationship between fitness and distance to the goal as a measure of GA difficulty.

The methodology here is to examine problems with known optima, take a sample of individuals and compute the correlation coefficient r, given the set of (fitness, distance) pairs. Given a set F = {f_1,f_2 ... f_n} of n individual fitnesses and a corresponding set D = {d_1,d_2 ... d_n} of the n distances to the nearest global maximum, r is computed the usual way: r = Cov(F,D)/(sd(F).sd(D)) where Cov(F,D) = 1/nSUM(i=1 to n{ (f_i - f_avg)(d_i - d_avg) } is the covariance of F, and D, and sd(F), sd(D), f_avg and i_avg are the standard deviations and means of F and D. [Terry95, p194] uses FDC to make predictions about the relative worth of binary and gray encodings. We repeat these experiments with scatter codes to estimate the relative worth of scatter codes. These give us insights into the nature of the fitness landscape relative to other encodings.

Results

FDC Plots were generated for the 1max and riolo problems on all three types of encodings. The results in [Terry95] were verified on gray and binary encodings on sizes 8, 16, and 24, on the dejong functions 1,2. in order to compare these with scatter codes, we need to ensure that the accuracy of encodings is comparable. In all cases we restrict the dejong functions to the 2 dimensional case and convert it as a maximization problem. If we choose a scatter encoding with 1024 codes, and we choose a size of 20 bits in the binary and gray encodings, we get 10 bits per argument = 1024 codes. If we choose a scatter code size of 200, there are 100 bits for each argument. Each scatter code translation has 1024 codes. The scatter encoding had a flip-bit-probablity of 0.01.


The fdc-experiments were run on the first dejong function with these parameters:
      :func #'(lambda (x1 x2) (+ (square x1) (square x2)))
      :nargs 2
      :range '((-5.12 5.12) (-5.12 5.12))
      :optima '((-5.12 -5.12) (-5.12 5.12) (5.12 -5.12) (5.12 5.12))

We repeat generate fitness-distance pairs for each encoding. The process is repeated 10 times, to make sure results are within acceptable limits of standard error. The average correlation, the standard error, and best correlations are compared for the three encodings: g
statistic avg-correlation square(stderr) best-correlation
binary -0.30917525 1.3242558e-4 -0.3282121
gray -0.37372816 1.2376348e-4 -0.38945815
scatter-0.38841963 0.01179954 -0.5296851

The scatter-plots of fitness versus distance are plotted for the best distributions.

The scatter encodings yielded better correlations than the binary or gray encodings. Analysing the scatter-plots the nature of the fitness landscapes becomes apparent. Both in binary and gray codes, because of the hamming cliffs, if x and y are close to optima, and are seperated by say 0.02, the encoding of x E(x) and E(y) are far apart in hamming distance. This explains the bands. The scatter encoding however is curved up - there is almost a tracable continous path which the ga can take in reaching the optimum, by mutations.


The 2nd dejong function was tested with these parameters:
  :func #'(lambda (x1 x2)
  (+ (* 100 (square (- (square x1) x2))) (square (- 1 x1))))
  :nargs 2
  :optima '((2.048 -2.048))
  :range '((-2.048 2.048) (-2.048 2.048))
statistic avg-correlation square(stderr) best-correlation
binary -0.1860648 3.0494604e-4 -0.20727673
gray -0.34222966 1.7104487e-4 -0.37592116
scatter -0.40781552 0.004300935 -0.5232019

The dejong function 3 was tested as: This function had better correlations under binary than gray encoding but was not significantly found to do better in binary as in the experiments of Caruaana and Schaffer quoted and referenced in [Terry95]. Functions like these made fdc difficult to compare with experimental results in gray and binary encodings. An actual simulation is required to verify that scatter does not indeed perform any worse.
    :func #'(lambda (x1 x2) (+ (floor x1) (floor x2)))
    :nargs 2
    :range '((-5.12 5.12) (-5.12 5.12))
    :optima '((5.12 5.12))
statistic avg-correlation square(stderr) best-correlation
binary -0.53967196 1.7185335e-4 -0.55281687
gray -0.2716285 6.7105925e-5 -0.28346756
scatter -0.36458302 0.013899967 -0.5504384
It seems that the best correlation was not a good example of the scatter-plot. We plot the scatter plot for the result of the fdc experiment where:
((MEANY 97543/1000) (SUMY 390172) (VARY 120.163925)
 (PAIRS
  ((-4 102) (4 62) (-7 101) (0 94) (-6 112) (0 107) (0 112) (-3 93) (-2 111) (-5 104) ...))
 (POP-SIZE 4000) (SUMX -4197) (VARX 17.39881) (COV -11.516021) (MEANX -4197/4000)
 (CORR -0.25185794))

Conclusions and Future Work

Scatter encodings lead to better fitness landscapes and therefore make the GA easier to search in the sense of [Terry95] This was established by analysing the fitness-distance correlations and scatterplots on function optimization experiments primarily dejong functions 1 and 2, 3, and other functions.

Actual simulations of scatter-encodings on the simple ga indicate that

There were interesting (software engineering) results in the actual code that was written. Using CLOS and mixin-patterns was an experience in the ease of method-based Object oriented programming. The code uses GECO and is available at http://www.cs.unm.edu/~madhu/spoilers/ga.tgz

This was only an interim report. There is a body of work that still needs to be evaluated and presented in a complete report,


References


[hOMe] Touched: Fri Jul 31 19:33:08 1998