Author: Jack Penick, jpenick

1

Background

As genetic algorithms are similar to Darwinian evolution, so the algorithm created herein is similar to Lamarckian evolution. By reinforcing behavior throughout the lifetime of an individual and passing on this learning to decendents, a new variation on genetic algorithms that can more quickly converge on local optima while maintaining the benefits of a broad search space and parallelizability that a genetic algorithm provides.

2

Goals

To combine an implementation of NeuroEvolution of Augmenting Topologies (NEAT) with an implementation of Deep Q-Learning with backpropagation to create a novel reinforcement learning method, and test this new reinforcement learning method on Flappy Bird, comparing to the original algorithms by themselves

3

Approach

I forked an existing application of NEAT to flappy bird available at https://github.com/markopuza/FlappyBird-Evolution, and manually implemented Q-Learning backpropagation and called it at every point when the score was incremented. I built a q-lerning alone implementation using the same code as the combined algorithm with an alternate configuration file that maintains each bird from one generation to the next.

4

Challenges

The existing NEAT implementation was not very conducive to backpropagation, and the code for the feed forward network required major modifications at every level in order to accomadate. The existing application of NEAT to Flappy Bird was not compatibile with the modern version of python-neat and required updating. Further, the NEAT implemenation was poorly documented and rather opaque, making modifications difficult.

1

5

Implementation overview

flappy original.py: The original, verbatim implementation as found in the publically available github repository, https://github.com/markopuza/Flappy-Bird-Evolution flappy neat only.py: A modified version of the original implementation to be compatible with modern libraries, python 3.5 and python-neat 0.91. Can be executed with no parameters flappy combined.py: A modified version of the original implementation that executes backpropagation at every point that the score is increased and at every point that a bird dies. Can be executed with no parameters. flappy q only.py: Identical to flappy combined.py, except that it uses a configuration file which effectively disables NEAT. Not the most efficient implementation, but was fast to create. Can be executed with no parameters. flappy config: The hyperparameters for NEAT flappy config q only: Sets NEAT hyperparameters with pop size = elitism and with no selection so that every bird survives from one generation to the next. 30 birds are used in parallel with 20 hidden neurons each All other files: Auxilliary files included in the original implementation necessary for a functioning user interface

6

Results

An execution of the combined implementation yielded the following: Highscore Highscore Highscore Highscore

after after after after

10 20 30 40

generations: generations: generations: generations:

7 19 40 77

An execution of the neat-only implementation yielded the following: Highscore Highscore Highscore Highscore

after after after after

10 20 30 40

generations: generations: generations: generations:

3 20 20 25 2

An execution of the q-learning-only implementation yielded the following:

Highscore Highscore Highscore Highscore

7

after after after after

10 20 30 40

iterations: iterations: iterations: iterations:

0 0 0 0

Analysis

While NEAT alone appeared to reach a steady state around a score of 25 or so, the combined implementation appeared to grow without bound in score as experience grew, and the q learning only implementation failed to figure out how to pass a single pipe. The reason that q-learning only failed seems to be that the reward function used has 2 local optima which are easily reached - flapping at the top to maximize survival duration, and not flapping at all to minimize flapping (the reward function takes both of these into account in order to provide some idea of what an improvement looks like among birds that cannot score). Experiences generated never successfully score, making it difficult for a q-learning bird to successfully identify strategies that can pass through the first pipe. The reason that NEAT alone seems to reach a steady state seems to be that there is a point that new innovations generated by positive mutations are balanced out by negative mutations, making further progress difficult. The combined algorithm has lots of success in making up for the shortfalls of both methods - while the genetic algorithm successfully searches broadly and escapes from local optima, it fails to converge on local optima while the combined algorithm does. Similarly, while q-learning alone too quickly converges on a local optimum without searching broadly for alternative methods, the combined algorithm effectively searches broadly and improves on its performance.

8

References

Stanley, Kenneth O., and Risto Mikkulainen. ”Evolving neural networks through augmenting topologies.” Evolutionary computation 10.2 (2002): 99-127. Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. University of Cambridge, 1989.

3

1 Background 2 Goals 3 Approach 4 Challenges - GitHub

code for the feed forward network required major modifications at every level in order to accomadate. The existing application of NEAT to Flappy Bird was not ...

46KB Sizes 4 Downloads 271 Views

Recommend Documents

2. Background 5. Conclusion 1. Introduction 3 ...
1. Introduction. With the advent of the photonic crystal a new concept in fiber optics called photonic crystal fiber. (PCF) has come to forefront in fiber research.

1 : :' ') ' , .( : :' ... ') ' , - .( : . 2 : ' , :' ;' . : , :' ;' , . 3 : :' ') ' .( 4 ...
This stanza was composed to complete the song in which we praise ' for the many salvations He has granted us; in our modern era, we have been blessed '.

K 1 2 3 4
pp, mp, mf, ff adagio, moderato, presto, legato & staccato ... Major scale (C, F, G). I, IV, V(7) Broken Chord ... Major & minor scales. Bass clef. Grand staff.

1 3 3 ITEM NO. PART NUMBER QTY. 1 Acrylic Wall Panel 2 2 ... - GitHub
*This Page is for the stand off for mounting the acrylic case to the Robot kit. These pieces also Include another pair for 10-24. Hex Nut*. DETAIL B. SCALE 1 : 1.

Background Running predictions - GitHub
The app is designed to address the limitations of current web .... This interface allows you to get an estimate of how well conserved a .... Development of an epitope conservancy analysis tool to facilitate the design of epitopebased diagnostics.

Paddle Controller 1.0 Amplifier and IIC Digitizer ABCD 1 2 3 4 ... - GitHub
Copyright (C) 2014 H.Poetzl. Licensed under CERN OHL v.1.1. 1.0. 12bit Resolution, IIC Terminated. LMV321SEG-7SOT25. MCP3221A5T-E/OT. GND. GND. V.

Page 1 !"#$%&'()*$+ !$,-./*-$0/*1(& 2"%3$4" 56 789: 4;
CC. [email protected] C>. C9. C8. C7. CK. C? @:. @+. @C. @@. FSZW/1-*/(Q. MMMN11-()/XX-NR(.N/Q. 0Q*&)Q&*$!(S)1&. MMMN*)-/NR(.N/Q. 0Q*&)Q&*$!(S)1&.

"line 1" "line 2" "line 4" "line 10" 1 "line 1" 2 "line 2" 3 4 " -
Page 1. "line 1". "line 2". "line 4". "line 10". 1 "line 1". 2 "line 2". 3. 4 "line 4". 5. 6. 7. 8. 9. 10 "line 10". Not a code block.

Page 1 3126/18/13 1 2 Page 2 ) ) b* c* d* 3 (94) 4 Page 3 " 31.41 2 ...
Nov 17, 2015 - 10. 1. -9. -. 10. 2. -8. -. *. 10. 2. -8. 2. 9. 1. -8. -. 6. 0. -6. -. 6. 0. -6. -. *. 6. 1. -5. 1. 5. 0. -5. -. 10. 6. -4. -. 4. 0. -4. -. 4. 1. -3. -. 17. 18. +1. 6. 5. 0. 5. 10. 15. 20. 25. 30. 35. 40. 90. 91. 92. 93. 94. 95. 96. 97

1 2 cityandboroughofsitka 3 4 ordinance no ...
Apr 12, 2016 - use of cell phones while operating a motor vehicle. ... accident, medical emergency, or serious traffic hazard or to prevent a crime about to be.

Page 1 as-4-2-4- 3-2. - wall all www.tips club.com andy ur-. rTrrar gla ...
Uli S. J3. ck -- s se-1-y k 3r- le-3- r-a2- s - J - a c-a' albi Juels, Julis S., , ,. JS ...Y el: 1 : : 2- 9 - - 124 r sy- Pella gally. ( * : *sky tes, e.g. : J it is . Jar: as. J.3, 4,.

1 2 cityandboroughofsitka 3 4 ordinance no ...
Apr 12, 2016 - use of cell phones while operating a motor vehicle. 19 ... Use of the cell phone is for obtaining emergency assistance to report a crime, traffic.