Author: Jack Penick, jpenick

1

Background

As genetic algorithms are similar to Darwinian evolution, so the algorithm created herein is similar to Lamarckian evolution. By reinforcing behavior throughout the lifetime of an individual and passing on this learning to decendents, a new variation on genetic algorithms that can more quickly converge on local optima while maintaining the benefits of a broad search space and parallelizability that a genetic algorithm provides.

2

Goals

To combine an implementation of NeuroEvolution of Augmenting Topologies (NEAT) with an implementation of Deep Q-Learning with backpropagation to create a novel reinforcement learning method, and test this new reinforcement learning method on Flappy Bird, comparing to the original algorithms by themselves

3

Approach

I forked an existing application of NEAT to flappy bird available at https://github.com/markopuza/FlappyBird-Evolution, and manually implemented Q-Learning backpropagation and called it at every point when the score was incremented. I built a q-lerning alone implementation using the same code as the combined algorithm with an alternate configuration file that maintains each bird from one generation to the next.

4

Challenges

The existing NEAT implementation was not very conducive to backpropagation, and the code for the feed forward network required major modifications at every level in order to accomadate. The existing application of NEAT to Flappy Bird was not compatibile with the modern version of python-neat and required updating. Further, the NEAT implemenation was poorly documented and rather opaque, making modifications difficult.

1

5

Implementation overview

flappy original.py: The original, verbatim implementation as found in the publically available github repository, https://github.com/markopuza/Flappy-Bird-Evolution flappy neat only.py: A modified version of the original implementation to be compatible with modern libraries, python 3.5 and python-neat 0.91. Can be executed with no parameters flappy combined.py: A modified version of the original implementation that executes backpropagation at every point that the score is increased and at every point that a bird dies. Can be executed with no parameters. flappy q only.py: Identical to flappy combined.py, except that it uses a configuration file which effectively disables NEAT. Not the most efficient implementation, but was fast to create. Can be executed with no parameters. flappy config: The hyperparameters for NEAT flappy config q only: Sets NEAT hyperparameters with pop size = elitism and with no selection so that every bird survives from one generation to the next. 30 birds are used in parallel with 20 hidden neurons each All other files: Auxilliary files included in the original implementation necessary for a functioning user interface

6

Results

An execution of the combined implementation yielded the following: Highscore Highscore Highscore Highscore

after after after after

10 20 30 40

generations: generations: generations: generations:

7 19 40 77

An execution of the neat-only implementation yielded the following: Highscore Highscore Highscore Highscore

after after after after

10 20 30 40

generations: generations: generations: generations:

3 20 20 25 2

An execution of the q-learning-only implementation yielded the following:

Highscore Highscore Highscore Highscore

7

after after after after

10 20 30 40

iterations: iterations: iterations: iterations:

0 0 0 0

Analysis

While NEAT alone appeared to reach a steady state around a score of 25 or so, the combined implementation appeared to grow without bound in score as experience grew, and the q learning only implementation failed to figure out how to pass a single pipe. The reason that q-learning only failed seems to be that the reward function used has 2 local optima which are easily reached - flapping at the top to maximize survival duration, and not flapping at all to minimize flapping (the reward function takes both of these into account in order to provide some idea of what an improvement looks like among birds that cannot score). Experiences generated never successfully score, making it difficult for a q-learning bird to successfully identify strategies that can pass through the first pipe. The reason that NEAT alone seems to reach a steady state seems to be that there is a point that new innovations generated by positive mutations are balanced out by negative mutations, making further progress difficult. The combined algorithm has lots of success in making up for the shortfalls of both methods - while the genetic algorithm successfully searches broadly and escapes from local optima, it fails to converge on local optima while the combined algorithm does. Similarly, while q-learning alone too quickly converges on a local optimum without searching broadly for alternative methods, the combined algorithm effectively searches broadly and improves on its performance.

8

References

Stanley, Kenneth O., and Risto Mikkulainen. ”Evolving neural networks through augmenting topologies.” Evolutionary computation 10.2 (2002): 99-127. Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. Diss. University of Cambridge, 1989.

3

1 Background 2 Goals 3 Approach 4 Challenges - GitHub

code for the feed forward network required major modifications at every level in order to accomadate. The existing application of NEAT to Flappy Bird was not ...

46KB Sizes 4 Downloads 411 Views

Recommend Documents

2. Background 5. Conclusion 1. Introduction 3 ...
1. Introduction. With the advent of the photonic crystal a new concept in fiber optics called photonic crystal fiber. (PCF) has come to forefront in fiber research.

K 1 2 3 4
pp, mp, mf, ff adagio, moderato, presto, legato & staccato ... Major scale (C, F, G). I, IV, V(7) Broken Chord ... Major & minor scales. Bass clef. Grand staff.

1 : :' ') ' , .( : :' ... ') ' , - .( : . 2 : ' , :' ;' . : , :' ;' , . 3 : :' ') ' .( 4 ...
This stanza was composed to complete the song in which we praise ' for the many salvations He has granted us; in our modern era, we have been blessed '.

!" ! # $%%&'%()*%%%+,-% ..$%/0%1%2-%)3%%4 ... -
G%O%-%%)A?K)L%%&^S&)[%='%)1%&E)3%%5)%+_%E@)%+3%Z8^S)C%%0%BO%&9)0%A)%+3%Z8^S,%%)$%MZ%) ...

Background Running predictions - GitHub
The app is designed to address the limitations of current web .... This interface allows you to get an estimate of how well conserved a .... Development of an epitope conservancy analysis tool to facilitate the design of epitopebased diagnostics.

Background Getting Started - GitHub
four reference projects for building Moai: vs2008, vs2010, Android/NDK and Xcode. ... phone simulator: we've only exposed the desktop input devices supported ...

1 3 3 ITEM NO. PART NUMBER QTY. 1 Acrylic Wall Panel 2 2 ... - GitHub
*This Page is for the stand off for mounting the acrylic case to the Robot kit. These pieces also Include another pair for 10-24. Hex Nut*. DETAIL B. SCALE 1 : 1.

1 of 3 4/28/15, 19:23 - GitHub
Apr 28, 2015 - Populating the interactive namespace from numpy and matplotlib. Out[5]: (3168, 4752). Out[6]: (array([7972546, 76184, 1027385, 1151136, ...

Page 1 !"#$%&'()*$+ !$,-./*-$0/*1(& 2"%3$4" 56 789: 4;
CC. C@. C>. C9. C8. C7. CK. C? @:. @+. @C. @@. FSZW/1-*/(Q. MMMN11-()/XX-NR(.N/Q. 0Q*&)Q&*$!(S)1&. MMMN*)-/NR(.N/Q. 0Q*&)Q&*$!(S)1&.

"line 1" "line 2" "line 4" "line 10" 1 "line 1" 2 "line 2" 3 4 " -
Page 1. "line 1". "line 2". "line 4". "line 10". 1 "line 1". 2 "line 2". 3. 4 "line 4". 5. 6. 7. 8. 9. 10 "line 10". Not a code block.

6 4 5 3 2 7 9 10 1 Manhattan Community Districts and City ... - GitHub
Source: MapPLUTO™ V.16.2, BYTES of the Big Apple. Created: September 2017. 0. 1. 2. Miles. °. Community District. City Council District. Park. 45. 10. 8. 8. 8. 5. 1. Page 2. 76. 68. 73. 70. 69. 74. 67. 75. 66. 65. 71. 72. Forest. Park. Central. Pa

Paddle Controller 1.0 Amplifier and IIC Digitizer ABCD 1 2 3 4 ... - GitHub
Copyright (C) 2014 H.Poetzl. Licensed under CERN OHL v.1.1. 1.0. 12bit Resolution, IIC Terminated. LMV321SEG-7SOT25. MCP3221A5T-E/OT. GND. GND. V.

3,&&2/2 3& ' ɹɶ 7 - GitHub
'LPHQVLRQV. /HQJWK. :LGWK. +HLJKW. PP ɲɸɱ ɹɱ ɹɶ. LQFK ɷ ɸ ɴ ɲ ɴ ɴ. 0DWHULDOV. 0DWHULDO. 3RO\FDUERQDWH. %DVH FRORXU. 5$/ ɸɱɴɶ.

Lot 1 Lot 2 Lot 3 Lot 4 Lot 5
Oversized End Lot. Premium: Series/Type. : Townhome - 200. Series. Model: The Genoa (227). Notes: Lot 7 ..... Left Side of map. Status. Sold. Size: Oversized ...

Page 1 3126/18/13 1 2 Page 2 ) ) b* c* d* 3 (94) 4 Page 3 " 31.41 2 ...
Nov 17, 2015 - 10. 1. -9. -. 10. 2. -8. -. *. 10. 2. -8. 2. 9. 1. -8. -. 6. 0. -6. -. 6. 0. -6. -. *. 6. 1. -5. 1. 5. 0. -5. -. 10. 6. -4. -. 4. 0. -4. -. 4. 1. -3. -. 17. 18. +1. 6. 5. 0. 5. 10. 15. 20. 25. 30. 35. 40. 90. 91. 92. 93. 94. 95. 96. 97

1. background 4. literature review 5. methodology 6 ... -
advanced technologies. • parallel connection to stabilize the voltage. •Self- made sensor from copper cylinder with hanging pendulum. • manipulate the change ...

Page 1 as-4-2-4- 3-2. - wall all www.tips club.com andy ur-. rTrrar gla ...
Uli S. J3. ck -- s se-1-y k 3r- le-3- r-a2- s - J - a c-a' albi Juels, Julis S., , ,. JS ...Y el: 1 : : 2- 9 - - 124 r sy- Pella gally. ( * : *sky tes, e.g. : J it is . Jar: as. J.3, 4,.

1 2 cityandboroughofsitka 3 4 ordinance no ...
Apr 12, 2016 - use of cell phones while operating a motor vehicle. ... accident, medical emergency, or serious traffic hazard or to prevent a crime about to be.

Polyisocyanates blocked with 3, 5-dimethyl-1, 2, 4-triazole
Jul 24, 1997 - Description of the Prior Art. Multi-layer coatings in Which the ..... foregoing for the purpose of illustration, it is to be under stood that such detail is ...

1 2 cityandboroughofsitka 3 4 ordinance no ...
Apr 12, 2016 - use of cell phones while operating a motor vehicle. 19 ... Use of the cell phone is for obtaining emergency assistance to report a crime, traffic.

"#$%&'&'(%)*+%,&'"-.)'-) /0&1"2'&32'3-)4&3
Oct 1, 2011 - political participation, competitive authoritarian regimes can be used as ...... cal science, where lately there has been a shift to micro-data in the ...

( ) ( 0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8 4 9 7 @ A B 2 C D 2 C E 8 4 1. F 2 G 8 2 4 E 7 4 8 G 4 4 A 7 ..... Ж y Е b R S W c V P Q V U X V Q v V V e b a U X V b i Г V W U Б e c P T U X V W ...

1. Academic Qualifications 2. Employment 3. Awards 4 ...
Pradeep Kumar Singh. Assistant Professor. Department of Computer Science and Engineering. M M M Engineering college, Gorakhpur -273010 UP INDIA ...

Recordset 1 och 2 - GitHub
TTEKOKORTISAR. EKK. TT-GÖTEBORG-PM. GPM. TT-NORRLANDS-PM. NPM .... This means of course that this field not is repeated. The signatures are SGML ...