Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution Jacob Schrum (schrum2@southwestern.edu) Risto Miikkulainen (risto@cs.utexas.edu)
Introduction Challenge: Discover behavior automatically - Simulations
- Robotics
- Video games
Why challenging? - Complex domains
- Multiple agents
- Multiple objectives
- Multimodal behavior required
Agent must exhibit multiple behaviors Agent must exhibit multiple behaviors Domain consists of multiple tasks How to identify/distinguish tasks? - Interleaved
- Each task is separate
- Sharp borders between tasks
- Blended
Previous results show great success in Ms. Pac-Man (Schrum and Miikkulainen, GECCO 2014) Previous results show great success in Ms. Pac-Man (Schrum and Miikkulainen, GECCO 2014) - Ms. Pac-Man has blended tasks
- Mixture of threat/edible ghosts
- Neuroevolution discovers own task divisions
What about interleaved tasks? - Introduce a version of Ms. Pac-Man with interleaved tasks
- Ghosts are all threats or all edible
What about human-specified task divisions? - Evolve multitask networks
Understand problem better with formalism (in paper)
Ms. Pac-Man Multimodal behavior needed Predator/prey variant - Pac-Man takes on both roles
Goals: Maximize score by - Eating all pills in each level
- Avoiding threatening ghosts
- Eating ghosts (after power pill)
Non-deterministic Four mazes
Blended Blended - Original game
- Edible and threat ghosts present simultaneously
- Distinct threat/edible regions
- Four ghosts can be mix of threat/edible/imprisoned
Modular Policy One policy consisting of several policies/modules Means of arbitration also needed - Human-specified, or learned via preference neurons
Constructive Neuroevolution Genetic Algorithms + Neural Networks Three basic mutations + Crossover Other structural mutations possible (NEAT by Stanley and Miikkulainen 2004)
Module Mutation MM(Duplicate) copies previous module New module can diverge as weights change Can happen more than once
Evolved using Modular Multiobjective NEAT (http://nn.cs.utexas.edu/?mm-neat) Evolved using Modular Multiobjective NEAT (http://nn.cs.utexas.edu/?mm-neat) Each network type evaluated in 30 runs - Standard networks with one module (1M)
- Nets with two (2M) or three (3M) modules + preference neurons
- Nets starting with one module, subject to MM(D)
- Multitask networks with human-specified task divisions
- Interleaved:
- Blended:
- One module if any ghost is edible, one otherwise (MT2)
- One module for edible, one for threats, one for mixed (MT3)
Discussion Obvious division is between edible and threat - Works fine in interleaved domain
- Causes problems in blended domain
- Strict Multitask divisions do not perform well
Better division: one module when surrounded - Very asymmetrical: surprising
- Useful in interleaved domain, vital in blended domain
- Module activates when Pac-Man almost surrounded
- Often leads to eating power pill: luring
- Helps Pac-Man escape in other risky situations
Interleaved tasks Interleaved tasks - Many approaches work well
- Multitask network with one module per task sufficient
- Preference neurons can still learn better task division
Blended tasks - Harder scenario dealt with in more real domains
- Human-specified task divisions (Multitask) break down
- Evolution needs freedom to discover own task division
- Results in novel, successful task division (escaping/luring)
Contact me: schrum2@southwestern.edu Movies: http://nn.cs.utexas.edu/?interleaved-pm http://nn.cs.utexas.edu/?blended-pm Code: http://nn.cs.utexas.edu/?mm-neat
Animals can perform many different tasks Imagine learning a monolithic policy as complex as a cardinal’s behavior: HOW? Problem more tractable if broken into component behaviors
Previous Work in Pac-Man Custom Simulators - Genetic Programming: Koza 1992
- Neuroevolution: Gallagher & Ledwich 2007, Burrow & Lucas 2009, Tan et al. 2011
- Reinforcement Learning: Burrow & Lucas 2009, Subramanian et al. 2011, Bom 2013
- Alpha-Beta Tree Search: Robles & Lucas 2009
Screen Capture Competition: Requires Image Processing - Evolution & Fuzzy Logic: Handa & Isozaki 2008
- Influence Map: Wirth & Gallagher 2008
- Ant Colony Optimization: Emilio et al. 2010
- Monte-Carlo Tree Search: Ikehata & Ito 2011
- Decision Trees: Foderaro et al. 2012
Pac-Man vs. Ghosts Competition: Pac-Man - Genetic Programming: Alhejali & Lucas 2010, 2011, 2013, Brandstetter & Ahmadi 2012
- Monte-Carlo Tree Search: Samothrakis et al. 2010, Alhejali & Lucas 2013
- Influence Map: Svensson & Johansson 2012
- Ant Colony Optimization: Recio et al. 2012
Pac-Man vs. Ghosts Competition: Ghosts - Neuroevolution: Wittkamp et al. 2008
- Evolved Rule Set: Gagne & Congdon 2012
- Monte-Carlo Tree Search: Nguyen & Thawonmos 2013
Evolved Direction Evaluator Inspired by Brandstetter and Ahmadi (CIG 2012) Net with single output and direction-relative sensors Pick direction with highest net output
Preference Neuron Arbitration How can network decide which module to use? - Find preference neuron (grey) with maximum output
- Corresponding policy neurons (white) define behavior
Pareto-based Multiobjective Optimization (Pareto 1890)
Non-dominated Sorting Genetic Algorithm II (Deb et al. 2000) Population P with size N; Evaluate P Use mutation (& crossover) to get P´ size N; Evaluate P´ Calculate non-dominated fronts of P P´ size 2N New population size N from highest fronts of P P´
Direction Evaluator + Modules Network is evaluated in each direction - Human-specified (Multitask) or Preference Neurons
- Chosen module policy neuron sets direction preference
Dostları ilə paylaş: |