Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution



Yüklə 537 b.
tarix14.10.2017
ölçüsü537 b.
#4796


Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution

  • Jacob Schrum (schrum2@southwestern.edu) Risto Miikkulainen (risto@cs.utexas.edu)


Introduction

  • Challenge: Discover behavior automatically

    • Simulations
    • Robotics
    • Video games
  • Why challenging?

    • Complex domains
    • Multiple agents
    • Multiple objectives
    • Multimodal behavior required


Agent must exhibit multiple behaviors

  • Agent must exhibit multiple behaviors

  • Domain consists of multiple tasks

  • How to identify/distinguish tasks?

    • Interleaved
      • Each task is separate
      • Sharp borders between tasks
    • Blended


Previous results show great success in Ms. Pac-Man (Schrum and Miikkulainen, GECCO 2014)

  • Previous results show great success in Ms. Pac-Man (Schrum and Miikkulainen, GECCO 2014)

    • Ms. Pac-Man has blended tasks
      • Mixture of threat/edible ghosts
    • Neuroevolution discovers own task divisions
  • What about interleaved tasks?

    • Introduce a version of Ms. Pac-Man with interleaved tasks
      • Ghosts are all threats or all edible
  • What about human-specified task divisions?

  • Understand problem better with formalism (in paper)



Ms. Pac-Man

  • Multimodal behavior needed

  • Predator/prey variant

    • Pac-Man takes on both roles
  • Goals: Maximize score by

    • Eating all pills in each level
    • Avoiding threatening ghosts
    • Eating ghosts (after power pill)
  • Non-deterministic

    • Very noisy evaluations
  • Four mazes

    • Behavior must generalize


Blended

  • Blended

    • Original game
    • Edible and threat ghosts present simultaneously
    • Distinct threat/edible regions
    • Four ghosts can be mix of threat/edible/imprisoned


Modular Policy

  • One policy consisting of several policies/modules

  • Means of arbitration also needed

    • Human-specified, or learned via preference neurons


Constructive Neuroevolution

  • Genetic Algorithms + Neural Networks

  • Three basic mutations + Crossover

  • Other structural mutations possible

  • (NEAT by Stanley and

  • Miikkulainen 2004)



Module Mutation



Evolved using Modular Multiobjective NEAT (http://nn.cs.utexas.edu/?mm-neat)

  • Evolved using Modular Multiobjective NEAT (http://nn.cs.utexas.edu/?mm-neat)

  • Each network type evaluated in 30 runs

    • Standard networks with one module (1M)
    • Nets with two (2M) or three (3M) modules + preference neurons
    • Nets starting with one module, subject to MM(D)
    • Multitask networks with human-specified task divisions
      • Interleaved:
      • Blended:
        • One module if any ghost is edible, one otherwise (MT2)
        • One module for edible, one for threats, one for mixed (MT3)




















Discussion

  • Obvious division is between edible and threat

    • Works fine in interleaved domain
    • Causes problems in blended domain
      • Strict Multitask divisions do not perform well
  • Better division: one module when surrounded

    • Very asymmetrical: surprising
    • Useful in interleaved domain, vital in blended domain
    • Module activates when Pac-Man almost surrounded
      • Often leads to eating power pill: luring
      • Helps Pac-Man escape in other risky situations


Interleaved tasks

  • Interleaved tasks

    • Many approaches work well
    • Multitask network with one module per task sufficient
    • Preference neurons can still learn better task division
  • Blended tasks

    • Harder scenario dealt with in more real domains
    • Human-specified task divisions (Multitask) break down
    • Evolution needs freedom to discover own task division
    • Results in novel, successful task division (escaping/luring)


  • Contact me:

  • schrum2@southwestern.edu

  • Movies:

  • http://nn.cs.utexas.edu/?interleaved-pm

  • http://nn.cs.utexas.edu/?blended-pm

  • Code:

  • http://nn.cs.utexas.edu/?mm-neat





Multimodal Behavior

  • Animals can perform many different tasks

  • Imagine learning a monolithic policy as complex as a cardinal’s behavior: HOW?

  • Problem more tractable if broken into component behaviors







Previous Work in Pac-Man

  • Custom Simulators

    • Genetic Programming: Koza 1992
    • Neuroevolution: Gallagher & Ledwich 2007, Burrow & Lucas 2009, Tan et al. 2011
    • Reinforcement Learning: Burrow & Lucas 2009, Subramanian et al. 2011, Bom 2013
    • Alpha-Beta Tree Search: Robles & Lucas 2009
  • Screen Capture Competition: Requires Image Processing

    • Evolution & Fuzzy Logic: Handa & Isozaki 2008
    • Influence Map: Wirth & Gallagher 2008
    • Ant Colony Optimization: Emilio et al. 2010
    • Monte-Carlo Tree Search: Ikehata & Ito 2011
    • Decision Trees: Foderaro et al. 2012
  • Pac-Man vs. Ghosts Competition: Pac-Man

    • Genetic Programming: Alhejali & Lucas 2010, 2011, 2013, Brandstetter & Ahmadi 2012
    • Monte-Carlo Tree Search: Samothrakis et al. 2010, Alhejali & Lucas 2013
    • Influence Map: Svensson & Johansson 2012
    • Ant Colony Optimization: Recio et al. 2012
  • Pac-Man vs. Ghosts Competition: Ghosts

    • Neuroevolution: Wittkamp et al. 2008
    • Evolved Rule Set: Gagne & Congdon 2012
    • Monte-Carlo Tree Search: Nguyen & Thawonmos 2013


Evolved Direction Evaluator

  • Inspired by Brandstetter and Ahmadi (CIG 2012)

  • Net with single output and direction-relative sensors

  • Each time step, run net for each available direction

  • Pick direction with highest net output



Preference Neuron Arbitration

  • How can network decide which module to use?

    • Find preference neuron (grey) with maximum output
    • Corresponding policy neurons (white) define behavior


Pareto-based Multiobjective Optimization (Pareto 1890)



Non-dominated Sorting Genetic Algorithm II (Deb et al. 2000)

  • Population P with size N; Evaluate P

  • Use mutation (& crossover) to get P´ size N; Evaluate P´

  • Calculate non-dominated fronts of P P´ size 2N

  • New population size N from highest fronts of P  P´



Direction Evaluator + Modules

  • Network is evaluated in each direction

  • For each direction, a module is chosen

    • Human-specified (Multitask) or Preference Neurons
    • Chosen module policy neuron sets direction preference


Yüklə 537 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə