Learning tabula rasa can be unnecessarily slow



Yüklə 499 b.
tarix11.06.2018
ölçüsü499 b.
#47886



Learning tabula rasa can be unnecessarily slow

  • Learning tabula rasa can be unnecessarily slow

  • Humans can use past information

    • Soccer with different numbers of players
    • Different state variables and actions
  • Agents: leverage learned knowledge in novel/modified tasks



Model Free

  • Model Free

    • Q-Learning, Sarsa, etc.
    • Learn values of actions
    • In example: ~256 actions
  • Model-Based

    • Dyna-Q, R-Max, etc.
    • Learn effects of actions (“what is the next state?” → planning)
    • In example: ~36 actions


Transferring Instances for Model Based REinforcement Learning

  • Transferring Instances for Model Based REinforcement Learning

  • Transfer between

    • Model-learning RL algorithms
    • Different state variables and actions
    • Continuous state spaces
  • In this paper, we use:





χx: starget→ssource

  • χx: starget→ssource

    • Given state variable in target task (some x from s = x1, x2, … xn )
    • Return corresponding state variable in source task
  • χA: atarget→asource

    • Similar, but for actions
  • Intuitive mappings exist in some domains (Oracle)

  • Mappings can be learned (e.g., Taylor, Kuhlmann, and Stone (2008))



  • 2D Mountain Car

    • x,
    • Left, Neutral, Right
  • 3D Mountain Car

    • x, y, ,
    • Neutral, West, East, South, North
  • χX

    • x, y → x
    • , →
  • χA

    • Neutral → Neutral
    • West, South → Left
    • East, North → Right










Fitted R-MAX balances:

  • Fitted R-MAX balances:

    • sample complexity
    • computational complexity
    • asymptotic performance










Instance Transfer in Fitted Q Iteration

  • Instance Transfer in Fitted Q Iteration

    • Lazaric et. al, 2008
  • Transferring Regression Model of Transition Function

    • Atkeson and Santamaria, 1997
  • Ordering Prioritized Sweeping via Transfer

    • Sunmola and Wyatt, 2006
  • Bayesian Model Transfer

    • Tanaka and Yamamura, 2003
    • Wilson et. al, 2007


Implement with other model-learning methods

  • Implement with other model-learning methods

    • Dyna-Q
    • R-Max
    • Fitted Q Iteration
  • Guard against U-shaped curve in Fitted R-Max?

  • Examine more complex tasks

    • Can TIMBREL improve performance of real world problems?


Significantly increases speed of learning

  • Significantly increases speed of learning

  • Results suggest less data needed to learn than

  • Transfer performances depends on:

    • Source task and target task similarity
    • Amount of source task data collected


Model Free:

  • Model Free:

    • Value Function [Taylor, Liu, & Stone JMLR-07]
    • Policy [Taylor, Whiteson, & Stone AAMAS-07]
    • Rules [Taylor & Stone, ICML-07]
  • Full Model?



Yüklə 499 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə