References

Viewer
Transcript

Notation for Stochastic Dynamic Programming (Markov Decision Processes, Approximate Dynamic Programming, Reinforcement Learning)

Stages First Stage Final Stage State Space State Action Space Action Policy Transitions Cost Terminal Cost Discount Q-Value (Policy) Q-Value (Optimal) Value (Policy) Value (Optimal) Bellman Operator

Bertsekas k N 0

Sutton and Barto t 1 T

i, ik U (i)

s a π(s, a), π a Pss 0 a Rss0 rT γ Qπ (s, a)

µk (i), π pij (µk (i)) g(i, u, j) G(iN ) α Jkπ (i) Jkπ (i) Jk∗ (i)

π

V (s) V ∗ (s)

Puterman t 1 N S s A = ∪s∈S As a D π, dM (s) t pt (· | s, a) rt (s, a) rN (s) λ

Powell t 1 T S s, St A a π P(s0 | St , at ) Ct (St , at ) VT (ST ) γ

uπt u∗t

Q(S n , a) Vtπ (St ) Vt (St ) M

L, L

T

Optimal Value Function • Bertsekas [2007] Jk∗ = min

n X

u∈U (i)

∗ pij (u) g(i, u, j) + αJk−1 (j)

j=1

• Sutton and Barto [1998] a a ∗ 0 V ∗ (s) = max Pss 0 [Rss0 + γV (s )] a

• Puterman [1994] u∗t (st ) = max

a∈Ast

 

rt (st , a) +



X

pt (j | st , a)u∗t+1 (j)

  

j∈S

• Powell [2011] ( Vt (St ) = max Ct (St , at ) + γ at

) X

0

0

P(s | St , at )Vt+1 (s )

s0 ∈S

References D.P. Bertsekas. Dynamic Programming and Optimal Control. Number v. 2 in Athena Scientific Optimization and Computation Series. Athena Scientific, 2007. ISBN 9781886529304. URL http://books.google.com/books?id=eL01YAAACAAJ. W.B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics. John Wiley & Sons, 2011. ISBN 9781118029152. URL http://books.google.com/books?id=VBuZhne7pmwC. M.L. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics. Wiley-Interscience, 1994. ISBN 9780471727828. URL http://books.google.com/books?id=Y-gmAQAAIAAJ. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. Mit Press, 1998. ISBN 9780262193986. URL http://books.google.com/books?id=CAFR6IBF4xYC.

Tim Hopper – [email protected] – StiglerDiet.com

References - GitHub

Policy. Âµk(i), Ï Ï(s, a), Ï Ï, dMD t. (s) Ï. Transitions pij(Âµk(i)). Pa ss pt(Â·| s, a). P(s | St,at). Cost g(i, u, j). Ra ss rt(s, a). Ct(St,at). Terminal Cost. G(iN ). rT. rN (s).

Download PDF

154KB Sizes 2 Downloads 385 Views

Report

References

References - Research at Google

Sardanashvily's encyclopedic references

summary skills experience references

references concluding remarks model description ... - GravesLab

references concluding remarks model description ... - GravesLab

References Results Conclusions Data Methodology

electric chiller job references

Download Ugly's Electrical References, 2011 Edition Full Pages

CS401-Midterm Solved Mcqs With References By Muhammad Khan ...

References - GitHub

References

References

References - Research at Google

Sardanashvily's encyclopedic references

summary skills experience references

references concluding remarks model description ... - GravesLab

references concluding remarks model description ... - GravesLab

References Results Conclusions Data Methodology

electric chiller job references

Download Ugly's Electrical References, 2011 Edition Full Pages

CS401-Midterm Solved Mcqs With References By Muhammad Khan ...

References - GitHub

Recommend Documents