Notation for Stochastic Dynamic Programming (Markov Decision Processes, Approximate Dynamic Programming, Reinforcement Learning)

Stages First Stage Final Stage State Space State Action Space Action Policy Transitions Cost Terminal Cost Discount Q-Value (Policy) Q-Value (Optimal) Value (Policy) Value (Optimal) Bellman Operator

Bertsekas k N 0

Sutton and Barto t 1 T

i, ik U (i)

s a π(s, a), π a Pss 0 a Rss0 rT γ Qπ (s, a)

µk (i), π pij (µk (i)) g(i, u, j) G(iN ) α Jkπ (i) Jkπ (i) Jk∗ (i)

π

V (s) V ∗ (s)

Puterman t 1 N S s A = ∪s∈S As a D π, dM (s) t pt (· | s, a) rt (s, a) rN (s) λ

Powell t 1 T S s, St A a π P(s0 | St , at ) Ct (St , at ) VT (ST ) γ

uπt u∗t

Q(S n , a) Vtπ (St ) Vt (St ) M

L, L

T

Optimal Value Function • Bertsekas [2007] Jk∗ = min

n X

u∈U (i)

∗ pij (u) g(i, u, j) + αJk−1 (j)



j=1

• Sutton and Barto [1998] a a ∗ 0 V ∗ (s) = max Pss 0 [Rss0 + γV (s )] a

• Puterman [1994] u∗t (st ) = max

a∈Ast

 

rt (st , a) +



X

pt (j | st , a)u∗t+1 (j)

  

j∈S

• Powell [2011] ( Vt (St ) = max Ct (St , at ) + γ at

) X

0

0

P(s | St , at )Vt+1 (s )

s0 ∈S

References D.P. Bertsekas. Dynamic Programming and Optimal Control. Number v. 2 in Athena Scientific Optimization and Computation Series. Athena Scientific, 2007. ISBN 9781886529304. URL http://books.google.com/books?id=eL01YAAACAAJ. W.B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics. John Wiley & Sons, 2011. ISBN 9781118029152. URL http://books.google.com/books?id=VBuZhne7pmwC. M.L. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics. Wiley-Interscience, 1994. ISBN 9780471727828. URL http://books.google.com/books?id=Y-gmAQAAIAAJ. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. Mit Press, 1998. ISBN 9780262193986. URL http://books.google.com/books?id=CAFR6IBF4xYC.

Tim Hopper – [email protected] – StiglerDiet.com

References - GitHub

Policy. µk(i), π π(s, a), π π, dMD t. (s) π. Transitions pij(µk(i)). Pa ss pt(·| s, a). P(s | St,at). Cost g(i, u, j). Ra ss rt(s, a). Ct(St,at). Terminal Cost. G(iN ). rT. rN (s).

154KB Sizes 2 Downloads 385 Views

Recommend Documents

References
[4] N. Arora and D. Kumar, “System analysis and maintenance manage- ... 2Department of Mechanical and Industrial Engineering, Email: [email protected].

References
sole crop (Cl), maize + soy bean at 2:1 (C2) and maize 4- ... C1 78.67 111.33 125.8 340.2 A: -8 4445 1320 1050 ... cv ADT 36 under transplanted conditions .

References - Research at Google
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...

Sardanashvily's encyclopedic references
field theory leads to an infinite-dimensional phase space, when canonical variables are values of fields in any given instant. It fails to be a partner of Lagrangian formalism of classical field theory. The Hamilton equations on such a phase space ar

summary skills experience references
award-winning websites at companies such as Hasbro,. Nvidia, and The ... apps such as a reusable front-end cropping tool, visual image editor, and form ...

references concluding remarks model description ... - GravesLab
[2-5] In order to investigate the short- and long-term antimicrobial ... chemotaxis. • Fibroblasts: f. 1. 2. 6. 17. 2. (. ) 1. (1 )(1 ). (1 ) f f f fc f f f a c c e a k k. D t κ-. ⎛. ⎞.

references concluding remarks model description ... - GravesLab
and G. E. Morfill. 2. 1. Department of Chemical and Biomolecular Engineering, University of California, Berkeley, USA. 2. Max Planck Institute for Extraterrestrial ...

References Results Conclusions Data Methodology
of relevance in the training set on the performance of ... set, validation and testing set. • select a total number of ... ~30,000 web queries. • 136 features for each ...

electric chiller job references
SUKRUCHA COMPANY 599/34 SOI LADPRAO1/1, JOMPOL, CHATUCHAK, LADPRAO ROAD, BANGKOK 10900. Tel: (+662) 9189596. Fax: (+662) 9189596 ...

Download Ugly's Electrical References, 2011 Edition Full Pages
Ugly's Electrical References, 2011 Edition Download at => https://pdfkulonline13e1.blogspot.com/0763790990 Ugly's Electrical References, 2011 Edition pdf download, Ugly's Electrical References, 2011 Edition audiobook download, Ugly's Electrical R

CS401-Midterm Solved Mcqs With References By Muhammad Khan ...
Page 1 of 29. vustudypakpattan.blogspot.com. Solved by Muhammad Khan. 0305-9892117. CS401 Assembly Language. Solved MCQS. From Midterm Papers. Muhammad Khan 0305-9892117. MIDTERM FALL 2011. CS401 Assembly Language. Question No:1 ( Marks: 1 ) - Please