Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux University of Liège

Mini-workshop on Reinforcement Learning Department of Electrical Engineering and Computer Science University of Liège September 29th, 2011

Formalization

The batch mode setting

Lipschitz continuity

The worst that can happen Liège

? UARS Satellite

Given: ●

The batch collection of trajectories



The Lipsthiz continuity assumptions + two constants

The T-stage problem

Any suggestions?

So let us start with the 2-stage case...

The 2-stage problem

First results

Relaxation scheme: trust region

Relaxation scheme: trust region

Relaxation scheme: Lagrangian dual

Relaxation scheme: Lagrangian dual

Relaxation schemes: synthesis

Illustration





Uniformly drawn state-action couples

Illustration

Grid

Average (uniform sampling)

Tons of future works

T-stage problem

Stochastic frameworks

? Exact solution ?

Infinite horizon

Min Max Generalization for Deterministic Batch Mode ...

Sep 29, 2011 - University of Liège. Mini-workshop on Reinforcement Learning. Department of Electrical Engineering and Computer Science. University of ...

732KB Sizes 0 Downloads 113 Views

Recommend Documents

Min Max Generalization for Deterministic Batch Mode ... - Orbi (ULg)
Nov 29, 2013 - One can define the sets of Lipschitz continuous functions ... R. Fonteneau, S.A. Murphy, L. Wehenkel and D. Ernst. Agents and Artificial.

MeqTrees Batch Mode: A Short Tutorial - GitHub
tdlconf.profiles is where you save/load options using the buttons at ... Section is the profile name you supply ... around the Python interface (~170 lines of code).

Min-Max Multiway Cut
in a distributed database system or a Peer-to-Peer system. ... files on a network, as well as other problems such as partitioning circuit .... We need to delete.

Upward Max Min Fairness - Research at Google
belong to the community and thus should be shared in a fair way among all ..... of flow values of large indices to increases of flow values of ...... Data Networks.

Batch Mode Reinforcement Learning based on the ...
We give in Figure 1 an illustration of one such artificial trajectory. ..... 50 values computed by the MFMC estimator are concisely represented by a boxplot.

Recent Advances in Batch Mode Reinforcement Learning - Orbi (ULg)
Nov 3, 2011 - R. Fonteneau(1), S.A. Murphy(2), L.Wehenkel(1), D. Ernst(1) ... To combine dynamic programming with function approximators (neural.

Recent Advances in Batch Mode Reinforcement Learning - Orbi (ULg)
Nov 3, 2011 - Illustration with p=3, T=4 .... of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010), 2-.

Many-to-Many Matching with Max-Min Preferences
Nov 12, 2011 - weakly column-efficient matching is also defined in the same way. ... we denote singleton set {x} by x when there is no room for confusion.

All-optical integrated ternary MIN and MAX gate
Email: [email protected] 19th West Bengal State Science & Technology .... Bulk optical PBLU. P. Q. R. S. B. B. AB. 1. 2. 3. A. A. W. X. Y. Z. O1. O2. 4. 5.