Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux University of Liège

Mini-workshop on Reinforcement Learning Department of Electrical Engineering and Computer Science University of Liège September 29th, 2011

Formalization

The batch mode setting

Lipschitz continuity

The worst that can happen Liège

? UARS Satellite

Given: ●

The batch collection of trajectories



The Lipsthiz continuity assumptions + two constants

The T-stage problem

Any suggestions?

So let us start with the 2-stage case...

The 2-stage problem

First results

Relaxation scheme: trust region

Relaxation scheme: trust region

Relaxation scheme: Lagrangian dual

Relaxation scheme: Lagrangian dual

Relaxation schemes: synthesis

Illustration





Uniformly drawn state-action couples

Illustration

Grid

Average (uniform sampling)

Tons of future works

T-stage problem

Stochastic frameworks

? Exact solution ?

Infinite horizon

Min Max Generalization for Deterministic Batch Mode ...

Sep 29, 2011 - University of Liège. Mini-workshop on Reinforcement Learning. Department of Electrical Engineering and Computer Science. University of ...

732KB Sizes 0 Downloads 257 Views

Recommend Documents

Min Max Generalization for Deterministic Batch Mode ...
Introduction. Page 3. Menu. Introduction. I Direct approach .... International Conference on Agents and Artificial Intelligence (ICAART 2010), 10 pages, Valencia ...

Min Max Generalization for Deterministic Batch Mode ...
Nov 29, 2013 - Formalization. ○. Deterministic dynamics: ○. Deterministic reward function: ○. Fixed initial state: ○. Continuous sate space, finite action space: ○. Return of a sequence of actions: ○. Optimal return: ...

Min Max Generalization for Deterministic Batch Mode ... - Orbi (ULg)
Nov 29, 2013 - One can define the sets of Lipschitz continuous functions ... R. Fonteneau, S.A. Murphy, L. Wehenkel and D. Ernst. Agents and Artificial.

Min Max Generalization for Deterministic Batch Mode ... - Orbi (ULg)
Electrical Engineering and Computer Science Department. University of Liège, Belgium. November, 29th, 2013. Maastricht, The Nederlands ...

Relaxation Schemes for Min Max Generalization in ... - ORBi
Given a two-stage sequence of actions (u0,u1) ∈ U2, the two-stage version of the problem (PT (F,Lf ,Lρ,x0,u0,...,uT −1)) reads as follows: (P2(F,Lf ,Lρ,x0,u0,u1)) ...

Relaxation Schemes for Min Max Generalization in ... - ORBi
finite (discrete) action space U = {u(1),...,u(m)} that we abusively identify with {1,...,m}. T ∈ N \ {0} is referred to as the (finite) optimization horizon. An instantaneous reward rt = ρ (xt,ut) ∈ R is associated with the action ut taken whil

108.84 Min: 0 Max: 1384.67 Min: 0 Max: 1916.72 Min -
0. 30. 60. 90. ϕ1. Φ. Max: 0. 0◦. Min: 0. 0 30 60 90120150180210240270300330360. 0. 30. 60. 90. ϕ1. Φ. Max: 0. 5◦. Min: 0. 0 30 60 ...

MeqTrees Batch Mode: A Short Tutorial - GitHub
tdlconf.profiles is where you save/load options using the buttons at ... Section is the profile name you supply ... around the Python interface (~170 lines of code).

Upward Max Min Fairness - Research at Google
belong to the community and thus should be shared in a fair way among all ..... of flow values of large indices to increases of flow values of ...... Data Networks.

Min-Max Multiway Cut
in a distributed database system or a Peer-to-Peer system. ... files on a network, as well as other problems such as partitioning circuit .... We need to delete.

Batch Mode Adaptive Multiple Instance Learning for ... - IEEE Xplore
positive bags, making it applicable for a variety of computer vision tasks such as action recognition [14], content-based image retrieval [28], text-based image ...

Batch Mode Reinforcement Learning based on the ... - Orbi (ULg)
Dec 10, 2012 - Theoretical Analysis. – Experimental Illustration ... data), marketing optimization (based on customers histories), finance, etc... Batch mode RL.

Recent Advances in Batch Mode Reinforcement Learning - Orbi (ULg)
Nov 3, 2011 - Illustration with p=3, T=4 .... of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010), 2-.

Batch Mode Reinforcement Learning based on the ...
We give in Figure 1 an illustration of one such artificial trajectory. ..... 50 values computed by the MFMC estimator are concisely represented by a boxplot.

Recent Advances in Batch Mode Reinforcement Learning - Orbi (ULg)
Nov 3, 2011 - R. Fonteneau(1), S.A. Murphy(2), L.Wehenkel(1), D. Ernst(1) ... To combine dynamic programming with function approximators (neural.

Contributions to Batch Mode Reinforcement Learning
Feb 24, 2011 - A new approach for computing bounds on the performances of control policies in batch mode RL. ✓ A min max approach to generalization in ...

Batch mode reinforcement learning based on the ...
May 12, 2014 - Proceedings of the Workshop on Active Learning and Experimental Design ... International Conference on Artificial Intelligence and Statistics ...

Batch mode reinforcement learning based on the ...
May 12, 2014 - "Model-free Monte Carlo-like policy evaluation". ... International Conference on Artificial Intelligence and Statistics (AISTATS 2010), JMLR ...

Contributions to Batch Mode Reinforcement Learning
B Computing bounds for kernel–based policy evaluation in reinforcement learning. 171. B.1 Introduction ... a subproblem of reinforcement learning: computing a high-performance policy when the only information ...... to bracket the performance of th

Batch Mode Reinforcement Learning based on the ...
Nov 29, 2012 - Reinforcement Learning (RL) aims at finding a policy maximizing received ... data), marketing optimization (based on customers histories), ...

All-optical integrated ternary MIN and MAX gate
Parallelism is the capability of the system to execute more than one operation .... number of programming input should increased. ▫ We can design ...

Multi-view Face Recognition with Min-Max Modular ... - Springer Link
Departmart of Computer Science and Engineering,. Shanghai Jiao ... we have proposed a min-max modular support vector machines (M3-SVMs) in our previous ...

Many-to-Many Matching with Max-Min Preferences
Nov 12, 2011 - weakly column-efficient matching is also defined in the same way. ... we denote singleton set {x} by x when there is no room for confusion.

BAI-TOAN-MAX-MIN-TOI-UU-2017.pdf
Sign in. Page. 1. /. 28. Loading… Page 1 of 28. Page 1 of 28. Page 2 of 28. Page 2 of 28. Page 3 of 28. Page 3 of 28. BAI-TOAN-MAX-MIN-TOI-UU-2017.pdf. BAI-TOAN-MAX-MIN-TOI-UU-2017.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying BAI-T