A Hybrid Learning Strategy for Discovery of Policies of Action
Pontifical Catholic University of Paraná – PUCPR
Curitiba – PR, Brazil
Graduate Program in Computer Science - PPGIA
A Hybrid Learning Strategy for
Discovery of Policies of Action
R. Ribeiro, A. L. Koerich and F. Enembreck
XVIII Brazilian Artificial Intelligence Symposium (SBIA 2006),
Ribeirão Preto, SP, Brazil, October 2006
Motivation & Challenge;
– Adaptive Autonomous Agents;
– Reinforcement Learning;
– Q-Learning Algorithm;
– Policy Estimation Techniques based on Instance-Based Learning;
– Evaluation Methodology;
– Hybrid Learning Method;
Conclusion & Future Work.
Pontifical Catholic University of Paraná - PUCPR 2
Motivation & Challenge
Discovery and Evaluation of Policies of Action;
Generic Evaluation Methodology;
Hybrid Learning Method.
Pontifical Catholic University of Paraná - PUCPR 3
ADAPTIVE AUTONOMOUS AGENTS:
– Finding an action policies autonomously;
– Incremental learning based in reward/punishments;
– Learning through of trial/error interactions with an
– Convergence for an optimal policy visiting all states of the
Pontifical Catholic University of Paraná - PUCPR 4
Foundations of Reinforcement Learning:
– Environment, action policies and reward.
Sensing (s) Rewards/
Pontifical Catholic University of Paraná - PUCPR 5
Example of learning
EXAMPLE (Problem proposed):
(a) Set up of States b) Without Learning (c) Intermediate Policies
(d) 1000 steps (e) 1500 steps (f)Optimal Policy
Pontifical Catholic University of Paraná - PUCPR 6
– Different domains;
– Quality measures are often specific (kilometers,
money, force, energy, etc);
– Different ways of evaluation the same problem (n. of
steps, n. of changes of actions, processing time).
Pontifical Catholic University of Paraná - PUCPR 7
Generic Evaluation Methodology of Policies of Action;
Hybrid Learning Method;
Pontifical Catholic University of Paraná - PUCPR 8
Pontifical Catholic University of Paraná - PUCPR 9
1 Initiating Correct=0, Wrong=0, CostP=0, CostA*=0;
2 For each s ∈ S:
CostP = cost(s, s_goal, P);
CostA*= cost(s, s_goal, PA*);
- Related pdf books
- Zeta invariants for Dirichlet series - ICMC-USP - São Carlos ...
- ON LOCAL DIFFEOMORPHISMS OF Rn THAT ARE INJECTIVE
- Surfaces in R4 and their projections to 3-spaces
- On the geometry of the cross-cap in the Minkoswki 3-space
- A Base de Dados Lexical e a Interface Web do TeP 2.0 – Thesaurus ...
- + ~f'b) - ICMC-USP - São Carlos | Instituto de Ciências ...
- 1 O que e Algebra
- RESULTADO DA PROVA DISSERTATIVA - ICMC-USP - São Carlos ...
- Analise Funcional II
- 570 Cita¸oes em Trabalhos de Pesquisa
- Revisão sobre Matrizes
- SMA 0330 – COMP. DE GEO E VETORES - ICMC-USP - São Carlos ...
- Programação Web com Jsp, Servlets e J2EE
- Sumarização Automática de Textos Científicos Estudo de Caso
- Popular epubs
- CANARIE AND CUCCIO ENABLE RESEARCH , DISCOVERY AND LEARNING
- Hybrid Systems Modeling in Learning Science and Technology
- Galisteo Creek Watershed Restoration Action Strategy
- Residential Environment Action Strategy
- Summary of WDR Forum, “The 70:20:10 Learning Strategy Debate ...
- Complex Behavioral Strategy and Reversal Learning in the Water ...
- Community Services - Student Learning and Assessment Strategy
- Learning from the Korean Green IT Strategy
- Learning and Teaching Strategy 2008/9
- e-business Models: Integrating Learning from Strategy ...
- Assistente de currículo
- Study of Perceptual Similarity between Different Lexicons