A Developmental Approach to Learning Affordances

Viewer
Transcript

A Developmental Approach to Learning Affordances

Priyatosh Mishra Department of Computer Science & Engineering IIT Madras Chennai, India [email protected] Balaraman Ravindran Department of Computer Science & Engineering IIT Madras Chennai, India [email protected]

Abstract Originating in the psychology literature, the concept of affordances has found wide use in the area of autonomous robotics. Viewed as relations between the agent’s action choices and their effects on the state of the environment, affordances allow agents to represent and reason about their interactions with the environment. In designing procedures which allow agents to learn these affordances by directly interacting with objects in the environment, researchers aim to tackle the challenge of uncertainty posed by the occurrence of novel objects and situations encountered in complex, real-world domains. In this paper, we present a new approach to learning affordances. The proposed approach distinguishes itself from existing methods by emphasising an incremental and developmental approach to the learning of affordances tied directly to the task for which affordances are being learned, i.e., for problem solving in the agent’s operating environment. By tightly coupling the learning of affordances with the planning component, and by generally restricting learning to those affordances which are immediately relevant for achieving current goals, the proposed approach aims to allow agents to efficiently tackle problems which may be beyond current capabilities while at the same time allowing for continual accumulation of newer and more advanced skills.

1

Introduction

Impressive advances in different areas of artificial intelligence such as machine learning, computer vision, automated planning, etc., have not lead to the kind of advances in robotics that might reasonably have been expected. This is especially true of robots designed for autonomous operation in real-world environments. Among the many issues that make problems in this area especially challenging is the issue of coping with the uncertainty presented by the requirement to often interact with completely novel objects, or operate in unfamiliar situations. For autonomous robots designed to operate in real-world environments such as homes or offices, the ability to handle novel objects/situations is an indicator of the robustness of such agents, and is an important factor in their suitability to operate in such environments. Given that it is infeasible to train for all possible scenarios that may be encountered, some thought needs to be given to the manner in which robots are designed to respond in such situations. Techniques such as imitation learning [Schaal, 1999] or learning from human instructions can be considered as potential approaches, assuming appropriate communication capabilities, as well as the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

availability of humans in the operating environment willing to provide the necessary supervision. An alternate approach would be for the robot to directly interact with the the environment in an unsupervised manner and use its existing knowledge to learn how to behave in new situations. Such autonomous capabilities will allow robots to have the ability to tackle problems which are within their capabilities, but for which they have not been specifically trained. For example, one can imagine a humanoid robot being able to open a door which has a handle that the robot has never encountered before, or the robot learning to use a bowl to carry water instead of a cup with which it had been trained, but which is currently not present in the environment. The desirability of autonomous learning capabilities is also evident when we view the problem from the developmental learning perspective, where robots to be trained to operate in similar complex environments start with knowledge of only a primitive set of behaviours. Arguments have been made [Weng et al., 2001] that a developmental approach which is not task-specific but rather an open-ended and cumulative process is required for this task. An important aspect of such an approach is for the agents to have the capability to generate representations for knowledge and skills that are acquired during the development process. This latter aspect has been addressed in the area of autonomous robotics by utilising the concept of affordances [Gibson, 1979]. Originally introduced in the context of ecological psychology to indicate the action possibilities that objects in the environment offer to the organism, the term has been used in autonomous robotics to capture relations between the agent’s action choices and their effects on the state of the environment [Sahin ¸ et al., 2007]. Making use of this idea, recent studies (for example, [Stoytchev, 2005, U˘gur et al., 2007, Montesano et al., 2008]) have focussed on designing procedures which allow agents to learn affordances by directly interacting with objects in the environment. Such procedures are significant in that they allow robots to effectively handle novel situations by autonomously learning appropriate new skills. In the next section, we briefly discuss some of these studies and propose some desirable extensions. In section 3, we propose a new approach to learning affordances which attempts to incorporate these desired extensions.

2

Related work

The concept of affordances has been influential in the study of autonomous robotics. In particular, many studies have focussed on the problem of affordance learning for the purpose of autonomous robot control. These include for tasks such as traversal and obstacle avoidance (for example, [U˘gur et al., 2007]), grasping (for example, [Montesano et al., 2008]), and various forms of object manipulation (for example, [Stoytchev, 2005, U˘gur et al., 2011]). In this context, a variety of approaches have been used in the representation and learning of affordances. For example, in [U˘gur et al., 2007] an initial period of data collection takes place in which samples in the form of (effect, (entity, behaviour)) tuples are recorded, where entity encapsulates the state, behaviour denotes the robot behaviour applied, and effect captures the resultant change in state. Using data of this form, predictive models are learned which allow the agent to predict the effect of a behaviour carried out in a particular state. Extensions to this work [U˘gur et al., 2011] also describe the use of forward-chaining for planning with the use of similar predictive models. Similarly, in [Stoytchev, 2005] an initial data collection period allows for the population of an affordance table which is subsequently made use of for control using a simple rule based approach. Without going into the details of other approaches, we present three extensions to affordance learning schemes which aim to emphasise the problem solving aspect of these schemes. First is the use of a developmental approach to learning which will allow agents to incrementally acquire skills of increasing complexity. Second is the use of sophisticated planners in conjunction with the learning schemes to fully utilise learned skills in solving non-trivial problems. The final extension deals with efficiency in learning, both, in terms of interaction costs for learning individual affordances, as well as in the choice of affordances to learn when the need arises during problem solving.

3 3.1

Proposed approach Experimental framework

With the above extensions in mind, we propose a new approach for learning affordances. Our approach to the affordance learning problem is closely tied to the experimental framework illustrated 2

Figure 1: The experimental framework within which problem solving and affordance learning occurs. in Figure 1. Given an initial state (perhaps sensed by the agent) and a goal, the planning component attempts to generate a plan to achieve that goal. Successful generation of a plan leads to an attempt to execute that plan (which may require several iterations of replanning and execution). This is the ideal use case in which the agent is given a task to perform and is able to do so using existing capabilities. The more interesting use case is the one where planning fails. This is interpreted as indicating the absence of suitable planning operators. It results in the initiation of an exploratory phase in which the agent tries to learn affordances from which operators can be built, such that, when added to the planning system, will allow planning to proceed beyond the point of previous failure. In case no affordance can be learned in the environment, execution terminates in failure. The above experimental framework illustrates our emphasis on the extensions discussed in the previous section. To start with, integration of the planning component with the affordance learning scheme illustrates the focus on problem solving. Next, by appropriately sequencing tasks, this framework allows us to present challenges to the agent in a manner which leads to an incremental accumulation of skills. Finally, and as will be elaborated on further, when affordance learning is initiated, the aim is to learn only specific affordances that may allow planning to progress. Thus, even in an affordance-rich environment, the agent tries to learn only relevant affordances, ensuring efficiency in problem solving. We now briefly describe each of the three individual components: planning, affordance learning, and action model learning. 3.2

Planning

To be able to solve non-trivial problems, we would like to make use of state-of-the-art planning algorithms. From our discussion on the experimental framework, it should be clear that any domainindependent planner can potentially be used since our setup does not require any modification to the normal execution flow of typical algorithms. The only additional requirement that is desirable is for the planning algorithm to be able to indicate reasons for plan failure. This capability will allow the affordance learning component to focus its efforts on learning relevant affordances. To identify reasons for plan failure, we aim to make use of heuristics, specific to the planning algorithm used. As an illustrative example, consider the use of backward state space planning and a task where a robot is required to move an object from one room to another. Assuming the planner does not have an operator to open doors, it would be useful for the planner to be able to indicate that the literal whose value the affordance learning scheme should learn to affect is open(D1), where we assume D1 indicates the door, and the value of open(D1) indicates whether door D1 is open or not. In the context of backward state space planning, basic information regarding plan failure can be provided by passing on the set of literals that were present in the final (unfulfilled) goals along all the different paths of the search tree. Clearly, this set will contain the literal of interest, open(D1). However, to be more informative, we can make use of heuristics which will try to identify important 3

literals. For example, a simple heuristic is to compare each literal with the effects of operators currently available to the planner. In our example, no operator can be grounded to have as an effect open(D1), and hence this literal can be considered as an important factor responsible for plan failure. 3.3

Affordance learning

Our affordance learning scheme comprises of two distinct components. The first component is used to simply identify the object which the agent should focus on. We propose to model this task as a contextual bandit problem [Sutton and Barto, 1998] where the context corresponds to the literal (in our example) that the planner indicates is most crucial. The arms of the bandit task are the different objects which are currently accessible and which the robot can interact with. The reward for the bandit depends upon the subsequent success or failure of being able to learn a relevant affordance. One of the benefits of modelling object selection using a contextual bandit is that we can consider or ignore arms depending upon the availability of corresponding objects in the environment. Additionally, this approach allows us to intelligently initialise arms corresponding to new objects by considering their similarity with existing objects. The second component of the affordance learning scheme involves learning a behaviour policy which, when applied on the selected object, results in a change in the value of the indicated literal. We propose to model this as a full Reinforcement Learning (RL) problem [Sutton and Barto, 1998] with a positive reward being received when the value of the indicated literal changes. Specifically, we intend to use Intrinsically Motivated Reinforcement Learning (IMRL) [Barto et al., 2004, Chentanez et al., 2004] for this task. In this approach we are essentially learning options [Sutton et al., 1999], which are temporally extended actions, and can be considered as representing the learned affordances. The utility of using IMRL is that it allows us to overcome the issue of scarcity of rewards, arising in our setup due to the specification of a single positive reward during each learning instance. In IMRL, learning can be sped up by the presence of intrinsic rewards. This is especially the case when option discovery methods (for example, McGovern and Barto [2001], Mannor et al. [2004]) are used during IMRL execution to identify other useful options to learn. 3.4

Action model learning

The final component makes use of the information obtained in the process of learning an affordance and formalises it as an operator which the planning algorithm can make use of. This is known as operator or action model learning in the literature (for example, [Wang, 1995, Walsh and Littman, 2008]), where the traditional motivation has been to resolve issues of inconsistent/incomplete specification of planning operators. The utility of this component depends upon the planner being used. If we consider our example backward state space algorithm, then it is required to build classical planning operators from the information generated in the previous component. There exist online action model ˇ learning algorithms (for example, [Certick` y, 2014]) which can learn the preconditions and effects of such classical planning operators. In contrast, if MDP planning techniques are used, then the options learned in the affordance learning component may directly be used for planning as well.

4

Conclusion

In this paper we have proposed a new approach to the problem of affordance learning. This approach makes use of RL techniques allowing it to learn affordances in an efficient and incremental fashion. By tightly coupling affordance learning and planning, we emphasise the problem solving aspect of the system. The ability to use the latest domain-independent planning algorithm allows for tackling non-trivial problems, whereas communication with the planner allows the affordance learning scheme to focus on learning relevant affordances in dealing with novel situations. Acknowledgments This work was supported by the Centre for Artificial Intelligence and Robotics, Defence Research and Development Organization, Government of India. 4

References Andrew G. Barto, Satinder P. Singh, and Nuttapong Chentanez. Intrinsically motivated learning of hierarchical collections of skills. In Proc. 3rd International Conference on Development and Learning, pages 112–119, 2004. ˇ Michal Certick` y. Real-time action model learning with online algorithm 3sg. Applied Artificial Intelligence, 28(7):690–711, 2014. Nuttapong Chentanez, Andrew G. Barto, and Satinder P. Singh. Intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems, pages 1281–1288, 2004. James J. Gibson. The Ecological Approach to Visual Perception. Houghton Mifflin, Boston, 1979. Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the 21st International Conference on Machine Learning. ACM, 2004. Amy McGovern and Andrew G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning, pages 361–369, 2001. Luis Montesano, Manuel Lopes, Alexandre Bernardino, and José Santos-Victor. Learning object affordances: From sensory–motor coordination to imitation. IEEE Transactions on Robotics, 24 (1):15–26, 2008. Erol Sahin, ¸ Maya Cakmak, Mehmet R. Do˘gar, Emre U˘gur, and Göktürk Üçoluk. To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adaptive Behavior, 15(4):447–472, 2007. Stefan Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3 (6):233–242, 1999. Alexander Stoytchev. Behavior-grounded representation of tool affordances. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2005, pages 3060–3065, 2005. Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998. Richard S. Sutton, Doina Precup, and Satinder P. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181–211, 1999. Emre U˘gur, Mehmet R. Do˘gar, Maya Cakmak, and Erol Sahin. ¸ The learning and use of traversability affordance using range images on a mobile robot. In IEEE International Conference on Robotics and Automation, 2007, pages 1721–1726. IEEE, 2007. Emre U˘gur, Erol Sahin, ¸ and Erhan Oztop. Unsupervised learning of object affordances for planning in a mobile manipulation platform. In IEEE International Conference on Robotics and Automation (ICRA), 2011, pages 4312–4317. IEEE, 2011. Thomas J. Walsh and Michael L. Littman. Efficient learning of action schemas and web-service descriptions. In AAAI, pages 714–719, 2008. Xuemei Wang. Learning by observation and practice: An incremental approach for planning operator acquisition. In Proceedings of the 12th International Conference on Machine Learning, pages 549–557, 1995. Juyang Weng, James McClelland, Alex Pentland, Olaf Sporns, Ida Stockman, Mriganka Sur, and Esther Thelen. Autonomous mental development by robots and animals. Science, 291(5504): 599–600, 2001.

5

A Developmental Approach to Learning Affordances

Endophenotype Approach to Developmental ... - Springer Link

Endophenotype Approach to Developmental ...

Affordances for practice

$pdf-1449\group-counseling-a-developmental-approach-by-george ...$

pdf-1449\group-counseling-a-developmental-approach-by-george ...

Learning of Tool Affordances for Autonomous Tool ...

Using developmental trajectories to understand developmental ...

A Data Mining Approach To Rapidly Learning Traveler ...

a machine learning approach to recognizing acronyms ...

A Machine-Learning Approach to Discovering ... - Prem Melville

Problem-Based Learning: A Concrete Approach to ...

A Machine-Learning Approach to Discovering ... - Semantic Scholar

Contemplative Practices and Learning: A Holistic Approach to ...