Goal Region Inverse Optimal Control To Predict Human Reaching Motion in Shared Workspaces Jim Mainprice1 , Rafi Hayne2 , Dmitry Berenson2 1
{
[email protected] ,
[email protected],
[email protected]} Max-Planck-Institute for Intelligent Systems, Autonomous Motion Department, Paul-Ehrlich-Str. 15, 72076 Tbingen, Germany 2 Robotics Engineering Program, Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609.
A great deal of work in the fields of neuroscience and biomechanics has sought to model the principles underlying human motion. However, human motion in environments with obstacles has been difficult to characterize. Furthermore, human motion in collaborative tasks where two humans share a workspace is difficult to model due to unclear social, interference, and comfort criteria. Recently some works have proposed to use Inverse Optimal Control (IOC) to uncover the optimality principles guiding human motion [1], [2], [3]. Here, we present an algorithm able to perform IOC for reaching motions in cluttered and dynamic environments. We also present experimental results suggesting that the learned behavior allows to predict human motions when collaborating with a human partner or with a robot. I. G OAL SET ALGORITHM To learn cost functions that allow planning towards a task-space goal set, i.e., where the end configuration qN is not specified, we introduced an aglorithm in [4] based on PIIRL [5] that we call Goalset-PIIRL. The algorithm samples trajectories with high smoothness around each demonstration in order to estimate the partition function. The sampling distribution is defined using Multivariate Gaussians N (ξd , Σ = σR−1 ), centered at each demonstration ξd , where R = K T K, and K is a matrix of finite differences that computes time derivatives of configurations T along the trajectory (i.e., for accelerations, q¨1 . . . q¨N = Kξd ). Goalset-PIIRL adds a step which projects the trajectory samples to the constraint manifold. This step ensures that the samples are projected with respect to the metric R defining the trajectory-samples’ smoothness. II. R ESULTS We applied our algorithm to data gathered from two humans performing pick-and-place tasks in close proximity (see Figure 1). We have performed similarity analyses between the demonstrations and the trajectories obtained from planning using our learned model Dynamic Time Warping (DTW) and spectral measures. We found that we were able to capture a cost function outperforming baseline methods in terms of generalizing to unseen reaching examples. Table 2 shows DTW results of using our learn cost function (green) to predict human motion when collaborating with a robot, compared to hand tuned cost functions (yellow and blue). This work is supported in part by the Office of Naval Research under Grant N00014-13-1-0735 and by the National Science Foundation under Grant IIS-1317462.
Fig. 1. Shared workspace assembly experiment (left), goal region sampled in the Inverse Optimal Control phase (right).
Fig. 2. Distributions of task-space DTW scores for the human robot experiment of two hand tuned cost functions (yellow and blue) and the learned one (green). Each distribution corresponds to a subsequent run of the manipulation task comprising 18 elementary motions. These are average over 15 pairs of participant.
R EFERENCES [1] S. Albrecht et al., “Imitating human reaching motions using physically inspired optimization principles,” in Humanoid Robots (Humanoids), 11th IEEE-RAS International Conference on, 2011. [2] B. Berret, E. Chiovetto, F. Nori, and T. Pozzo, “Evidence for composite cost functions in arm movement planning: an inverse optimal control approach,” PLoS Comput Biol, vol. 7, no. 10, 2011. [3] N. Sylla, V. Bonnet, G. Venture, N. Armande, and P. Fraisse, “Assessing neuromuscular mechanisms in human-exoskeleton interaction,” in IEEE-EMBC, 2014. [4] J. Mainprice, R. Hayne, and D. Berenson, “Goal set inverse optimal control and iterative re-planning for predicting human reaching motions in shared workspaces,” IEEE Transactions on Robotics, 2016. [5] M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation,” in ICRA, 2013.