Abstract— We develop a simultaneous learning and planning capability for a robot arm to enable the arm to plan and execute planar motions of sliding objects. The friction characteristics and the presence of kinematic constraints, if any, is assumed to be unknown. Using simple physics-based, parametrized models of motion, we enable parameter updates during the course of plan execution, rather than rely on priori pre-processing.

I. I NTRODUCTION We deal with the problem of motion planning for objects on a planar workspace. The motion of the objects is controlled by a robot arm end-effector which can push the objects on the workspace. Moreover, we assume that the planar friction characteristics and kinematic constraints among different objects, if any, are not known a priori and are to be learnt by the planner during the course of execution of the plan. Planar motion planning using pushing control has been studied before for objects with known friction characteristics [1], [2]. Similarly, identification of kinematic constraints, such as the presence of joints, has also been studied previously in the literature [3]. Our work attempts to extend and combine these results. The motivation for our work is providing a planar workspace handling capability to a robot arm, such that the arm can plan and execute planar sliding motions for almost any object without any preprocessing. We use simplified physics-based models of objects for planning, and continuously update (initially unknown) model parameters according to the observations. Because this update is carried out during the execution of the plan itself, it lends a simultaneous learning and planning capability to the robot arm. We use vision-based object identification and tracking. The motion planning is performed using the wellknown RRT algorithm [4]. The hardware setup used for implementing the ideas discussed in this paper is described below.

Fig. 1. The hardware setup: Schunk arm, overhead camera, and illuminating lamp

in Section IV, we discuss motion planning and execution of the motion plan by the robot arm. II. V ISION - BASED O BJECT I DENTIFICATION A. Vision Calibration Vision calibration is necessary to uniquely map each point in the image coordinate axis system to a point in the workspace coordinate axis system. This mapping is linear [5], and hence it can be represented by a matrix A ∈ R3×3 . The matrix A is obtained as follows. Let xk and yk , k = 1, 2, 3, be, respectively, the coordinates of a point Pk in the workspace axes and image axes. Then, by the definition of the transformation matrix A, xk

Ayk ,

k = 1, 2, 3.

def

def

We define X = (x1 , x2 , x3 ) and Y = (y1 , y2 , y3 ) rewrite the preceding set of equations as X = AY . Then it follows that

A. Hardware Setup Figure 1 shows the hardware setup used for all the experiments. The 7-DOF Schunk light weight Arm 3 attached with a spherical end-effector was used to push objects on a flat, uniform table. An overhead camera was used for visionbased object identification (see Section II), and an additional lamp was used for appropriate illumination of the setup. The rest of the paper is organized as follows: in Section II, we describe the object identification and tracking algorithms using images from the overhead camera. In Section IV-B, we describe the theoretical and practical aspects of modeling planar sliding motion. We discuss parametrized models and we discuss algorithms to update the model parameters. Finally,

=

A = XY + , def

T −1

(1)

is the Moore-Penrose pseudowhere Y + = Y T YY inverse of Y . Since we deal with a planar workspace, we may assume that each point Pk has the same vertical coordinate, and consequently we define w p1,x pw pw 2,x 3,x , pw pw X = pw 1,y 2,y 3,y 1 1 1 i p1,x pi2,x pi3,x Y = pi1,y pi2,y pi3,y . 1 1 1

B. Object Identification

(a) Sample image after thresholding (b) Sample image after Canny edge detection

(c) Sample image after fitting bounding boxes: orange box is an error, which is corrected by filtering. Fig. 2.

Steps in vision-based object identification.

After calibrating the image, object identification is performed using the following steps. 1) Thresholding the image: A sample thresholded image of the workspace is shown in Fig. 2(a). 2) Canny edge detection: Figure 2(b) shows the result after edge detection of the image shown in Fig. 2(a). Further filtering (Step 4) of the image removes the non-object contours (shown in red color). These contours do not satisfy the minimum length and breadth requirements. 3) Contour identification: Figure 2(c) shows the result of contour identification on the image shown in Figs 2(a) and 2(b). When bounding boxes are fit onto the contours, the box shown in orange appears to fit. However, this incorrect identification is filtered out by finding the ratio of the area enclosed by the blue contour and to the area enclosed by the orange contour. The orange contour is eliminated if this ratio is lower than a certain threshold value. 4) Filtering: Contours which satisfy the following two conditions are accepted. a) The number of points on the contour is greater than certain threshold value. b) The length and breadth of the contour are greater than a certain threshold value. 5) Fitting bounding boxes: As mentioned before, the bounding boxes that are fitted onto the contours are accepted only if the ratio of the area of the contours to the area of the bounding box is greater than a certain threshold value. We used cuboids for our experiments, and they are projected as rectangles in the images. Figure 3 shows the final output of the object identification algorithm, i.e. the position and orientation of each object. C. Object Tracking

Fig. 3.

Result of vision-based object identification.

The notation used above is the following: for each k = 1, 2, 3, w pw k,x (resp. pk,y ) denotes the x-coordinate (resp. y-coordinate) of the point Pk in the workspace axes, while pik,x (resp. pik,y ) denotes the x-coordinate (resp. y-coordinate) of the point Pk in the image axes. In light of the preceding discussion, the vision calibration procedure is as follows. The robot arm is moved to three suitably chosen distinct points in the workspace. At each of the points, the image coordinates and workspace coordinates are recorded to form the X and Y matrices as defined above. The transformation A between the image coordinate axes and the workspace coordinate axes is then calculated using (1). In our experiments, we obtained a calibration accuracy of 0.5cm. We identified the error resulted due to perspective projection which, in turn, was due to the close proximity of the camera to the workspace.

The object identification algorithm described above returns only a set of the observed objects. Consequently, if the identification is repeated after one or more of the objects is moved, there is no correspondence between the new set of objects with the previous set of objects. While this limitation does not pose a problem if objects of different shapes and/or sizes are used, we did not impose any such restriction on the shapes and/or sizes of the objects. Instead, we devised a tracking algorithm for matching the objects identified in one image with those identified in the next image. Let p1,1 and p1,2 be the original positions of two objects (as computed by the algorithm in Section II-B). Let r1 and r2 be the positions identified in the new image. The matching, i.e., identification of whether r1 ≡ p2,1 and r2 ≡ p2,2 or whether r1 ≡ p2,2 and r2 ≡ p2,1 is performed by choosing the minimum from the set of permutations {kp1,1 − r1 k + kp1,2 − r2 k, kp1,2 − r1 k + kp1,1 − r2 k}. The advantage of minimizing the sum of differences instead of matching each box individually can be seen in Fig. 4. The blue objects are the objects expected by the

Fig. 5. Fig. 4.

Comparison of Coulomb friction and viscous friction.

Illustrative example for vision-based object tracking.

algorithm. The red objects are the objects that the vision actually identifies. If each object were to be matched individually, then both objects would be matched with the object marked “New Box 1”, which is a meaningless result. On the other hand, minimizing the sum of errors as above and restricting the matching algorithm to match each object to exactly one target object, the resultant matching is: “Box 1” → “New Box 1” and “Box 2” → “New Box 2”, which is the desired result. If the number of expected objects differs from the number of expected objects, then we try to match the nearest object and ignore the object that is farthest away. III. M OTION M ODELS One of the most significant improvements of our work over [6] is that we employed physics-based models of object motions for planning. These models depend on certain parameters which the planner updates during the execution of the plan. A. Coulomb Friction In order to calculate the force or moment associated with planar friction, an area integral must be computed. Given a known rotation and translation of the body, each differential element on the body moves with a velocity of (vx , vy ), where vx = −(yi j ω − vGx ), where xi j and yi j are the distances of the differential element from the center of mass in the x and y directions. In a Coulomb friction model, the force due to friction experienced by each differential mass element has a magnitude of µ p, where µ is the coefficient of friction and p is the pressure, in the opposite direction of the velocity of the differential element. The pressure p generally equals gdm/dA where dm is mass of the differential element, g is the acceleration due to gravity, and dA is the contact area of the differential element with the surface. For the purposes of this project, µ and p are constant. The Coulomb friction force components acting on each differential element are thus v µ p y − G,x ω q Ff ,x = , vG,x 2 v 2 y− ω + x + G,y ω v −µ p x + G,y ω Ff ,y = q . vG,x 2 v 2 y− ω + x + G,y ω

The force experienced by the center of mass due to this Coulomb friction force is the same as the force experienced

by the differential element. The moment the Coulomb friction force exerts on the center of mass of the body is the force in the x-direction multiplied by -yi j plus the force in the y-direction multiplied by xi j . The forces and moments experienced by the body are thus v ZZ µ p y − G,x ω q Ff ,x = dxdy, v 2 vG,x 2 y− ω + x + G,y ω v ZZ −µ p x + G,y ω q Ff ,y = dxdy, v 2 vG,x 2 y− ω + x + G,y ω v v ZZ µ p − x2 + G,y x − y2 − G,x y ω ω q dxdy. Mf = v 2 vG,y 2 y − G,x + x + ω ω B. Simplified Friction Models and Learning The Coulomb friction model described in the preceding section provides an accurate description of the forces and moment due to planar friction. However, there are two problems associated with the model which makes it impractical for use in our motion planning scheme: 1) The model is computationally intensive because it involves the calculation of three double integrals. Moreover, the result of these computations is only the first step in the overall motion model: the friction forces and moment are “inputs” to the equations of motion, which, for a single object, are six first order differential equations. Due to the unwieldy formulae of the friction forces and moment, an analytic solution to the equations of motion becomes impossible. 2) The model depends on the knowledge of the pressure distribution p(x, y) along the contact surface. Since we assume no knowledge of the friction characteristics, p is unknown, and in general, it may not be a constant over the entire contact surface. Treating the function p as a characteristic of the object to be learnt is also impractical, as it would necessarily involve learning (in principle) an infinite number of parameters to describe the function. In light of these difficulties with the practical implementation of the Coulomb friction model, we experimented with a simple viscous friction approximation. In particular, we assumed the friction force as a damping force (proportional to the translational velocity, opposite in direction), and the moment due to friction as an independent damping moment (proportional to the angular velocity, opposite in direction).

The equations of motion of a single object [7] can then be written as follows: mx¨ = Fx − α1 x, ˙

(2)

my¨ = Fy − α2 y, ˙ I3 θ¨ = (rP − rC ) × F − α2 θ˙ .

(3) (4)

The notation in the preceding equations was as follows: rC = (x, y) θ m I3

= = = =

F rP

= =

α1 , α2

=

position of the C.M. of the object, orientation of the object, mass of the object, moment of inertia about the vertical axis passing through the C.M., external force applied, position vector of the point of application of the force F, damping coefficients.

The equations of motion (2)–(4) are linear, and if the damping coefficients are high enough, the system stabilizes quickly. In other words, we may ignore the transients caused by the accelerations, since we are interested mainly in the gross pushing action. This gross motion is captured by the quasi-static equations 0 0

= Fx − α1 x, ˙ = Fy − α2 y, ˙

0

∗ = RrPC × F − α2 θ˙ ,

∗ represents the body axis coordinates of (r − r ), where rPC P C and R is the rotation matrix: cos θ − sin θ 0 cos θ 0 . R = sin θ (5) 0 0 1

The preceding equations are simple enough to have analytic solutions for certain cases. In particular, if we assume that the external force is always normal to the edge at which it is applied, and if we assume that the point of application of ∗ is constant), the force is constant (in the body axes, i.e., rPC then the equations of motion may be solved as α2 x(t) = x0 + (cos θ (t) − cos θ0 ) , (6) α1 u α2 y(t) = y0 + (sin θ (t) − sin θ0 ) , (7) α1 u |F|ut θ (t) = θ0 + , (8) α2 when F is parallel to the body y-axis. For simple objects such as rectangular boxes, it is easy to write similar solutions to the equations of motion that are valid for every point (of force application) on the edge of the object. Admittedly, the viscous friction model is not an accurate description of planar friction. However, over short enough time intervals, we found the predictions of the viscous friction model accurate enough for our purposes. In addition to the fact that analytic solutions to the equations of motion are available, the viscous friction assumption is convenient

because the object is characterized by the two damping coefficients α1 and α2 . Since we assume no knowledge of the friction characteristics, we assume that α1 and α2 are unknown initially, and we assume some arbitrarily chosen initial guesses αˆ 1 and αˆ 2 for the same. The parameters may then be updated as follows. We apply a force to the object at a particular point for a particular time period T , whilst ensuring that the force is normal to the edge at that point. Let x(T ˆ ), y(T ˆ ), and θˆ (T ) represent the final position and orientation of the object as calculated from the preceding viscous friction model with parameters αˆ 1 and αˆ 2 . Let xT , yT , and θT be the observed final position and orientation. Then the parameters αˆ 1 and αˆ 2 are updated by calculating M3 kˆ 2 = θT − θˆ (T ) + , (9) αˆ2 αˆ 2 u kˆ 1 = xT − x(T ˆ )+ cos θˆ (T ) − cos θ0 × (10) αˆ 1 kˆ 2 , (11) (cos θT − cos θ0 ) |F| |F|u , αˆ 2 = . ˆk1 kˆ 2 If the applied force were perfectly normal to the edge, then the preceding update law will converge to the “correct” values of α1 and α2 in one iteration. However, to ensure that the applied force is normal to the edge, the robot arm end-effector trajectory must be pre-specified. This trajectory, in turn, is obtained from the (uncertain) motion model itself, and hence the applied force will not be perfectly normal to the edge. Consequently, the preceding update laws are applied over several iterations. In our experiments, we found the values of α1 and α2 to converge in about five to six iterations. and then calculating αˆ 1 =

C. Identifying Joints Any relative movement of two objects will have a potential joint, or a location which both of the objects seen to be rotating about. If a joint exists between two objects this location will always be in the same location relative to both objects, however if the objects are unconnected this point will be highly inconsistent. Finding this point of rotation between two frames takes some kinematic analysis. The algorithm presented here assumes that the visionbased identification system can return the initial and final centers of each object and the initial and final rotations of each object. The initial and final positions of object A are G and rG respectively, and the initial and final denoted as rai af G and rG . Similarly, positions of object B are denoted as rbi bf the initial and final orientations of object A are denoted as θaiG and θaGf respectively, and the initial and final rotations of object B are denoted θbiG and θbGf respectively. The algorithm also assumes all of these values are measured from a constant arbitrary reference frame G. Vector notation follows standard G screw theory notation. An example is rx,bi/ai which is the x-component of a vector from ai to bi with respect to the frame G. The location the center of object B from the center

of object A in the camera reference frame can be calculated using G rx,bi/ai G rx,b f /a f G ry,bi/ai G ry,b f /a f

G G = rx,bi/G − rx,ai/G G G = rx,b f /G − rx,a f /G G G − ry,ai/G = ry,bi/G

=

G G ry,b f /G − ry,a f /G

These relative locations can then be transformed into the coordinate system of object a using G G G G G rx,bi/a = cos θa,i · rx,bi/ai + sin θa,i · ry,bi/ai G G G G G ry,bi/a = − sin θa,i · rx,bi/ai + cos θa,i · ry,bi/ai G G G G G rx,b f /a = cos θa, f · rx,b f /a f + sin θa, f · ry,bi/a f G G G G G ry,b f /a = − sin θa, f · rx,b f /a f + cos θa, f · ry,b f /a f The location of the potential joint relative to object can then be calculated using

a rx,Cb/a

a ry,Cb/a

=

a a rx,b f /a + rx,bi/a

=

a a ry,b f /a + ry,bi/a

2

2

−

−

G ry,b f /bi G G θb,i −θb, f 2 · tan 2 G rx,b f /bi G G θb,i −θb, f 2 · tan 2

The potential joint locations between each combination of objects was then stored in a matrix of the form NaN Rax,Cb/a Rax,Cc/a Rax,Cd/a Rb Rbx,Cc/b Rbx,Cd/b x,Ca/b NaN c · NaN Rcx,Cd/c Rx,Ca/c Rcx,Cb/c Rdx,Ca/d Rdx,Cb/d Rdx,Cc/d NaN NaN Ray,Cb/a Ray,Cc/a Ray,Cd/a Rb b b y,Ca/b NaN Ry,Cc/b Ry,Cd/b c c c Ry,Ca/c Ry,Cb/c NaN Ry,Cd/c Rdy,Ca/d Rdy,Cb/d Rdy,Cc/d NaN for each frame change. The first step when replanning is to analyze these potential joint locations to determine whether joints actually exist or not. First, the variance of each location in the matrix was calculated from the values obtained from all of the matrices. If the squared sum of the variances of the x and y values of that location fell below a preset threshold, which could be fairly large, the mean of that location among all of the matrices was calculated. For example, when analyzing the possibility of a joint between body a and body b, the variance of (1,2,1), (1,2,2), (2,1,1), and (2,1,2) would be found across all of the frame changes. The squares of the variances of (1,2,1) and (1,2,2) would be added together and compared to the threshold value and the squares of the variances of (2,1,1) and (2,1,2) would be added together and compared to the threshold value. If the squared sum were lower than

Fig. 6. The green line is desired end-effector trajectory that the trajectory controller enforces. The red curve is a possible path corresponding to a straight line in the joint space.

the threshold, the mean of (1,2,1) would be recorded in the corresponding location in a new matrix. Likewise for (1,2,2), (2,1,1), and (2,1,2). If the threshold were exceeded, NaN is stored in the corresponding location to indicate that no joint exists between the two objects. This new matrix is used as the model of the system and referenced by the planner when creating a new plan to get the objects from the current positions to the goal positions. IV. M OTION P LANNING AND E XECUTION A. RRT-based Planning To develop motion plans for the robot arm, we use the well known RRT algorithm [4]. Each object is represented by its position (of any one point on the object) and its orientation. Thus, the dimension of the overall configuration space is 3N, where N is the number of objects. For each object, the initial configuration is identified from the vision-based identification system, and the goal configuration is explicitly pre-specified. The execution of the RRT algorithm for this case is briefly summarized as follows. At each iteration the RRT algorithm either expands towards the goal (greedy step), or it expands towards a randomly chosen point in the configuration space. At each iteration, we choose one object to move, while all other objects remain stationary. For expanding the tree, we choose a point for pushing this object using the motion model known at that instant. The duration of pushing is kept constant. We store the final estimated position and orientation as the new node in the RRT. The primary challenge during the execution of the plan is that all the planning was performed using an uncertain model of the object’s motion. At each iteration during the execution, we compare the observed and planned configurations and update the parameters for the objects accordingly (as described in Section III-B. At each iteration, the execution algorithm attempts to reach the next configuration in the plan. If the next configuration in the plan is farther away from the current configuration than a certain threshold value, then we replan from the current configuration (using the RRT algorithm). The overall execution is described below. B. End-effector Trajectory Control The movements of the end-effector are produced by a trajectory controller. Using a trajectory controller has two main advantages: 1) It ensures that the arm moves through the expected workspace positions and will not do incalculable curves during its movement.

Fig. 8.

Fig. 7.

Illustrative example of execution of motion plan.

2) It makes it possible to execute pushing over longer durations. The first advantage ensures the security of the program. Without interpolating intermediate workspace positions, the arm could hit the table while trying to move from one position to its target position. This is particularly a serious issue in the problem of pushing objects, since the pushes happen only a few millimeters over the table surface (see Fig. IV-B). The second advantage makes more sophisticated planning possible. The planned trajectories (along which the endeffector pushes objects) do not have to be straight lines, but be curves (since objects rotate, and we wish to push normal to the edges). The trajectory controller ensures that the planner is not restricted to short pushes, i.e., the end effector can follow the actual movement of the box it is pushing. Our trajectory controller expects as input all the points in the workspace that it should visit. These points are specified as the x,y and z coordinates, the rotation of the end effector and the time at which it this point should be visited. Furthermore it expects as input an inverse kinematics function that converts workspace positions to possible joint space positions. The joint space tracking can then be performed using a simple PD controller. C. Overall Plan Execution The execution of the plans was done in a program loop that executed one push in each iteration. Each iteration is described as follows: 1) Move robot arm away from the workspace to allow the overhead camera to identify objects. 2) Choose an object to push at, and calculate best push trajectory. In particular, we choose the object that is farthest away from its desired position. 3) Move the arm to a secure position over the initial point of the push. 4) Execute the push trajectory. 5) Move robot arm away from the workspace to allow the overhead camera to identify objects.

Sample initial and final configurations.

6) Update model parameters. As mentioned before, our plans are not lists of pushes, rather they are lists of configurations that should be visited. Figure 7 depicts a sample execution of the planner. The blue boxes are the boxes the planner wanted to visit. As it did not know the right parameters of the motion model, the result of the first push results in a configuration quite different from the expected result. In the next push it knows the boxes better. At the same time it compensates the error of the last push by calculating the new push trajectory using the actual position and not the planned position. Therefore the position of the box after the second push is much closer to the intended goal than after the first push. V. R ESULTS Using the planner described above, we were successfully able to move multiple unconnected rectangular objects from any initial configuration to a specified goal configuration. No prior knowledge of the dimensions of the objects or their friction characteristics was required. Figure V shows a sample initial and final configuration with three objects that the arm was able to plan for. VI. C ONCLUSIONS AND F UTURE W ORK We developed a simultaneous learning and planning capability for a Schunk robot arm and enabled the arm to plan and execute planar motions of sliding objects. The friction characteristics and the presence of kinematic constraints, if any, were assumed to be unknown. Future work includes extensive testing of the motion models, update laws, and joint detection algorithm for the case of multiple connected objects. R EFERENCES [1] M. T. Mason, “Mechanics and planning of manipulator pushing operations,” International Journal of Robotics Research, vol. 5, no. 3, pp. 53–71, 1986. [2] M. T. Mason and J. K. J. Salisbury, Robot Hands and the Mechanics of Manipulation. MIT Press, Cambridge, 1985. [3] D. Katz and O. Brock, “Manipulating articulated objects with interactive perception,” in Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, USA, May 2008. [4] S. M. LaValle and J. J. J. Kuffner, “Randomized kinodynamic planning,” International Journal of Robotics Research, vol. 20, no. 4, pp. 378–400, 2001. [5] D. A. Forsyth, Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, 2002. [6] J. Scholz, V. Subramanian, and A. K. Elangovan, “A framework for vision-guided trajectory generation with a 7 dof robot arm,” Dec 2008, documentation for general BibTeX users. [7] J. H. Ginsberg, Advanced Engineering Dynamics, 2nd ed. Cambridge University Press, New York, 1995.