Surgeon pose detection with a depth camera during laparoscopic surgeries K. Buys1 , D. Van Deun2 , B. Van Cleynenbreugel3 , T. Tuytelaars4 , and J. De Schutter1 1
Dep. of Mechanical Engineering, Robotics research group, KU Leuven, Belgium 2 Dep. of Mechanical Engineering, Biomechanics section, KU Leuven, Belgium 3 University Hospital Leuven, KU Leuven, Belgium 4 Dep. of Electrical Engineering, KU Leuven, Belgium
Abstract This paper presents a method to measure the surgeon’s body pose during a laparoscopic operation. The method handles the extreme situations of a darkened operation room in which people are all similarly clothed. This is achieved by enhancing the existing random decision forest approach with a new energy function in the region growing step. A switching behavior between the old energy function and the new one was defined. A proof-of-concept experiment provided successful early results. Keywords: Posture and motion, Ergonomics, RGB-D data
During laparoscopic surgeries the surgeon enters the patient abdomen through a number of small incisions. The surgeon can only see inside the patient by means of a video camera that enters the abdomen while its images are shown on a large screen above the patient and in front of the surgeon. In order to reduce fatigue and to keep focused on the computer screen for multiple hours, surgeries often occur in complete darkness. In addition, darkness is needed to have a correct color balance on the screen in to make correct color-based diagnoses. Darkness also reduces other stimuli around the screen that could cause distraction. We present a method to measure the surgeon’s (and other staff’s) body pose during a laparoscopic operation. This can be useful for different applications. One application is to evaluate the physical and cognitive ergonomics of all present personnel during long operations. As
the instruments are on a fixed position around the trocar, it is mainly the surgeon who’s has to maintain a fixed pose for multiple hours, which can be a burdensome task. Another application is to recognize the surgeon’s gestures and to ensure safety in a dark environment where multiple people are walking around in a small room with sharp instruments like scalpels, needles, ... Recognizing the surgeon’s gestures can be used to detect the phases of the surgery and, in the future, to aid assistive technologies, as in . The first challenge is the complete darkness which makes that typical RGB cameras provide no valuable information and we can only rely on depth information (taken from a Primesense RGB-D camera ). Even when the backlight of the computer screen provides sufficient illumination, the operation room is occupied with people and instruments all surrounded by the same blue or green cloth, making segmentation a difficult problem.
Corresponding author: Koen Buys Email: buys dot koen (at) gmail dot com
Koen Buys, Surgeon pose detection available like the LapSim, MIST-VR and Xitact 500LS. To evaluate the surgeon’s body pose during an operation often a visual evaluation [11, 12] is used where the evaluator takes notes on a document which is preformatted according to evaluation guidelines (like the OCRA (ISO 11228-3 and EN 1005-5) and KIM tool ). Alternatively the evaluator can reside on newer technologies like optical motion capture systems [14, 8], electromagnetic tracking [15, 16] or orientation sensors [17, 18]. However, these systems are often intrusive and expensive. Due Figure 1: The operation room lighted during to their intrusive nature they are more suitable the operation preparations. Illustrating the for training and testing purposes then for meaclose proximity in which people work and the suring during a real operation. Also most methods focus on the surgeon similar clothing. The camera is positioned under the screen on which the surgeon sees the and to not allow for easy evaluation of the ergonomics of the surgical assistants  with the laparoscopic camera images. same dataset. In contrast, our presented implementation can evaluate the body pose of the asA second challenge is related to the interac- sistants as well using post processing with mintion between the nurses and the surgeon. Given imal configuration adjustments. the space constraints, this results in multiple people who are continuously visible in the im3 System overview ages.
The presented method is based on prior work by Shotton et al.  and Buys et al. . Shotton et al. proposed using a random decision forest (RDF) [5, 6] assigning body parts to pixels. Their application uses a background subtraction step as the initial step in the pipeline, so that after the RDF labeling they can use a simple meanshift  to get estimates of the joint locations. Their approach is based on simple depth comparisons as features. These depth comparisons are easy to calculate, which results in online calculation. Buys et al. extended the work by Shotton et al. by integrating a more complex skeleton model, which makes the background subtraction superfluous and allows the algorithm to be used with a moving camera in highly cluttered environments. Even though research dates back to 17001713, evaluating the ergonomics during surgery is still a very active research field . Often a virtual reality simulation is used to optimize the architectural design of the operation room [9, 10]. A number of simulation environments are
The system is based on a random decision forest as described in Buys et al.  and explained in subsection 3.1. As input to the algorithm a RGB-D image is acquired by a Primesensebased camera . The output contains 3D locations of the human body parts according to a predefined kinematic model. Initially all pixels are labeled according to the RDF as being either a specific body part label or a background label. This labeling step outputs a very noisy label. Therefore, all pixels are smoothed and then clustered into body part proposals. From these part proposals a candidate set is filtered out using the part statistics. These statistics are eigenvectors and eigenvalues of the corresponding pointcloud, number of pixels and real life size. In the resulting set of candidate parts a search for feasible kinematic trees is conducted. This step results in one or more people detections. These detections are still noisy and often have missing body parts or poor locations. In the implementation proposed in  an appearance model is estimated and a segmentation refinement step is executed. For this implementation each initial noisy estimate is used as a seed for color and depth-based re2
Koen Buys, Surgeon pose detection
Figure 2: Color coded depth image, red means close to camera, green further away. Illustrating the close proximity in which 4 people work. Figure 3: In a random decision forest each tree takes binary decisions until it reaches a leaf gion growing to retrieve a better segmentation, node where a vote is casted, votes from multo retrieve missing body parts, and to improve tiple trees are combined in a final vote. on the location of the existing parts. Due to the extreme conditions of the operation room like the complete darkness, identically colored clothing of patient, surgeon and nurses, the color and depth-based energy function for the region growing algorithm fails to provide adequate results. For this reason a new energy function was defined that only bases itself on the depth data (as seen in figure 2) and tries to use the RGB info only in regions where adequate color information is present. This energy function is further discussed in subsection 3.2 For each person specific segmentation that is outputted by the region growing algorithm the RDF-based pixel labeling and the kinematic tree search are executed again. The resulting output is a more robust and accurate human body pose estimate that is discussed in section 4.
Random decision forest
A random decision forest is a machine learning approach used for classification. It combines the output of multiple decision trees into a final classification, illustrated in figure 3. A decision tree is a classifier that follows a predefined number of binary evaluations (at nodes in different levels) in order to arrive at a leaf node which contains the classification. Training is data driven where a large amount of annotated data is put through a random set of binary evaluations. For each the information
gain [20, 21] is evaluated in order to select the evaluation with the highest entropy. In our case we use three decision trees, each with 20 levels deep, in the RDF. More details can be found in  and the generation of the training data can be found in .
The applied region growing algorithm is based on graph theory. In this algorithm the nodes are the pixels and the edges are the interconnections in a 8 neighborhood around a center pixel. According to an energy function that evaluates the edges a weight factor is assigned to each edge. Then all edges are evaluated against a threshold in order to make a background/foreground segmentation. In the original implementation  this energy function used the offset of the hue value in HSV color space and the Euclidean L2 offset between the nodes in the graph in order to accept or decline an edge. However when the color information is not available (due to similar clothing or poor lighting conditions), this would mean a region growing only based on Euclidean L2 distances and a Euclidean clustering would be the result. Due to close proximity of people, this does not result in the desired behavior. The newly proposed approach is to apply region growing in regions where color information is available. This is evaluated based on 3
Koen Buys, Surgeon pose detection the saturation and value fields in HSV color space. When the color information is not adequate an energy function based on point normals  (Figure 4) and Euclidean L2 distances is used. This approach can be interpreted as evaluating local smoothness of the surface and results in a 3D edge detector behavior. In this fashion people can be separated even standing in close proximity. On graph edges where one node has insufficient color information, the algorithm falls back on the normal information and from the next step in the breadth first search it will use the color information.
As multiple people are continuously working in close proximity around the surgeon during an operation we added a heuristic rule to facilitate Figure 5: The lighting conditions during the the identification of the correct person. This surgery, only some illumination from an adjawas achieved by defining a region of interest in cent room is visible. 3D, that reflects the closest point to the trocar next to the patient.
As a proof of concept experiment a surgeon was recorded during a 4-hour laparoscopic surgery. During the surgery around 12 people where continuously present in the room of which at least four where continuously visible in the camera image (Fig. 2). The preparations where also recorded (as seen in Figure 1) but where not used for the evaluation of the newly proposed algorithm. The lighting conditions were extremely poor, only in the distant background light from a nearby room was visible as illustrated in figure 5. The original implementation  was used as a reference (shown in figure 6). The energy function continuously relied on the point normal information to do the region growing as explained in section 3.2. This was the desired test behavior, as the color based energy function was already evaluated in prior work. When the energy function was changed to use both point normals and depth information, we obtained the result shown in figure 7 When we combine the new normal based energy function with the color based energy func-
Figure 6: The original segmentation, based on a color and depth-based energy function. Although it is able to segment a (wrong) person correctly, the results are still noisy and body parts remain missing.
Koen Buys, Surgeon pose detection
Figure 4: The point normals used in the energy function in the X (fig. 4(a)), Y (fig. 4(b)) and Z (fig. 4(c)) direction. Mainly in the Z direction the edges and smooth regions become directly apparent.
Figure 7: The new segmentation, with an point normal and depth-based energy function. The results remain noisy, a big part of the surgeon is already found correctly but the algorithm still segments out two people.
Figure 8: The final surgeon segmentation, the different colors indicate different body segments
tion in regions where the color is adequately This paper presents early results of human pose available and together with prior knowledge of detection in extreme situations. The presented the surgeon’s place at the operation table the algorithm was successful in detecting a surgeon’s result can be seen in figure 8. Finally correct body pose during a laparoscopic operation in complete darkness. The presented implementasegmentation rate of 79% was achieved. The code presented in this paper is part tion can be used for automated ergonomic evalof the PointCloud Library  and can be ac- uation of the surgeons body pose, augmented cessed freely as open source under the BSD li- feedback, automated user interface and control cense at http://www.pointclouds.org. More or other applications in the operating room. information on how to use the algorithm will The proposed extension to the existing algorithm allows for future applications in other become available on http://people.mech.kuleuven. be/~kbuys/ Due to patient confidentiality the fields. Future work includes a more quantitative evaluation as well as a complete GUI apacquired test data can not be made public. plication in the surgical field. The authors are aware of the drawbacks of using heuristic rules in this early implementation. A probabilistic version of the algorithm using person tracking
Koen Buys, Surgeon pose detection is being developed.
space analysis,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 5, pp. 603–619, 2002.
Acknowledgements The authors would like to acknowledge Nvidia for their financial contributions and technical support for this project. Koen Buys is funded by KU Leuven’s Concerted Research Action GOA/2010/011 Global real-time optimal control of autonomous robots and mechatronic systems, a PCL-Nvidia Code Sprint grant, an Amazon Web Services education and research grant, this work was partially performed during an intern stay at Willow Garage.
 G. Lee, T. Lee, D. Dexter, R. Klein, and A. Park, “Methodological infrastructure in surgical ergonomics: a review of tasks, models, and measurement systems,” Surgical Innovation, vol. 14, no. 3, pp. 153– 167, 2007.  K. H. Goodell, C. G. Cao, and S. D. Schwaitzberg, “Effects of cognitive distraction on performance of laparoscopic surgical tasks,” Journal of Laparoendoscopic & Advanced Surgical Techniques, vol. 16, no. 2, pp. 94–98, 2006.
 W. D. Smith and R. Berguer, “A simple virtual instrument to monitor surgeons’ workload while they perform minimally in I. Famaey, K. Buys, D. V. Deun, T. D. vasive surgery tasks.,” Studies in health Laet, J. V. Sloten, and J. D. Schutter, technology and informatics, vol. 98, p. 363, “Hand gesture recognition for surgical con2004. trol based on a depth camera,” in In proceedings of the international digital human  P. Joice, G. B. Hanna, and A. Cuschieri, modeling conference, 2013. “Ergonomic evaluation of laparoscopic bowel suturing.,” American journal of  Primesense. primesense.com, 2010. surgery, vol. 176, no. 4, p. 373, 1998.  J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kip-  P. Joice, G. Hanna, and A. Cuschieri, “Errors enacted during endoscopic surgery—a man, and A. Blake, “Real-time human human reliability analysis,” Applied erpose recognition in parts from a single gonomics, vol. 29, no. 6, pp. 409–414, 1998. depth image,” in CVPR, 2011.
 K. Buys, C. Cagniart, A. Baksheev, T. D.  A. Klussmann, U. Steinberg, F. Liebers, H. Gebhardt, and M. Rieger, “The key inLaet, J. D. Schutter, and C. Pantofaru, dicator method for manual handling op“An adaptable system for rgb-d based huerations (kim-mho)-evaluation of a new man body detection and pose estimation,” method for the assessment of working conJournal of Visual Communication and Imditions within a cross-sectional study,” age Representation, no. 0, pp. –, 2013. BMC musculoskeletal disorders, vol. 11,  L. Breiman, “Random forests,” Machine no. 1, p. 272, 2010. learning, vol. 45, no. 1, pp. 5–32, 2001.  G. Lee, S. M. Kavic, I. M. George, and  G. Rogez, J. Rihan, S. Ramalingam, A. E. Park, “Postural instability does not C. Orrite, and P. H. Torr, “Randomnecessarily correlate to poor performance: ized trees for human pose detection,” in case in point,” Surgical endoscopy, vol. 21, Computer Vision and Pattern Recognition, no. 3, pp. 471–474, 2007. 2008. CVPR 2008. IEEE Conference on,  S. Smith, J. Torkington, T. Brown, pp. 1–8, IEEE, 2008. N. Taffinder, and A. Darzi, “Motion anal D. Comaniciu and P. Meer, “Mean ysis,” Surgical endoscopy, vol. 16, no. 4, shift: A robust approach toward feature pp. 640–645, 2002.
Koen Buys, Surgeon pose detection  R. Aggarwal, A. Dosis, F. Bello, and A. Darzi, “Motion tracking systems for assessment of surgical skill,” Surgical Endoscopy, vol. 21, no. 2, pp. 339–339, 2007.  R. Berquer, W. Smith, and S. Davis, “An ergonomic study of the optimum operating table height for laparoscopic surgery,” Surgical Endoscopy And Other Interventional Techniques, vol. 16, no. 3, pp. 416–421, 2002.  G. Kondraske, E. Hamilton, D. Scott, C. Fischer, S. Tesfay, R. Taneja, R. Brown, and D. Jones, “Surgeon workload and motion efficiency with robot and human laparoscopic camera control,” Surgical Endoscopy And Other Interventional Techniques, vol. 16, no. 11, pp. 1523–1527, 2002.  G. Lee, T. Lee, D. Dexter, C. Godinez, N. Meenaghan, R. Catania, and A. Park, “Ergonomic risk associated with assisting in minimally invasive surgery,” Surgical endoscopy, vol. 23, no. 1, pp. 182–188, 2009.  S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.  S. Kullback, Information theory and statistics. Courier Dover Publications, 1968.  K. Buys, J. Hauquier, C. Cagniart, T. Tuytelaars, and J. D. Schutter, “Virtual data generation based on a human model for machine learning applications,” in In proceedings of the international digital human modeling conference, 2013.  N. J. Mitra and A. Nguyen, “Estimating surface normals in noisy point cloud data,” in Proceedings of the nineteenth annual symposium on Computational geometry, pp. 322–328, ACM, 2003.  R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” in Robotics and Automation (ICRA), 2011 IEEE International Conference on, pp. 1–4, IEEE, 2011.