REALISM: Real-Time Hand Gesture Interface for Surgeons and Medical Experts David Louis M. Achacon Jr., Denise M. Carlos, Maryann Kaye Puyaoan Christine T. Clarin, Prospero C. Naval, Jr. Computer Vision & Machine Intelligence Group Department of Computer Science College of Engineering University of the Philippines-Diliman

[email protected] ABSTRACT Computer usage in the operating room by surgeons has been limited due to sterility issues that might risk the patients’ safety. When surgeons browse medical images in the operating room, the task is often delegated to an assistant that controls the mouse or keyboard while the surgeon provides instructions. Such interaction, however, is slow, awkward and error-prone. REALISM, (Real-time Hand Gesture Interface for Surgeons and Medical Experts) aims to provide a non-contact human-computer interface through hand gestures. We envision the project with a medical expert in front of a camera performing hand gestures which the computer then interprets as commands for navigation of images and objects. Image navigation is the primary function that the REALISM intends to provide. The implementation of REALISM system is divided into three parts: Hand Detection, Hand Gesture Recognition and Hand Tracking. Gesture vocabulary was designed considering factors such as gestures’ ease of use, simplicity and intuitiveness for the users. Gesture vocabulary must also refrain from including poses offensive to certain cultures. The project was developed using OpenCV, a computer vision library originally developed by Intel. Experimental results show that the REALISM system achieved higher precision and recall in well lighted environment than in unlighted environment. The system was able to detect 95.61% - 100% of the hands present within the camera’s vision in a well lighted environment. While in poorly illuminated environment the system detected 15.98% - 96.88% of the hands present within the camera’s vision. Although the system detects most of the hands present in the camera’s vision, it still misclassifies some objects as hands. The system achieved 64.47% - 75.13% of precision in well lighted environment and 35.24% - 53.34% in poorly illluminated environment. Keywords: Hand Gesture Recognition, Hand Gesture Tracking, Human-Computer Interface

1.

INTRODUCTION

Nowadays, applications of computer information technology are already integrated with hospital and medical procedures.

Hospital records are computerized; imaging tools such as MRI, PET and CT, to name a few, are totally dependent on computer technology. Medical tools that accurately retrieve and analyze data have saved lives. However, computer information technology’s potential use in Operating Rooms and Intensive Care Units is not fully maximized since computer technology presents some problems that may risk patients’ safety. One of the major problems that come with the use of computer devices in hospitals is sanitation or sterility issues. It has been found that a common way of spreading infections in sterile place involves computer peripherals like keyboards and mice. Cross-transmission of disease occurs because of acquisition of transient hand carriage by health care personnel during contact with the contaminated computer keyboard surface. Staphylococci, diphtheroids, micrococcus species and bacillus species are just some of the possible pathogens dwelling on hospitals’ mice and keyboards [1]. With the issue of sterility at hand, the manner in which medical expert-computer interacts becomes problematic. Medical experts need to be sterile while performing operation with their patients thus; they can not hold devices such as mouse and keyboard. The issue of ease and speed of use has affected the accuracy of the medical expert’s analysis. A common practice, when surgeons browse medical images, in Operating room is delegating an assistant to control the mouse or keyboard while the surgeon instructs him what to do. Such interaction, however, is slow and awkward. Chances of error on both the surgeon and assistant are also high since the surgeon’s attention gets divided between attending to his patient and giving instruction. Miscommunication can also be a problem [2]. On section 2 of this paper, existing camera based Human Computer Interface researches that uses hands and gaze are discussed. The description of REALISM is tackled on section 3. The algorithms used for the modules of the system is in the 4th section while section shows the preliminary results of experiments for hand detection and hand recognition.

2.

RELATED RESEARCHES

By early 1990s, computer scientists and medical experts started to find ways to develop a surgeon-computer interface. In FAceMOUSe [1], a surgeon can control the motion of the laparoscope using face gestures. This real - time image-based system does not involve any voice input nor body contact with the surgeon. It simply tracks the face gestures of the surgeon which makes it more convenient for the surgeon to perform the surgery. The effectiveness of the performance of FAceMOUSe was tested in a laparoscopic cholecystectomy on a pig. Hand gestures used as mouse functions also appeared in Graetzel et al. [2]. This Non-contact mouse for Surgeon Computer Interaction is specifically designed for Minimally Invasive Surgeries wherein the operations performed require computer usage by the surgeon. This system is designed to work with bare hands or with surgical gloves on the surgeon’s hands. One system which uses Gaze, a diagnostic imaging technique for selecting images is presented in [9]. The system records a digital portrait of the user’s eyes. The digital portrait will then be used for estimating the location of the user’s eye gaze from the computer screen. Like the FAceMOUSe, this system exhibits a non-contact Human Computer Interaction by just looking at the appropriate sequence of menu options displayed on the computer screen.

Figure 1: System Overview A medical expert initially performs hand gestures in front of a camera. The computer analyzes these gestures by first detecting the hand present within the camera’s vision. REALISM requires a clean background to be able to detect hands. Both bare hands and hands with gloves can be detected by the system. The detected hand would then be mapped to the gesture set defined in the system. Each gesture signifies a command that can be used for navigation of medical images. Once a gesture has been classified, the computer then performs the command associated to it. A recognized hand is tracked to analyze its motion.

In Gestix[3], surgeons can browse medical images in a dynamic medical environment using a sterile gesture interface. The hand gestures performed by the surgeon act as commands for the navigation and manipulation of images and other data in the computer screen. Factors that are considered in Gestix are the relative positions of the hand gestures on the screen and the color - motion cues for tracking the surgeon’s hand. The proponents of the project aim to provide a means of Human - Computer Interaction in hospital setting using hand gestures. The group has chosen hand gestures over face gestures and voice because hand movements are more defined and flexible. Speech recognition may also be problematic since ORs tend to be noisy due to the sound of machines, fans and spoken dialogues. It is a real-time Hand Gesture Interface for surgeons, wherein the surgeon performs hand gestures in front of the camera. These gestures will then be interpreted as computer commands that will be used for controlling and navigating images from MRIs and CTs.

3.

THE REALISM PROJECT

REALISM aims to provide a non contact Human Computer Interface that can be used for browsing MRI and CT scan images by medical experts. This project will be useful during medical procedures where the surgeons are restricted from any contact to non-sterile device.

Figure 2: REALISM Gesture Vocabulary The gesture vocabulary of REALISM consists of close fist, open palm, Y posture and L posture. Once the application is opened, the images in a predefined directory will be shown in a filmstrip view. If the surgeon opts to scan the previous and next images in that particular directory, the Y Posture must be used. Once an image is selected, it can be zoomed in or zoomed out using the Open Palm Posture and the Close Fist Posture respectively. If the surgeon opts to use the scroll functionality, the L Posture must be used. The scroll functionality is used for moving the chosen image upwards, downwards, to the left or to the right. The system can recognize U posture but it has no corresponding functionality at this moment. Hence, the system can be further developed by assigning new functionality to U posture. Although this project is intended for medical use, the group has seen its potential to be extended to other applications by having these hand gestures mapped to ctrl key functionalities of computer or specific application (i.e. open palm to ctrl O). It can be used as an API for other application as well. Gesture vocabulary can also be extended or customized according to the user’s needs.

4.

IMPLEMENTATION

REALISM is divided into 3 parts: Hand Detection, Hand Gesture Recognition and Hand Tracking.

The value of Haar is the difference between the sums of pixel gray level values within the black and white region. A particular feature is said to be present if the difference is greater than the default threshold that was used during the training phase.

(x) = Sumblackrect. (pixelgray) − Sumwhiterect. (pixelgray)

Figure 3: System Software Architecture

4.1

Hand Detection

The Hand Detection module intends to locate the hand present within the camera’s vision. This module requires the user to put his hand in front of the camera. Only significant hand gestures will be detected by the system. The detected hands are then classified into different hand gestures using the second module, Hand Gesture Recognition.

4.1.1

Haar-like Features and Adaboost

Setting the threshold levels in the training phase and obtaining the specific Haar Features are done using AdaBoost. AdaBoost (Adaptive Boost) learning algorithm is a variation of the regular boosting algorithm, and can adaptively select the best features at each step and combine a series of weak classifiers into a strong classifier. This algorithm has been primarily used for face detection systems but recent works have used this in view-specific hand posture detection. Adaboost starts with a uniform distribution of ”weights” over training examples. The weights tell the learning algorithm the importance of the example. It then repeatedly obtains weak classifier from the weak learning algorithm and increases the weights on the training examples that were misclassified. [5] The group used OpenCV for Haar training. We trained 5000 positive hand samples with 5000 negative images.

4.1.2

Cascade of Classifiers

A cascade of classifiers is a degenerated decision tree where at each stage a classifier is trained to detect almost all objects of interest while rejecting a certain fraction of the nonobject patterns [6] (see Figure 5). Each stage was trained using Gentle AdaBoost.

Figure 4: Detected Hand The hand detection module was implemented using Haar-like features and Adaboost. Haar-like features consists of jointed “black” and “white” rectangles [4] (see Figure 3).

Figure 6: Cascade of classifiers with N stages. At each stage a classifier is trained to achieve a hit rate of h and a false alarm rate of f. In a decision tree, if at least one particular classifier was unable to pass an image region, that image region is now considered as a false hand. On the other hand, if a particular classifier has successfully passed an image region, it continuously goes to the next stage until the Nth stage and decides as to whether or not the image contains a hand.

4.2

Figure 5: Haar-like Features

Hand Gesture Recognnition

The second module maps the detected hands to the gesture set defined in the system. Open palm, close fist, L, Y and U postures are the hand poses this module aims to classify. Hand Gesture Recognition module was implemented using Principal Components Analysis and distance based matching which was used in Eigenface [7]. Hand Recognition Module consists of a training phase and recognition phase.

Given example of hand images for each gestures and an unknown hand image to recognize, recognition works by: (1) computing the “distance” between the new image and each of the example hand image, (2) selecting from the example image the closest to the unknown (3) Checking if the distance is below the threshold, recognize the unknown hand image belonging to the given hand gesture, otherwise, classify as“unknown”.

5.1

Hand Detection using Cascade Classifiers

To measure the performance of the hand detection and hand recognition module, the group used precision and recall. Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness. Precision and recall can be computed using the following formula:

Recall =

TP TP + FN

P recision =

Figure 7: Recognized Open Palm

4.2.1

Principal Components Analysis

PCA is a dimensionality reduction method used for easier analysis of data. This method reduces the computation of analysis of images from the order of number of pixels (N2 ) in the image to the order of number of images in the training set (M). The main idea of principal component analysis is find the vectors which best account for the distribution of feature characteristics of images within the entire image space. [8] These vectors define the subspace of the hand images. Images are projected into PCA subspace before it is recognized. Projection works by assigning a subspace location that is closest to its location in the higher dimensional space.

4.2.2

Distance Matching

Finding for the closest training example follows after an unknown image has been projected to its PCA subspace. This is done by finding the Euclidean distance between projected points in PCA subspace of the test image and the hands classes.

TP TP + FP

where TP is true positive, FN is false negative and FP is false positive. True positive refers to ground truth in AI. In hand detection, detected regions in images containing hands are considered to be true positives. False negative refers to an error of failing to observe a difference when in truth there is one. Type II error or false negative can be interpreted as an oversight or inadequate sensitivity. In hand detection module, these are images containing hands that the system fails to distinguish. Error of observing a difference when in fact there is none is called type I error or false positive. It can be interpreted as false alarm. False positives in hand detection occur when the system detects a hand in an image that does not actually exist. Recall in this context is the number of correctly detected hands by the system divided by the total number of hands tested (i.e. detected and undetected hands). On the other hand, Precision is the ratio of correctly detected hands to the total number of hands detected by the system (i.e. correctly detected and wrongly detected hands). Recall measures how sensitive the hand detection module to hands present in the system while precision determines the accuracy of the system’s module. The group, conducted two experiments using two haar cascades that works in different backgrounds namely light and dark backgrounds. Table 1 shows the result of Precision and Recall performed in a well lighted and a unlighted place with the user wearing gloves and not with a light background. Three persons have participated in this experiment.

 = ||Ω − Ωk || A hand image is classified belonging to class k when the minimum distance  is lower than the chosen threshold θ. Otherwise the hand is classified as “unknown”.

4.3

Hand Tracking

Hand tracking module tracks and analyzes the motions of the detected hand. This module was implemented using Kalman Filter. Kalman Filter is an efficient recursive data processing algorithm that estimates the state of dynamic system from a series of incomplete and noisy measurements. It can estimate hand velocity and predict future hand position. Kalman Filter assumes Gaussian distribution of states and noise.

5.

PRELIMINARY EXPERIMENTS AND RESULTS

Figure 8: Different Environments in a Light Background: lighted w/o gloves, lighted w/ gloves, unlighted w/o gloves and unlighted w/gloves

The hand detection module was able to achieve higher precision and recall in well lighted environment than in dark

Table 1: Precision and Recall Result in a Light Background Environment Precision Recall lighted w/ gloves 64.47% 100% lighted w/o gloves 68.82% 100% unlighted w gloves 35.24% 15.98% unlighted w/o gloves 47.83% 56.15% ones. In lighted place with the user wearing gloves, a 100% recall was achieved but only 64.47% of precision was obtained. The high recall means that the system was able to detect most of the hands present in the area. The precision tells us that the system still misclassify some objects as hands (false positives). When the user is not wearing gloves, the precision rate is 68.82% and recall rate is 100%. A relatively higher precision was achieved in lighted without gloves scenario as more bare hands was used for training than hands wearing gloves. In dark environment, the system was able to achieved lower precision and recall results. Though hospital setting usually has good lighting condition, the group is still considering this as an area for improvement. Further training of the system is needed to achieve better results. The system should be trained in different lighting condition and background. More negative samples should also be added for the system to discriminate hands more aptly. Table 2 shows the result of Precision and Recall performed in a well lighted and a unlighted place with the user wearing gloves and not with a dark (black) background. Three persons have participated in this experiment.

recall was achieved but only 64.47% of precision was obtained. The high recall means that the system was able to detect most of the hands present in the area. The precision tells us that the system still misclassify some objects as hands (false positives). When the user is not wearing gloves, the precision rate is 68.82% and recall rate is 100%. A relatively higher precision was achieved in lighted without gloves scenario as more bare hands was used for training than hands wearing gloves. In the second expriment, a precision of 95.61% and recall of 75.13% was achieved in a well-lighted room while the users were wearing gloves. A precision of 96.25% and a recall of 64.52% on the same environment while the users are in bare hands. Higher precision and recall results were obtained in unlighted environment on this experiment (black background) compared to the first experiment (light background). This is because the haar training was done in black background making it easier to detect hands in an unlighted environment relative to haar training done in light background.

5.2

Hand Recognition using PCA and Nearest Distance Matching

The performance of hand recognition was measured by Precision. Precision determines how accurate can the system classify the detected hand images into 5 gestures (open, close, y, u and l). 100 images were used test each hand gesture. All of these images were used for the training of hand recognition module. The hand recognition module was tested in a light background, well lighted environment and user’s bare hands. Table 3 shows the result of the experiment:

Table 3: Precision of Hand Recognition Hand Gesture Precision open palm 100% close fist 100% l pose 100% y pose 100% u pose 100% Figure 9: Different Environments in a Dark (Black) Background: lighted w/o gloves, lighted w/ gloves, unlighted w/o gloves and unlighted w/gloves

Table 2: Precision and Recall Result in a Dark (Black) Background Environment Precision Recall lighted w/ gloves 75.13% 95.61% lighted w/o gloves 64.52% 96.25% unlighted w/ gloves 53.34% 93.44% unlighted w/o gloves 52.02% 96.88% The hand detection module was able to achieve higher precision and recall in well lighted environment than in dark ones. In lighted place with the user wearing gloves, a 100%

A precision results of 100% were obtained in all hand gestures. This were mainly because all the computed eigenvectors were the same with stored eigenvectors trained for classifying hand images to their respective classes. This experiment shows the reliability of the hand recognition module. Another set of images were tested using the same environment condition (light background, well lighted environment and user’s bare hands). 50 images were used test close fist, 37 images for open palm.

A precision result of 94.59% was obtained for open palm. A lower precision was achieved for close, l and y poses due to semblance of their shapes.

Table 4: Precision of Hand Recognition Hand Gesture Precision open palm 94.59% close fist 78% l pose 70% y pose 72%

6.

CONCLUSION

In this paper, we have described our project, REALISM. We identified the modules of the system and the gesture vocabulary that it will use. A performance measure has also been conducted for the hand detection module and preliminary results show that the system was able to detect 95.61% 100% of the hands present within the camera’s vision in a well lighted environment. While in unlighted environment the system detected 15.98% - 96.88% of the hands present within the camera’s vision. Although the system detects most of the hands present in the camera’s vision, it still misclassify some objects as hands. The system achieved 64.47% - 75.13% of precision in well lighted environment and 35.24% - 53.34% in poorly illuminated environment. Hand recognition experiments yields 70% - 94.59% of precision for four hand gestures(open, close, y and l postures). The open palm achieved the highest precision of 94.59%. The close, y and l postures obtained 78%, 72% and 70% respectively. These postures have similiraties in shape and structure giving them a lower precision value and close interval with each other. The group has also noted that concepts of this project can also be used for other software. The system can be more nonspecific of its application by associating the gesture set to shortcut keys used by a particular application. In this manner, the concepts of REALISM can cater to more application rather than being specific for this medical software only.

7.

REFERENCES

[1] W A. Rutala,Matthew S. White, Maria F. Gergen and David J. Weber, Bacterial Contamination of Keyboards: Efficacy and Functional Impact of Disinfectants Infection Control and Hospital Epidemiology April 2006, vol. 27, no. 4 [2] Chauncey Graetzel, Terrence W Fong, Sebastien Grange and Charles Baur, A non-contact mouse for surgeon- computer interaction Technology and Health Care vol. 12, no. 3, 2004, pp. 245-257 [3] Juan Wachs, Helman Stern, Yael Edan,Micheal Gillam, Craig.Feied, Mark Smith and Jon Handler Gestix: A Doctor-Computer Sterile Gesture Interface for Dynamic Environments Soft Computing in Industrial Applications vol. 39, Springer Berlin / Heidelberg, 2007, pp.30-39 [4] Qing Chen Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features Discover Lab, SITE, University of Ottawa [5] Paul Viola and Michael J. Jones Robust real-time

object detection Cambridge Research Laboratory Technical Report Series CRL2001/01, pp. 1-24, 2001 [6] Paul Viola and Michael J. Jones Rapid Object Detection using a Boosted Cascade of Simple Features IEEE CVPR, 2001 [7] Matthew Turk and Alex Pentland Face Recognition Using Eigenfaces Proc IEEE Conf. on Computer Vision and Pattern Recognition, 1991 [8] Robin Hewitt Seeing With OpenCV Face Recognition using Eigenface Servo Magazine, April 2007, pp 36-39 [9] T.E. Hutchinson, K.P. White Jr., W.N. Martin, K.C. Reichert and L.A. Frey Human-computer interaction using eye-gaze input November-December 1989, Volume 19, Issue No.6

REALISM: Real-Time Hand Gesture Interface for ... - CiteSeerX

The implementation of REALISM system is divided ... poorly illuminated environment the system detected 15.98% ..... It can be interpreted as false alarm.

225KB Sizes 1 Downloads 186 Views

Recommend Documents

REALISM: Real-Time Hand Gesture Interface for ...
College of Engineering. University of ... One of the major problems that come with the use of computer ... By early 1990s, computer scientists and medical experts.

Hand Gesture Recognition.pdf
Page 1 of 14. International Journal of Artificial Intelligence & Applications (IJAIA), Vol.3, No.4, July 2012. DOI : 10.5121/ijaia.2012.3412 161. HAND GESTURE ...

Hand gesture recognition for surgical control based ... - Matthew P. Reed
the desired hand contours. If PointCloud Data. (PCD) files of these gestures already exist, this method can be adjusted quickly. For the third method, the MakeHuman hand model has to be shaped manually into the desired pose and exported as a PCD-file

a fast algorithm for vision-based hand gesture ...
responds to the hand pose signs given by a human, visually observed by the robot ... particular, in Figure 2, we show three images we have acquired, each ...

Hand gesture recognition for surgical control based ... - Matthew P. Reed
Abstract. The introduction of hand gestures as an alternative to existing interface techniques could result in .... Above all, the system should be as user-friendly.

Hand Gesture Recognition for Human-Machine ...
achieve a 90% recognition average rate and is suitable for real-time applications. Keywords ... computers through hand postures, being the system adaptable to ...

Hand gesture recognition for surgical control based ... - Matthew P. Reed
2Dep. of Mechanical Engineering, Robotics research group, KU Leuven, Belgium. Abstract. The introduction of hand gestures as an alternative to existing interface techniques could result in groundbreaking changes in health-care and in every day life.

Camera-based Scrolling Interface for Hand-held Devices
subjects that executed a trajectory with our system and, as a comparison, with ... ing computing and networking capacity of such devices, users expect to be able ...

Computer Vision Based Hand Gesture Recognition ...
Faculty of Information and Communication Technology,. Universiti ... recognition system that interprets a set of static hand ..... 2.5 Artificial Neural Network (ANN).

Toward Realistic Hands Gesture Interface: Keeping it ...
May 6, 2017 - in mobile devices. Figure 1. The camera and user setting we address - providing a gesture- based user interface for common laptop and desktop .... Our system is able to recognize basic propositions 92% of the time with a false positive

46.A Hand Gesture Recognition Framework and Wearable.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 46.A Hand ...

Robust Part-Based Hand Gesture Recognition ... - Semantic Scholar
or histograms. EMD is widely used in many problems such as content-based image retrieval and pattern recognition [34], [35]. EMD is a measure of the distance between two probability distributions. It is named after a physical analogy that is drawn fr

From Offensive Realism to Defensive Realism: A Social Evolutionary ...
RSIS offers an exacting graduate education in international affairs, taught by .... For a recent application of sociobiology to international relations, .... Strategic Culture and Grand Strategy in Chinese History, Princeton: Princeton University Pre

89. GESTURE RECOGNITION SYSTEM FOR WHEELCHAIR ...
GESTURE RECOGNITION SYSTEM FOR WHEELCHAIR CONTROL USING A DEPTH SENSOR.pdf. 89. GESTURE RECOGNITION SYSTEM FOR ...

pdf-175\realtime-data-mining-self-learning-techniques-for ...
... loading more pages. Retrying... pdf-175\realtime-data-mining-self-learning-techniques ... numerical-harmonic-analysis-by-alexander-paprotny.pdf.

a motion gesture delimiter for mobile interaction - Research at Google
dition, Rico and Brewster [10] reported that users rated motion gestures more .... using the Android SDK [1] for use on the Nexus One phone with a Qualcomm ...

Distributed QoS Guarantees for Realtime Traffic in Ad Hoc Networks
... on-demand multime- dia retrieval, require quality of service (QoS) guarantees .... outside interference, the wireless channel has a high packet loss rate and the ...

Gesture Avatar: A Technique for Operating Mobile User ... - Yang Li
Computer Science and Engineering. DUB Group ... recent years [2,12]. However .... Figure 2: Tapping on a target using Gesture Avatar. (a) .... 45-degree range.

Gesture Coder: A Tool for Programming Multi-Touch ... - Yang Li
as mp3 players [5], mobile phones [10] and tablets [18], to large interactive surfaces ... gesture is often employed for direct manipulation that requires incremental ...

Realtime HTML5 Multiplayer Games with Node.js - GitHub
○When writing your game no mental model shift ... Switching between different mental models be it java or python or a C++ .... Senior Applications Developer.