AVSS #67

100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

AVSS #67

150 151 Video Surveillance for Biometrics: Long-Range Multi-Biometric System 152 153 154 155 Faisal Bashir, David Usher, Pablo Casaverde, Marc Friedman 156 157 Retica Systems, Inc. 201 Jones Road, Waltham, MA 02451 158 fbashir, dusher, pcasaverde, mfriedman @ retica.com 159 160 161 Abstract 162 163 The human iris is hypothesized to be the best biometric 164 characteristic in terms of uniqueness and robustness. Iris 165 recognition algorithms developed over the last decade 166 have matured significantly to address population-level 167 cross comparisons. Yet iris acquisition systems remain 168 borderline intrusive and less-friendly for subjects and 169 operators. This paper addresses the issue of strictly170 constrained iris acquisition in traditional systems. We 171 highlight the observation that all traditional iris 172 recognition systems impose substantial constraints on 173 subject position and motion during iris acquisition. We 174 further observe that the efforts to relax these constraints 175 for iris acquisition of distant and/or moving subjects fall Figure 1: This figure compares our system in terms of stand-off 176 short of scalable system design. We present a novel iris distance and capture volume with others in recent literature. 177 recognition system for long-range human identification. 178 The system is capable of acquiring face and iris images 179 acquisition procedures are often cumbersome and they from multiple humans present anywhere in the capture 180 have been perceived as intrusive. As a result less volume. The iris acquisition system uses multiple cameras 181 constrained acquisition has become an active area of with hierarchically-ordered fields of views, a highly 182 research and development. In the context of intelligent precise pan-tilt unit (PTU) and a long focal length zoom 183 surveillance, the most relevant questions are: ‘who are the lens. The system is driven by innovative algorithms that people in the space’ (identity tracking) and ‘where are the 184 perform wide-area video surveillance, object detection people in the scene’ (location tracking) [11]. However, 185 and tracking, and precision pointing. Experimental results Hampapur et al [11] observe that, at present, the 186 are reported in an indoor environment for multiple subject technologies for identity tracking (through biometrics) and 187 iris recognition at a distance. Eagle-Eyes is a long-range location tracking (through intelligent video surveillance) 188 multi-biometric system that improves on existing iris are evolving in isolation. If a biometric system, that 189 acquisition approaches in terms of stand-off distance and successfully couples high matching performance with a 190 capture volume through the use of collaborative scene and less constrained acquisition process, can be married with 191 face tracking. video surveillance technologies, applications would be 192 powerful and numerous. Moreover, we believe that the 193 design of a biometric system that is scalable in terms of 194 1. Introduction acquisition constraints cannot be realized without large 195 area situational awareness through video surveillance. 196 Recognition of humans using unique biometric Iris biometrics has attracted a great deal of research 197 characteristics is being applied to numerous applications and development efforts in recent years and has been 198 including homeland security, access control, and many shown to be one of the most accurate biometrics currently 199 user-specific services. Humans possess multiple biometric available [1],[3]. The iris biometrics, however, has not yet characteristics which have been used for recognition. A become ubiquitous as compared to face and finger-prints. criticism of biometric technologies has been that biometric

1

AVSS #67

200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

The face biometric trait has the advantage of being generally in plain view and therefore lends itself to less constrained acquisition. Finger-prints have offered a higher accuracy solution but require contact (or near contact) with finger-print sensors. The finger-print trait is therefore not scalable in terms of acquisition parameters, such as stand-off distance and capture volume. The iris biometric potentially offers matching accuracy exceeding that of finger-prints while sharing some of the potential advantages of face. As with the face, the iris is generally in plain view and therefore, theoretically at least, can be acquired given line-of-sight with a capture system. However, the dimensions of the iris are such that iris capture at a distance imposes more significant design challenges as compared to face acquisition at a distance. Challenges associated with iris acquisition systems stem largely from two requirements, (1) active near infrared (NIR) illumination and (2) resolution (pixel and spatial). Firstly, standards dictate that NIR illumination be used as it has been found to emphasize iris textures while providing contrast between the iris and both the sclera and pupil. Secondly, the dimensions of the iris are such that iris recognition technologies require signification pixel and spatial resolutions in order to encapsulate enough biometric data. The iris image data interchange standard [18] sets a lower limit on the number of pixels across the iris region to 100 pixels and a lower spatial resolution limit of two line pairs-per-mm at 60% contrast or higher.

AVSS #67

surveillance. The architecture of Eagle-Eyes is presented in Section 3. Sections 4, 5 and 6 present the processing in three hierarchical fields of view, namely scene, face and iris processing. System performance analysis is presented in Section 7. Conclusions are presented in Section 8.

2.

Related Work

The uniqueness of iris patterns has fueled the development of robust pattern recognition algorithms for biometric identification. Details of the preeminent iris algorithm can be found in [3]. In recent literature, multibiometric approaches have proven to outperform individual biometrics. This approach is especially helpful in the cases when one biometric fails to be acquired or matched. Using multiple biometric traits can alleviate several practical problems including noisy data, nonuniversality and spoof attacks [4]. Fusion of iris biometrics with retina [5] and face has been reported in the literature to improve the recognition performance. The feasibility of iris recognition at larger distances has been studied in [7]. They demonstrated iris recognition results at a fixed stand-off distance of 10 meters using a 20 cm diameter telescope. A collimated LED illuminator was used at the fixed point where the subject’s head was stably placed on a chin rest and a forehead stabilizer. Although recognition rates from this work successfully suggested that iris recognition at larger distances was feasible, the need for relaxation of constraints on a subject during acquisition became ever more obvious. Recognizing this need, Matey et al [8] presented an iris recognition system for moving subjects passing through a portal type structure. The system is designed around an image acquisition system using highresolution cameras, video synchronized strobed illumination, and specularity-based iris segmentation. The system captures both iris images of subjects moving through a narrow portal with a 20x20x10 cm3 capture volume at a 3m stand-off distance. Multiple fixed NIR LED illuminators are used to project a fixed pattern of specular reflection. These reflection spots help in the coarse localization of eyes in the image, but introduce extra noise inside the encoded iris region. Although the system reported in [8] relaxes the constraints on the positioning of the subject’s eyes, it does so by adding multiple high-resolution NIR cameras (4 mega pixels per camera). The major disadvantage of this approach is its lack of scalability. To increase the capture volume laterally requires the addition of more high-resolution NIR iris cameras to tile the field of view. This approach rapidly becomes cost prohibitive. Another approach to relax the constraints on the subject’s location and movement during iris image capture is reported in [9]. Their two-camera system combines a high-resolution still-

Traditional iris acquisition systems have met these requirements by imposing significant constraints on subjects. Limitations imposed on system parameters such as stand-off distance, capture volume, and subject motion account for some of the deficiencies of existing iris recognition systems [6]. Relaxing these constraints comes at the price of more demanding system design. Some new approaches in the literature have addressed these constraints in specific scenarios, e.g. by providing a portal type gate to pass through, or a glance-and-go type system design [8][10]. These types of systems provide larger stand-off distances (approximately 1-3 meters) and capture volumes (approximately 0.2m x0.1m x0.2 m), but are not scalable in terms of distance. This paper presents a novel prototype system to address the issues of stand-off distance, capture volume and subject motion in a scalable design. Eagle-Eyes is the first known multi-biometric acquisition system that demonstrates scalable dual-eye iris recognition and face acquisition at a large stand-off distance (3-6 meters) and a large capture volume (3x2x3 m3) (Figure 1). This is achieved by a multi-level collaborative tracking framework that processes image data at human, face, and iris levels. This paper is organized as follows: Section 2 presents some relevant work in iris recognition and video

2

250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299

AVSS #67

300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

image camera (6 mega pixels) with a wide field-of-view video camera using a pan-tilt unit (PTU). The wide fieldof-view camera scans the capture volume, detects the subject’s face and coarsely estimates the distance between the subject’s face and the camera. The PTU is moved to the subject’s estimated location where a high-resolution iris camera takes the still image of the subject’s face. The two eye boxes are detected for further iris segmentation and processing. Their approach represents one major step towards multi-scale analysis for a less-constrained iris image acquisition. The approach is still un-scalable for high volume environments. Their facial points-based distance estimation is highly inaccurate and only deals with a single stationary subject scenario. The small capture volume of [8] and inaccurate depth estimation of [9] are addressed by Yoon et al in [10]. Their system uses a wide FOV camera, a 4 mega-pixel pan-tilt-zoom (PTZ) camera and a light stripe projector to handle a single stationary subject in the capture volume. The subject’s xand z- location is estimated through a wide FOV camera and a light stripe projector, while the tilt angle is obtained from face detection results in a PTZ camera. Their system configuration processes a 1x1x1 m3 capture volume at a stand-off distance of 1.5 m. A major drawback of this approach is that in heavily cluttered or outdoor environments its performance maybe questionable. Additionally, this approach is only applicable for a single stationary subject in the capture volume. A graphical comparison of some major approaches in recent literature is presented in Figure 1. This figure shows our proposed system along with [8] and [10] in terms of stand-off distances and capture volumes.

AVSS #67

Figure 2: System schematic for Eagle-Eyes.

3. Eagle-Eyes Architecture Eagle-Eyes is a multi-biometric system developed as an inter-disciplinary research and development effort at the cross-roads of traditional biometrics and video surveillance. Eagle-Eyes integrates video surveillance techniques with a multi-biometric capture device (Figure 2). Multiple cameras with hierarchically-ordered fields of views, a highly precise pan-tilt unit (PTU) and a long focal length zoom lens are combined to acquire face and iris biometrics at a large stand-off distance. The fixed scene camera is used for wide area scene surveillance to detect and track humans. A scheduler ranks faces in the scene FOV and directs the PTU in sequence to all active subjects. A list of previous acquisition events prevents repeat acquisitions of the same subject. The face camera, which has a narrower FOV is used to acquire a higher resolution image of the subject’s face. This camera is mounted on the PTU assembly along with a rangefinder, iris laser illuminator and a dual-iris camera. Images generated from the face camera are also used to locate and track the subject’s eyes. A target point is located on the subject’s face mid-way between their two eyes for iris targeting. The hierarchical framework of Eagle-Eyes facilitates a collaborative tracking between scene and face cameras for accurate iris pointing. The problem of longdistance iris illumination is solved using an innovative laser illuminator design. The laser illuminator propagates a collimated beam that maintains a uniform illumination profile out over large distances. This is in stark contrast with the existing longer-range iris acquisition approaches, such as [8] and [10], that illuminate a fixed region in the capture volume. Iris image resolution requirements are addressed using a scalable optical design. A long focal length zoom lens is used in conjunction with a customdesigned dual sensor iris camera. The dual sensor iris

Situational awareness through intelligent video surveillance is an active area of research in recent literature. Hampapur et al [11] suggest that comprehensive situational awareness requires a multi-scale approach to human tracking in video surveillance. They present a system with wide area video surveillance and PTZ functionality that captures facial biometrics of humans at a distance. A two-camera master-slave system was presented in [2] that tracked subjects at a distance of up to 50 meters with a 600 field of view. Hierarchical tracking with wide area surveillance and high resolution tracking is also addressed by Bashir and Porikli [13]. A highdefinition video camera with electronic PTZ functionality, replaces a mechanical pan-tilt assembly, and is used for collaborative tracking. Shah et al [12] present another aspect of intelligent video surveillance: distributed video surveillance across multiple cameras for wide area object tracking and behavioral analysis. Their system detects, categorizes and tracks moving objects in a scene observed by multiple cameras.

3

350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399

AVSS #67

400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

camera is made up of two standard VGA-resolution image sensors. Our system design allows for the capture of “high-resolution” iris images according to iris image quality standards [18]. The motion of the subject is accounted for by a subject-servo-loop that tracks both the motion of the PTU assembly and the subject. Eye tracking in the face camera is combined with range information using an optical rangefinder. The zoom and focus of the iris lens are controlled to match the subject’s range trajectory. The system has a distributed architecture, and performs real-time processing on multiple CPUs.

result in false alarms. We have implemented a face scoring mechanism to probabilistically handle false alarm cases. This post-processing step filters out the face detection results which have lower probability of being true human faces based on human body constraints, results of foreground segmentation, and the set of tracked faces. Our face tracking algorithm implements an automatic possible object initialization, matured object tracking, and disappeared object removal from a tracking list, which is similar in spirit to [15]. We have devised a face tracking engine based on frontal face detection and probabilistic data association for multiple subject face tracking. For each detected face, temporal data association is performed by computing its similarity score with all the faces in the set of possible and tracked faces. The similarity between two face regions F i and F j is computed using a weighted sum of two factors, as: (2) m  F i ,F j  = α m .s r  F i ,F j  + (1 − α m ) .s o  F i ,F j 

4. Scene Camera Processing The static wide FOV scene camera is used to monitor the capture volume for newly entering human subjects. Towards this end, a background model is first generated to detect the pixels that exhibit significant intensity changes. This is accomplished by generating and maintaining a perpixel statistical model of the scene background as

where α m controls the weights for the two factors, s r denotes the normalized cross correlation, and s o denotes the normalized overlap area. The set of tracked faces is passed to a multi-person scheduling engine to prioritize multiple subjects for iris acquisition. The set of tracked faces, detected faces and foreground regions are also used to update the per-pixel learning rate images for each layer. Tracked faces are targeted by applying a mapping function from scene camera coordinates to pan-tilt angle pairs in the PTU coordinate system. This mapping function is learnt offline through a multi-step calibration process. Once a face has been targeted in the scene camera, face camera processing is activated for multicamera collaborative tracking.

n

estimated from the incoming video frames I , where n is the time index or frame number. We have implemented a two-layered background model where pixels marked as n

foreground by a slow-learning layer ls are used as a filter n

for background update in a fast-learning layer l f . Our background model for each layer Bi i ∈ ls , l f maintains n

not only per-pixel image statistics, but also per-pixel background learning rates updated as:

 In n=0  n p p n n −1 Bi =  I ⋅ ωi + Bi ⋅ (1 − ωi ) 0 < n ≤ N t  n n −1 n −1 n −1  I ⋅ϖ i + Bi ⋅ (1 − ϖ i ) n > N t t

t

AVSS #67

(1)

5. Face Camera Processing The face and iris cameras are mounted on the PTU assembly which is initially moved towards the target subject by the scene camera processing. When a face is detected within the face camera’s FOV control of the PTU, assembly passes from the scene processing module to the face processing module. The face processing engine continuously tracks the mid-point between a subject’s eyes from the face camera image stream. This track center is used in the subject servo loop for continuous human tracking. This involves continuous frontal face detection, and detection and tracking of both eyes.

where N t represents the number of frames in the background training phase, ωi t denotes the constant p

learning rate for each layer during the training phase and

ϖ i n represents the per-pixel learning rates image for the ith layer during detection phase. Binary thresholding is applied on the difference between the incoming image and the background images to generate a mask image for each layer. Foreground results from each layer are then combined to generate a final foreground image. Detection of humans in the foreground image is accomplished through horizontal and vertical projection histograms. Combined modes of these histograms are then used to locate human regions in the scene. For each human region, frontal face detection is performed using a cascade of classifiers trained on Haar-like features using Ada-Boost [14]. The upper body region within the detected human defines the search region. This face detection method can

5.1. Face Detection and Targeting As in the scene processing, a face classifier trained on Haar-like features using Ada-Boost for feature selection [14] is used. The face detector returns a set of faces from the face camera image stream, which includes occasional face false alarms. False alarms are rejected through a face

4

450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499

AVSS #67

500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

scoring mechanism. Detected faces are scored as a weighted combination of three factors: Si = α E SiE + α θ Siθ + α C SiC (3)

MeanShift_EyeTrack Input: y0 : Initial eye location from previous frame.

F ,ED

E

where the first score factor S is based on number of eyes N E and uses an exponential function with a width of σ E :

E E  θ ,θ ≤ θ max θˆ =  E θ max ,θ > θ max

n−d

: Face and detected eye region from frame with

most recent successful eye detection.

F ,ES ,E X

( N E − 2 )2  (4) S E = exp  2 2σ E    The second factor, S θ is based on head angle which is computed as the angle subtended by the line connecting the center of the two eye regions: θˆ Sθ = 1 −

θ max

AVSS #67

n −1

: Face, eye search and detected/tracked eye

regions from the previous frame.

F ,ES

n

: Face and eye search region from current frame

Output: ET

n

: Tracked eye region for current frame

1. Estimate inter-frame eye translation and scaling using eye search regions. Refine the initial eye location using estimated scale S and translation T: ˆy0 = S.y0 + T 2. Warp most recent detected eye region

(5)

ED

n−d

to face

n

region in current frame F . Call this location ˆyw . 3. Initialize object data for eye using initial eye coordinates ˆy0 . Compute Kernel-weighted histogram:

Where θ E is the computed head angle and θ max is limited

qˆ u =

to ±10o . The final face scoring factor SC favors the best centered face. After each face has been scored, the face with the highest score above a threshold is processed for target point tracking towards iris acquisition.

∑k ( x

* 2 i

i =1

where k ( x ) =

5.2. Eyes Detection and Tracking

∑k ( x N

1 N

)

* 2 i

i =1

)δ b ( x ) − u  * i

 1  exp  − x 2  2π  2  1

4. Derive Mean-shift weights image with prior probability weighting using warped eye location ˆyw :

The target point is located as the point mid-way between two eye regions. Instability in the position of this point during acquisition directly affects the subject-servo-loop’s ability to smoothly track the trajectory of the subject. Accurate target point detection and stable target point tracking are of utmost importance for successful iris acquisition. Therefore, we have implemented multiple approaches for detection and tracking of both eyes under challenging imaging conditions. The importance of accurately detecting eyes and its application for face recognition was highlighted in [16]. Their study shows that eye location errors significantly affect the face recognition accuracy. In our system, the maximum eye location discrepancy that will not jeopardize iris acquisition is less than 10 pixels. We address the problem of eye detection using Haar-like features in a cascade of classifiers with feature selection through Ada-boost. The training is done separately for each eye. We use a labeled set of 1,710 positive training set images for each eye. For the negative training set images, we use the image strips cropped from the face and background areas of images that do not include the eye regions. A total of 8,318 negative training set images are used. Separate cascaded classifiers for each eye are learnt using Haar-based features. The classifiers learnt for each eye are used for eye detection inside each face region. Each eye is detected in the respective upper quadrant of the face region. The detected set of eyes is used for face scoring as in Eq.(4). If

m

wi = ∑ u =1

 ˆy − x qˆ u δ b ( xi − u )  × g  w i  ˆpu ( ˆy0 ) h 

2

   

5. Find next location of target candidate:

 y0 − xi 2  x w g  ∑ i i    h i =1   ˆy1 = nh  y0 − xi 2  w g  ∑ i    h i =1   6. If ˆy1 − ˆy0 < ε Stop nh

else Set ˆy0 ←  ˆy1 . Go to Step 4. Figure 3: The algorithm for tracking eye region using Meanshift for density gradient ascent. Eye scale is estimated from inter-frame eye search regions mapping.

one left and one right eye is detected, then the face rotation angle is also computed for scoring as in Eq.(5). Temporal gaps left by failed eye detection are filled by eye tracking using Mean-shift analysis shown in Figure 3. The notations in the algorithm are borrowed from [17] and are not repeated here for the sake of brevity. We address two basic problems inherent in the generic Mean-shift tracking [17]. The first problem is that of object scale; as the face is initially detected at larger distance, the face and correspondingly eyes are of smaller size. As the subject moves closer to the camera, eyes start getting larger. We

5

550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599

AVSS #67

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

address this issue by estimating the true object scale change through inter-frame scale variation of a subject’s face. This solution also accounts for sudden large shifts in object location from the previous frame to the current frame. The second problem is that of object drift; under excessive image variations, the tracker result tends to drift away from the original object location. This is addressed by introducing the prior probability on the object location. This prior probability is estimated by warping the eye detection location of the most recent successful detection, to the current frame face region. This warped eye region gives the best estimate of our prior belief about the object’s location in the current frame, which is refined by the mean-shift iterations. By integrating the scale change and the prior probability distribution of the object location in the Mean-shift framework, we have developed a robust eye tracker. This eye tracker accounts for sudden and gradual object scale and translation changes as well as keeping the object drift in check. The Mean-shift process has been proven to converge to a local mode in the posterior distribution of the object location within a few iterations [17]. Our experiments have shown that our modified Mean-shift eye tracker converges within less than 4 iterations under most circumstances. Uneven ambient illumination presents a challenge for stable target point tracking. This situation is of practical importance because in those regions of space where the face is unevenly lit, eye detection might fail for one side of the face. We address this situation by proposing an adaptive eye template matching based on normalized cross correlation. This approach is automatically started in the case of one eye being detected and/or tracked successfully, but a failure of detection and tracking on the other eye. The adaptive eye template is generated from the eye that has been found. This template is then geometrically mirrored to represent the template being searched for the other eye. Normalized cross-correlation trials are performed at all locations in the eye search region. The location that generates the maximum normalized cross correlation score above a minimum threshold is taken as the new eye location for this side of the face. The above three approaches for accurate eye detection and stable eye tracking are weaved into a single eye processing framework using a binary decision tree. The leaves of the decision tree are the states for the eye processing framework. The intermediate nodes are individual processing algorithms applied to the current frame. This framework guarantees accurate eye detection at every frame based on Haar-based detection. In case of failure, the efficient Mean-shift tracking is performed. If that too fails, and one eye has been detected, then normalized cross correlation-based eye localization is used. The eye regions for both eyes localized using the

AVSS #67

integrated detection and tracking approach are used to compute and track the target point on the subject’s face. The target point location at each frame is then passed to the PTU processing module. This module updates the pan and tilt angles of the PTU in a servo loop to keep the subject’s irises in the field of view of the iris camera.

5.3. Subject Servo Loop Offline calibrations establish homographies between the face and iris cameras. The PTU is then moved to target a subject’s irises centering them in the iris cameras. A subject-servo-loop tracks both the motion of the PTU assembly and the subject. Frequent polling of the position of the PTU is used to form a PTU motion model that estimates past and future positions and velocities of the PTU assembly. Eye tracking in the face camera FOV is combined with range information gathered from the optical rangefinder. Subject coordinates as measured within the face camera are converted to pan and tilt angles relative to the live position of the PTU assembly. The subject’s range is used for triangulation. Absolute pan and tilt angles are then calculated using an estimation of the position of the PTU at the time at which the face camera’s image was recorded. The PTU is instructed to accelerate from its current pan and tilt velocities to coincide with the subject at a future time.

6. Iris Camera Processing The iris acquisition processing is activated when a face has been centered in the face camera. At this stage, the NIR laser illumination is turned on to illuminate the subject’s face region. The focus plane and focal length of the iris zoom lens are adjusted to match the subject’s range trajectory in the capture volume in order to maintain good focus and the required pixel resolution on the iris. The iris acquisition process performs segmentation of the given iris image to isolate the iris region. If the segmentation process succeeds, two iris image quality measures are computed from the segmented iris region. Firstly, a focus score is used to estimate focus and reject blurred iris images. Secondly, a coverage score is used to reject iris images with heavy eyelid or eyelash occlusion. A set of iris images that pass thresholds for these quality measures is stored in the system’s cache memory. Once a number of iris images have been cached, the iris acquisition process stops. The segmented iris region from each acquired image is then encoded into an iris signature bit pattern. This bit-pattern is then matched against all such bit-patterns stored in the database. Finally, if the subject’s iris bit-pattern matches with any of the stored patterns, a match is declared. Results from this processing are displayed on the user interface display.

6

650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699

AVSS #67

700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

AVSS #67

Whole iris on the sensor (752x480)

Figure 5: Tilt angle error and upper bound on angular accuracy for moving and stationary subjects. Figure 4: Absolute pixel errors in eye location during detection and tracking of eyes for two subjects.

7. System Performance Analysis We have performed a number of experiments aimed at quantitatively measuring the performance of Eagle-Eyes in terms of eye tracking accuracy, subject-servo-loop stability, system processing times, iris acquisition times and finally iris recognition performance. The accuracy of a target point computed from the detection and tracking of both eyes is shown in Figure 4 for two stationary subjects at multiple stand-off distances. Absolute pixel errors in target location were measured using a ground truth consisting of manually located eye locations from video data recorded during the experiment. Results are aggregated over approximately 3000+ video frames. The accuracy of the subject-servo-loop is illustrated in Figure 5. Angle variations from ground truth subject angles are shown for two stationary subjects at four different standoff distances and for two moving subjects as they walk through the capture volume. The tilt axis is shown as it has the stricter pointing accuracy requirements. It can be seen from the figure that for stationary subjects, the tilt angle errors are well within the upper bound for successful iris acquisition. Also, for the moving subjects, a large number of frame samples contain tilt errors within the upper bound. The frame rate processing of EagleEye’s video surveillance algorithms is illustrated in Table 1. Scene and face camera processing rates together with the average time between PTU move instructions are shown for a subject walking through the capture volume. Iris recognition performance on 13 individuals at multiple distances in the capture volume are detailed in Table 2. A total of 5 attempts per subject were allowed. The transactional recognition rate was 92% for transactions that allowed up to 3 attempts per subject. Finally, the iris acquisition times are reported in Figure 6. Acquisition

Figure 6: Histogram of acquisition times for different subjects at multiple locations in the capture volume.

times are defined as the time between a subject’s first detection in the scene and the acquisition of their iris. The histogram labeled “watchlist” is restricted to tests using single subjects in the capture volume. The histogram labeled “all acquires” includes tests using multiple subjects in the capture volume. The average acquisition time for the “watchlist” experiment was 6.33 seconds with a standard deviation of 4.2 seconds. Table 1: Processing times for a Dual-Xeon Processor @ 2.33 GHz with 2 GB RAM and 6MB L2 Cache. Average Time (mSec.)

Standard Deviation (mSec.)

Scene Processing

21.0

2.7

Face Processing Time Between PTU Instructions

23.8

4.3

33.2

5.5

Table 2: True match rate (TMR) for iris recognition Up to 3 Attempts

7

All Single Attempts

3.5 m

92% (12/13)

78% (51/65)

4.5 m

92% (12/13)

80% (52/65 )

750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799

AVSS #67

800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849

AVSS 2008 Submission ****. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

[4] A. Ross, A. Jain, “Multimodal Biometrics: An Overview”, Proceedings of 12th European Signal Processing Conference (EUSIPCO), Vienna, Austria, 2004. pp. 1221-1224. [5] D. Usher, Y. Tosa, M. Friedman, “Ocular Biometrics: Simultaneous Capture and Analysis of the Retina and Iris”, N. Ratha, V. Govindaraju (eds.), Advances in Biometrics: Sensors, Algorithms and Systems, Springer 2008. [6] Y. Wang, T. Tan, A. Jain, “Combining Face and Iris Biometrics for Identity Verification”, in Proceedings of 4th International Conference on Audio- and Video- based Biometric Person Authentication (AVBPA), Guildford, UK, June 9-11 2003. [7] C. Fancourt, L. Bogoni, K. Hanna, Y. Guao, R. Wildes, N. Takahashi, U. Jain, “Iris Recognition at a Distance”, In T. Kanade, A. Jain, N. Ratha (eds.) AVBPA 2005. LNCS, Vol. 3546, pp. 1-13. Springer, Heidelberg (2005). [8] J. Matey, O. Naroditsky, K. Hanna, R. Kolczynski, D. Loiacono, S. Magru, M. Tinker, T. Zappia, W. Zhao, “Iris on the Move: Acquisition of Images for Iris Recognition in Less Constrained Environments”, Proceedings of the IEEE, Vol. 94(11), Nov. 2006. [9] G. Guo, M. Jones, P. Beardsley, “A System for Automatic Iris Capturing”, Mitsubishi Electric Research Laboratories, Technical Report, TR2005-044. [10] S. Yoon, H. Jung, J. Suhr, J. Kim, “Non-intrusive Iris Image Capturing System Using Light Stripe Projection and Pan-Tilt-Zoom Camera”, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007. [11] A. Hampapur, L. Brown,H. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl, S. Pankanti, A. Senior, C. Shu, Y. Tian, “Smart Video Surveillance: Exploring the concept of multiscale spatiotemporal tracking”, IEEE Signal Processing Magazine, March 2005, pp. 38-51. [12] M. Shah, O. Javed, K. Shafique, “Automated Visual Surveillance in Realistic Scenarios”, IEEE Multimedia, Vol. 14(1), Jan-March 2007, pp. 30-39. [13] F. Bashir, F. Porikli, “Collaborative Tracking of Objects in EPTZ Cameras”, Visual Communications and Image Processing, VCIP 2007, Vol. 6508 (1). [14] R. Lienhart, J. Maydt, “An Extended Set of Haar-like Features for Rapid Object Detection”, IEEE International Conference on Image Processing, 2002. [15] F. Porikli, “Human Body Tracking by Adaptive Background Models and Mean-shift Analysis”, IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, March 2003. [16] P. Wang, M. Green, Q. Ji, J. Wayman, “Automatic Eye Detection and Its Validation”, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2005, Vol. 3. [17] D. Comaniciu, V. Ramesh, P. Meer, “Kernel-Based Object Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25(5), May 2003. pp. 564-577. [18] “Information Technology, Biometric Data Interchange Format. Iris Image Data”, ISO/IEC 19794-6:2005.

8. Discussion and Conclusions In this paper, we have addressed the issue of constrained iris acquisition in current iris recognition systems. We have proposed a long-range multi-biometric acquisition system for human identification. The system is capable of acquiring the face and both irises from multiple humans present anywhere in the capture volume. The system has been tested in an indoor environment with multiple subjects. Preliminary performance results in terms of eye tracking and pointing accuracy, processing and acquisition times and recognition rates are reported. It is shown that our system is able to perform background modeling, human detection, face detection, eye tracking and subject servo loop processing at video rate. Sufficient pointing accuracy and stability for iris acquisition has been achieved for stationary subjects and is approaching what is required for moving subjects. Results are preliminary and work is in process to improve all aspects of the current system. A qualitative comparison of the Eagle-Eyes system with other less constrained iris capture systems is presented in Table 3. The comparison clearly shows that our system significantly improves on existing iris acquisition approaches in terms of stand-off distance and capture volume through the use of hierarchical scene and face tracking. A comprehensive system analysis in terms of iris recognition performance for various scenarios is underway as future work. Table 3: Comparison of Eagle-Eyes with other long-range iris acquisition systems. Feature Stand-Off Distance Capture Volume Number of Subjects Moving Subjects Outdoor Capable Camera Resolution Scalable Iris Illumination

Sarnoff [8]

Yoon [10]

MERL [9]

3 m.

1.5 m.

1.2 m

3m

0.008 m3

1 m3

0.22 m3

18 m3

EagleEyes

1

1

1

<4

Yes

No

No

Yes

No

-

-

Yes

4 MPix

?

6 MPix

0.3 MPix

No

?

No

NIR LED

NIR LED

None

Yes NIR Laser

AVSS #67

9. References [1] G. Williams, “Iris Recognition Technology”, IEEE Aerospace and Electronic Systems Magazine, Vol. 12(4), pp. 23-29, 1997. [2] X. Zhou, R. T. Collins, T. Kanade and P. Metes, “A MasterSlave System to Acquire Biometric Imagery of Humans at Distance”, ACM International Workshop on Video Surveillance, Nov. 2003. [3] J. Daugman, “How iris recognition works”, IEEE Trans. On Circuits and Systems for Video Technology, CSVT, Vol. 14(1), January 2004, pp. 21 - 30.

8

850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899

Video Surveillance for Biometrics: Long-Range Multi-Biometric System

recognition system for long-range human identification. The system is capable of ...... International Conference on Audio- and Video- based. Biometric Person ...

426KB Sizes 2 Downloads 220 Views

Recommend Documents

Outdoor Video Surveillance System
system just transmits the acquired video to a base station where most of the processing is ..... Barnard Communications, IEEE conference on UAVs, 2007.

Semantic Video Event Search for Surveillance Video
2.2. Event Detection in Distributed System. Conventional video analytic systems typically use one processor per sensor. The video from the calibrated sensor.

Group Event Detection for Video Surveillance
and use an Asynchronous Hidden Markov Model (AHMM) to model the .... Based on the description of SAAS, before detecting the symmetric activity of each ...

Human Activity Recognition for Video Surveillance
of unusual event recognition with lack of training data. Zelnik-Manor et al. .... a GMM classifier Ci for each CFV Fi with MAP (Maximum a. Posteriori) principle, as ...

VERSA – Video Event Recognition for Surveillance ...
Nov 3, 2008 - streamed by a lower-level video analytics service that provides basic ..... Similarly, supervised learning methods ...... flatten([R|Relsx],Rels).

pdf-1321\security-camera-system-buying-guide-cctv-surveillance-for ...
... one of the apps below to open or edit this item. pdf-1321\security-camera-system-buying-guide-cctv-surveillance-for-home-business-by-samuel-page.pdf.

Omnidirectional Surveillance System Using Thermal ...
Abstract- Thermography, or thermal visualization is a type of infrared visualization. .... For fast image and data transfer of real-time fully radiometric 16-bit images, ...

111.A WIRELESS SURVEILLANCE AND SAFETY SYSTEM FOR ...
A WIRELESS SURVEILLANCE AND SAFETY SYSTEM FOR MINE WORKERS BASED ON ZIGBEE.pdf. 111.A WIRELESS SURVEILLANCE AND SAFETY ...

Energy-Efficient Surveillance System Using Wireless ... - CiteSeerX
an application is to alert the military command and control unit in advance to .... to monitor events. ...... lack of appropriate tools for debugging a network of motes.

[PDF] Digital Video Surveillance and Security, Second ...
Online PDF Digital Video Surveillance and Security, Second Edition, Read .... this revised edition addresses the newest technologies and solutions available on ...

Interactive Exploration of Surveillance Video through ...
provide tools for spatial and temporal filtering based on regions of interest. This allows analysts to filter .... therefore challenging for an automatic vision algorithm, and (b) need ...... feedback on the final deployment of our tools. We also tha

Emergency facility video-conferencing system
Oct 24, 2008 - Health Service Based at a Teaching Hospital, 2 J. of Telemed. &. Telecare .... BBI Newsletter, Welcome to the ROC (Remote Obstetrical Care),. BBI Newsl., vol. ...... (IR) beam transmission for sending control signals to the.

IJCB2014 Multi Modal Biometrics for Mobile Authentication.pdf ...
IJCB2014 Multi Modal Biometrics for Mobile Authentication.pdf. IJCB2014 Multi Modal Biometrics for Mobile Authentication.pdf. Open. Extract. Open with. Sign In.

Emergency facility video-conferencing system
Oct 24, 2008 - tors Telehealth Network, Inc.'s Preliminary Invalidity Contentions. Under P.R. 3-3 and Document ... Care Costs, An Evaluation of a Prison Telemedicine Network,. Research Report, Abt Associates, Inc., ..... with a camera mount enabling

Emergency facility video-conferencing system
Oct 24, 2008 - Based on Wireless Communication Technology Ambulance, IEEE. Transactions On ..... Tandberg Features, Tandberg Advantage.' Security ...

pdf biometrics
Sign in. Page. 1. /. 1. Loading… Page 1. pdf biometrics. pdf biometrics. Open. Extract. Open with. Sign In. Main menu. Displaying pdf biometrics. Page 1 of 1.

Biometrics Security.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Biometrics ...

Biometrics Security.pdf
Page 1 of 1. Page 1 of 1. Biometrics Security.pdf. Biometrics Security.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Biometrics Security.pdf.

an audio indexing system for election video ... - Research at Google
dexing work [1, 2, 3, 4] however here the focus is on video material, the content of ..... the “HTML-ized” version of the documents, and compared the two retrieval ...

System and method for synchronization of video display outputs from ...
Jun 16, 2009 - by executing an interrupt service routine by all host processors. FIG. 9 .... storage medium or a computer netWork Wherein program instructions are sent over ..... other information include initialization information such as a.