ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring Pichao Wang1 , Wanqing Li1 , Zhimin Gao1 , Chang Tang2 , Jing Zhang1 , and Philip Ogunbona1 1 Advanced Multimedia Research Lab, University of Wollongong, Australia; 2 School of Electronic Information Engineering, Tianjin University, China [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] B ACKGROUND

R OTATION TO MIMIC VIRTUAL CAMERAS

1. Action recognition has been an active research topic in computer vision due to its wide range of applications including intelligent surveillance and human-computer interactions. 2. The release of the Microsoft Kinect Sensors opens up new opportunities for action recognition. 3. Deep learning approach has achieved great success in several kinds of applications.

Pd

Image center (Cx,Cy)

X

o

β Po

O

θ Pt

y z

Z

f

x

(a)

Fig.1 Action Recognition source from

T. Lan et. al.

Fig.2 Kinect Sensors source from Apple

Fig.3 Deep Learning source from VLab MIT

The rotation of the 3D points can be performed equivalently by assuming a virtual RGB-D camera moves around and points at the subject from different viewpoints. The coordinates of subject with respect to the virtual camera can be computed by the transformation:  T 0 0 0 T [X , Y , Z , 1] = Try Trx X, Y, Z, 1 (1) 0

0

0

where X , Y , Z represent the 3D coordinates with respect to the virtual camera system and Try denotes the transformation along Y axis (right-handed coordinate system) while Trx denotes the transformation along X axis and they are:

P ROPOSED M ETHOD The proposed method consists of two major components: three ConvNets3 and construction of DMMs2 from sequences of depth maps as the input to the ConvNets. Given a sequence of depth maps, 3D points are created and three DMMs are constructed by projecting the 3D points to the three orthogonal planes. Each DMM serves as an input to a ConvNet for classification4 . Final classification of the given depth sequence is obtained through a late fusion of the three ConvNets. Three strategies have been developed to deal with the challenges posed by small datasets. Firstly, more training data are synthesized by rotating the input 3D points to mimic different cameras; Secondly, the same ConvNet architecture as the one for ImageNet is adopted so that the model trained over ImageNet can be adapted to our problem through transfer-learning. Thirdly, each DMMs goes through a pseudo-color coding process to separate different motion patterns with enhancement into the PseudoRGB channels before being input to the ConvNets. ConvNet Rotation And Pseudocoloring

(b)

11 11

    Ry (θ) Ty (θ) Rx (β) Tx (β) Try = ; Trx = 0 1 0 1 

where

1 Ry (θ) = 0 0

0 cos(θ) sin(θ)

    cos(β) 0 0 0 −sin(θ) Ty (θ) =  Z · sin(θ)  ; Rx (β) =  −sin(β) Z · (1 − cos(θ)) cos(θ)

0 1 0

(2)

   −Z · sin(β) sin(β) . 0 0  Tx (β) =  Z · (1 − cos(β)) cos(β)

P SEUDOCOLORING Motivated by the work1 where color-coding can harness the perceptual capabilities of the human visual system to extract more information from gray images and, hence, effectively to enhance the detailed texture patterns contained in the image, it is proposed in this paper to code a DMMs into a Pseudo-color image such that to effectively exploit/enhance the texture in the DMMs that corresponds to the motion patterns of actions.

c

DMMf

4096

4096

Ci=1,2,3

ConvNet

11

c

DMMs

4096

4096

class score fusion

ConvNet

Input depth maps

Rotation And Pseudocoloring

11 11 c

DMMt

4096

conv1

conv2

conv5

fc6

4096

fc7 fc8

fusion

R EFERENCE 1. B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi, “Improving weapon detection in single energy X-ray images through pseudocoloring,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 36(6):784–796, 2006. 2. X. Yang, C. Zhang, and Y. Tian, “Recognizing actions using depth motion maps-based histograms of oriented gradients,” ACM MM, 2012. 3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, 2012. 4. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”arXiv:1408.5093, 2014.

1

Corresponding normalized color value

Rotation And 11 Pseudocoloring

1 1 2 = {sin[2π · (−I + ϕi ) · + ]} · f (I) 2 2

0.9 0.8 0.7 0.6

R(α = 1) G(α = 1) B(α = 1) R(α = 10) G(α = 10) B(α = 10) Amplitude modulation

0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

Normalized gray level

E XPERIMENTAL R ESULTS

(3)

ConvNets-Based Action Recognition from Depth Maps ...

Fig.2 Kinect Sensors source from Apple. Fig.3 Deep Learning source from VLab MIT. REFERENCE. 1. B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi, ...

2MB Sizes 2 Downloads 207 Views

Recommend Documents

Automatic Human Action Recognition in a Scene from Visual Inputs
problem with Mind's Eye, a program aimed at developing a visual intelligence capability for unmanned systems. DARPA has ... Figure 1. System architectural design. 2.2 Visual .... The MCC (Eq. 2) is a balanced measure of correlation which can be used

Jump Stitch Metadata & Depth Maps Version 1.0 Developers
Sep 14, 2016 - major version number: Major version of the metadata file. This value will be incre- mented when changes are made to the file format which are ...

Human Activities Recognition using Depth Images.pdf
Human activities recognition is useful in many applica- tions like surveillance, action/event centric video retrieval. and patient monitoring systems. A large ...

An Investigation into Face Recognition Through Depth Map Slicing
Sep 16, 2005 - Face Recognition, Depth Map, Local Binary Pattern, Discrete Wavelet ..... Other techniques, outlined below, can be used to reduce this. The first ...

Question Generation from Concept Maps - CiteSeerX
365 Innovation Drive Memphis, TN 38152 ... received extensive attention in computer science for several decades, with ...... This result supports the hypothesis.

Question Generation from Concept Maps
ural language generation maps from many possible representations to one piece of text (Dale et al. 1998). Question answering, after all, requires ..... The fourth and final question generation mechanism is conversation-control, ... coding for discrim

Question Generation from Concept Maps - CiteSeerX
concept maps from a textbook and uses them to generate questions. ... primarily in educational/psychological circles (Beck et al. 1997 ...... tional Conference on Information Technology, Coding and Computing, volume 1, pages 729–733,.

Feature Seeding for Action Recognition ... H - Research at Google
call seeding, improve performance on an action classifica- .... 1We use the word “feature” in the same way as in boosting ap- .... 3. Method. 3.1. Descriptor extraction and handling. Our synthetic data can support a wide variety of motion.

Model Recommendation for Action Recognition - CMU Robotics Institute
one specific case: view-dependent action recognizers. Given a pool of 1600 ..... recognize “walk” from any angle at all (0◦ − 360◦), there is no reason to prefer ...

Feature Seeding for Action Recognition ... H - CMU Robotics Institute
Introduction. A human researcher ..... domly generated from a large motion capture database (see ... CMU motion capture database [5] and adding temporal dis-.

View-invariant action recognition based on Artificial Neural ...
View-invariant action recognition based on Artificial Neural Networks.pdf. View-invariant action recognition based on Artificial Neural Networks.pdf. Open.

Action and Event Recognition in Videos by Learning ...
methods [18]–[20] can only adopt the late-fusion strategy to fuse the prediction ... alternatively adopt the early fusion strategy to form a lengthy ...... TABLE III. TRAINING TIME OF ALL METHODS WITHOUT USING PRIVILEGED INFORMATION ON THE KODAK DA

Sampling Strategies for Real-time Action Recognition
interaction and smart home. State-of-the-art approaches. [23, 24, 30] have ... diving with water background and skiing with snow back- ground), and thus provide ...

Human Action Recognition in Video by 'Meaningful ...
Bag-of-word based action recognition tasks either seek right kind of features for ... The emphasis on the pose specific details is in accordance with the theme of this ... supervised fashion, i.e. this set of poses is constructed sep- arately for eac

Human Action Recognition using Local Spatio ...
Shenzhen Institute of Advanced Integration Technology .... An illustration of 2-dimensional manifold embedding using LPP, LSDA and our LSTDE method for ...

Human Action Recognition in Video by 'Meaningful ...
condensation technique alleviates the curse of dimensional- ity by mining the multi-dimensional pose descriptors into a kd-tree data structure. The leaf nodes of the kd-tree ...... Morgan Kaufmann. Publishers Inc., 1981. [13] G. Mori and J. Malik. Es

Depth and Occlusion Estimation from Uncalibrated ...
May 10, 1999 - using Dynamic Programming along the Epipolar Lines. N. Grammalidis, L. ... of the displacement field along each epipolar line and to identify occluded areas in both images. *This work ... presence for occluded pixels are still enforced

HOW TO DOWNLOAD SPANISH GOVERNMENT MAPS FROM OUR ...
HOW TO DOWNLOAD SPANISH GOVERNMENT MAPS FROM OUR SITE.pdf. HOW TO DOWNLOAD SPANISH GOVERNMENT MAPS FROM OUR SITE.pdf.

Beyond Beck: Design Of Schematic Maps From
disabled accessibility maps (separate step free and avoiding stairs versions) and ... domain in the primary representational schemes of a notation or visualization. .... may exist, a dot's colour represents from where the user has come, with the ...

Inferring Maps and Behaviors from Natural Language ...
Visualization of one run for the command “go to the hydrant behind the cone,” showing .... update the semantic map St as sensor data arrives and refine the optimal policy .... that best reflects the entire instruction in the context of the semant

Mining action rules from scratch
Action rules provide hints to a business user what actions (i.e. changes within some values of flexible attributes) should be taken to ... changed (age, marital status, number of children are the ... Class association rule (CAR) is a small subset of

Mining action rules from scratch
... most of these techniques stop short of the final objective of data mining-providing ..... Table 2, the dataset stores the basic data of customers, with five attributes ...