Point-and-Shoot for Ubiquitous Tagging on Mobile ...

Viewer
Transcript

International Symposium on Mixed and Augmented Reality 2010 2010. 10. 13~16, Seoul, Korea

Point-and-Shoot for Ubiquitous Tagging on Mobile Phones Wonwoo Lee, Y.Park, W. Woo GIST U-VR Lab. V. Lepetit EPFL CVLab.

Introduction •

We propose a novel 3D augmentation method with minimalist user interaction on mobile phones

•

In situ 3D augmentation through a simple point-and-shoot approach

• •

No complex 3D reconstruction Target detection from unseen viewpoints

Introduction •

We propose a novel 3D augmentation method with minimalist user interaction on mobile phones

•

In situ 3D augmentation through a simple point-and-shoot approach

• •

No complex 3D reconstruction Target detection from unseen viewpoints

Introduction • The proposed method follows a standard procedure of target learning / detection

Input Image

Online Learning

Real-time Detection

Introduction • The proposed method follows a standard procedure of target learning / detection

Input Image

Online Learning

Real-time Detection

Online Target Learning If

• Input: Image of the target plane • Output: Patch data and camera poses If

I1

I2

I3

Input patch

Input patch

1st pass: patch warping

Assumptions • Known camera parameters • Horizontal or vertical surface

2nd pass: radial blurring

3rd pass: Gaussian blurring

Input Image

I1

I2

I4 1st pass: patch warping

4th pass: Accumulation

Frontal View Generation

Blurred Patch Generation

Post-processing

Online Target Learning If

• Input: Image of the target plane • Output: Patch data and camera poses If

I1

I2

I3

Input patch

Input patch

1st pass: patch warping

Assumptions • Known camera parameters • Horizontal or vertical surface

2nd pass: radial blurring

3rd pass: Gaussian blurring

Input Image

I1

I2

I4 1st pass: patch warping

4th pass: Accumulation

Frontal View Generation

Blurred Patch Generation

Post-processing

Frontal View Generation • We need a frontal view to create the patch data and their associated poses

Targets whose frontal views are available

Frontal View Generation • However, frontal views are not always available in the real world

Targets whose frontal views are NOT available

Frontal View Generation • Objective: Fronto-parallel view image from the input image

• Approach: Exploit the phone’s built-in accelerometer

•

1 DoF assumption: Target plane has only pitch rotation)

the third row of Figure 12, the augmentation will correspond to a large object if the camera is far away from the surface; conversely, as shown in the fourth row, the augmentation will correspond to a small object if the camera is close to the surface. This is very intuitive but limited to some range of scale within which the user can move the phone, and the interface lets the user adjust the scale if needed.

standard to gener more effi dial blur to comp

Frontal View Generation

Warp ence ima a new pa Under the 1 DoF assumption As illustrated in Figure 2, we can finally describe how we genera 320 × ate a virtual fronto-parallel view of the target from the input image. Section 3 details how the set of templates are built and whereas etection and tracking. Experimental results are given Without loss of generality weinset the pose of the virtual camera Frontal view camera: [I|0] iments. We provide conclusions in Section 5. in the fronto-parallel location as [I|0]. The orientation obtained as Patch Virtual Using Z Captured view camera: [R|c] surface for the captured imTED W ORK explained above gives us the rotation matrix Rfrontal during le θ age, which is a rotation around the X-axis in this coordinate syscent works showed that it is possible to run Computer P view Zage. = d0On c gorithms for localization 3D to tracking on mobile tem. It isand easy see that the coordinates of the camera center c failures. 15, 16, R 17, = 18].Rot They arePall Captured ) based on feature points� X (θ [0, d0 sin θ p , d0to(1work − cos θ p )] , and the translation vector for the ore require a are fair amount of texture correctly. view ing unlik � Y c =captured [0, d0 have sinimage θrelatively cos θPFrom )] [4], the expression of the homogmobile phones often camis(1 tlow-quality =−−Rc. P , d0 summar h tend to blur the images under fast motion and make the raphy H f ←c that warps the capturedFigure image to thethevirtual frontal 2: Defining camera pose in the case of a vertical To ges ints difficult to detect. view is then: Without loss of generality the pose of the frontal view is defi refore considered Gepard, anwarping alternative from methodthe based Homography captured view tothethe frontal view rotations [I|0]. Then in case of a vertical surface, the coordinate te matching, which was proved to be adapted to poorly camera center c are [0, d0 sin θ p , d0 (1 − cos θ p )]� . also use bjects and blurry images [5]. Given an image patch to�de�−1 the trans rd generates a set of “mean patches”. Each mean patch is � tn −1 as the average of the patches seen over range of a rangem Haflimited = K R − K , which is further refined (5) using template ←c spatial orientation, s, and the ranges over all the mean patches cover all possid0 techniques. Then, by comparing an input patch to the mean patches, Radia In practice, Eq. (1) is not used directly as this would cognize it and get an estimate of the camera viewpoint. get a ne

•

Frontal View Generation

Guessing Target Pose • The orientation of a target (H / V)

is recommended based on the current pose of the phone π π − < θp < + 4 4

: the surface is vertical

Otherwise, the surface is horizontal

• Too stiff rotation cannot give a good frontal image

Blurred Patch Generation • Objective: learn the appearances of a target surface fast

• Adopt the approach of patch learning in ‘Gepard’ (Hinterstoisser et al. 2009)

•

Real-time learning of a patch on the desktop

Review: Gepard • Fast patch learning by linearizing image warping with PCA

• ‘Mean patch’ as a patch descriptor • •

Direct comparison with input image No complex descriptor generation

Review: Gepard • Difficult to directly apply to mobile phone platform

• •

Low performance of mobile phone CPU Large amount of pre-computed data is required (about 90MB)

Keypoint Recognition & Coarse

Keypoint Recognition & Coarse Blurred Patch Generation Our Solution:

Simple Descriptor for p

Keypoint Recognition Our Solution: & Coarse Pose Es

of • Approach: Use blurred patch instead Simple Descriptor for p

( (

mean patch Our Solution:

Input patch Gepard Ours

(

Mean Simple Descriptor for patchMean p Warped patches Mean patch

Our descriptorMean Mean

Our descriptor Blurring Our descriptor

(

( (

Mean

Mean

Mean

)

Blurred patch

Blurred Patch Generation • Generate blurred patches through multipass rendering in a GPU

•

Faster image processing through a GPU’s Parallelism

If

I1

Input patch

I2

1st pass: patch warping

I3

2nd pass: radial blurring

3rd pass: Gaussian blurring

I4

4th pass: Accumulation

Blurred Patch Generation • 1st Pass: Warping • •

Render the input patch from a certain viewpoint Much faster than on CPU

If

I1

Input patch

I2

1st pass: patch warping

I3

2nd pass: radial blurring

3rd pass: Gaussian blurring

I4

4th pass: Accumulation

Blurred Patch Generation • 2nd Pass: Radial blurring to the warped patch

•

Allow the blurred patch covers a range of poses close to the exact pose

If

I1

Input patch

I2

1st pass: patch warping

I3

2nd pass: radial blurring

3rd pass: Gaussian blurring

I4

4th pass: Accumulation

Blurred Patch Generation • 3rd Pass: Gaussian blurring to the radialblurred patch

•

Make the blurred patch robust to image noise

If

I1

Input patch

I2

1st pass: patch warping

I3

2nd pass: radial blurring

3rd pass: Gaussian blurring

I4

4th pass: Accumulation

Blurred Patch Generation • 4th Pass: Accumulation of blurred patches in a texture unit

•

Reduce the number of readback from GPU memory to CPU memory

If

I1

Input patch

I2

1st pass: patch warping

I3

2nd pass: radial blurring

3rd pass: Gaussian blurring

I4

4th pass: Accumulation

Post-Processing • Downsampling blurred patches •

(128x128) to (32x32)

• •

Zero mean and Stdev of 1

• Normalization Robustness to intensity changes

Detection & Tracking • •

User points the target through the camera Square patch at the center of the image is used for detection

Input patch at t

Patch detected in (t-1)?

NO

YES

Patch Descriptor Comparison

Pose Update

Pose Refinement

Patch Varification with NCC

Detection & Tracking • Initial pose is retrieved by comparing the

input patch with the learned mean patches

• ESM-Blur (Y.Park et al., ISMAR09) is applied for further pose refinement

• NEON instructions are used for faster pose refinement

Experimental Results • Patch size: 128 x 128 • Number of views used for learning: 225 • Maximum radial blur range: 10 degrees • Gaussian blur kernel: 11x11 • Memory requirement: 900 KB for 225 views

Experimental Results

Images used for learning

Detection from different views

Experimental Results

Detection in different scales

Experimental Results

Targets whose frontal views are unavailable

Experimental Results

Targets whose frontal views are unavailable

Experimental Results

Targets whose frontal views are unavailable

Experimental Results

More examples in real scenes

More examples in real scenes

Experimental Results • Instant 3D augmentation

Experimental Results • Share the learned data with nearby mobile phones via Bluetooth communication

Experimental Results

iPhone 3GS

PC

CPU

ARM 600MHz

Intel QuadCore 2.2 GHz

GPU

PowerVR SGX 535

GeForce 8800 GTX

Renderer

OpenGL ES 2.0

OpenGL 2.0

Experimental Results 11,019.2

12000

7,993.0

9000 5,396.2

6000 3000

2,746.3

3,396.6

0

547.6

600

14,162.6

Learning time (ms)

Learning time (ms)

15000

500

420.5

400

324.3

300 200

238.6 148.9

169.2

108

135

100 0

108

135

210

300

420

540

Number of views Postproc.

Readback

Gaussian blur Radial blur

210

300

420

540

Number of views

Accumulation

Postproc.

Warping

Gaussian blur Radial blur

iPhone 3GS

Readback

Accumulation Warping

PC

More views, more rendering Slow radial blur due on the mobile phone Possible speed improvement through shader optimization

Experimental Results • Comparison with Gepard (Hinterstoisser et al. 2009)

Data sets from www.metaio.com/research/

Gepard

Proposed

Sign-1

96.400002

93

Sign-2 Car Wall

84 96.3 91.199997

76 86.8 74

90.599998 97 98.56

90.2 95 92.67

73.199997 51.59 93.400002

57.8 41.2 68.2

Experimental Results

Grass Macmini Board graf1 stop_sign_f book_SMALL2

•

Comparison with Gepard (Hinterstoisser et al. 2009) 93.800003 69.199997 94.599998

95.2 82.2 98.6

Gepard

100

Detection Performance (%)

City Cafe Book

Proposed

75

50

25

0 Sign-1 Sign-2

Car

Wall

City

Cafe

Data set

Book

Grass Macmini Board

Limitations • Weak to repetitive textures and reflective surfaces

• Currently single target only

Conclusions • Potential applications • •

AR tagging on the real world AR Apps Anywhere anytime

• Future work • •

Addressing 1 DoF constraint More optimization on mobile phones