3D shape estimation and texture generation using ...

Viewer
Transcript

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

3D shape estimation and texture generation using texture foreshortening cues Jeffrey B. Colombe The MITRE Corporation [email protected]

Abstract The surfaces of 3D objects may be represented as a connected distribution of surface patches that point in various directions with respect to the observer. Viewpoint-normal patches are those whose tangent plane is perpendicular to the line of sight. Foreshortening of surface patches results from their obliquity, with a directional wavelength compression, and an accompanying 1-dimensional stretching of the spatial frequency distribution. This stretching of spatial frequency distributions was used to generate plausible depth illusions via local foreshortening of surface textures rendered from a stretched spatial frequency envelope. Texture foreshortening cues were exploited by a multi-stage image analysis method that revealed local dominant orientation, degree of orientation dominance, relative power in spatial frequencies at a given orientation, and a measure of local surface obliquity, which provides incomplete but useful information in a multi-cue depth estimation framework.

1. Introduction The judgement of the shapes of 3D surfaces from texture has been in evidence in the techniques of painters for centuries, and has been addressed as a scientific subject at least since the 1950s [1]. Theoretical treatments (including [2-11]) have emphasized the judgement of the slant of a plane in perspective, the judgement of degree of curvature, the spatial frequency (or spectral) differences between texture patches at different distances and/or obliquities with respect to viewpoint (e.g., [12]), spectral differences in baseline textures (e.g., [13]), and the intrinsic limitations of generic textures as a source of information about shape (e.g., [14,15]). A very interesting related study applies to the judgement of 3D surface shape from specularity, or shininess[16].

This paper explores the relationship between foreshortening of local surface texture due to viewpoint-obliquity, and locally evaluated spatial frequency content. A texture illusion framework is introduced that uses a spatial-frequency plane envelope on frequency noise to bias, or modulate, the amplitudes of spatial frequencies, simulating the effects of oriented obliquity of surface patches. A two-stage neural network with fixed architecture and weights is used to analyze artificial and natural images based on the degree of deviation from isotropy in the spatial frequency plane locally within images. This analysis is able to extract partially informative cues for the judgement of shape from texture foreshortening. Limitations and other uses of this analytic preprocessing are discussed.

2. Methods Texture illusions were generated using a baseline, or viewpoint-normal, texture that consisted of frequency-plane noise multiplied by a radially isotropic, exponentially decaying envelope centered at the origin of the f(x,y) spatial frequency plane. The pixel amplitudes of points lying on a surface behind the image plane were calculated by elliptically stretching the spatial frequency envelope in a direction and to a degree corresponding to the azimuthal direction of tilt and obliquity, respectively, of the surface point underlying each pixel. The envelope was then filtered by a Nyquist antialiasing filter, to remove supra-Nyquist frequencies. Entire images were generated from each resulting enveloped frequency distribution using an inverse Fourier transform, and the single pixel of interest was retrieved and stored in a final image buffer. For brevity, equations are not included here; a supplement describing quantitative methods is available by request to the author.

1

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

Figure 1. Shape-from-texture illusion, and depiction of projectional foreshortening. The texture illusion shown here combines the frequency-plane texture foreshortening method described herein, with Lambertian shading. The superimposed graphics show how a surface patch undergoes directional foreshortening as a function of projective geometry: wavelengths and frequencies parallel to the direction of obliquity are not affected by the projection. The textural analysis of images was performed using a multi-stage neural network with fixed architecture and connection weights. The first stage was composed of Gabor wavelets spanning 8 orientations, 4 spatial frequencies, and 2 quadrature phases (even and odd; [17]), resembling both the receptive field properties of ‘simple cells’ in mammalian primary visual cortex [18,19], and the results of Independent Components Analysis of natural images [20,21]. This Gabor basis was convolved with the entire analyzed image. The outputs of quadrature pairs were squared and summed to produce a phase-invariant power measure for each orientation / spatial frequency channel (32 channels total), resembling the receptive field properties of ‘complex cells’ in primary visual cortex [22], and providing a non-negative, expansive nonlinearity between otherwise linear processing stages. The second stage was composed of linear filters on the outputs of the first stage, with weights that corresponded to oriented, elliptical envelope functions in the spatial frequency plane of the same type used to generate the texture illusions. These spanned 4 degrees of elliptical eccentricity (by choice), and 8

dominant orientations (reflecting the underlying 8 orientation channels from stage one). The third stage involved reducing the 32dimensions-per-pixel output of stage two into three analytic dimensions. The first corresponded to dominant local orientation (ψ). Subsets of filters with the same orientation, but different spatial-frequency elliptical eccentricities, were summed to produce 8 orientation channels. Projections of these intensities in the spatial frequency plane across 180 different directions was used to identify the direction of greatest projection strength; this was the resulting dominant orientation (note that this 180-direction expansion allowed analysis of directions that were interpolated among the original 8 orientations). The second dimension corresponded to the degree of local orientation dominance (d), calculated as the ratio between the directions of greatest and least orientational projection strength. The third dimension corresponded to an estimate of local elliptical eccentricity (ε) in spatial frequency. Subsets of filters with the same eccentricity, but different orientations, were summed to produce 4 eccentricity channels. The

2

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

Figure 2. Mechanics of texture generation, with reference to the spatial frequency plane. Pixels in the generated texture are sampled from a modulated frequency distribution composed of an elliptical envelope (shown, left panels) that multiplies spatial frequency plane noise (not shown). For each pixel, an entire modulated texture is generated (right panels), and the value at the position of the pixel in question is stored in the final rendered image. The appearance of surface obliquity is created through selective biasing of certain spatial frequencies, relative to a baseline ‘view-normal’ isotropic distribution (upper right panels). local elliptical eccentricity was calculated as the ratio of the two highest eccentricity channels to the sum of all four. These three analytic dimensions were assigned hue (dominant local orientation), saturation (degree of local orientation dominance), and value (local elliptical eccentricity) roles, respectively, in a color map for visualization. Local surface obliquity was calculated as the product of local orientation dominance (d) and local elliptical eccentricity (ε), to reflect the foreshortening cue provided by ε, modulated by the degree of orientation dominance d (low-dominance surface patches might show high spatial frequencies without significant foreshortening, for example punctate speckles in images).

3. Results The texture illusion mechanism was used to generate spheres, both with (Fig. 1) and without (Fig. 2) the addition of Lambertian shading. Superimposed

graphics on Figure 1 show the effects of projective geometry on local surface patches, in particular how the direction and degree of obliquity of surface patches with respect to viewpoint compresses wavelengths and expands spatial frequencies in the direction of foreshortening. Parallel wavelengths and spatial frequencies are not affected. This property results in a relative boosting of perceivable orientations perpendicular to the direction of foreshortening. Figure 3A shows a sample of the Gabor filters used in the first stage of analysis. Quadrature pairs (columns) across 8 orientations (enumerated within rows) are shown for one spatial frequency; three other spatial frequencies were also used (not shown). Figure 3B shows the spatial-frequency plane weighting imposed by second-order filters on first-order outputs. Columns indicate preferred orientation, while rows indicate preferred degrees of spatial-frequency plane eccentricity (corresponding to degree of foreshortening). Figure 4 shows the results of the analysis of artificial and natural images. Locally dominant orientations are visualized as hue (or ‘color’), the

3

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

Figure 3. Two stages of filtering for image analysis. A) Gabor filters at eight orientations (columns) and two phases (rows; even and odd). Even and odd phases constituted the components of quadrature pairs (see text). Four spatial frequencies were used in total (three are not shown), for a total of 64 filters per local image patch (e.g., per center pixel of each patch). B) Second-order filters in the spatial frequency plane. Columns indicate dominant orientation preference, rows indicate degree of eccentricity preference (see text). degree of local orientation dominance is visualized as color saturation, and the relative power in higher spatial frequencies locally is visualized as value (or ‘brightness’). Analysis of the shape-from-texture illusion shows concentrically organized chromatic hue variation, and radially organized saturation and value variation, which reflects the underlying spatial frequency content of the rendered shape. The middle column (rusty faucet image, top, with analysis, bottom) shows both salient edges around figureground boundaries, as well as a progressive shift in the lower cylinder portion from relatively neutral orientations in the center to vertically oriented (cyan color code) portions that are foreshortened. The right column (avocado image, top, with analysis, bottom) shows an example of poor performance of the algorithm, due largely to the blurring of higher frequencies in the image, and also due to the specular (or ‘shiny’) nature of the surface, and the resulting effects on relative amplitudes of low and high frequency components across the surface. However, this natural-image example shows how the analysis presented here may be repurposed to engage shapefrom-specularity estimates, given that the color-coded green-blue region at the center of the specular-lit surface is indicative of the orientation of the avocado’s surface under specularity constraints [16]. Figure 5 (right panel) shows a simple estimation of surface obliquity from viewpoint-normal, from the input image (left panel). The results are confounded to some degree by the local vagaries of surface texture, which only yield limited information about the baseline or ‘source’ texture and its frequencymodulated properties under projectional distortion. This result highlights the problem that luminance profiles alone cannot completely disambiguate obliquity or relative depth.

4. Discussion Textured surfaces undergo a projective distortion due to the direction and degree of obliquity of local surface patches relative to the direction of view in an image. This distortion involves a local obliquitydependent compression of wavelengths in the image plane, and a concomitant directional dilation of spatial frequencies. We have demonstrated that an illusion of local surface obliquity, curvature, and depth can be created by local image-plane modulations of a baseline, isotropic frequency-plane texture, without actually dilating the underlying noise distribution in the spatial-frequency plane. We have also demonstrated that a feedforward analytic architecture that uses the same assumptions about relative frequency amplitudes can be used to extract meaningful, but limited, information about surface obliquity and shape. The major limitation of the described analytic method is that there is no comparison of local estimates of image properties across space, laterally, within the image. Local analysis alone can only give limited information about texture content, such that differences in local textures across an image with the same 3D surface orientation may be attributable either to differences in obliquity (falsely), or differences in apparent baseline texture (ambiguously). Wideraperture priors, or regularization criteria, are needed to unify apparently different textures in local surface patches under an interpretation of statistically or heuristically ‘plausible’ resulting shapes. In addition, other visual cues are needed to boost the disambiguation of surface shape, such as shape-fromshading, shape-from-disparity (for binocular or multi-

4

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

Figure 4. Analysis of artificial and natural images. Top row: texture illusion and two natural images (taken from the WWW via Flickr and Google Images). Bottom row: HSV colormap visualization of the analysis of local dominant orientation (hue), degree of orientation dominance (saturation), and relative degree of local spatial frequency plane eccentricity (value). The analysis of the faucet is given as an example of successful recovery of surface curvature (lower cylinder, see text), while the example of the avocado is given as an example of unsuccessful recovery of surface curvature by texture analysis (although the center green-blue region shows successful analysis of shape via specularity, see text, also Fleming et al., 2005). ocular data), depth-from-occlusion (also called ‘border ownership’, [23]), depth-from-motion (for video data), and a potential host of other relevant cues.

5. Acknowledgements This work benefitted from useful conversations with Bruno Olshausen, Dave Warland, Issac Trotts, and Matt Caywood. [The author's affiliation with The MITRE Corp is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.]

6. References [1] J. Gibson, The perception of the visual world, Houghton Mifflin, Boston, 1950. [2] K. Stevens, “The information content of texture gradients”, Biological Cybernetics 4:295-105, 1981.

[3] A. Witkin, “Recovering surface shape and orientation from texture”, Journal of Artificial Intelligence 17:1745, 1981. [4] J.E. Cutting and RT Millard, “Three gradients and the perception of flat and curved surfaces”, Journal of Experimental Psychology 113:198-216, 1984. [5] J. Todd and R. Akerstrom, “Perception of threedimensional form from patterns of optical texture”, Journal of Experimental Psychology: Human perception and performance 13:242-255, 1987. [6] A. Blake and C. Marinos, “Shape from texture: Estimation, isotropy and moments”, Artificial Intelligence 45:323-380,1990. [7] D. Buckley and J.P. Frisby, “Interaction of stereo, texture, and outline cues in the shape perception of three-dimensional ridges”, Vision Research 33:919-934, 1993. [8] B. Cumming, E. Johnston, and A. Parker, “Effects of different texture cues on curved surfaces viewed stereoscopically”, Vision Research 33:827-838, 1993.

5

IEEE Advanced Imagery Pattern Recognition Workshop, October 2006, Washington DC

Figure 5. Analysis of local obliquity and surface normals. The middle panel shows the product of local spatial-frequency eccentricity and local orientation dominance (the latter as a discounting factor when orientations are not significant). The right panel shows estimated pixel-by-pixel surface normal components (x,y,z) rendered in a respective, absolute-valued (r,g,b) colormap. Local variability in the apparent spectral properties of surface texture make this analysis limited, but still useful, for direct estimates of surface shape. [9] B. Super and A. Bovik, “Shape from texture using local spectral moments”, IEEE Transactions on Pattern Analysis and Machine Intelligence 17:333-343, 1995. [10] J. Malik and R. Rosenholtz R, “Computing local surface orientation for curved surfaces”, International Journal of Computer Vision 23:149-168, 1997. [11] M. Clerc and S. Mallat, “The texture gradient equation for recovering shape from texture”, IEEE Transactions on Pattern Analysis and Machine Intelligence 24:536549, 2002. [12] A. Li and Q. Zaidi Q, “Perception of three-dimensional shape from texture is based on patterns of oriented energy”, Vision Research 40:217-242, 2000. [13] M. Black and R. Rosenholtz, “Robust estimation of multiple surface shapes from occluded textures”, In: Proceedings of the IEEE International Symposium on Computer Vision, Coral Gables, Florida, 1995. [14] A. Li and Q. Zaidi, “Information limitations in perception of shape from texture”, Vision Research 41:2927-2942, 2001. [15] Q. Zaidi and A. Li, “Limitations on shape information provided by texture cues”, Vision Research 42:815-835, 2002. [16] R.W. Fleming, A. Torralba, and E.H. Adelson, “Specular reflections and the perception of shape”, Journal of Vision 4:798-820, 2005. [17] J.P. Jones and L.A. Palmer, “An evaluation of the twodimensional Gabor filter model of simple receptive fields in cat striate cortex”, Journal of Neurophysiology 58:1233-1258, 1987. [18] D.H. Hubel and T.N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s striate cortex”, Journal of Physiology (London) 160:106-154, 1962. [19] J. Movshon, I. Thompson, and D. Tolhurst, “Spatial summation in the receptive fields of simple cells in the cat’s striate cortex”, Journal of Physiology (London) 283:53-77, 1978a. [20] B.A. Olshausen and D.J. Field, “Emergence of simple cell receptive field properties by learning a sparse code for natural images”, Nature 381:607-609, 1996.

[21] A.J. Bell and T.J. Sejnowski, “An information maximization approach to blind separation and blind deconvolution”, Neural Computation 7:1129-1159, 1995. [22] J. Movshon, I. Thompson, and D. Tolhurst, “Receptive field organization of complex cells in cat’s striate cortex”, Journal of Physiology (London) 283:79-99, 1978b. [23] H. Schuetze, E. Niebur, and R. von der Heydt, “Modeling cortical mechanisms of border ownership coding”, Journal of Vision 3:114a, 2003.

6

3D shape estimation and texture generation using ... - Semantic Scholar

Efficient 3D shape matching and retrieval using a ...

NON-RIGID 3D SHAPE RETRIEVAL USING ...

Visual Similarity based 3D Shape Retrieval Using Bag ...

NON-RIGID 3D SHAPE RETRIEVAL USING ... - Semantic Scholar

With One Look: 3D Face Shape Estimation from a ...

A 3D Shape Measurement System - Semantic Scholar

Weighting Estimation for Texture- Based Face ...

A 3D Shape Measurement System

Texture recognition by using GLCM and various ...

Shape Descriptor using Polar Plot for Shape ... - Clemson University

Nonstationary Spatial Texture Estimation Applied to ...

Face Pose Estimation with Combined 2D and 3D ... - Jiaolong Yang

Shape discovering using tactile guidance

Time-varying Array Shape Estimation by Mapping ...

Multiresolution Hierarchical Shape Models in 3D ...

Identity Verification using Shape and Geometry of ...