Fast Light Mixture Estimation with Graphics Processors Jinghua Fu, Ke Yang ∗ , Qiong Luoa, Xiaohong Jiang, Jiaoying Shi State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China a

Hong Kong University of Science and Technology, Hong Kong [email protected]

Abstract The state-of-the-art of white balance technique can handle photos taken under mixed lighting, such as indoor/outdoor or flash/ambient light types. However, the cost for the faithful result is the long CPU time that precludes interaction on personal computers. In this paper, we use commodity graphics processors (GPUs) to accelerate the light mixture estimation for spatially varying white balance. Our implementation on an NVIDIA G80 GPU achieves a performance improvement of around 30X over the dual-thread CPU-based counterpart. Our method enables fast photo white balance at digital home, and can also be used to accelerate pattern recognition and matting applications.

Keywords: White Balance; Light Mixture Estimation; Graphics Processors

1

Introduction

Figure 1. Left: original photograph (in courtesy of [8]). Middle: traditional white balance. Right: white balance with light mixture estimation using [8]’s algorithm (in courtesy of [8]). Our CPU implementation of it takes 55.4 seconds; our GPU acceleration takes 1.8 seconds.

Image processing, such as photo editing, is an important part of digital life and home entertainment. One of the most frequently used photo editing functionalities, white balance, is to restore the natural rendition of the scene that has been affected by the surrounding illumination. For example, Figure 1 (Left) is a photo taken under mixed flash and indoor lighting, and the wall appears unnaturally orange. Most modern cameras and photo editing packages have functionality of white balance; however, these tools generally assume a single light color, which often contradicts with the real case. For example, the white balance result in Figure 1 (Middle) produced by using Photoshop is unsatisfactory. Recently, [8] propose a novel technique that estimates the light mixture at each pixel and produces visually pleasing white balance results, as shown in Figure 1 (Right). ∗

Corresponding author. Email addresses: [email protected]

2

The technique proposed by [8] is effective. However, it involves intensive computations and is time consuming. For example, it takes about one minute to process the photo in Figure 1 (image size: 640×429) on a commodity dual core CPU. This slow response precludes practical home use, where we want to click a button and wait within seconds. The graphics processing unit (GPU) is a rapidly evolving parallel hardware traditionally designed for 3D graphics. Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computations (GPGPU) [1][2][5][6][11]. GPGPU is particularly attractive to digital home applications for two reasons. First, it is of high performance/cost ratio. A commodity GPU dedicates most transistors to arithmetic rather than cache or control; therefore, it achieves much higher GFLOPS (giga floating-point operations per second) than a CPU at the similar price. Second, the GPU hardware is pervasive; it exists on virtually every personal computer and mobile platform, backed up by the ever-raising entertainment and multimedia markets. Therefore, the GPU is suitable as a personal co-processor that offloads computations from the CPU and benefits the overall performance. There have been studies on image processing on GPUs [2][3]. The related work greatly accelerates traditional image processing such as histogram and filtering [2], or accelerates vision processing such as detection and tracking [3]. In this paper, we investigate to use the GPU to accelerate a recently published white balance algorithm [8], so to make it fast enough for the digital life with family computers. This paper makes the following contributions. First, we develop the GPU parallelization for the state-of-the-art white balance algorithm, which is effective but not efficient on commodity CPUs. Our implementation on an NVIDIA G80 GPU achieves a performance improvement of around 30X over the dual-thread CPU-based counterpart, thus pushes the technique to practical digital life. Second, we conduct detailed code analysis and experimental evaluations to study the behavior of the GPU program, and provide insights on GPU programming for the GPGPU and image processing communities. The remainder of this paper is organized as follows. In Section 2, we briefly introduce the related work and the GPU background. In Section 3, we review the CPU-based algorithm, which is parallelized on the GPU as described in Section 4. We experimentally evaluate our methods in Section 5 and conclude in Section 6.

2

Related Work

2.1

White balance

White balance is a process of removing unrealistic color casts in the photo, so that objects which appear unnaturally lighted are recovered with white surrounding illumination. Most modern cameras and photo editing packages include some functionality to do white balance. There are many simple techniques proposed for white balance and the related problem of color constancy, such as [4]. However, they assume only a single illuminant and may get unsatisfactory results on occasions that the scenes are illuminated by a mixture of lights with different color temperatures. [8] present a light mixture estimation technique for scenes with two light types. They estimate the relative contribution of each light color at each pixel. The algorithm includes a material color estimation step using a voting scheme that is related to pattern recognition algorithms [7], and an interpolation step that retrieves the light mixture at each pixel by using matting algorithms [9]. We will review [8]’s algorithm in detail in Section 3.

3

While this algorithm yields more faithfully results, the long CPU runtimes does not allow fast interaction. The photo in Figure 1 (640×429) takes 55 seconds, and bigger images need to be down sampled, which inevitably lose information of the image.

2.2

GPUs

The GPU architecture and GPU co-processing can be modeled as in Figure 2. The GPU consists of many SIMD (single-instruction, multiple data) multiprocessors. The SIMD processors perform best when the program is data-parallel and free of divergence. For example, a data-dependent if-else flow control would cause the two execution paths to be serialized, increasing the execution time. The graphics memory (GRAM) has a high access latency (e.g., 200 cycles for G80), and the GPU caches are relatively small. As a result, the GPU uses massive threading rather than caching to hide the memory latency. Besides, if neighboring threads are accessing neighboring memory addresses in an aligned way, usually called coherent access, then these accesses can be SIMD-optimized into one single memory transaction. This is called coalesced access in CUDA. Nowadays, GPU vendors provide general-purpose APIs such as AMD CAL [1] and NVIDIA CUDA [2]. These APIs are directly hardware-driven and do not invoke the graphics pipeline. This provides the programming with more flexibility and generally higher performance [6]. We used CUDA for our implementation. CUDA provides a multithreaded programming interface extended from the C/C++ language. A minimal CUDA program consists of three steps: (1) Allocate the GRAM space for the data using cudaMalloc(), and copy the input data from the main memory to the GRAM using cudaMemcpy(). (2) Initiate the CUDA kernel code (which is analogous to the shader code) by calling the kernel function labeled by _global_. We can use the thread identifier to locate the corresponding data for each thread. (3) Copy the execution results from GRAM to the main memory. Table 1 Notations used in the paper.

CPU main memory 1

3

bus

GRAM GPU

SIMD SIMD

SIMD

Figure 2 GPU co-processing model.

2

color

Structure of 3 or 2 floats, depending on the color space

float2

Structure of 2 floats

I[]

Input image

N

Number of pixels in the image

m

Material color

Ns

Number of material samples

M[]

Set of final materials

Nm

Number of final materials in M

N1

Number of marked pixels

W[]

Intensity weights of the two lights at each pixel

L[]

Matting laplacian matrix for the image

|w|

Number of elements in the matting window

2

2.3

Image processing on GPUs

There have been studies on image processing on GPUs [2][3]. [2] presents several GPU-based image processing implementations, such as histogram, filters, edge detection and wavelet etc. [3] accelerates computer vision algorithms such as detection and tracking using multiple GPUs. Up to now, we are not aware of GPU acceleration of white balance or digital matting algorithms. In this paper, we use the GPU to accelerate the state-of-the-art white balance algorithm, and our acceleration can also be used to accelerate the related pattern recognition and matting applications. Below, we review the CPU-based algorithm and present our GPU-based algorithm. The notations used in this paper are listed in Table 1.

3

Review of CPU-based Algorithm

In this section, we review the light mixture estimation algorithm [8]. The algorithm models the color at each pixel as I = R(k1L1+K2L2), where the material’s spectral reflectance R and the weights k1 and k2 are unknown and vary per pixel. The two light colors L1 and L2 are constant and are given by the user. The white balance produces an image as if all the lights are white light 1, i.e., WI = R(k11+K21). The key is to solve the material R and the light mixture k1, k2 at each pixel. Please refer to [8] for algorithm details. For completeness, we briefly review the algorithm in five steps, namely Vote, Mark, Disambiguity, GetLaplacian and SolveMixture, as shown in Algorithm 1. Each step is illustrated in Figure 3. (a) Vote(in color I[N], out int vote[Ns]) densely sample the space of all possible materials; 1 for each material i, vote[i]=0; 2 for each pixel p in I 3 4 solve the weights of the two lights at p; 5 if weighted color is close to I[p] 6 vote[i]++; (b) Mark(in color I[N], out float2 W[N], out bool marked[N], out color M[]) //initially, M={} and Nm=0 for each material m in decreased order of vote[m] 1 if vote[m] is too small, break; 2 for each pixel p in I 3 4 if !marked[p] and I[p] votes for m 5 marked[p]=true; write weights to W[p]; 6 M.insert(m); ++Nm; (c) Disambiguity(in color I[N], in color M[Nm], inout bool marked[N]) 1 for each pixel p in I with marked[p]==true 2 for all materials in M 3 if I[p] votes for more than one material 4 marked[p]=false; break; (d) GetLaplacian(in color I[N], out float2 L[N·|w|2]) for each pixel p in I get the matting laplacian. //Element ij of L is defined according to pixels i and j in windows wp centered around p. Here we use float2 to denote both the value and the sparse index (e) SolveMixture(in color I[N], in float2 L[N·|w|2], in float2 W[N], out float alpha[N]) //alpha: the mixture portion Use a sparse matrix solver to minimize a quadratic objective function, with the weights of marked pixels as

3 constraints, and those of other pixels as unknowns. Algorithm 1 Light mixture estimation pseudo code. after processing m3:

samples:

m1

α *: known mixture

m2 m3 m4

constraints

α : unknown mixtures

image:

to be interpolated

after processing m1:

p wp for pixel p

votes: {m1:2, m2:1, m3:3, m4:0}

α

α

α*

α*

α

α*

break, and M={m3, m1}

(a) Vote

(b) Mark

(c) Disambiguity

(d) GetLaplacian

(e) SolveMixture

Figure 3 Illustration of each step of the light mixture estimation algorithm.

The time complexity of Step (a)-(e) is O(Ns·N), O(Nm·N), O(N1·Nm), O(N) and O(N), respectively. [7] uses Ns =1024; Nm is usually less than ten; N1 is only a small portion of N. There are computation-heavy loop bodies: Line 4 of Step (a) employs least squares to solve for the two weights, and Step (d) involves heavy matrix and vector calculations, such as multiplication, inverse, mean, variance etc, to get the laplacian of the matting window wk. In sum, Step (a) and (d) are expected to have the longest running time on the CPU, while Step (b) and (c) are expected to have a small overhead.

4

GPU-based Algorithm

In this section, we describe our GPU parallelization methods. We first introduce our parallel paradigm, namely strip mining, then introduce the sum primitive employed by the Vote step, and finally analyze the kernel code.

4.1

Parallel paradigm

We follow the parallel paradigm of “strip mining”, where all the T threads together eat up the whole task list of N pixels gradually. Strip mining paradigm is illustrated in Figure 4. It is beneficial to the GPU architecture, since aligned memory accesses from the SIMD threads are coalesced. task list t1

tT t1

tT

Figure 4 Illustration of strip mining parallel paradigm.

In a kernel, strip mining processing is in below form: int i = tid; //thread index while(i < N) process the i-th pixel; i += T; //num. of threads in the kernel

In Algorithm 1, Step (a)-(d) all expose a loop of “for each pixel”, which are therefore similarly parallelized in a strip-mining way. The code pieces out of this loop are performed on the CPU, and the code pieces in shadowed area are what left to be done within each GPU kernel. Therefore,

4

we have four kernels, namely vote_kernel, mark_kernel, disamb_kernel, getLap_kernel, for each of the four steps, respectively. Since the per-pixel processing code is similar to that in the corresponding CPU code, we omit listing the kernel code here. Step (e) needs a standard sparse matrix solver, which is not directly parallelizable. Since this step takes only a small portion of the total time and it is well supported by many CPU-based libraries such as LAPACK, we currently use the CPU to perform this step, and leave the GPU parallelization of sparse matrix solvers to the future work.

4.2

Sum primitive

The parallelization of vote[i]++ in Line 6 of code Vote in Algorithm 1 requires inter-thread communication. A straightforward solution is to maintain an Boolean array bVote[N] recording whether each pixel votes for that material, and then apply a reduction operator to it to get the total number of votes. [2] provides a highly optimized reduction implementation, but it assumes the array length to be power of two, which lacks practicability. We modify their implementation into the strip mining form, choosing one thread from each block to output a per-block result. We finally copy the per-block results to the main memory to finalize the sum. An alternative is to use [10]’s highly optimized scan primitive to finalize the sum, but we find that for a small array (e.g. < 512 elements), CPU is faster. Using the sum primitive, Line 3-6 of the code Vote in Algorithm 1 are parallelized as below: int bVote[N]; vote_kernel(bVote); vote = sum_kernel(bVote);

The processing of the i-th pixel in vote_kernel is: bVote[i] = (weighted color close to I[p]) ? 1: 0;

4.3

Kernel code analysis

It is important to study the factors that influence the performance of the GPU kernels. There are in total five kernels, four named after the four steps, together with the sum_kernel. We statically analyze these kernels in two aspects, memory access and divergence. As mentioned in Section 2.2, the GPU performance is SIMD-optimized for coherent memory accesses, and is degraded by divergent execution paths. There are two types of memory access, load and store. We estimate the load/store amount of each kernel (for simplicity, we use the data type’s name to denote its size in byte), and estimate whether the access is coherent. We also count the actual code lines under divergent (div.) execution paths. For example, Line 2-4 of Step (c) in Algorithm 1 is in a divergent path caused by the “with marked[p] == true“ condition. This is not an accurate estimation, but helps us to understand the severity of divergence. The analysis of the kernels is listed in Table 2. We omit explaining the number of calls (#calls) and load/store amount of each kernel. Not shown in Algorithm 1, there are two divergent lines in sum_kernel, one during the halving of working threads in each depth [2], the other from choosing one thread from each block to output. There are several places where memory coherence (coh.) is lost. First, in disamb_kernel, the threads are competing for accessing the same elements of M. Second, in sum_kernel, there is only one thread writing per block, thus loses coherence. Third, in getLap_kernel, the access to the sliding windows wk is misaligned, and the kernel requires frequent store during the matrix/vector

5

computations. Table 2 Kernel code analysis. kernel vote_kernel sum_kernel mark_kernel disamb_kernel getLap_kernel

5

#calls Ns Ns Nm 1 1

load amount color·N int·N color·N color·(N1·Nm) color·(N·|w|)

load coh. y y y n n

store amount int·N O(1) (bool+float2)·N1 bool·O(N1) float2·O(N |w|2)

store coh. y n y y n

div. lines 0 2 4 11 0

Experiments

We implemented both the CPU- and GPU- based algorithms on a machine with an NVIDIA 8800 GTX GPU and an AMD Athlon Dual Core CPU. The hardware configuration is shown in Table 3. Table 3 Hardware configuration.

Processors Cache DRAM (MB) Bus width (bit)

GPU 1.35 local memory: 16KB × 16 768 384

CPU 2.11 GHz × 2 L1: 64KB × 4, L2: 512KB × 2 2048 64

The programs are implemented using MSVC 8.0, CUDA 2.0 Beta and Matlab R2008a in Windows XP. For the CPU-based algorithm, we implement Step (a)-(c) using MSVC, and call the Matlab code to perform Step (d)-(e) which are mainly matrix/vector computations. For the GPU-based algorithm, we implement Step (a)-(d) using CUDA kernels with MSVC host code, and call the Matlab code to perform Step (e) for the reasons described in Section 4.1. Although this step is on the CPU, we will call this pipeline as GPU-based implementation in the following for convenience. We modify the Matlab code for Step (d)-(e) from [9]’s Matlab implementation of matting laplacian, setting |w|= 4. We use OpenMP to create two threads for Step (a)-(c) of the CPU-based implementation. In specific, we declare “#pragma omp parallel for” preceding the “for each pixel” loops, with an extra word “shared(vote)” preceding Line 3 of Step (a) in Algorithm 1 declaring the shared variable vote[i]. The performance of the dual-thread implementation is 1.5-1.8 times of that of the single-threaded version. Our experiments include three parts. The first part uses the CUDA profiler tool [2] to profile the behavior within each GPU kernel. The second part measures the time breakdown for both CPU and GPU implementations and shows the GPU speedup (CPU time/GPU time). The third part studies the performance scalability with varying image size.

5.1

Profiling

We first use the CUDA Visual Profiler [2] to analyze the inside of each kernel. The profiler is able to profile the runtime kernels and collect statistics such as execution time, memory access, divergence and instruction number etc. Our program takes Figure 1 (Left) as input, and outputs the white-balanced result (Figure 1 (Right)). We run the program a hundred times and get the average numbers. The profiling result is listed in Table 4. Here the profiled terms are execution time (in us), number of incoherent/coherent load/store, number of divergent branches, number of

6

dynamic instructions (in fetch), respectively. The profiling generally agrees with our static code analysis in Table 2. Note the huge amount of incoherent memory access and instruction number has caused getLap_kernel the most expensive kernel. The second expensive kernel, disamb_kernel, attributes its cost to the frequent incoherent and coherent access, severe divergence and a large number of instructions. Fortunately, these two kernels are called only once; therefore, they only take a small portion of the overall overhead. Table 4 Profiling of kernel executions. Profiling

GPU usec

ld_incoh

ld_coh

st_incoh

st_coh

div

#instr.

vote_kernel

164

0

4292

0

8584

0

15931

sum_kernel

36

0

2102

59

8

16

4185

mark_kernel

106

0

4423

0

9754

355

12221

disamb_kernel

3324

160752

11736

0

29127

586

63024

getLap_kernel

7170

127817

524

2179270

0

0

172097

5.2

Time breakdown

Table 5 lists the time breakdown of each algorithm step processing a photo of size 640×429, as shown in Figure 1 (Left). On the CPU-based implementation, the Matlab time for getting laplacian dominates the overall time. This is the performance bottleneck. The next heavy step is Vote, which is O(Ns·N) and involves intensive least squares computation. The Mark and Disambiguity steps are insignificant in overhead due to the few computation and small number of calls (Nm = 5 in this case). On the GPU-based implementation, the bottleneck shifts to the SolveMixture step. As mentioned, this step is still performed on the CPU. Except for this step, the GPU bottleneck is Vote, which is heavier than GetLaplacian. This is different from the situation on the CPU. We can deduce that the Ns (= 1024) vote_kernels outweigh one getLap_kernel, although the latter is much more heavy per kernel. The speedup of step (a)-(d) is 35, 37, 31, 1026, respectively. It is quite encouraging that the GPU achieves such a great speedup on GetLaplacian, which is highly computation-intensive and free of divergence flow, a typical match of the GPU features. Finally, since now the performance bottleneck for the GPU-based implementation is the CPU-based sparse matrix solver, the future work of parallelizing this step would be highly profitable. Table 5 Time breakdown (unit: ms). Step

a

b

c

d

e

CPU time

8219

516

31

45162

1480

GPU time

236

14

1

44

5.3

Varying workload

Now we vary the image size and test the scalability of the algorithms on each step. We up/down sample the photo of Figure 1 (Left) (using lanczos filter, preserving aspect ratio) so that the image size (the larger dimension) doubles from 80 to 1280. With the varying image size, we measure the CPU time and GPU time (exclude Step (e)) for Step (a)-(e). The speedup is plotted along the

7

secondary y-axis. As shown in Fig. 5, the GPU implementation constantly outperforms the CPU side for any size, and the speedup still increases as the image size grows. This indicates that the GPU is underutilized on small images, and its potential is yet to be fully exploited on large images for Step (a). For an image size of 1280, the speedup exceeds 40 times. The situation is the same in Step (b)-(d) (Figure 6-8). The speedup in Step (b) approaches 60. For Step (c), Disambiguity is called only once, and its overhead is negligible on the GPU. The timing in Fig. 7 is measured in ms, and we do not measure the speedup at finer time precisions. For Step (d), the tremendous speedup again shows the GPU’s aptness at heavy data-parallel computations. At last, Fig. 10 shows the total processing time, and the overall speedup is around 30 for images of size larger than 80. 50

20 20 10

150

Time(sec)

30

10

0 0

320

640 Image size

960

100

1000

50

500

0 1280

0 0

60

GPU b

50

Speedup

40

1.2

30 0.8

20

0.4

0

320

640 960 Image size

4 2

0 1280

0 0

Fig. 6 Performance of Step (b).

320

640 Image size

300 CPU c

96

Time(sec)

GPU c 64 32

CPU-total

200

GPU-total Speedup

100

0 0

320

640

960

1280

960

1280

Fig. 9 CPU Performance of Step (e).

128 Time(ms)

CPU e

6

10

0

0 1280

640 960 Image size

8

Time(sec)

CPU b

Speedup

Time(sec)

1.6

320

Fig. 8 Performance of Step (d).

Fig. 5 Performance of Step (a). 2

1500

Speedup

Speedup

40

Speedup

2000

GPU d Speedup

Time(sec)

GPU a 30

CPU d

200

0 0

320

35 30 25 20 15 10 5 0 1280

Image size

640 960 Image size

Fig. 7 Performance of Step (c).

Fig. 10 Overall performance.

Speedup

CPU a

40

2

6

Conclusion

The graphics processor is an attractive part of the digital platform, due to its high performance/price ratio and its pervasiveness in the digital life. The continuing advances in hardware and the recent improvements on programmability make GPUs even more suitable for general-purpose processing than before. In this paper, we have proposed the GPU parallelization of the state-of-the-art white balance algorithm. The performance ratio to the dual-threaded CPU-based algorithm is around 30, and would be even much higher if we parallelize the sparse matrix solver. We have analyzed and profiled the kernel code in detail; by understanding the factors that influence the kernel performance, we could improve the algorithm design and implementation. In the future, we intend to study the sparse matrix solver on the GPU and push the white balance to interactive speed. Since the Vote and GetLaplacian steps can serve as fundamental operations in pattern recognition and digital matting algorithms, we would like to extend our method to accelerate related computer vision and digital matting problems.

Acknowledgement This work is supported by the key NSFC project of "Digital Olympic Museum" under Grant No.60533080 and 863 project on “Digital Media Authoring Platform”(grant no:2006AA01Z335). The authors are grateful for Prof. Zhigeng Pan for the guidance and help during the paper submission. Thanks to Eugene Hsu for the help in reading his paper. The graphics card is provided by GPUQP group of Hong Kong University of Science and Technology.

References [1] AMD stream processing SDK, http://ati.amd.com/products/streamprocessor/. [2] CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html. [3] J. Fung, S. Mann, C. Aimone, OpenVIDIA: Parallel GPU Computer Vision, Proceedings of the ACM Multimedia 2005 , Singapore, Nov. 6-11, 2005, pages 849-852. [4] A. Gijsenij, A., T. Gevers, Color constancy using natural image statistics. In IEEE Computer Vision and Pattern Recognition, 1–8. [5] GPGPU website, www.gpgpu.org. [6] B. He, K. Yang, R. Fang et al, Relational Joins on Graphics Processors, SIGMOD 2008. [7] P. V. C. Hough, 1962. Method and means of recognizing complex patterns. U.S. Patent 3,069,654. [8] E. Hsu, T. Mertens, S. Paris et al, Light Mixture Estimation for Spatially Varying White Balance, SIGGRAPH 2008. [9] A. Levin, D. Lischinski, Y. Weiss, A closed form solution to natural image matting. In IEEE Computer Vision and Pattern Recognition, 61–68. [10] S. Sengupta, M. Harris, Y. Zhang and J. Owens, "Scan Primitives for GPU Computing", Proceedings of Graphics Hardware 2007. [11] Qizhi Yu, Chongcheng Chen, Zhigeng Pan: Parallel Genetic Algorithms on Programmable Graphics Hardware. ICNC (3) 2005: 1051-1059

tree similarity under connectivity-integrality constraint

digital life with family computers. This paper makes the following contributions. First, we develop the GPU parallelization for the state-of-the-art white balance algorithm, which is effective but not efficient on commodity CPUs. Our implementation on an NVIDIA G80 GPU achieves a performance improvement of around.

195KB Sizes 0 Downloads 127 Views

Recommend Documents

Constraint Programming for Optimization under ... - Roberto Rossi
Sep 10, 2008 - Roberto Rossi1. 1Cork Constraint Computation Centre, University College Cork, Ireland ... approaches computer science has yet made to the Holy Grail of programming: ...... Generating good LB during the search. 65. 62. 130.

Optimal Investment under Credit Constraint
We characterize the optimal investment decision, the financing and ...... [6] Dixit, A. and R. Pindyck, 1994, Investment under Uncertainty, Princeton University.

Sparse Filter Design Under a Quadratic Constraint: Low ...
Examples in wireless channel equalization and minimum-variance ... area, or the supply voltage may be lowered to take advantage of a slower computation rate ...

rk narayan under the banyan tree pdf
Page 1 of 1. File: Rk narayan under the banyan tree. pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. rk narayan under the banyan tree pdf. rk narayan under the banyan tree pdf. Open. Extract. Open with. Sign I

Summers-Under-The-Tamarind-Tree-Recipes-Memories-From ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Summers-Under-The-Tamarind-Tree-Recipes-Memories-From-Pakistan.pdf. Summers-Under-The-Tamarind-Tree-Recipes-

Approximating the Best–Fit Tree Under Lp Norms
Department of Computer and Information Science, University of Pennsylvania,. Philadelphia, PA ... We also consider a variant, Lrelative, of the best-fit objective.

Support Constraint Machines
by a kernel-based machine, referred to as a support constraint machine. (SCM) ... tor machines. 1 Introduction. This paper evolves a general framework of learning aimed at bridging logic and kernel machines [1]. We think of an intelligent agent actin

From sample similarity to ensemble similarity ...
kernel function only. From a theoretical perspective, this can be justified by the equivalence between the kernel function and the distance metric (i.e., equation (2)): the inner product defines the geometry of the space containing the data points wi

Similarity Defended4
reactions, but a particular chemical reaction which has particular chemicals interacting .... Boulder: Westview Press, 1989. ... Chicago: University of Chicago, 1999.

Support Constraint Machines
For a generic bilateral soft-constraint we need to construct a proper penalty. For .... We consider a benchmark based on 1000 bi-dimensional points belonging to.

Adaptation and Constraint: Overview
argued that wheels might be highly functional for some terrestrial ... a wheel from organic tissues. .... support, is that evolution of specialization in form or func-.

Universal Timed Concurrent Constraint Programming
3 Department of Computer Science, Javeriana University Cali, Colombia. ... Concurrent Constraint Programming (ccp) [3] is a well-established and mature.

object constraint language pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. object constraint ...

Subexponential concurrent constraint programming
Dec 4, 2016 - Preprint submitted to Theoretical Computer Science .... In this case, the monotonicity guarantees that the degree of preference (see [7]).

Feature Term Subsumption using Constraint ...
Feature terms are defined by its signature: Σ = 〈S, F, ≤, V〉. ..... adding symmetry breaking constraints to the CP implementation. ... Tech. Rep. 11, Digital. Research Laboratory (1992). [3] Aıt-Kaci, H., Sasaki, Y.: An axiomatic approach to

Geometrical Constraint Equations and Geometrically ...
Sep 16, 2010 - rectly deduced from the equilibrium differential equa- tions of vesicles. For a vesicle with uniform rigidity, this differential equation (i.e. the ...

Adaptation and Constraint: Overview
birds, do we find feathers with asymmetrical vanes that could assist in creating lift – the flight feathers.Using the phylogeny of Figure 1, we can identify some ...

Articular constraint, handedness, and directional ...
micro-CT data with data obtained by traditional histomorph- ... sampling location, quantitative trabecular analysis with mi- cro-CT has been shown to ... The CTan software employs the ... for SMI to 0.962 for DA, indicating very good to excellent.

intertemporal budget constraint and public
the sum of all current and expected future non-interest outlays — expressed in ... economy by taking into account the growth of national income. In such a case, ...