A Split Node Cache Scheme for Fast Ray Tracing Jae-ho Nah*

Jin-suk Heo†

Woo-chan Park‡

Tack-don Han§

Department of Computer Science, Yonsei University

School of Electrical and Electronic Engineering, Yonsei University

Department of Computer Engineering, Sejong University

Department of Computer Science, Yonsei University

ABSTRACT We propose a node cache scheme for efficient ray tracing hardware. The scheme uses an aspect that traversing high-level nodes have more locality. In this method, a node cache is split into a high-level node cache and a low-level node cache. The data of the high-level nodes is retained during one frame. In addition, this scheme has a hybrid tree layout. That is, the high-level nodes are represented to the breath-first layout for a division of nodes, and the low-level nodes are represented to the depth-first layout for an effective use of locality. Simulation results show the reduction in cache miss ratio up to around three percent. KEYWORDS: Ray tracing, cache scheme INDEX TERMS: I.3.1 [Hardware Architecture]: Graphics processors 1

MOTIVATION

Nowadays, there are various approaches for real-time ray tracing. A dedicated hardware for ray tracing (ray tracing hardware, RT H/W) is one of them, and an efficient memory management is an important factor in the performance of the RT H/W. Therefore, the RT H/W has caches between processing units and memory. Among the caches, a node cache is used when a traversal unit reads tree nodes. According to Woop [1], a hit ratio of the node cache in the RT H/W can decline by up to 31.2 percent when the RT H/W renders complex scenes. Thus, an effective cache scheme is needed to increase the cache efficiency. 2

PROPOSED CACHE SCHEME

Our cache scheme is made up of two concepts, they are a split node cache and a hybrid tree layout. (Figure 1)

common cache policy. The reason of this splitting is an increase in cache locality. The space of parent node includes the space of child node, so the traversal processing in ray tracing uses a depth-first search. That is, access to the high-level node has more locality than access to the low-level node. Thus, we use different policies for the highlevel node. Hybrid tree layout In this section, we explain the tree layout for the split node cache. Typical tree layouts are the breath-first layout (BFL) and the depth-first layout (DFL). The BFL fits the split node cache. A mapping between index of nodes and memory address of the nodes is easier than the DFL because we can distinguish levels of the nodes easily using the index of nodes. However, parent nodes and their child nodes aren’t placed sequentially, so a locality can decrease in the traversal. Therefore, we propose the hybrid layout, high-level nodes are represented by the BFL, and low-level nodes are represented by the DFL. The hybrid layout doesn’t have the locality problem, so it has advantages over both the BFL and the DFL in the split node cache. We use a deque to represent the hybrid tree layout because the layout needs both breath-first search and depth-first search. In addition, we modify traversal algorithm in ray shooting because an index format of child nodes is changed. 3

EXPERIMENTAL SIMULATION RESULTS

We implemented a Whitted-style ray tracer to verify our scheme. An acceleration structure of the ray tracer is the kd-tree. The ray tracer makes a statistical data of memory accesses, and the data is used for input of Dinero IV cache simulator. Our test environment is as follows. The resolution is 800x600. The max depth of ray tracing is 10. The parameters of the node cache are eight-byte block and direct-mapped, and we assume that the high-level node cache and the low-level node cache have same size. Table 1 shows the test results. In most cases, our scheme leads to reduced cache miss ratio. BART Robot (7K triangles) Without With Cache the split the split Size scheme scheme 4KB 12.42% 13.29% 8KB 5.52% 4.47% Scene

Figure 1. Proposed cache scheme.

Split node cache In the scheme, a node cache is split into two caches. One is a high-level node cache and the other is a low-level node cache. We apply different policies to these two node caches. Data of high-level nodes is retained during one frame, so cache misses don’t occur. In contrast, the low-level node cache uses a *



E-mail: [email protected], [email protected], ‡ § [email protected], [email protected]

16KB

1.82%

1.73%

BART Kitchen (11K triangles) Without With the split the split scheme scheme 13.63% 10.70% 7.12% 5.00% 4.44%

2.66%

Fairy Forest (17K triangles) Without With the split the split scheme scheme 11.33% 10.34% 7.01% 5.87% 4.29%

3.97%

Table 1. Experimental simulation results - cache miss ratio.

4

CONCLUSION

We propose the node cache scheme for fast ray tracing. We will release a new RT H/W using the split cache scheme in near future. REFERENCES

IEEE/EG Symposium on Interactive Ray Tracing 2008 9 - 10 August, Los Angeles, California, USA 978-1-4244-2741-3/08/$25.00 ©2008 IEEE

186

[1]

Sven Woop. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University, 2007

A Split Node Cache Scheme for Fast Ray Tracing

size. Table 1 shows the test results. In most cases, our scheme leads to reduced cache miss ratio. BART Robot. (7K triangles). BART Kitchen. (11K triangles).

169KB Sizes 3 Downloads 252 Views

Recommend Documents

A Split Node Cache Scheme for Fast Ray Tracing
Jae-ho Nah*. Department ... H/W) is one of them, and an efficient memory management is an ... Data of high-level nodes is retained during one frame, so cache.

Fast and Robust Ray Tracing of General Implicits ... - Semantic Scholar
correct visualization [13]. ..... precision is seldom required for accurate visualization [13]. ..... from a cubic environment map invokes no performance penalty.

Fast and Robust Ray Tracing of General Implicits ... - Semantic Scholar
nents of a domain interval is insufficient to determine a convex hull over the range. This is not the case with an inclusion extension F. (b), which, when evaluated, ...

A Fast and High Quality Multilevel Scheme for ...
Mar 27, 1998 - University of Minnesota, Department of Computer Science ... The multiple minimum degree ordering used almost exclusively in serial direct.

Interactive Ray Tracing of Arbitrary Implicits with SIMD ...
on common laptop hardware, with a system that accurately visual- izes any implicit surface .... domains [4, 9], and ray tracing algorithms involving recursive in-.

ray tracing from the ground up pdf
ray tracing from the ground up pdf. ray tracing from the ground up pdf. Open. Extract. Open with. Sign In. Main menu. Displaying ray tracing from the ground up ...

Ray tracing and refraction in the modified US1976 ...
1. Introduction. Refraction has intrigued mankind for the past two millennia. Computations on the ... closed form and, in this computer era, by ray tracing. ...... degree of accuracy, be described by tan. P1. P0 .... Since 1958, the year that the UK

Ray Tracing Field Prediction An Unforgiving Validation.pdf ...
Page 1 of 12. Research Article. Ray Tracing RF Field Prediction: An Unforgiving Validation. E. M. Vitucci,1 V. Degli-Esposti,2 F. Fuschini,1 J. S. Lu,2. M. Barbiroli,1 J. N. Wu,2 M. Zoli,1 J. J. Zhu,2 and H. L. Bertoni3. 1. Dipartimento di Ingegneria

an-introduction-to-ray-tracing-by-andrew-s.pdf
an-introduction-to-ray-tracing-by-andrew-s.pdf. an-introduction-to-ray-tracing-by-andrew-s.pdf. Open. Extract. Open with. Sign In. Main menu.

Ray Tracing Field Prediction An Unforgiving Validation.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Ray Tracing ...

Reducing Cache Miss Ratio For Routing Prefix Cache
Abstract—Because of rapid increase in link capacity, an Internet router has to complete routing ... stores the most recent lookup result in a local fast storage in hope that it will be ..... for providing free access to the trace data under Nationa

ePub Letter Tracing For Boys: Letter Tracing Book ...
Pen Control Age 3-5 Wipe Clean Activity Book (Collins Easy Learning Preschool) · Writing Workbook Ages 3-5: New Edition (Collins Easy Learning Preschool).

Optimized fast handover scheme in Mobile IPv6 ... - Springer Link
Jun 12, 2010 - Abstract In the future cloud computing, users will heavily use mobile devices. Mo- bile networks for cloud computing should be managed ...

Improved Approximation Algorithms for (Budgeted) Node-weighted ...
2 Computer Science Department, Univ of Maryland, A.V.W. Bldg., College Park, MD ..... The following facts about a disk of radius R centered at a terminal t can be ..... within any finite factor when restricted to the case of bounded degree graphs.

Google XRay: A Function Call Tracing System
Apr 5, 2016 - XRay enables efficient function call entry/exit .... functions to enable function call logging. 3. When tracing is explicitly turned off, we do either ...

Split alignment
Apr 13, 2012 - I use the standard affine-gap scoring scheme, with one additional parameter: a .... Ai,j: the alignment score for query base j in alignment i.

Message Delays for a TDMA Scheme Under a ...
Abstract-A TDMA access-control scheme operating under a nonpre- emptive message-based .... For the underlying station we define: W,(k) = waiting time of the ...

typedef struct node
6. 5. 7. 4. 2. 9. 8. Root. Leaves ... bool search(node* root, int val). { if root is NULL return false. if root->n is val return true. if val is less than root->n search left child.