A Detailed Study of the Numerical Accuracy of GPUImplemented Math Functions Dan Fay[1], Ali Sazegari[1], and Dan Connors[2] [1] Apple Computer, Inc.

Introduction Motivation: Modern programmable GPUs have demonstrated their ability to significantly accelerate important classes of non-graphics applications; however, GPUs' substandard support for floating-point arithmetic can potentially limit their usefulness in some algorithms.   Previous studies of GPUs' numerical accuracy[2][3][4] quantified only the "overall" accuracy of different arithmetic and math functions on the GPU by providing an average error and/or an error bounds for each operation.  Many algorithms also require correct behavior for edge cases on the floating-point number line: • Subnormal (denormal) numbers – Allow for gradual precision loss when working with very small magnitude numbers. • Infinities – Provide a way to describe numbers with magnitudes too large to represent as normal numbers. • Not a Number (NaN) – Used to close the number line for invalid operations, such as 0/0. NaNs can carry a “payload” useful for debugging by having a specific bit pattern in their lower bits. • +/-0 – A negative zero describes an extremely tiny negative number that is smaller in magnitude than the smallest possible subnormal number.

Experimental Methodology ATi Platform:

nVIDIA Platform:

•Machine: Core Duo iMac •Operating System: OS X 10.4.7 •GPU: ATi Radeon x1600

•Machine: Core 2 Duo iMac •Operating System: OS X 10.4.7 •GPU: nVIDIA GeForce 7300GT

Input Data: Test vectors were drawn from Jerome Coonen’s Ph.D. thesis, entitled “Contributions to a Proposed Standard for Binary Floating-Point Arithmetic”[6], as well as being supplemented with other tests designed to exercise algorithm-specific edge cases.

Test Functions: The basic operations tests examine the accuracy of the fundamental arithmetic operators, which form the building blocks of any numerical code. Finally, the ported vForce functions are code directly translated from Apple Computer’s vForce math library.

Comparing the Results: The first set of columns compare the percentage of test cases that pass: higher numbers are better. The other set of vectors describe the percentage of failures. In this case, lower numbers are better. With the exception of the comparison operators, failed test cases are classified by the type of number in one or more of the results, such as subnormal numbers, infinities, NaNs, +/-0s, and normals. To compare the ATi and nVIDIA GPUs, each cell is color-coded one of three colors: red if the particular GPU fails more test cases than the other GPU, blue if it passes more test cases than the other GPU, and yellow if there is a tie.

Goals of this Study:

Accuracy of Basic Operators

• Study the numerical accuracy of basic arithmetic functions on the GPU by testing important edge cases. • Test the built-in mathematical functionality provided by the OpenGL Shading Language, GLSL. • Investigate the accuracy of an existing high-performance math library, vForce[9], when it is ported to the GPU. • Investigate the consistency of results between GPU vendors by comparing the results provided by current ATi and nVIDIA GPUs.

Op # Tests + * / < <= > >= == !=

References [1] "IEEE 754: Standard for Binary Floating-Point Arithmetic."  Available at http://grouper.ieee.org/groups/754/ . [2] Karl E. Hillesland and Anselmo Lastra, "GPU Floating-Point Paranoia."  In Proc. GP2,  August 2004. [3] "GPUBench Test: Precision."  Available at http://graphics.stanford.edu/projects/gpubench/test_precision.html . [4] Guillaume Da Graca and David Defour, "Implementation of float-float operators on graphics hardware."  In Proc. 7th conference on Real Numbers and Computers, July 2006. [5] David Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic."  Available at  http://docs.sun.com/source/8063568/ncg_goldberg.html . [6] Coonen, Jerome T.: Contributions to a Proposed Standard for Binary Floating-Point Arithmetic. PhD dissertation, Univ. of California, Berkeley, 1984. [7] John Kessenich, "The OpenGL Shading Language.  Available at http://www.opengl.org/registry/specs/ARB/GLSLangSpec.Full.1.20.6.pdf . [8] Steven Moshier, "Cephes Mathematical Library."  Available at http://www.moshier.net/#Cephes . [9] "Vector Libraries."  Available at http://developer.apple.com/hardwaredrivers/ve/vector_libraries.html .

[2] University of Colorado

267 267 311 345 400 400 400 400 400 400

Pass % Subnorm % Infinity % NaN % +/-0 % Normal % ATi NV ATi NV ATi NV ATi NV ATi NV ATi NV 79.8 79.8 16.5 16.5 0.749 0.749 0.00 0.00 0.375 0.375 2.62 2.62 79.8 79.8 16.5 16.5 0.749 0.749 0.00 0.00 0.375 0.375 2.62 2.62 60.8 66.2 22.2 19.6 0.00 0.00 5.14 3.86 8.36 8.36 3.54 1.93 56.2 62.3 11.0 7.25 5.51 5.22 5.22 6.67 11.0 9.86 11.0 8.70 77.0 100 96.0 100 77.0 100 96.0 100 92.5 100 92.5 100

Discussion:

• Neither GPU can correctly handle subnormal numbers using their basic arithmetic operators. Both GPUs flush subnormal numbers to zero. • Unlike the ATi GPU, the nVIDIA GPU can correctly compare subnormals. Neither GPU properly treats -0 as a negative number: both GPUs treat it as a positive number. • Per the IEEE-754 standard[1], if a NaN is encountered as an input, the function should return the exact same NaN payload. Neither GPU does this.

Conclusion Conclusion

• The nVIDIA GPU produces somewhat better quality results than the ATi GPU. • Implementing a custom math library for the GPU can produce better results. • • •

Future Work Test all of the vForce functions (pow, sinh, cosh, tanh, asinh, acosh, and atanh were not tested). Thoroughly test the normal number line, in addition to the edge cases tested here. Study the performance of the built-in GLSL functions versus the ported vForce functions.

Accuracy of Built-in GLSL Functions Fn

# Tests

sqrt sin cos tan asin acos atan log exp log2 exp2

63 821 791 1152 96 96 96 37 28 44 28

Pass % Subnorm % ATi NV ATi NV 22.2 17.5 1.59 0.00 26.9 33.7 5.60 4.87 28.4 35.5 1.14 0.00 2.69 3.47 1.91 11.3 0.00 12.5 10.4 10.4 0.00 25.0 0.00 0.00 2.08 12.5 10.4 10.4 13.5 59.5 0.00 0.00 100 100 0.00 0.00 18.2 20.5 0.00 0.00 100 100 0.00 0.00

Infinity % ATi NV 0.00 0.00 1.95 1.95 0.00 0.00 1.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.4 0.00 0.00 0.00 18.2 18.2 0.00 0.00

NaN % +/-0 % ATi NV ATi NV 50.8 50.8 1.59 1.59 0.00 0.00 0.244 0.244 0.00 0.00 0.00 0.00 0.00 0.00 0.694 0.694 59.4 49.0 0.00 2.08 59.4 49.0 0.00 0.00 10.4 0.00 2.08 2.08 56.8 18.9 0.00 0.00 0.00 0.00 0.00 0.00 40.1 40.1 0.00 0.00 0.00 0.00 0.00 0.00

Normal % ATi NV 23.8 30.2 65.3 59.2 70.4 64.5 93.4 84.5 30.2 26.0 40.6 26.0 75.0 75.0 24.3 21.6 0.00 0.00 22.7 20.5 0.00 0.00

Discussion:

• log2 and exp2 are also important basic functions because they are used to extract and alter respectively the exponent of a floating-point number. • Overall, the quality of the results for the nVIDIA GPU system are better than the ones produced by the ATi GPU system. • ATi and nVIDIA are fairly consistent in their results for many of the floating-point specials.

Accuracy of Ported vForce Functions Fn

# Tests

div sqrt sin cos tan asin acos atan log exp log2 expm1 log1p

Discussion

345 63 821 791 1152 96 96 96 37 28 44 52 86

Pass % Subnorm % ATi NV ATi NV 53.0 55.1 8.70 9.27 76.2 93.7 0.00 0.00 30.5 37.8 3.65 5.60 34.9 40.2 0.00 0.00 2.69 3.47 1.91 11.3 76.0 86.5 0.00 0.00 88.5 88.5 0.00 0.00 55.2 53.1 10.4 10.4 67.6 97.3 0.00 0.00 92.8 100 0.00 0.00 68.1 81.8 0.00 0.00 57.7 100 40.4 0.00 51.2 98.8 45.3 0.00

Infinity % ATi NV 8.70 5.22 0.00 0.00 1.95 0.00 0.00 0.00 1.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.2 0.00 0.00 0.00 0.00

NaN % +/-0 % ATi NV ATi NV 14.2 15.9 9.86 9.86 22.2 0.00 1.59 1.59 0.00 0.00 0.243 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.694 0.694 0.00 0.00 2.08 0.00 0.00 0.00 0.00 0.00 10.4 0.00 2.08 2.08 21.6 0.00 0.00 0.00 7.14 0.00 0.00 0.00 31.8 0.00 0.00 0.00 0.00 0.00 1.92 0.00 1.16 0.00 1.16 0.00

Normal % ATi NV 5.51 4.64 0.00 4.76 63.7 56.6 65.1 59.8 93.4 84.5 24.0 13.5 11.5 11.5 21.9 34.4 10.8 2.70 0.00 0.00 0.00 0.00 0.00 0.00 1.16 1.16

• vForce actually names log2 logb, but it is listed as log2 for consistency. • div and sqrt, which both use Newton-Raphson refinement to converge at an answer, employ the built-in division and square root functionality in GLSL to provide the initial estimate. • expm1 and log1p are the only two functions listed which use a table lookup. • sin, cos, and tan have custom test vectors designed to test around nπ/4. This was done to stress the argument reduction used by these functions’ algorithms. • Limited language support in GLSL for floating-point special constants required using a third texture to store the important constants. Accessing these constants involves doing an additional texture lookup for every data element. • Overall, the quality of the results for the nVIDIA GPU system are better than the ones produced by the ATi GPU system. • Overall, the ported vForce functions provide better results than do the built-in GLSL functions.

Dan Fay[1]

The nVIDIA GPU produces somewhat better quality results than the ATi GPU. • Implementing a ... to significantly accelerate important classes of non-graphics.

583KB Sizes 4 Downloads 304 Views

Recommend Documents

ITJTIHAD DAN IFTA', TAQLID DAN TALFIQ.pdf
Ibid. Page 3 of 16. ITJTIHAD DAN IFTA', TAQLID DAN TALFIQ.pdf. ITJTIHAD DAN IFTA', TAQLID DAN TALFIQ.pdf. Open. Extract. Open with. Sign In. Main menu.

sejarah-dan-bibliografi-akhbar-dan-majalah-melayu.pdf ...
Nederland, Singapura, Sri Lanka dan United Kingdom. Senarai bibliografi akhbar dan majalah. serta nama editornya yang tersusun mengikut kronologi dan ...

05_SISDUR PERTANGGUNGJAWABAN DAN PELAPORAN.pdf ...
Page 3 of 5. 05_SISDUR PERTANGGUNGJAWABAN DAN PELAPORAN.pdf. 05_SISDUR PERTANGGUNGJAWABAN DAN PELAPORAN.pdf. Open. Extract.

Dan Ariely.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Dan Ariely.pdf.

Syllable Integrity - Dan Everett
Next, I review the basic properties of Banawá stress, as originally analyzed in ... However, if it does have V-syllables, then we must also add a stipulation to the effect .... 10 For the sake of illustration, I have shown the forms of the name and.

optimalisasi-peran-dan-fungsi-guru-bimbingan-dan-konseling-dalam ...
Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. optimalisasi-peran-dan-fungsi-guru-bimbi

Dan Johnson
way out of ontological commitment was the development of a new semantics, ...... the proposition true in my sense, however, because a different electron could ...

INOVASI DAN PRESTASI.pdf
Retrying... INOVASI DAN PRESTASI.pdf. INOVASI DAN PRESTASI.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying INOVASI DAN PRESTASI.pdf.

Huong dan MITCALC.pdf
mở trong Microsoft Excel do công ty Ing. Miroslav Petele, Cộng hòa Séc thá»±c hiện. MITCalc. gồm cả tính toán thiết kế và kiểm nghiệm cho nhiều chi tiết máy khác nhau nhÆ°: bánh răng, đai,. xích, ổ trục, chi tiáº

Jingga dan Senja.pdf
terlmbat sudah sering dilakukannya baik disengaja ataupun tidak. Tapi pagi ini. dia sedang malas mendengarkan ceramah Bu Sam, guru yang palinh terobsesi.

MUTLAQ DAN MUQAYYAD.doc.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. MUTLAQ DAN MUQAYYAD.doc.pdf. MUTLAQ DAN MUQAYYAD.doc.pdf. Open. Extract. Open with. Sign In. Main menu.

NU dan Pancasila.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. NU dan ...

pesantren dan radikalisme_makalah.pdf
gugusan pulau-pulau, selat-selat, dan bersuku-suku. Pesantren Basis Kultural Untuk Memupuk Islam yang Nasionalistik. Jasmerah, kata Bung Karno, jangan ...

ATASE DAN ATDAG.pdf
Indonesian Trade Promotion Center, Jeddah The. Consulat General of the Republic of Indonesia Al-Mualifin. St. At-Rehab District/5 PO. Box 10, Jeddah 2141,. Kingdom of Saudi Arabia,. Jeddah Intl. Business Center /JIB 2 'Floor PO Box 6659. Jeddah 21452

Huong dan DKHP_XHNV.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Huong dan DKHP_XHNV.pdf. Huong dan DKHP_XHNV.pdf. Open. Extract. Open with. Sign In. Main menu.

Dan bull dishonored
samples v2.0. Jimmy fallon holmes.125988366.Green.lantern.the.animated.series s01e0 720p.Affair s02e12 killers 720.Digital painting pdf.Tori black. orgasm.One directionmadein theis_safe:1. . laligne verte.Celeste buckinghamwherei belong.Krrish 3 offi

Indonesia dan GNB.pdf
Konferensi Asia Afrika merupakan gagasan oleh lima Negara yaitu. Indonesia, India, Pakistan, Burma dan Sri Lanka. Persiapan pertama. dilakukan di Kolombo ...

For Peer Review - Dan Halgin
social capital across individuals, and how these differences relate to differences in outcomes (cf. Lin, Cook .... a dynamic property of individuals that can change as a result of life events (Gist & Mitchell,. 1992), as ..... business development, c

epub dan simmons
dan simmons epub, fb2, mobi, lit, lrf, pdf. Los viros de la mente dan ... Détails du torrent dan simmons serie cantos d 39 hyperion tomes 1 9. Un. verano tenebroso dan ... Nombres decimals: part sencera i part decimal. - Dècimes, centèsimes i ..

IMAN DAN KUFUR.pdf
Hasan Hanafi, istilah kunci yang biasanya dipergunakan oleh para teologi. Muslim adalah amal (perbuatan baik atau patuh), ikrar (pengakuan dengan. lisan) ...

VISI DAN MISI.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

Suhu dan Pemuaian.pdf
19.1 Temperature and the Zeroth Law. of Thermodynamics. 19.2 Thermometers and the Celsius. Temperature Scale. 19.3 The Constant-Volume Gas.

Dan Bornstein Google
It is a virtual machine to… What is the Dalvik VM? • run on a slow CPU. • with relatively little RAM. • on an OS without swap space. • while powered by a battery ...