Varun Sampath University of Pennsylvania CIS 565 -‐ Spring 2012
• Future • Note about sources
What is an SoC? • System-‐on-‐a-‐Chip
Mobile SoC Market Share 2011
– CPU, GPU, DSP, I/O – Single-‐chip soluKon
• Top mobile SoC vendors: – Qualcomm, Apple, TI, Samsung, NVIDIA
• Advantages of using SoCs? • Disadvantages? • We will see all consumer chips converge to SoCs
What is an SoC?
Others 12% TI 17%
Samsung 14%
Apple 23% NVIDIA 3% Qualcomm 31%
Market Share Data from PC PerspecKve
Image from iFixit
1
4/11/12
Block Diagram of TI OMAP 4470
Brief Discussion of ARM • RISC CPU vendor that currently dominates mobile • Mobile Designs: Cortex-‐A8, A9, A15 • Fabless Designer – Core Design Licensees – Architecture Licensees • Qualcomm Scorpion/Krait • NVIDIA
Image from TI
The Constraints of Mobile
Some Energy Numbers
• Energy – Cell phone bacery capacity of 5-‐7 Wh (tablets 20-‐40 Wh) – How much energy can our chips consume?
• Area – PCB size constraints – Cooling constraints
Data from AnandTech
2
4/11/12
Some Contributors to Switching Energy
Some TheoreKcal Performance Numbers
• Off-‐chip Interconnect (to DRAM) – Bandwidth is expensive – Minimize reasons to fire up memory bus
• High frequencies – Requires increased voltages
Apple iPad 2
ASUS Transformer Some Nice Prime Desktop
CPU
A5 @ 1GHz
Tegra 3 @ 1.4GHz
GPU
POWERVR SGX543MP2 @ 250MHz
Mobile GeForce @ GTX680 @ 1GHz 500MHz
Memory Interface
64-‐bit @ (maybe) 800MHz = 6.4GB/s
32-‐bit
256-‐bit @ 6GHz = 192GB/s
GPU GFLOPS
16 GFLOPS
12 GFLOPS
3 TFLOPS
Sandy Bridge @ 3.4GHz
Mobile Data from AnandTech GTX680 Specs from Newegg
GeForce GPU in NVIDIA Tegra 2
Tegra 2 Mobile GeForce • Separate vertex and pixel shaders – 4 of each, each capable of 1 mulKply-‐add /clock
• Pixel, texture, vertex, and acribute caches – Reduce memory transacKons – Pixel cache useful for UI components
• Memory controller opKmizaKons – Arbitrate between CPU & GPU requests – Reorder requests to limit bank switching Image from NVIDIA
3
4/11/12
NVIDIA Tegra 3 (Kal-‐El)
PowerVR SGX
• Expanded Mobile GeForce – 4 vertex and 8 pixel shaders
• 4-‐PLUS-‐1 architecture Image from AnandTech
PowerVR SGX Series5XT
• TA (Tile Accelerator) – store scene data and split up screen into Kles • ISP (Image Synthesis Processor) – perform Hidden Surface Removal with z-‐ tesKng • TSP (Texture and Shading Processor) – run pixel shader
Image from ImgTec
Summarizing PowerVR SGX Series5XT • Used in Apple A5, A5X • Unified shader architecture (called USSE2) • Tile based deferred rendering (TBDR) – Will cover in more detail next week
• MulK-‐core architecture
Image from ImgTec
4
4/11/12
Mobile GPU Families
Demands for Mobile
• Qualcomm Adreno
• Higher screen resoluKons
– Unified shaders, 4-‐wide SIMD – immediate mode with early-‐z
– Separate vertex (4) & pixel (8/12) shaders , scalar – immediate mode with early-‐z
– Requires more memory bandwidth – Pixel count growing higher than geometry?
• Longer bacery life • Higher quality mobile gaming
• ARM Mali
– Separate vertex (1) & pixel (4) shaders , 4/2-‐wide SIMD – immediate mode with early-‐z Analysis by AnandTech
Case Study: the new iPad
iPad Gaming Performance
• Screen resoluKon of 2048x1536 – Quadruple the pixels of previous 1024x768 version – Higher than nearly all desktop and laptop displays
• Bacery life approximately equal to previous version • Gaming performance?
Image from AnandTech
5
4/11/12
Apple iPad StaKsKcs Apple iPad 2
Apple iPad (2012) 11” Apple MBA
CPU
A5 @ 1GHz
A5X @ 1GHz
Sandy Bridge @ 1.8GHz
GPU
POWERVR SGX543MP2 @ 250MHz
POWERVR SGX543MP4 @ 250MHz
Sandy Bridge IGP @ 350MHz/ 1.2GHz
Memory Interface
64-‐bit @ 800MHz = 6.4GB/s
128-‐bit (for GPU)
128-‐bit @ 1.3GHz = 20.8GB/s
Die Size
122mm2
163mm2
149mm2
42.5Wh
35Wh
Bacery Size 25Wh
Apple A5X Die Shot
Image from UBMTechInsights
Data and Image from AnandTech
What will the future bring? • GPU Compute – PowerVR SGX Series5XT OpenCL capable, but no drivers – Could do compute the old-‐fashioned way with GLSL – Direct3D 11 means Compute Shader support
• PowerVR Series6 press release suggests 100-‐1000 GFLOPS • Kepler-‐based GPU coming to a super phone near you?
Mar 15, 2016 - Published as a conference paper at ICLR 2016. NEURAL ... One way to resolve this problem is by using an attention mechanism .... for x =9=8+1 and y =5=4+1 written in binary with least-significant bit left. Input .... We call this.
reino no tendrá fin». Y MarÃa dijo al ángel: «¿Cómo será esto,. pues no conozco a varón?». El ángel le contestó: «El EspÃritu Santo vendrá sobre ti y el poder del.
clean the specimens and test device before referee testing (for. example, clean ... Report. 11.1 The report shall include the following for each speci- men tested:.
This semester we'll take a close look at Haskell's type system, which. ⢠Helps clarify thinking and express program structure. The first step in writing a Haskell ...
to be largely oblivious to where data is locatedâwhether on disk, in main memory, in a .... comprising a kernel into a single global hardware queue. The hard-.
parallel data analysis applications, prioritized image matching and string search, highlight the ... A design and implementation of a generic software-only buffer.
eavesdropping, and enables authentication of end hosts. Nowadays,. SSL plays an essential role in online-banking, e-commerce, and other Internet services to ...
Dec 15, 2005 - 11.2.1 Degrees of completeness . ...... aspnes/classes/469/notes-2011.pdf. Notes from earlier semesters can be found at http://pine.cs.yale. ...... years thanks to the Network Time Protocol, cheap GPS receivers, and clock.
Bayesian networks on an NVIDIA GeForce 8800GTX GPU over an optimized CPU version on a single core of a 2.4 GHz Intel Core. 2 processor with 4 MB L2 cache. For sufficiently large inputs the speedups reach 2700. The significant contributor to the speed
Leisure and Learning. Planning. Who is this for? You may be: .... current schedule of what you can access and for other adult education courses, visit our website.
Aug 30, 1995 - to adapt to the environment under insufficient knowledge and ...... depends both on the request from the external environment and on the ...
Planning Ahead. Page 7. Southampton Carers' Strategy. Page 8. Top Tips. Page 9. Contacts. Page 10. 1 ... via a packed website carersinsouthampton.co.uk ... We also host a dedicated resources room within our .... launched by January 2016.
age of person 1, and what person 4 is drinking [Griggs and Cox, 1982]. This result is often interpreted as âthough people have difficulty in following logic in abstract reasoning, they can do so in concrete situationsâ. If human reasoning does no
Feb 2, 2012 - interact (e.g., talk with microphones/ headsets, listen to presentations, ask questions, etc.) with other avatars virtu- ally located in the same ...
Target applications: shared memory to cache input (e.g. stencil). ⢠Our case: binary field multiplication. ⢠Result: 50% speedup over baseline x138 over a single core CPU with Intel's CLMUL instruction. Page 3. ICS 2016. Mark Silberstein, Technio
Call this strip of nodes S. Note that |S| ⤠2q log n. The nodes in S break A into two smaller subproblems L .... Proc. of VLDB Conference, pages 19â30, 2003.
Jul 15, 2006 - techniques. LabVIEW and/or other standard tools for system interfacing may be discussed ... Mathematically processing data. ⢠Communicating ...
of applications in various domains by executing a number of threads and thread blocks in paral- lel, which are specified by the programmer. Its popularity has ...