4/11/12  

Agenda   •  SoCs   •  Case  Studies  

Mobile  GPUs  

–  NVIDIA  Tegra  2,  Tegra  3   –  ImaginaKon  Technologies’  PowerVR  SGX   Series5XT   –  Apple  iPad  (2012)    

Varun  Sampath   University  of  Pennsylvania   CIS  565  -­‐  Spring  2012  

•  Future   •  Note  about  sources  

What  is  an  SoC?   •  System-­‐on-­‐a-­‐Chip  

Mobile  SoC  Market  Share  2011  

–  CPU,  GPU,  DSP,  I/O   –  Single-­‐chip  soluKon  

•  Top  mobile  SoC  vendors:   –  Qualcomm,  Apple,  TI,   Samsung,  NVIDIA    

•  Advantages  of  using   SoCs?   •  Disadvantages?   •  We  will  see  all  consumer   chips  converge  to  SoCs  

What  is  an  SoC?  

Others   12%   TI   17%  

Samsung   14%  

Apple   23%   NVIDIA   3%   Qualcomm   31%  

Market  Share  Data  from  PC  PerspecKve  

Image  from  iFixit  

1  

4/11/12  

Block  Diagram  of  TI  OMAP  4470  

Brief  Discussion  of  ARM   •  RISC  CPU  vendor  that  currently  dominates   mobile   •  Mobile  Designs:  Cortex-­‐A8,  A9,  A15   •  Fabless  Designer   –  Core  Design  Licensees   –  Architecture  Licensees   •  Qualcomm  Scorpion/Krait   •  NVIDIA  

Image  from  TI  

The  Constraints  of  Mobile  

Some  Energy  Numbers  

•  Energy   –  Cell  phone  bacery  capacity  of  5-­‐7  Wh  (tablets   20-­‐40  Wh)   –  How  much  energy  can  our  chips  consume?  

•  Area   –  PCB  size  constraints   –  Cooling  constraints  

Data  from  AnandTech  

2  

4/11/12  

Some  Contributors  to  Switching   Energy  

Some  TheoreKcal  Performance   Numbers  

•  Off-­‐chip  Interconnect  (to  DRAM)   –  Bandwidth  is  expensive     –  Minimize  reasons  to  fire  up  memory  bus  

•  High  frequencies   –  Requires  increased  voltages  

Apple  iPad  2  

ASUS  Transformer   Some  Nice   Prime   Desktop  

CPU  

A5  @  1GHz  

Tegra  3  @  1.4GHz  

GPU  

POWERVR   SGX543MP2  @   250MHz  

Mobile  GeForce  @   GTX680    @  1GHz   500MHz  

Memory   Interface  

64-­‐bit  @  (maybe)   800MHz  =  6.4GB/s  

32-­‐bit  

256-­‐bit  @  6GHz  =   192GB/s    

GPU   GFLOPS  

16  GFLOPS  

12  GFLOPS  

3  TFLOPS  

Sandy  Bridge  @   3.4GHz  

Mobile  Data  from  AnandTech   GTX680  Specs  from  Newegg  

GeForce  GPU  in  NVIDIA  Tegra  2  

Tegra  2  Mobile  GeForce   •  Separate  vertex  and  pixel  shaders   –  4  of  each,  each  capable  of  1  mulKply-­‐add  /clock  

•  Pixel,  texture,  vertex,  and  acribute  caches   –  Reduce  memory  transacKons   –  Pixel  cache  useful  for  UI  components  

•  Memory  controller  opKmizaKons   –  Arbitrate  between  CPU  &  GPU  requests   –  Reorder  requests  to  limit  bank  switching   Image  from  NVIDIA  

3  

4/11/12  

NVIDIA  Tegra  3  (Kal-­‐El)  

PowerVR  SGX  

•  Expanded   Mobile   GeForce   –  4  vertex  and   8  pixel   shaders  

•  4-­‐PLUS-­‐1   architecture   Image  from  AnandTech  

PowerVR  SGX  Series5XT  

•  TA  (Tile  Accelerator)  –  store  scene  data  and  split  up  screen  into  Kles   •  ISP  (Image  Synthesis  Processor)  –  perform  Hidden  Surface  Removal  with  z-­‐ tesKng   •  TSP  (Texture  and  Shading  Processor)  –  run  pixel  shader  

Image  from  ImgTec  

Summarizing  PowerVR  SGX  Series5XT   •  Used  in  Apple  A5,  A5X   •  Unified  shader  architecture  (called  USSE2)   •  Tile  based  deferred  rendering  (TBDR)   –  Will  cover  in  more  detail  next  week  

•  MulK-­‐core  architecture  

Image  from  ImgTec  

4  

4/11/12  

Mobile  GPU  Families  

Demands  for  Mobile  

•  Qualcomm  Adreno  

•  Higher  screen  resoluKons  

–  Unified  shaders,  4-­‐wide  SIMD   –  immediate  mode  with  early-­‐z  

•  ImaginaKon  Technologies’  PowerVR  SGX  Series5XT   –  Unified  shaders,  4-­‐wide  SIMD   –  Tile  based  deferred  rendering  

•  NVIDIA  Mobile  GeForce  

–  Separate  vertex  (4)  &  pixel  (8/12)  shaders  ,    scalar   –  immediate  mode  with  early-­‐z  

–  Requires  more  memory  bandwidth   –  Pixel  count  growing  higher  than  geometry?  

•  Longer  bacery  life   •  Higher  quality  mobile  gaming  

•  ARM  Mali  

–  Separate  vertex  (1)  &  pixel  (4)  shaders  ,  4/2-­‐wide  SIMD   –  immediate  mode  with  early-­‐z   Analysis  by  AnandTech  

Case  Study:  the  new  iPad  

iPad  Gaming  Performance  

•  Screen  resoluKon  of  2048x1536   –  Quadruple  the  pixels  of  previous  1024x768   version   –  Higher  than  nearly  all  desktop  and  laptop  displays  

•  Bacery  life  approximately  equal  to  previous   version   •  Gaming  performance?  

Image  from  AnandTech  

5  

4/11/12  

Apple  iPad  StaKsKcs   Apple  iPad  2  

Apple  iPad  (2012)   11”  Apple  MBA  

CPU  

A5  @  1GHz  

A5X  @  1GHz  

Sandy  Bridge  @   1.8GHz  

GPU  

POWERVR   SGX543MP2  @   250MHz  

POWERVR   SGX543MP4  @   250MHz  

Sandy  Bridge  IGP   @  350MHz/ 1.2GHz  

Memory   Interface  

64-­‐bit  @  800MHz  =   6.4GB/s  

128-­‐bit  (for  GPU)  

128-­‐bit  @  1.3GHz   =  20.8GB/s  

Die  Size  

122mm2  

163mm2  

149mm2  

42.5Wh  

35Wh  

Bacery  Size   25Wh  

Apple  A5X  Die  Shot  

Image  from  UBMTechInsights  

Data  and  Image  from  AnandTech  

What  will  the  future  bring?   •  GPU  Compute   –  PowerVR  SGX  Series5XT  OpenCL  capable,  but  no   drivers   –  Could  do  compute  the  old-­‐fashioned  way  with  GLSL   –  Direct3D  11  means  Compute  Shader  support  

•  PowerVR  Series6  press  release  suggests  100-­‐1000   GFLOPS   •  Kepler-­‐based  GPU  coming  to  a  super  phone  near   you?  

6  

Mobile GPUs - CIS 565

Apr 11, 2012 - 23%. NVIDIA. 3%. Qualcomm. 31%. Samsung. 14%. TI. 17%. Others. 12% ... version. – Higher than nearly all desktop and laptop displays.

5MB Sizes 3 Downloads 247 Views

Recommend Documents

565.pdf
Cloud Computing: Kiïëm tiïìn trïn... ... Àiïån toaán àaám mêy (Cloud Computing) vaâo .... Project, Quiz, Article, Debugging. n. Trêìn Quyïn. Page 3 of 24. 565.pdf.

Neural GPUs Learn Algorithms
Mar 15, 2016 - Published as a conference paper at ICLR 2016. NEURAL ... One way to resolve this problem is by using an attention mechanism .... for x =9=8+1 and y =5=4+1 written in binary with least-significant bit left. Input .... We call this.

hp 565 color.pdf
reino no tendrá fin». Y María dijo al ángel: «¿Cómo será esto,. pues no conozco a varón?». El ángel le contestó: «El Espíritu Santo vendrá sobre ti y el poder del.

B 565 - 04 _QJU2NQ__.pdf
clean the specimens and test device before referee testing (for. example, clean ... Report. 11.1 The report shall include the following for each speci- men tested:.

CIS 194.pdf
This semester we'll take a close look at Haskell's type system, which. • Helps clarify thinking and express program structure. The first step in writing a Haskell ...

Parallel Programming CPUs & GPUs
1837-71: Charles Babbage analytical engine. • 1954: IBM 704 “first real MIMD”. • 1958: parallelism in numerical calculations. • 1962: four-processor, 16 memory modules. • 1964: SIMD. • 1969: eight processors in parallel. • 1970s: more

GPUfs: Integrating a File System with GPUs
to be largely oblivious to where data is located—whether on disk, in main memory, in a .... comprising a kernel into a single global hardware queue. The hard-.

GPUfs: Integrating a File System with GPUs
parallel data analysis applications, prioritized image matching and string search, highlight the ... A design and implementation of a generic software-only buffer.

Accelerating SSL with GPUs
eavesdropping, and enables authentication of end hosts. Nowadays,. SSL plays an essential role in online-banking, e-commerce, and other Internet services to ...

Wason's Cards - Temple CIS - Temple University
. This paper proposes a new interpretation of Wason's selection task. According to it,.

Notes on Theory of Distributed Systems CS 465/565 - CiteSeerX
Dec 15, 2005 - 11.2.1 Degrees of completeness . ...... aspnes/classes/469/notes-2011.pdf. Notes from earlier semesters can be found at http://pine.cs.yale. ...... years thanks to the Network Time Protocol, cheap GPS receivers, and clock.

Efficient Computation of Sum-products on GPUs ...
Bayesian networks on an NVIDIA GeForce 8800GTX GPU over an optimized CPU version on a single core of a 2.4 GHz Intel Core. 2 processor with 4 MB L2 cache. For sufficiently large inputs the speedups reach 2700. The significant contributor to the speed

CIS Information Booklet - Southampton - Carers in Southampton
Leisure and Learning. Planning. Who is this for? You may be: .... current schedule of what you can access and for other adult education courses, visit our website.

Exploring the Essence of Intelligence - Temple CIS
Aug 30, 1995 - to adapt to the environment under insufficient knowledge and ...... depends both on the request from the external environment and on the ...

CIS Information Booklet - Carers in Southampton
Planning Ahead. Page 7. Southampton Carers' Strategy. Page 8. Top Tips. Page 9. Contacts. Page 10. 1 ... via a packed website carersinsouthampton.co.uk ... We also host a dedicated resources room within our .... launched by January 2016.

3. CIS CAS JEC.pdf
CRITERIOS DE CALIFICACIÓN. RESULTADOS PRELIMINARES CAS JEC - COORDINADOR DE INNOVACIÓN Y SOPORTE 2017. N° DRE UGEL APELLIDOS ...

Wason's Cards - Temple CIS - Temple University
age of person 1, and what person 4 is drinking [Griggs and Cox, 1982]. This result is often interpreted as “though people have difficulty in following logic in abstract reasoning, they can do so in concrete situations”. If human reasoning does no

IEEE CIS Social Media - IEEE Xplore
Feb 2, 2012 - interact (e.g., talk with microphones/ headsets, listen to presentations, ask questions, etc.) with other avatars virtu- ally located in the same ...

Fast Multiplication in Binary Fields on GPUs via ...
Target applications: shared memory to cache input (e.g. stencil). • Our case: binary field multiplication. • Result: 50% speedup over baseline x138 over a single core CPU with Intel's CLMUL instruction. Page 3. ICS 2016. Mark Silberstein, Technio

Real-Time Particle-Based Simulation on GPUs - Semantic Scholar
†e-mail:[email protected]. ‡e-mail:[email protected]. §e-mail:[email protected] particles (spheres) as the title of this skech implies ...

Approximation Algorithms for Wavelet Transform ... - CIS @ UPenn
Call this strip of nodes S. Note that |S| ≤ 2q log n. The nodes in S break A into two smaller subproblems L .... Proc. of VLDB Conference, pages 19–30, 2003.

CIS: Course Information Sheet
Jul 15, 2006 - techniques. LabVIEW and/or other standard tools for system interfacing may be discussed ... Mathematically processing data. • Communicating ...

HS CIS Login and Info.pdf
[email protected]. Page 2 of 2. HS CIS Login and Info.pdf. HS CIS Login and Info.pdf. Open. Extract. Open with. Sign In. Main menu.

Efficient Parallel CKY Parsing on GPUs - Slav Petrov
of applications in various domains by executing a number of threads and thread blocks in paral- lel, which are specified by the programmer. Its popularity has ...