Interprocessor Communication : Towards Cache Integrated Network ...

Viewer
Transcript

Interprocessor Communication : Towards Cache Integrated Network Interfaces Vassilis Papaefstathiou and Michael Papamichael Foundation for Research and Technology-Hellas (FORTH) Institute of Computer Science (ICS) – member of HiPEAC Computer Architecture and VLSI Systems Laboratory (CARV) Work funded by SARC

Motivation and Context ♦ Many on-chip cores available today (CMPs) – – – –

P

P

P

L2

L2

L1

Key scalability issue : efficient interprocessor communication Network-on-Chip (NoC) for the interconnection Memory resources in every tile used as L1 or L2 cache NoC interface in every tile Æ low cost to afford

…

L1 NI

L2 L1

NI

NI

Network-on-Chip NI

NI

L1

NI

…

L1 L2

L2

P

P

L1 L2 P

Cache Integrated Network Interfaces ♦ Tightly-coupled processor - network architecture

P1

– Processor and NI share local memory Æ lightweight NI – Allocation of memory blocks for NI use coarse or fine-grain block allocation (cache-line) – Fast and low latency mechanism for explicit messaging stores (send) and loads (receive) – Integrate NI mechanisms into the cache controller transmission similar to cache write-back reception similar to cache miss

Local Memory & Cache Controller

NI

NÆ1

Network-on-Chip

♦ Desired communication primitives

NI

– RDMA for bulk transfers post descriptors in cache-lines – Queues for small explicit transfers specify destination, size and payload send queues ( one-to-many ) receive queues ( many-to-one )

P2

NI

Local Memory & Cache Controller

1ÆN Local Memory & Cache Controller

Cache Line

P3

NI Queue

Runtime Configurable Memory Resources NI

♦ Applications configure the memory type ♦ Computation intensive applications

– Configure the degree of cache associativity

P

♦ Communication intensive applications – Fine grain allocation of blocks for NI

Configurable HW

– Diverse applications Æ different memory requirements

♦ Real-time embedded applications – Configure as addressable local store - scratchpad predictable performance

Local Memory Resources

Cache Line

NI Queue

Scratchpad