15th OptoElectronics and Communications Conference (OECC2010) Technical Digest, July 2010, Sapporo Convention Center, Japan

7A3-1 (Invited)

100GbE and Beyond for Warehouse Scale Computing Bikash Koley*, Vijay Vusirikala*, Cedric Lam*, Vijay Gill* * Google Inc, USA

accommodate non-blocking connection from every server to every other server in a datacenter, so that applications do not require location awareness within a WSC infrastructure. However, such a design would be prohibitively expensive. More commonly, interconnections are aggregated with hierarchies of distributed switching fabrics with an oversubscription factor for communication across racks (Fig. 2) [2].

Abstract--As computation and storage continues to move from desktops to large internet services, computing platforms running such services are transforming into warehouse-scale computers. 100 Gigabit Ethernet and beyond will be instrumental in scaling the interconnection within and between these ubiquitous warehouse-scale computing infrastructures. In this paper, we describe the drivers for such interfaces and some methods of scaling Ethernet interfaces to speeds beyond 100GbE.

I. INTRODUCTION As computation continues to move into the cloud, the computing platforms are no longer stand-alone servers but homogeneous interconnected computing infrastructures hosted in mega-data-centers. These warehouse-scale-computers (WSCs) provide a ubiquitous interconnected compute platform as a shared resource for many distributed services, and therefore are very different from traditional rack-full of collocated servers in a datacenter [1]. Interconnecting such WSCs in a cost-effective yet scalable way is a unique challenge that needs to be addressed. (a)

II. INTRA-DATACENTER CONNECTIVITY A WSC is a massive computing infrastructure built with homogeneous hardware and system software arranged in racks and clusters interconnected by massive networking infrastructure [1]. Figure 1 shows common architecture of a WSC. A set of commodity servers are arranged into racks and interconnected through a top of rack (TOR) switch. Rack switches are connected to cluster switches which provide connectivity between racks and form the cluster-fabrics for warehouse-scale computing.

(b) Fig. 2. Hierarchies of intra-datacenter cluster-switching interconnect fabrics (a) within a single building (b) across multiple buildings

Intra-datacenter networking takes advantage of a fiber rich environment to drive very large bandwidth within and between clusters.

Fig. 1. Typical elements in a Warehouse Scale Computer

Ideally, one would like to have an intra-datacenter switching fabric with sufficient bi-sectional bandwidth to

III. INTER-DATACENTER CONNECTIVITY A WSC infrastructure can span multiple data-centers. 106

Consequently the cluster aggregation switching fabric will span multiple data-centers as well as shown in Fig. 3.

10000

1000

Tbit Ethernet @ 2013?

100

IEEE 802.3ba

10

"Moore's Law Traffic" Growth

IEEE 802.3ae 1

0.1 1995

Fig. 3. Inter-datacenter networks connecting multiple WSCs

Internet Traffic Growth IEEE 802.3z

2000

2005

2010

2015

2020

Year

Fig. 5. Ethernet standards and port-speeds compared to Internet and extrapolated Morre’s Law (machine-to-machine) traffic growth

Typically inter-datacenter connection fabrics are implemented over a fiber-scarce physical layer as the link distances are tens to hundreds of kilometers. If capacity per fiber-pair is not maximized, a bottleneck is introduced due to high oversubscription for interdatacenter communication [1]. Acceleration of broadband penetration and uptake of internet based applications with rich multi-media contents have led to > 40% compound annual growth rate of internet traffic [3] (Fig. 3), with 9 exabytes of traffic volume per month. While the exponential growth of internet traffic drives bandwidth demand for interdatacenter networks, the Moore’s-law growth of processing and storage capacity [4] utilized in the WSC infrastructure drives bandwidth at an even faster pace. Extrapolating the average CAGR of 60% seen in processing-power and storage capacity, one can see that Ethernet standard and port-speeds have kept up well with internet-scale traffic growth but are falling behind Moore’s-law (Machine-to-Machine) traffic growth (Fig. 5.) .

Therefore, the need for Ethernet standard supporting speed 100Gbps is immediate for inter-datacenter connections. IV. CONCLUSIONS Advent of warehouse-scale-computing has been driving the need for bandwidth within and between datacenters. While intra-datacenter connections can take advantage of a fiber-rich physical layer, need for fiberscarce inter-datacenter connections will drive the adoption of 100GbE and beyond in the massive WSC environments. Deployment of Ethernet technology beyond 100GbE will be needed within the next three to five years for WSC interconnects.

REFERENCES [1]

[2] [3]

L.A. Barroso and U. Hölzle. The Datacenter as a Computer – an Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool Publishers, 2009. http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V0 1Y200905CAC006 B, Koley, “Requirements for Data Center Interconnects,” paper TuA2, 20th Annual Workshop on Interconnections within High Speed Digital Systems, Santa Fe, New Mexico, 3 – 6 May 2009. C. Labovitz et al: ATLAS Internet Observatory 2009 Annual Report http://www.nanog.org/meetings/nanog47/presentations/Monday/L abovitz_ObserveReport_N47_Mon.pdf

[4] Morris,Truskowski, “The evolution of storage systems”, IBM Systems Journal, Vol 42, No 2, 2003

Fig. 4. > 40% CAGR of internet traffic [3]

107

100GbE and Beyond for Warehouse Scale ... - Research at Google

from desktops to large internet services, computing platforms ... racks and clusters interconnected by massive networking ... five years for WSC interconnects.

1MB Sizes 4 Downloads 272 Views

Recommend Documents

100GbE and beyond for warehouse scale ... - Semantic Scholar
Jul 28, 2011 - running such services are transforming into warehouse scale computers. ... ters interconnected by massive networking infrastructure [1]. Fig.

100GbE and beyond for warehouse scale computing interconnects
Jul 28, 2011 - sumer trend is the migration from local compute/storage model to a cloud computing paradigm. As com- putation and storage continues to ...

Optimizing Google's Warehouse Scale ... - Research at Google
various corunning applications on a server, non-uniform memory accesses (NUMA) .... In addition, Gmail backend server jobs are not run in dedicated clusters.

Profiling a warehouse-scale computer - Research at Google
from those for traditional servers or hosting services, and em- phasize system design for internet-scale services across thou- sands of computing nodes for ... This paper presents the first (to the best of our knowledge) profiling study of a live ...

Evaluating job packing in warehouse-scale ... - Research at Google
that are behind cloud computing and high-performance web services [3]. A significant ..... 10. else if vertical inflation then. 11. for each t in tasks of j do. 12. tcpu ← tcpu × f. 13. ..... All four techniques agree on arrival being the best for

Dynamic iSCSI at Scale- Remote paging at ... - Research at Google
Pushes new target lists to initiator to allow dynamic target instances ... Service time: Dynamic recalculation based on throughput. 9 ... Locally-fetched package distribution at scale pt 1 .... No good for multitarget load balancing ... things for fr

Tera-scale deep learning - Research at Google
The Trend of BigData .... Scaling up Deep Learning. Real data. Deep learning data ... Le, et al., Building high-‐level features using large-‐scale unsupervised ...

Web-scale Image Annotation - Research at Google
models to explain the co-occurence relationship between image features and ... co-occurrence relationship between the two modalities. ..... screen*frontal apple.

Large-scale speaker identification - Research at Google
promises excellent scalability for large-scale data. 2. BACKGROUND. 2.1. Speaker identification with i-vectors. Robustly recognizing a speaker in spite of large ...

Optimal Content Placement for a Large-Scale ... - Research at Google
CONTENT and network service providers are facing an explosive growth in ... a 1-h 4 K video takes up about 20 GB of disk [2], and today's. VoD providers are ...

Up Next: Retrieval Methods for Large Scale ... - Research at Google
KDD'14, August 24–27, 2014, New York, NY, USA. Copyright 2014 ACM .... YouTube official blog [1, 3] or work by Simonet [25] for more information about the ...

Cultivating DNN Diversity for Large Scale Video ... - Research at Google
develop new approaches to video analysis and classifica- ... We then provide analysis on the link ...... tion and Software Technology, 39(10):707 – 717, 1997.

Large-Scale Deep Learning for Intelligent ... - Research at Google
Android. Apps. GMail. Image Understanding. Maps. NLP. Photos. Robotics. Speech. Translation many research uses.. YouTube … many others . ... Page 10 ...

Web-Scale Multi-Task Feature Selection for ... - Research at Google
hoo! Research. Permission to make digital or hard copies of all or part of this work for ... geting data set, we show the ability of our algorithm to beat baseline with both .... since disk I/O overhead becomes comparable to the time to compute the .

cost-efficient dragonfly topology for large-scale ... - Research at Google
radix or degree increases, hop count and hence header ... 1. 10. 100. 1,000. 10,000. 1985 1990 1995 2000 2005 2010. Year .... IEEE CS Press, 2006, pp. 16-28.

Efficient Topologies for Large-scale Cluster ... - Research at Google
... to take advantage of additional packing locality and fewer optical links with ... digital systems – e.g., server clusters, internet routers, and storage-area networks.

Shasta: Interactive Reporting At Scale - Research at Google
online queries must go all the way from primary storage to user- facing views, resulting in .... tions, a user changing a single cell in a sorted UI table can induce subtle changes to .... LANGUAGE. As described in Section 3, Shasta uses a language c

Web-Scale N-gram Models for Lexical ... - Research at Google
correction, an approach sometimes referred to as the Mays,. Damerau, and .... tion, and apply our systems to preposition selection, spelling correction, and ...

Large-scale Semantic Networks: Annotation and ... - Research at Google
Computer Science Department. University ... notate at the semantic level while preserving intra- sentential .... structures in an online version of the semantically.

Software Defined Networking at Scale - Research at Google
Google Confidential and Proprietary. Google's Global CDN. Page 7. Google Confidential and Proprietary. B4: Software Defined inter-Datacenter WAN. Page 8 ...

Experimenting At Scale With Google Chrome's ... - Research at Google
users' interactions with websites are at risk. Our goal in this ... sites where warnings will appear. The most .... up dialog together account for between 12 and 20 points (i.e., ... tions (e.g., leaking others' social media posts), this may not occu

Profiling a warehouse-scale computer - Harvard University
presents a detailed microarchitectural analysis of live data- center jobs, measured on more than 20,000 .... Continuous profiling We collect performance-related data from the many live datacenter workloads using ..... base with a well-defined feature

Optimizing Google's Warehouse Scale Computers: The NUMA ...
Optimizing Google's Warehouse Scale Computers: The NUMA Experience. Lingjia Tang†, Jason Mars†, Xiao Zhang‡, ... acenters recently coined as “warehouse scale computers”. (WSCs) [12] rises. As such, it becomes ..... request latency (focusing