Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

Programming, not Theory Not focus on theory. No theorems. No models. No algorithms. Focus on users' programming of parallel systems. Users write code. Not system developers. Users write tests.

Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Map Reduction

MapReduce: C++ Library

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Sawzall: Simpler Map Reductions

Sawzall Mental Model: One Record

Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code: saw --program=query-location.szl --input=… --output=…

Saw + Sawzall Use Used since 2003 by 100s of Googlers in 1000s of programs to compute a lot of data that is directly or indirectly externally facing.

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Scaling Programs Code ecosystems support sharing tested code. + Sawzall function libraries have tests. – Programs shared by copying. – Typically untested.

Sawzall Testing Model: Map Reduction

Structured Pgms: Separate Concepts

Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code: saw --program=query-location.szl --input=… --output=…

Structured Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" map: function(log: QueryLogProto, reduce: function(int, int)) { loc: Location = locationinfo(log_record.ip);

reduce(loc.lat, loc.lon); } reduce: function(lat: int, lon: int) { queries_per_degree: table sum[lat: int][lon: int] of int; emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

} log_record: QueryLogProto = input;

map(log_record, reduce);

Shell code: saw --program=query-location.szl --input=… --output=…

Structured Testing Model

Test Structured Programs Test map functions ... one record at a time ... using mocked reduce function. Advantages: No distributed I/O. Single processor only. Not test reduce functions or order enumeration.

Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

References Sawzall Pike et al. Open-source implementation Wikipedia article MapReduce Dean and Ghemawat (2004, 2008) Wikipedia article

Experiences Scaling Sawzall 20110310 - Research at Google

Mar 13, 2011 - Google's Sawzall. Jeffrey D. Oldham surname at company-name.com ... QueryLogProto = input; loc: Location = locationinfo(log_record.ip);.

455KB Sizes 0 Downloads 366 Views

Recommend Documents

Experiences with MapReduce, an Abstraction ... - Research at Google
Example: 20+ billion web pages x 20KB = 400+ terabytes ... ~four months to read the web. • ~1,000 hard drives just to .... information (e.g. # of pages on host, important terms on host). – per-host ... 23. MapReduce status: MR_Indexer-beta6-large

Searching for Build Debt: Experiences ... - Research at Google
include additional metadata about building the project [3],. [4]. BUILD files are for the most part manually maintained, and this lack of automation can be a ...

WSABIE: Scaling Up To Large Vocabulary ... - Research at Google
fast algorithm that fits on a laptop, at least at annotation time. ... ever previously reported (10 million training examples ...... IEEE Computer Society, 2008.

Scaling Optical Interconnects in Datacenter ... - Research at Google
Fiber optic technologies play critical roles in datacenter operations. ... optical cables, such as Light Peak Modules [3], will soon ... fabric for future low-latency and energy proportional .... The Quantum dot (QD) laser provides an alternative.

Scaling Up All Pairs Similarity Search - Research at Google
collaborative filtering on data from sites such as Amazon or. NetFlix, the ... network, and computing pairs of similar queries among the 5 ...... Degree distribution of the Orkut social network. 100. 1000. 10000. 100000. 1e+006. 1e+007. 1. 10. 100.

PRedictive Elastic ReSource Scaling for cloud ... - Research at Google
(1) deciding how much resource to allocate is non-trivial ... 6th IEEE/IFIP International Conference on Network and Service Management (CNSM 2010),.

Playable Experiences at AIIDE 2016 - GitHub
ebrates these efforts and emphasizes the development of polished experiences that ..... Conclusion. AIIDE is a meeting ground between entertainment software.

Mathematics at - Research at Google
Index. 1. How Google started. 2. PageRank. 3. Gallery of Mathematics. 4. Questions ... http://www.google.es/intl/es/about/corporate/company/history.html. ○.

Faucet - Research at Google
infrastructure, allowing new network services and bug fixes to be rapidly and safely .... as shown in figure 1, realizing the benefits of SDN in that network without ...

BeyondCorp - Research at Google
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.

VP8 - Research at Google
coding and parallel processing friendly data partitioning; section 8 .... 4. REFERENCE FRAMES. VP8 uses three types of reference frames for inter prediction: ...

JSWhiz - Research at Google
Feb 27, 2013 - and delete memory allocation API requiring matching calls. This situation is further ... process to find memory leaks in Section 3. In this section we ... bile devices, such as Chromebooks or mobile tablets, which typically have less .

Yiddish - Research at Google
translation system for these language pairs, although online dictionaries exist. ..... http://www.unesco.org/culture/ich/index.php?pg=00206. Haifeng Wang, Hua ...

traits.js - Research at Google
on the first page. To copy otherwise, to republish, to post on servers or to redistribute ..... quite pleasant to use as a library without dedicated syntax. Nevertheless ...

sysadmin - Research at Google
On-call/pager response is critical to the immediate health of the service, and ... Resolving each on-call incident takes between minutes ..... The conference has.

Introduction - Research at Google
Although most state-of-the-art approaches to speech recognition are based on the use of. HMMs and .... Figure 1.1 Illustration of the notion of margin. additional ...

References - Research at Google
A. Blum and J. Hartline. Near-Optimal Online Auctions. ... Sponsored search auctions via machine learning. ... Envy-Free Auction for Digital Goods. In Proc. of 4th ...

BeyondCorp - Research at Google
Dec 6, 2014 - Rather, one should assume that an internal network is as fraught with danger as .... service-level authorization to enterprise applications on a.

Browse - Research at Google
tion rates, including website popularity (top web- .... Several of the Internet's most popular web- sites .... can't capture search, e-mail, or social media when they ..... 10%. N/A. Table 2: HTTPS support among each set of websites, February 2017.

Continuous Pipelines at Google - Research at Google
May 12, 2015 - Origin of the Pipeline Design Pattern. Initial Effect of Big Data on the Simple Pipeline Pattern. Challenges to the Periodic Pipeline Pattern.

Accuracy at the Top - Research at Google
We define an algorithm optimizing a convex surrogate of the ... as search engines or recommendation systems, since most users of these systems browse or ...