Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

Programming, not Theory Not focus on theory. No theorems. No models. No algorithms. Focus on users' programming of parallel systems. Users write code. Not system developers. Users write tests.

Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Map Reduction

MapReduce: C++ Library

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Sawzall: Simpler Map Reductions

Sawzall Mental Model: One Record

Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code: saw --program=query-location.szl --input=… --output=…

Saw + Sawzall Use Used since 2003 by 100s of Googlers in 1000s of programs to compute a lot of data that is directly or indirectly externally facing.

Outline Map reductions and MapReduce Map reductions and Saw + Sawzall Structured Saw + Sawzall

Scaling Programs Code ecosystems support sharing tested code. + Sawzall function libraries have tests. – Programs shared by copying. – Typically untested.

Sawzall Testing Model: Map Reduction

Structured Pgms: Separate Concepts

Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" queries_per_degree: table sum[lat: int][lon: int] of int; log_record: QueryLogProto = input; loc: Location = locationinfo(log_record.ip); emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

Shell code: saw --program=query-location.szl --input=… --output=…

Structured Sample Program Compute the query number per latitude-longitude degree. Sawzall query-location.szl: proto "querylog.proto" map: function(log: QueryLogProto, reduce: function(int, int)) { loc: Location = locationinfo(log_record.ip);

reduce(loc.lat, loc.lon); } reduce: function(lat: int, lon: int) { queries_per_degree: table sum[lat: int][lon: int] of int; emit queries_per_degree[int(loc.lat)][int(loc.lon)] <- 1;

} log_record: QueryLogProto = input;

map(log_record, reduce);

Shell code: saw --program=query-location.szl --input=… --output=…

Structured Testing Model

Test Structured Programs Test map functions ... one record at a time ... using mocked reduce function. Advantages: No distributed I/O. Single processor only. Not test reduce functions or order enumeration.

Summary Sawzall eases writing map reductions. Structured Sawzall scales. Parallel system API should separate fundamental model concepts. Ex: map reduction = map + reduce + record enumeration ease writing test code.

Experiences Scaling Use of Google's Sawzall Jeffrey D. Oldham surname at company-name.com Google, Inc. 2011-03-13

References Sawzall Pike et al. Open-source implementation Wikipedia article MapReduce Dean and Ghemawat (2004, 2008) Wikipedia article

Experiences Scaling Sawzall 20110310 - Research at Google

Mar 13, 2011 - Google's Sawzall. Jeffrey D. Oldham surname at company-name.com ... QueryLogProto = input; loc: Location = locationinfo(log_record.ip);.

455KB Sizes 0 Downloads 128 Views

Recommend Documents

WSABIE: Scaling Up To Large Vocabulary ... - Research at Google
fast algorithm that fits on a laptop, at least at annotation time. ... ever previously reported (10 million training examples ...... IEEE Computer Society, 2008.

PRedictive Elastic ReSource Scaling for cloud ... - Research at Google
(1) deciding how much resource to allocate is non-trivial ... 6th IEEE/IFIP International Conference on Network and Service Management (CNSM 2010),.

Scaling Optical Interconnects in Datacenter ... - Research at Google
Fiber optic technologies play critical roles in datacenter operations. ... optical cables, such as Light Peak Modules [3], will soon ... fabric for future low-latency and energy proportional .... The Quantum dot (QD) laser provides an alternative.

Scaling Up All Pairs Similarity Search - Research at Google
collaborative filtering on data from sites such as Amazon or. NetFlix, the ... network, and computing pairs of similar queries among the 5 ...... Degree distribution of the Orkut social network. 100. 1000. 10000. 100000. 1e+006. 1e+007. 1. 10. 100.

Playable Experiences at AIIDE 2016 - GitHub
ebrates these efforts and emphasizes the development of polished experiences that ..... Conclusion. AIIDE is a meeting ground between entertainment software.

BeyondCorp - Research at Google
41, NO. 1 www.usenix.org. BeyondCorp. Design to Deployment at Google ... internal networks and external networks to be completely untrusted, and ... the Trust Inferer, Device Inventory Service, Access Control Engine, Access Policy, Gate-.

article - Research at Google
Jan 27, 2015 - free assemblies is theoretically possible.41 Though the trends show a marked .... loop of Tile A, and the polymerase extends the strand, unravelling the stem ..... Reif, J. Local Parallel Biomolecular Computation. In DNA-.

Contents - Research at Google
prediction, covering 2 well-known benchmark datasets and a real world wind ..... Wind provides a non-polluting renewable energy source, and it has been.

ausdm05 - Research at Google
Togaware, again hosting the website and the conference management system, ... 10:30 - 11:00 INCORPORATE DOMAIN KNOWLEDGE INTO SUPPORT VECTOR ...... strength of every objects oi against itself to locate a 'best fit' based on the.