Glow Introduction A map reduce system for Golang

Architecture: Resource Management 1. 2.

Agents run on each server. Agents report resources to master via heartbeats.

Master

Agent

Agent

Agent

Agent

Architecture: Resource Allocation 1. 2.

Driver asks Master for agents with resources Driver asks assigned agents to run tasks

Driver

Agent

Agent

Master

Agent

Agent

Architecture: DAG execution 1. 2.

Driver divides tasks into DAG One group of tasks is assigned to one agent

Driver

Agent Tasks

Agent Tasks

Agent Tasks

Agent Tasks

Architecture: Data Flow 1. 2. 3.

Outputs of tasks are saved by local agents Driver remembers all data locations Inputs of next group of tasks are pulled from the specified locations

Driver

Agent Tasks Data

Agent Tasks Data

Agent

Agent

Tasks

Tasks

Data

Data

Architecture: DAG Optimization Data are streamed to disk only when necessary: 1. 2.

when one task produces data for 2 or more tasks when one task consumes data from 2 or more tasks

Internal: A lot of channels Data flow between tasks via Go channels, Read remote data via Go channels. Write results to Go channels.

Distributed Mode vs Standalone mode 1. Standalone mode is efficient without disk IO. ○

Parallelize tasks via goroutines.



No need for idiomatic but verbose sync/wait, etc

2. Use distributed mode when need to scale up.

Glow can use Channels as inputs You can pump data via go channel // declare a channel of any desired type, and feed to the flow var inputChan chan LogLine flow.New().Channel(inputChan).Map(...).Reduce(...).Run() // In an another goroutine, feed data into the channel: inputChan <- LogLine{ Text: …, Time: time.Now(), }

Glow can use Channels as outputs You can peek at any dataset via go channel // declare a channel with matching type, add to any dataset var outChan chan ReducedType flow.New().Map(...).Reduce(...).AddOutput( outChan ).Run() // In another goroutine, take the data out: for x := range outChan{ println(x.Value) }

Fluid functional programing without type casting You may notice Glow does not have any cumbersome type casting. Just the right amount of type information. Not too succinct, not too verbose. Any functions are normal function. No special casting at all. You can customize struct type for each dataset. flow.New().Source(func(out chan YourType){... }).Map(func (a YourType)(key YourKeyType, value YourValueType){ })

Supported Functions (may be already outdated): Map(), Filter() Reduce(), ReduceByKey(), LocalReduce(), LocalReduceByKey(), MergeReduce(), ReduceByUserDefinedKey() Join(), CoGroup(), GroupByKey(), LocalGroupByKey() Sort(), LocalSort(), MergeSorted() Source(), TextFile(), Slice(), Channel() Partition()

Functions: Map() Map(func(value) (key, value){}) Map(func(key, value) (key, value){}) CoGroup().Map(func(key, leftValues, rightValues){}) Join().Map(func(key, leftValue, rightValue){})

Functions: Map() with a channel output The channel should be the last input parameter. Map(func(input string, outChan chan someType){}) The channel collects Map() outputs. ● ● ●

Emit 1 or no data for one input: similar to Filter() Emit 1 value for one input: common Map() Emit multiple values for one input: same as FlatMap()

Functions: CoGroup() Group values from 2 sources by the same key a.CoGroup(b).Map(func(key KeyType, valuesFromA []TypeA, valuesFromB[]TypeB){ //…... })

Think it as a more generic form of Join()

Functions: Source() Source() is the generic form. TextFile() is just a convenient function.

Driver

Both execute on agents. So TextFile() should read from a local file already exists on agents. flow.New().TextFile(“/local/file”). Map(func(line string){...})

Agent

/local/file

Functions: Channel() Channel() is the generic form. Slice() is just a convenient function. Both execute on driver!

Driver send data from driver to agent via remote channel

textChan := make(chan string) flow.New().Channel(textChan).Map (func(line string){...})

Agent

Functions: AddOutput() AddOutput() connects a dataset with the driver via an output channel.

outChan := make(chan string)

Driver receive data from agent to driver via remote channel

flow.New().....AddOutout(outChan) Agent

Glow Introduction - GitHub

Architecture: Data Flow. 1. Outputs of tasks are saved by local agents. 2. Driver remembers all data locations. 3. Inputs of next group of tasks are pulled from the ...

182KB Sizes 23 Downloads 381 Views

Recommend Documents

introduction - GitHub
warehouse to assemble himself. Pain-staking and time-consuming... almost like building your own base container images. This piggy purchased high- quality ...

Introduction - GitHub
software to automate routine labor, understand speech or images, make diagnoses ..... Shaded boxes indicate components that are able to learn from data. 10 ...... is now used by many top technology companies including Google, Microsoft,.

Introduction - GitHub
data. There are many ways to learn functions, but one particularly elegant way is ... data helps to guard against over-fitting. .... Gaussian processes for big data.

Introduction - GitHub
For the case that your PDF viewer does not support this, there is a list of all the descriptions on ...... 10. Other Formats. 10.1. AMS-TEX. AMS-TEX2.0. A macro package provided by the American .... A TeX Live port for Android OS. Based on ...

Introduction - GitHub
them each year. In an aggregate travel demand model, this would be represented as 100/365.25 = 0.2737851 trucks per day. In the simulation by contrast, this is represented as ... based on the distance traveled (Table 3.3). 2FAF3 Freight Traffic Analy

Introduction to R - GitHub
Nov 30, 2015 - 6 Next steps ... equals, ==, for equality comparison. .... invoked with some number of positional arguments, which are always given, plus some ...

Introduction To DCA - GitHub
Maximum-Entropy Probability Model. Joint & Conditional Entropy. Joint & Conditional Entropy. • Joint Entropy: H(X,Y ). • Conditional Entropy: H(Y |X). H(X,Y ) ...

Introduction to Algorithms - GitHub
Each cut is free. The management of Serling ..... scalar multiplications to compute the 100 50 matrix product A2A3, plus another. 10 100 50 D 50,000 scalar ..... Optimal substructure varies across problem domains in two ways: 1. how many ...

Glow Gala.pdf
Page 1 of 1. ANNUAL GROUNDWOR K L a w r en c e EVENT. SPONSORSHIP OPPORTUNITIES. JOIN US! THURSDAY • NOVEMBER 16 • 2017 • 6PM-9:30PM. Groundwork Lawrence invites you to. celebrate another successful year of. our projects, partnerships and progr

Glow Energy - Settrade
Feb 23, 2018 - 55. 100. 64. 119. Operating Profit. 2,612. 4,061. 4,164. 3,287. 3,753. 4,092 ..... SPS. 35.9. 34.8. 33.5. 32.7. 33.0. EBITDA/Share. 12.1. 12.4. 11.2.

Introduction to Fluid Simulation - GitHub
upon the notes for a Siggraph course on Fluid Simulation[Bridson. 2007]. I also used .... “At each time step all the fluid properties are moved by the flow field u.

Introduction to phylogenetics using - GitHub
Oct 6, 2016 - 2.2 Building trees . ... Limitations: no model comparison (can't test for the 'best' tree, or the 'best' model of evolution); may be .... more efficient data reduction can be achieved using the bit-level coding of polymorphic sites ....

122COM: Introduction to C++ - GitHub
All students are expected to learn some C++. .... Going to be learning C++ (approved. ). ..... Computer Science - C++ provides direct memory access, allowing.

Introduction to NumPy arrays - GitHub
www.scipy-lectures.org. Python. Matplotlib. SciKits. Numpy. SciPy. IPython. IP[y]:. Cython. 2015 ..... numbers and determine the fraction of pairs which has ... origin as a function of time. 3. Plot the variance of the trajectories as a function of t

Introduction to NumPy arrays - GitHub
we want our code to run fast. ▷ we want support for linear algebra ... 7. 8 a[0:5] a[5:8]. ▷ if step=1. ▷ slice contains the elements start to stop-1 .... Indexing and slicing in higher dimensions. 0. 8. 16. 24. 32. 1. 9. 17. 25. 33. 2. 10. 18.

Introduction to Framework One - GitHub
Introduction to Framework One [email protected] ... Event Management, Logging, Caching, . ... Extend framework.cfc in your Application.cfc. 3. Done. (or in the ... All controllers are passed the argument rc containing the request.context, and all v

An Introduction to BigQuery - GitHub
The ISB-CGC platform includes an interactive Web App, over a Petabyte of TCGA data in Google Genomics and Cloud Storage, and tutorials and code ...

Glow Energy - Settrade
Nov 13, 2017 - Other Non-op Income. 266. 135. 57. (245). 1,089. (358). (1,011). 124. 335. 44. 98. (273). 534. 346. 192. Non-Operating Expenses. (860). (857).

Hatsune Miku - glow piano sheet.pdf
34. a. f. f. f. f. k. ‡ a. f. f. f. f. k. k. k. k. k. k. k. k. k. k. ‡. k. k. k. k. ‡. k. k. k. k. k. k. k. k. k. k. k. k. k. k. k. k. k. k. j. j n. k. k. k. k. kz. kz. k. k. ‡.

Salvia plant named 'Orchid Glow'
Aug 17, 2010 - Primary Examiner * June Hwu. (75) Inventor: M. Nevin Smith, Watsonville, CA (US). (74) Attorney) Agent] or Firm i Buchanan lngerson &. R. PC.

1 Introduction 2 Vector magnetic potential - GitHub
Sep 10, 2009 - ... describes the derivation of the approximate analytical beam models ...... of the source whose solution was used to correct the residual data.

Course: Introduction to Intelligent Transportation Systems - GitHub
... Introduction to Intelligent Transportation Systems. University of Tartu, Institute of Computer Science. Project: Automatic Plate Number. Recognition (APNR).

Introduction to REST and RestHUB - GitHub
2. RestHUBанаRESTful API for Oracle DB querying. 2.1. Overview. RestHub was designed .... For example we want to create a simple HTML + Javascript page.

A Beginner's Introduction to CoffeeKup - GitHub
the buffer, then calls the title function which adds it s own HTML to the buffer, and ... Now it is starting to look like real HTML you d find on an ugly web page. 2 ...