Graph:

composable production systems in Clojure Jason Wolfe (@w01fe) Strange Loop ’12

Motivation •



Interesting software has: many components complex web of dependencies

• •

Developers want: simple, factored code easy testability tools for monitoring and debugging

• • •

Graph •

Graph is a simple, declarative way to express system composition



A Graph is just a map of functions that can depend on previous outputs



Graphs are easy to create, reason about, test, and build upon

{:x (fnk [i] ...) :y (fnk [j x] ...) :z (fnk [x y] ...)}

i

j

x

y z

input

output

{:i 1 {:x 2 :j 2} :y 5 :z 12}

Outline • Prismatic • Design Goals • Graph: specs and compilation • Applications • newsfeed generation • production services

{:x (fnk [i] ...) :y (fnk [j x] ...) :z (fnk [x y] ...)}

response response

Prismatic •

Personalized, interest-based newsfeeds



Build crawlers, topic models, graph analysis, story clustering, ...

• •

Backend 99.9% Clojure Personalized ranked feeds in real-time (~200ms) getprismatic.com

Prismatic’s production API service • • • •

>100 components storage systems caches & indices ranking algorithms

• • •

ec2-keys

doc index index snapshots

Coordinate in intricate dance to serve feeds fast Relentlessly refactored

feed-builder

top news

handlers

SQL

server

env

log store

observer update index

Still dozens of top-level components in complex dependency network

pubsub

service-name logger service-info

Parameters

Remote Storage

Caches, Indices

Fns, Other

Thread Pools

The feed builder user



20+ steps from query to personalized ranking, 20+ parameters



Not a simple pipeline

response

query

The feed builder user



20+ steps from query to personalized ranking, 20+ parameters



Not a simple pipeline



> 10 feed types w/ slightly different steps, configurations

response response

query

The feed builder user



20+ steps from query to personalized ranking, 20+ parameters



Not a simple pipeline



> 10 feed types w/ slightly different steps, configurations



Support for early stopping response response

query

Theme: complexity of composition •

Previous implementations: defns with huge lets



Unwieldy for large systems with complex or polymorphic dependencies



Hard to test, debug, and monitor response response

The ‘monster let’ •

Tens of parameters, not compositional



Mocks/polymorphic flow difficult



Ad hoc monitoring & shutdown logic per item



Core issue: structure of (de)composition is locked up in an opaque function

(defn start [{:keys [a,z]}] (let [s1 (store a ...) s2 (store b ...) db (sql-db c) t2 (cron s2 db...) ... srv (server ...)] (fn shutdown [] (.stop srv) ... (.flush s1))))

Prismatic software engineering philosophy • Fine-grained, composable abstractions (FCA) Libraries >> Frameworks

• Strive for simplicity, work with the language • Graph is a FCA for composition

Goal: declarative • Declarative specifications fix ‘monster let’ • Explicitly list components, dependencies • Enable abstractions over components, reasoning about composition

• Not new: Pregel, Dryad, Storm, ...

Goal: simple • •

Distill this idea to its simplest, most idiomatic expression a Graph spec is just a (Clojure) map no XML files or interface hell

• •

Graphs are ordinary data manipulate them ‘for free’ --> unexpected applications

• •

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. - Alan Perlis

From ‘let’ to Graph (defn stats [{:keys [xs]}] (let [n (count xs) m (/ (sum xs) n) m2 (/ (sum sq xs) n) v (- m2 (* m m))] {:n n :m m :m2 m2 :v v}))

xs n m

m2

v

{:n :m :m2 :v

(fn k (fn k (fn k (fn k

[xs] [xs n] [xs n] [m m2]

(count xs)) (/ (sum xs) n)) (/ (sum sq xs) n)) (- m2 (* m m)))}

Bring on the fnk • •

fnk = keyword function Similar to {:keys []} destructuring

• • • • •

nicer opt. arg. support asserts that keys exist metadata about args

Quite useful in itself Only macros in Graph

(defnk foo [x y [s 1]] (+ x (* y s))) (= 8 (foo {:x 2 :y 3 :s 2})) (= 5 (foo {:x 2 :y 3})) (thrown? Ex. (foo {:x 2})) (= (meta foo) {:req-ks #{:x :y}} :opt-ks #{:s})

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:n

(fnk [xs] (count xs)) :m (fnk [xs n] (/ (sum xs) n)) :m2 (fnk [xs n] (/ (sum sq xs) n)) :v (fnk [m m2] (- m2 (* m m)))}

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:xs [1 2 3 6]} {:n

(fnk [xs] (count xs)) :m (fnk [xs n] (/ (sum xs) n)) :m2 (fnk [xs n] (/ (sum sq xs) n)) :v (fnk [m m2] (- m2 (* m m)))}

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:xs [1 2 3 6]} {:n

4 (fnk [xs] (count xs)) :m (fnk [xs n] (/ (sum xs) n)) :m2 (fnk [xs n] (/ (sum sq xs) n)) :v (fnk [m m2] (- m2 (* m m)))}

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:xs [1 2 3 6]} {:n

4 (fnk [xs] (count xs)) :m 3 (fnk [xs n] (/ (sum xs) n)) :m2 (fnk [xs n] (/ (sum sq xs) n)) :v (fnk [m m2] (- m2 (* m m)))}

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:xs [1 2 3 6]} {:n

4 (fnk [xs] (count xs)) :m 3 (fnk [xs n] (/ (sum xs) n)) :m2 12.5 (fnk [xs n] (/ (sum sq xs) n)) :v (fnk [m m2] (- m2 (* m m)))}

A Graph Specification • •

A Graph is just a map from keywords to fnks Required keys of each fnk specify graph relationships



Entire graph specifies a fnk to map of results xs n m

m2

v

{:xs [1 2 3 6]} {:n

4 (fnk [xs] (count xs)) :m 3 (fnk [xs n] (/ (sum xs) n)) :m2 12.5 (fnk [xs n] (/ (sum sq xs) n)) :v 3.5 (fnk [m m2] (- m2 (* m m)))}

Compiling Graphs •

Compile graph to fnk that returns map of outputs

(def g {:n :m :m2 :v

(fnk (fnk (fnk (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) (= (stats {:xs [1 2 3 6]}) {:n 4 :m 3 :m2 12.5 :v 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked



(def g {:n :m :m2 :v

(fnk (fnk (fnk (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) (thrown? (= (stats {:xs [1 2 3 6]}) {:n (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked

• •

can return lazy map

(def g {:n :m :m2 :v

(fnk (fnk (fnk (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile

(thrown? (= (stats (:m (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked

• • •

can return lazy map can auto-parallelize

(def g {:n :m :m2 :v

(fnk (fnk (fnk (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile (par-compile g))

(thrown? (= (stats (:m (:v (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) 3.5) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked

• • •

can return lazy map can auto-parallelize

(def g {:n :m :m2 :v

(fnk 2 (fnk (fnk (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile (par-compile g))

(thrown? (= (stats (:m (:v (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) 3.5) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked

• • •

can return lazy map can auto-parallelize

(def g {:n :m :m2 :v

(fnk 2 (fnk 3 (fnk 13 (fnk

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile (par-compile g))

(thrown? (= (stats (:m (:v (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) 3.5) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

Compile graph to fnk that returns map of outputs error checked

• • •

can return lazy map can auto-parallelize

(def g {:n :m :m2 :v

(fnk 2 (fnk 3 (fnk 13 (fnk 4

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile (par-compile g))

(thrown? (= (stats (:m (:v (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) 3.5) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Compiling Graphs •

• •

Compile graph to fnk that returns map of outputs error checked

• • •

can return lazy map can auto-parallelize

With more tooling, also compile graphs to production services Could compile to crossmachine topologies, ...

(def g {:n :m :m2 :v

(fnk 2 (fnk 3 (fnk 13 (fnk 4

[xs] [xs n] [xs n] [m m2]

...) ...) ...) ...)})

(def stats (compile g)) g)) (lazy-compile (par-compile g))

(thrown? (= (stats (:m (:v (stats {:xs{:xs [1 2[1 3 5]})) 6]}) {:n 3) 3.5) (Ex.4“missing :m 3:xs”) (stats :m2 12.5 {:x:v 1})) 3.5)

Before: feed builder • •

Real-time personally ranked feeds 100-line fn expressed core composition logic, ~20 params

• •

several nested lets, escape hatches

Component polymorphism (10 flavors of feeds) kludge of cases ball of multimethods protocols + hacks

• • •

response response

Feed builder in Graph • •

Default parameters Graph with ‘holes’ captures shared logic

xx y

response

(def default-params {:alpha 0.7 ... :phasers :stun})

(def partial-graph {:query (fnk ...) ... :y (fnk [a x] ..) ... :resp (fnk ...)})

Feed builder in Graph • •

Each feed type specifies updated parameters missing/new graph nodes

• •

To make feed fn, just merge in updates compile resulting graph

• •

(def default-params ..) (def partial-graph ..) (def topic-feed (compile-feed-fn {:alpha 0.2} {:x (fnk ...) :q (fnk ...)}))

(defn compile-feed-fn [params nodes] (let [p (merge default-params params) g (compile (merge partial-graph nodes))] (fn feed [req] (g (merge p req)))))

After: feed builder •

Simpler, cleaner code



Polymorphism is trivial

(def topic-feed (compile-feed-fn {:alpha 0.2} {:x (fnk ...) :q (fnk ...)})) (def home-feed (compile-feed-fn {:alpha 0.4} {:x (fnk ...) :r (fnk ...) :s (fnk ...)}))

After: feed builder •

Simpler, cleaner code



Polymorphism is trivial



Early stopping for free via lazy compilation

tt

response

After: feed builder •

Simpler, cleaner code



Polymorphism is trivial



Early stopping for free via lazy compilation

tt

(let [h (home-feed req)] (:tt h)) response

After: feed builder •

Simpler, cleaner code



Polymorphism is trivial



v

Early stopping for free via lazy compilation

tt

(let [h (home-feed req)] [(:tt h) (:v h)]) response

Also: easy to analyze •

p

Detect mis-wirings at graph compile time positional constructor

• •

p

p

N

p

N N

p N

N

p p

N

p

p

N

p N

N

p

p

N

p

p

Avoid wrong # of args errors, arg ordering bugs

p

N N

p

N

N

p

N p

p N

N

p

p

N

p



Visualize graphs in 5 loc

p

p

p

(defn edges [graph] (for [[k f] graph :let [{:keys [req-ks opt-ks]} (meta f)] parent (concat req-ks opt-ks)] [parent k])) p

p

N

p

N

Also: easy to monitor •

Add monitoring and error reporting by mapping over fnks



Since a Graph is a Map, can just use map-vals

node n avg ms errors :fetch 2500 1.5 0 :rank 1001 150.0 1 :client 1000 70.0 0

(defn observe-graph [g] (into {} (for [[k f] g] [k (with-meta (fn [m] (let [v (f m)] (print k m v) v)) (meta f))])))

Example 2: production API service (def api-service (service {:service-name “api” :backend-port 42424 :server-threads 100} {:store1 (instance store {:type :s3 ...}) :memo (fnk [store1] {:resource ...}) ... :api-server (...)}))

Service definitions • • •

(def api-service Service definition = (service parameter map + {:service-name “api” resource graph :backend-port 42424 :server-threads 100} Crane reads params for {:store1 (instance store provisioning, deployment {:type :s3 ...}) :memo (fnk [store1] Graph = service code {:resource ...}) parameters are args ... cron jobs, handlers at :api-server (...)})) leaves

• •

• •

Service definitions • • •

(def api-service Service definition = (service parameter map + {:service-name “api” resource graph :backend-port 42424 :server-threads 100} Crane reads params for {:store1 (instance store provisioning, deployment {:type :s3 ...}) :memo (fnk [store1] Graph = service code {:resource ...}) parameters are args ... cron jobs, handlers at :api-server (...)})) leaves

• •

ec2-keys

doc index

index snapshots

feed-builder

top news

handlers

SQL

server

env

log store

• •

observer

update index

pubsub

service-name

logger

service-info

Parameters

Remote Storage

Caches, Indices

Fns, Other

Thread Pools

Service built-ins parameters



Parameters and graph nodes available by convention



Interface with deployment, other services, dashboard



Smartly reconfigure with env -- test/staging/prod

{:env :prod :instance-id “i-123abc” :ec2-keys ... }

resources {:nameserver :observer :pubsub

... ... ...

}

Nodes build Resources •



Resource = component e.g., database, cache, fn Plus metadata for shutdown, handlers, ... Represent as a map

• • •

Library of resources that work with builtins data stores processing queues recurring tasks ...

• • • •

(defnk refreshing-atom [f period] (let [a (atom (f)) e (Exec/newExec)] (.schedAtFixedRate e #(reset! a (f)) period) {:res a :shutdown #(.sd e)}))

Starting and Stopping •



Transform resource graph to ordinary graph



map over leaves, pull out :resource



assoc new :shutdown key

Run graph to start service, get clean shutdown hook

(defn start-service [spec] ((->> (:graph spec) resource-transform compile) (:parameters spec))) (def api (start-service api-service)) ((:shutdown api))

Sub-Components ec2-keys

doc index index snapshots

feed-builder

top news

handlers

SQL

server

env

log store

observer update index

pubsub

service-name logger service-info

Parameters

Remote Storage

Caches, Indices

Fns, Other

Thread Pools

Sub-Components ec2-keys

doc index index snapshots

feed-builder

top news

handlers

SQL

server

env

log store

observer update index

pubsub

service-name logger service-info

Parameters

Remote Storage

Caches, Indices

Fns, Other

Thread Pools

Sub-Components ec2-keys

doc index index snapshots

feed-builder

top news

handlers

SQL

server

env

log store

observer update index

pubsub

service-name logger service-info

Parameters

Remote Storage

Caches, Indices

Fns, Other

Thread Pools

Sub-Components • • •

Nodes can themselves be Graphs just nested maps



Package components as sub-graphs Sub-graphs are transparent debugging monitoring imperfect abstractions

• • •

(def write-back-cache {:store (instance store ...) :write-queue (instance queue ...) :periodic-prune (instance task ...)})

Easy system testing •

Old xxx-line lets were impossible to test



With graph, just merge in mock node fnks



no elaborate mocks objects or redefs



automatic, safe shutdown

(deftest home-feed-systest (test-service (assoc api-service :doc-index (fnk [] {:res fake-idx}) :get-user (fnk [] {:res (constantly me)})) (is (= (titles (slurp url)) [“doc1” “doc2”]))))

Summary •

Graph = way express complex compositions

• • • • •

declaratively simply

Widely applicable Simpler code, better tooling Hope to open source soon (we’re hiring!)



response response

Jason Wolfe (@w01fe) Strange Loop '12 - GitHub

... debug, and monitor. Theme: complexity of composition response response ... unexpected applications. It is better to ..... other services, dashboard. • Smartly ...

840KB Sizes 8 Downloads 269 Views

Recommend Documents

Jason Adsit's Resume - GitHub
Policy and Planning (PLCYPLN) Network Services (NETWORK) Enterprise ... Government GG-0301-13 unique one-of-a-kind STO/CAB/SAP/SAR Degree DISA ...

Jason R. Parham - GitHub
Education. Rensselaer Polytechnic Institute. Troy, NY. D. P Y · C S · C V. Aug ... Master's Thesis: “Photographic Censusing of Zebra and Girafe in the Nairobi ...

Jason R. Parham - GitHub
2-101 Waters View Cr, Cohoes, NY 12047. (714) 814-5305 | [email protected] .... PHP / HTML / CSS / MySQL. JavaScript / jQuery / AJAX . /S Y/S Y. Sep.

Homework 12 - Magnetism - GitHub
region containing a constant magnetic field B = 2.6T aligned with the positive ... With what speed v did the particle enter the region containing the magnetic field?

AIFFD Chapter 12 - Bioenergetics - GitHub
The authors fit a power function to the maximum consumption versus weight variables for the 22.4 and ... The linear model for the 6.9 group is then fit with lm() using a formula of the form ..... PhD thesis, University of Maryland, College Park. 10.

lecture 12: distributional approximations - GitHub
We have data X, parameters θ and latent variables Z (which often are of the ... Suppose we want to determine MLE/MAP of p(X|θ) or p(θ|X) over q: .... We limit q(θ) to tractable distributions. • Entropies are hard to compute except for tractable

Queens Community District 12 - GitHub
COMMUNITY BOARD PERSPECTIVES. 1. Affordable housing. 2. Schools. 3. Street flooding. To learn more, please read Queens CD 12's · Statements of Community District Needs · and Community Board Budget Requests · for Fiscal Year 2018. A Snapshot of Key Co

Brooklyn Community District 12 - GitHub
Page 1. 41%. 23%. 5%. 7%. 4%. 2%. 2%. 7%. 6%. 1%. 2%.

Manhattan Community District 12 - GitHub
23%. Manhattan CD 12. LIMITED ENGLISH PROFICIENCY4 of residents 5 years or older have limited · English proficiency. Manhattan. 14%. 20%. NYC. 21%. Manhattan CD 12 of residents have incomes below the NYCgov poverty threshold. See the federal poverty

Loop-order-on-charge-25-05-12.pdf
(6) M/s Loop Telecom Limited (A-6);. (7) M/s Loop Mobile India Limited (A-7); and. (8) M/s Essar Teleholdings Limited (A-8);. Order: Reserved on: 11.05.2012.

Queens Community District 12 Basemap - GitHub
212 PL. 113 RD. 215 PL. 218 ST. 93 AV. 108. AV. 94 AV. FRANCIS LEWIS BLVD. 193 ST. 224 ST. 212 ST. 212 ST. 169 ST. 179 ST. 125 ST. 125 ST. 111 RD. 178 PL. 193 ST. 201 ST. JAMAICA. AV. JAM. AICA. AV. JAMAICA. AV. VANDERVEERST. 175 ST. 89 AV. 215 ST. 1

Brooklyn Community District 12 Basemap - GitHub
EAST 8 ST. ELM. AV. WEST 1 ST. EAST 10 ST. ERIK PL. EAST 9 ST. OLD NEW UTRECHT RD. 67 ST. 67 ST. 70 ST. 70 ST. 70 ST. REG. ENT. PL. 84 ST. 73 ST. BAY 8 ST. 58 ST. BAY 7 ST. 43 ST. 43 ST. 63 ST. EAST 3 ST. LEW. IS. PL. 54 ST. LOU. ISA. ST. WEST 10 ST.

Bronx Community District 12 Basemap - GitHub
DEMEYER ST. DELANOY AV. EAST 234 ST. MARION. AV. BRITTO. NST. ROMBOUTS AV. EAST 241 ST. EAST 243 ST. EAST 204 ST. PELHAM BRIDGE. YOUNG AV. EAST 210 ST. VIREO AV. CHESTER ST. HOLLAND AV. EAST 215 ST. VARIAN AV. DARK ST. EAST 239 ST. TILLOTSON AV. WOOD

Jason Clarke.pdf
It's Jason Clarke,. but I do have my own band and. they are there with me. ey are. four very talented musicians. and they bring a lot to the ta- ble with their music as well. I'm. not a singer songwriter, and I'm. not a band at the same time. I. don'

Jason Resume
500 Almer Rd #207, Burlingame, Ca 94010. (650) 787-3722•[email protected]. Objective:Event Planner for a non-profit organization that enhances ...

LOOP Statements
Its roots are in the ADA language, as will be seen by the overall ..... Would the modeler rather write concurrent or sequential VHDL code? If the modeler wants to ...

Jason Bourne.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Jason Bourne.pdf. Jason Bourne.pdf. Open. Extract. Open with.