CSV, Rinse, Repeat

Javascript Data Exploration Mathieu Jacomy Sciences Po Paris médialab Equipex DIME-SHS

Paris Sciences Po médialab

A hybrid laboratory for social sciences: - Researchers - Engineers - Designers

I’m a sort of « social data scientist » I just received a CSV

Let me grab my laptop

Exploring data ...is not about statistical metrics The greatest value of a picture is when it forces us to notice what we never expected to see. — John W. Tukey Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. —John W. Tukey

What is the right question?

Exploring data ...is about iterating facets of the data

The chain of data mining by Ben Fry

CSV Problem #1: Painful coding In a spreadsheet environment, eg. Libre Office, Excel, Open Refine... coding features are designed for non-coders. Consequence for non-coders: You invest your time in nonstandard, broken languages (you lose your time + it is still complicated) Consequence for coders: Editing and filtering the data is painful (ranging from inefficient to WTF)

Painful coding, simple example Goal: retrieve years in movie titles

Issue: the year is coded within the title

Simple parsing with Libre Office 1. Find the year position

Look, it’s coded in French! 2. Retrieve the string

In this situation the GUI is a problem, not a solution

Simple parsing with Javascript

A real coding language is more efficient ...if you can bring your CSV in the right coding environment

CSV Problem #2: The filter/vis gap

Code (Filter + Edit)

GAP

Visualize

Simple filtering in Tableau Public

Tableau tutorial by Anne Stevens http://stevensanne.com/tableau-tutorial3-filters-and-parameters/

Edit the formula and/or the settings, apply and close

The modal hides the visualization

Formula + settings, in a form, inside a tab, of a modal that you open by a drag-and-drop (true story)

Reopen the modal, select the tab, select the field

Iterating is painful

Visualization

Note: in Libre Office or Excel, it’s even worse

CSV, Rinse Repeat is about shortening the gap between code and visualization to foster iterative exploration Real Javascript coding

Simple, ready-made visualizations

CSV, Rinse, Repeat ...is a proposition to solve these problems during exploration A simple accessible tool - Single web page A Javascript coding environment - A standard coding panel - CSV Import + Export - Basic preview A layout designed to get rid of the filter/vis gap - Input + code on the left - Output + visualization on the right

Demo time!

http://tools.medialab.sciences-po.fr/csv-rinse-repeat/

Input preview

Output preview

Vis cards

Coding panel

You can download the modified CSV

You can resize or hide the panels to fit your needs You can add & remove vis cards

1. Preview data

3. Monitor result

2. FIlter & Edit

4. Facets on demand

1. Preview data

3. Monitor result

2. FIlter & Edit

The goal: Streamlining iterations

4. Facets on demand

Example: Twitter data about Shakespeare

Not much to see at upload

Example: Twitter data about Shakespeare

Let’s filter nothing and take a look at different facets

Example: Twitter data about Shakespeare

Add visualization: which one?

Example: Twitter data about Shakespeare

We chose «Daily Volume». Data from which column?

Example: Twitter data about Shakespeare

Column: «created_at» ...the vis is complete

Example: Twitter data about Shakespeare

Why this peak?

...Back to filtering

Example: Twitter data about Shakespeare

We parse dates and filter after the 2016-04-20

Example: Twitter data about Shakespeare

And we add a vis card to look at the content of the tweets

Example: Twitter data about Shakespeare Before the peak

During the peak

By iterating, we can compare

Example: Twitter data about Shakespeare Before the peak

During the peak

Its the anniversary of Shakespeare’s death

Example: Twitter data about Shakespeare

The output monitoring helps validating the hypothesis

Wrap up Exploring data requires iterating That is why CSV, Rinse, Repeat is about constantly rewriting filters The visualizations are too basic, the preview is not comfortable... That’s fine! You don’t really need more during exploration. Our design aims at being «KISS»: Keep It Simple Stupid Exploration is over when you have hypotheses. At this point, just switch to a more analytical environment: Libre Office, Tableau, R, Stata...

Thank you for your attention [email protected]

http://medialab.sciences-po.fr

Javascript Data Exploration - GitHub

Apr 20, 2016 - Designers. I'm a sort of. « social data scientist ». Paris. Sciences Po médialab. I just received a CSV. Let me grab my laptop ... Page 9 ...

4MB Sizes 5 Downloads 82 Views

Recommend Documents

HW 2: Chapter 1. Data Exploration - GitHub
OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits ... create the scatterplot here. You can use ... Go to the Spurious Correlations website: http://tylervigen.com/discover and use the drop down menu to.

JavaScript Crash Course - GitHub
Nov 10, 2016 - 1Info on this slide from: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures ..... Google (you are smart, figure it out).

Modern JavaScript and PhoneGap - GitHub
ES3 (1999). iOS 3. By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=49508224 ... Supported by all modern mobile web views. 1. iOS 6+, IE .... Arrow function returns. Single line arrow functions use implicit return: [1, 2, 3

JavaScript Cheat Sheet by DaveChild - Cheatography.com - GitHub
Start of string. $. End of string . Any single character. (a|b) a or b. (...) ... Page 1 of 2. Sponsored by Readability-Score.com. Measure your website readability!

Emscripten: An LLVM-to-JavaScript Compiler - GitHub
Apr 6, 2011 - written in languages other than JavaScript on the web: (1). Compile code ... pile that into JavaScript using Emscripten, or (2) Compile a ... detail the methods used in Emscripten to deal with those ..... All the tests were run on a Len

Event-Driven Concurrency in JavaScript - GitHub
24 l. Figure 2.6: Message Passing in Go. When, for example, Turnstile thread sends a value over counter ...... Is JavaScript faster than C? http://onlinevillage.blogspot. ... //people.mozilla.com/~dmandelin/KnowYourEngines_Velocity2011.pdf.

Parallelize JavaScript Computations with Ease - GitHub
It abstracts the messaging-based programming model for a seamless .... difference of Threads.js is its messaging-based programming model that is ...... upload/208631.pdf (visited on 11/20/2016). [16] Microsoft. ... license.php. [26] npm Inc.

Open Data Canvas - GitHub
Top need for accessing data online. What data is most needed? Solution. How would you solve this problem? ... How big is the universe of users? Format/Use.

Tabloid data set - GitHub
The Predictive Analytics team builds a model for the probability the customer responds given ... 3 Summary statistics .... Predictions are stored for later analysis.

data tables - GitHub
fwrite - parallel file writer. SOURCE: http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ ... SOURCE: https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf length.

Data Science - GitHub
Exploratory Data Analysis ... The Data Science Specialization covers the concepts and tools for ... a degree or official status at the Johns Hopkins University.

RN-171 Data Sheet - GitHub
Jan 27, 2012 - 171 is perfect for mobile wireless applications such as asset monitoring ... development of your application. ... sensor data to a web server.

Prosper Loan Data Analysis - GitHub
not visible in the HTML/PDF export for the simlicity but the codes can be reviewed from the RMD file. The dataset is ... Prosper rating for borrowers in numbers ..... Household. Expenses. Personal. Loan. Auto. Business. Home. Improvement. Other ... 1