Reproducible graphics with R and ggplot2 Baptiste Auguié Victoria University of Wellington

May 17, 2012

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

1 / 21

: A loyal guide in data analysis

“[A software] to turn ideas into software, quickly and faithfully”

Free software environment for statistical computing and graphics Born in New Zealand in the 1990s More than 5000 packages freely available

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

2 / 21

Importing tabular data

Using the read.table() function, d = read.table(file = "../data/spectra/78_radius_0-08_medium_1-33_coating_152_thickness_0-01_gold_Mie_2012-04-21.txt", header = TRUE) 'data.frame': 801 obs. of 4 variables: $ wavelength: num 0.4 0.401 0.402 0.403 0.404 0.405 0.406 0.407 0.408 0.409 ... $ extinction: num 0.0662 0.0662 0.0662 0.0663 0.0663 ... $ scattering: num 0.0308 0.0308 0.0308 0.0308 0.0308 ... $ absorption: num 0.0354 0.0354 0.0354 0.0355 0.0355 ...

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

3 / 21

Basic plot Using the plot() function,

cross−section 0.02 0.06 0.10

plot(d$wavelength, d$extinction, type = "l", lty = 1, xlab = "wavelength", ylab = "cross-section") lines(d$wavelength, d$absorption, lty = 2) lines(d$wavelength, d$scattering, lty = 3)

0.4

0.6

0.8 wavelength .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

1.0 .

1.2 .

.

.

May 17, 2012

.

4 / 21

Improved plot

0.08

σext σsca σabs

0.04 0.00

σ µm2

0.12

matplot(d$wavelength, d[ ,2:4], type = "l", col = 1:3, ylim = c(0, 0.12), yaxs = "i", lty = 1, xlab = expression(wavelength/mu*m), ylab = expression(sigma/mu*m^2)) legend("topright", expression(sigma[ext], sigma[sca], sigma[abs]), col = 1:3, lty = 1, bg = "grey95", bty = "o", box.col = NA, inset = 0.05)

0.4

0.6

0.8 1.0 wavelength µm .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

1.2 .

.

.

May 17, 2012

.

5 / 21

Multiple plots

par(mfrow = c(9, 10), mar = c(0, 0, 0, 0), mgp=c(0, 0, 0)) for (file in lf){ # loop over all data files d = read.table(file, header = TRUE) matplot(d$wavelength, d[ ,2:4], type = "l", lty = 1, ylim = c(0, 1.1*max(d[ , -1])), yaxs = "i", xlab = NULL, xaxt = "n", ylab = NULL, yaxt = "n", frame.plot = F) box(lwd = 0.2, col="grey")

}

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

6 / 21

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4] d[, 2:4]

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength d$wavelength

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

7 / 21

Histograms

0 10

20

40 Major

50

60

70

Density 0.10

Histogram of Minor

10

20

.

Baptiste Auguié (VUW)

30

0.00

d = read.table("../data/rods.txt", header = TRUE) par(mfrow = c(2, 1), # split mar = c(4, 4, 1.5, 0.5), mgp = c(2, 1, 0)) ## top histogram with(d, hist(Major, xlim = c(10, 70))) ## bottom histogram with(d, hist(Minor, prob=TRUE, xlim = c(10, 70))) ## add density estimate with(d, lines(density(Minor), col = "red"))

Frequency 400 800

Histogram of Major

Reproducible graphics with R and ggplot2

30

.

40 Minor .

50

.

60

.

May 17, 2012

70

.

8 / 21

A natural and coherent language to describe graphics Point-and-click

A grammar of graphics

Yeah but, no but, yeah but, no but, :::::::::::::::::::::::::::: yeah but… I swear *** ***** **** :::::::::::::::::::::::: … but yeah. ::::::::::

. Use data d [x, y, z, t, . . . ] 2. Plot lines of variable y vs x 1

. Colour lines following variable z 4. Split into multiple panels 3

according to variable t .5 Add a layer with new data 6. …

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

9 / 21

ggplot2:

A Grammar of Graphics Mapping: data ↔ aesthetic Layers Scales Coordinates + (Stats, …) library(ggplot2) ggplot(data = ..., mapping = ...) + layer(geom = "point", stat = "identity") + layer(geom = "point", data = ...) + facet_grid(... ˜ ...) + coord_polar( ) + scale_colour(...) + opts(...)

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

10 / 21

## Reshape to long format m = melt(d, meas=c("extinction", "scattering", "absorption")) head(m, 3)

variable 0.10

extinction scattering absorption

1 2 3

wavelength variable value 0.400 extinction 0.06616 0.401 extinction 0.06620 0.402 extinction 0.06623

ggplot(m, aes(x = wavelength, y = value, colour = variable)) + geom_path() + opts(legend.position = c(0.8, 0.8), legend.direction = "vertical")

value

0.08

0.06

0.04

0.02

0.00 0.4

0.6

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

0.8

1.0

1.2

wavelength

.

.

.

.

May 17, 2012

.

11 / 21

Data exploration

head(Orange, 10) Tree 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 2 9 2 10 2

age circumference 118 30 484 58 664 87 1004 115 1231 120 1372 142 1582 145 118 33 484 69 664 111

p = ggplot(Orange, aes(x = age, y = circumference, colour = Tree)) + geom_point(aes(shape = Tree )) + layer(geom = "line", stat = "smooth", method = "lm") + opts(legend.position = "top", legend.direction = "horizontal")

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

12 / 21

p

p + facet_grid(Tree ˜ .)

Tree



3

1

5

2

4

Tree

250

50

● ●

500

1000

1500

500

age

1000

1500

age

.

Baptiste Auguié (VUW)





4

0



4

2



circumference



100



2

5

circumference

● ●









5

1

150

1

3

200

250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0 250 200 150 100 50 0

3



Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

13 / 21

Small multiples revisited 0

0.001

0.01

0.15 1

0.10 0.05

0.15 1.33

σ µm2

0.00

0.10 0.05 0.00 0.15

1.5

0.10 0.05 0.00

0.4 0.6 0.8 1.0 1.20.4 0.6 0.8 1.0 1.20.4 0.6 0.8 1.0 1.2

Wavelength µm

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

.

May 17, 2012

.

14 / 21

Coordinate transformations

parallel

Dipole orientation 102

ggplot(data = d) + geom_path(aes(x = x, y = y)) + scale_y_log10() + coord_polar() + annotate(...) + ...

101 10

perpendicular

0 −45

45

Air side

0

10−1

Au film

10−2 −90

−45

90

45

Glass side 0

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

15 / 21

DRY principle p + theme_article(10,



200

250 3 1

150

●● ● ●

100 ● ● ●

5 2 4

0 500 10001500

age

circumference

circumference

p + theme_presentation(16)

Tree

250

50

'serif')

200 150



● ●

100





50

● ●

0 500 1000 1500

age

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

16 / 21

Take home message Reproducible research, reproducible graphics

. Saving time: repeat the analysis instantly with a new data set 2. Readable by others: you can share scripts 1

. Safer: anyone can follow the analysis by reading lines of code 4. Visual aspect: better aesthetic choices (think LATEX vs word processors) 3

.

Reproduce this!

. library(knitr); knit2pdf("presentation.rnw") .

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

17 / 21

Getting started –– resources

. Get started, download R and RStudio (IDE) 2. ?ggplot : R’s help system 1

. Documentation pages http://had.co.nz/ggplot2/, wiki 4. An introduction to R 3

. R and ggplot2 mailing lists; Stack Overflow 6. Books: R graphics (Murrell), ggplot2 (Wickham) 5

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

18 / 21

.

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

19 / 21

Automation ## open a pdf file pdf(file = "all_plots.pdf", width = 10, height = 6) par(mar = c(4, 4, 1.5, 0.5), mgp = c(2, 1, 0)) for (file in lf){ # loop over all data files d = read.table(paste0("../data/spectra/", file), header = TRUE) matplot(d$wavelength, d[ ,2:4], type = "l", lty = 1:3, ylim = c(0, 1.1*max(d[ , -1])), yaxs = "i", xlab = expression(wavelength/mu*m), ylab = expression(sigma/mu*m^2)) legend("topright", expression(sigma[ext], sigma[sca], sigma[abs]), lty = 1:3, bg = "grey95", bty = "o", box.col = NA, inset = 0.05) ## extract parameters from filename as plot title title(gsub("_|\\.txt", " ", file))

} ## close pdf file dev.off() .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

20 / 21

model = function(p, x) p[3] / pi * p[2] / ((x - p[1])^2 + p[2]^2) + p[4] objective = function(p, d=NULL, x=d$wavenumber, y=d$intensity, ...){ predicted <- model(p, x, ...) sum((predicted - y)^2)

} d = read.table("../data/peaks.txt", head=TRUE)

Intensity 100 105 110 115 120 125 130

Fitting data

400

500 600 700 Wavelength nm

800

guess = c(500, 5, 500, 100) fit = optim(guess, objective, d = subset(d, wavenumber < 550 & wavenumber > 450)) .

Baptiste Auguié (VUW)

Reproducible graphics with R and ggplot2

.

.

.

.

May 17, 2012

.

21 / 21

Reproducible graphics with R and ggplot2 -

May 17, 2012 - plot(d$wavelength, d$extinction, type = "l", lty = 1, xlab = "wavelength", ylab = "cross-section") lines(d$wavelength, d$absorption, lty = 2) lines(d$wavelength, d$scattering, lty = 3). 0.4. 0.6. 0.8. 1.0. 1.2. 0.02. 0.06. 0.10 wavelength cross−section. Baptiste Auguié (VUW). Reproducible graphics with R and ...

2MB Sizes 3 Downloads 159 Views

Recommend Documents

Data Visualization Using R & ggplot2 - GitHub Pages
Feb 22, 2015 - 3. 1.4 .2 setosa. # Note the use of the . function to allow Species to be used ..... Themes are a great way to define custom plots. ... Then just call your function to generate a plot. ... ggsave(file = "/path/to/figure/filename.pdf") 

Making Computations and Publications Reproducible with VisTrails
6/8/12 10:41 AM ... through a Web-based interface, and upgrade the ..... the host and database name: .... best practices aren't necessarily formalized. By pub-.

R Graphics Output - GitHub
1.0. 1.5. −1.0. −0.5. 0.0. 0.5. 1.0. Significant features for level k3 versus other relative covariance(feature,t1) correlation(feature. ,t1) k3 other. M201.8017T217. M201.8017T476. M205.8387T251. M205.8398T264. M207.9308T206. M207.9308T311. M212

R Graphics Output - GitHub
Page 1. 0.00. 0.25. 0.50. 0.75. 1.00. Den−Dist−Pop. Index. Density−Distance−Population Index. By County, Compared to Median.