RLint: Reformatting R Code to Follow the Google Style Guide Andy Chen

Alex Blocker, ([email protected]), Andy Chu, Tim Hesterberg, Jeffrey D. Oldham, Caitlin Sadowski, Tom Zhang 2014-07-02

Summary RLint checks and reformats R code to follow R style guide. RLint used within Google. ● Eases checking correctness. ● Improves programmer productivity. Suggest experiment adopting consistent style guide + RLint. ● Does it improve your team's productivity?

Google Confidential and Proprietary

Style guides improve correctness and productivity Q: How do we produce correct R code when ● correctness is hard to check, ● R programmer time is expensive? Q: How do we maintain correct R code when ● modified by different programmers?

Google Confidential and Proprietary

Many R files modified by multiple users ~40% files modified by >1 Googler.

~50% directories contain code written by >1 Googler.

# Googlers modifying R file

% R files

# Googlers modifying R code in directory

% R directories

1

60.9%

1

52.0%

2-3

33.7%

2-3

36.6%

4-5

3.8%

4-5

7.0%

6+

1.5%

6+

4.4%

Google Confidential and Proprietary

Style guides improve correctness and productivity Q: How do we produce correct R code when ● correctness is hard to check, ● R programmer time is expensive? Q: How do we maintain correct R code when ● modified by different programmers? A: R style guide specifies uniform coding

Google Confidential and Proprietary

Style guides specify program structure Google R style guide specifies ● identifier naming: variable.name, FunctionName, kConstantName ● layout: indentation, spacing, ... ● comments ● function commenting ● ... Success criterion: Any programmer should be able to ● instantly understand structure of any code. Consistent style more important than "perfect" style. Google Confidential and Proprietary

RLint: Automate style checking and correction Goal: Minimize overhead of following style guide. RLint: Program warning style violations. ● Optionally produce style-conforming code. ● Key idea: Computers are cheap. Use within Google: ● All code violations flagged by code review tool. ● Violations must be corrected before code submission.

Google Confidential and Proprietary

Ex: Spacing Code:

foo <-function(x){ return (list ( a = sum(x[,1]), b = 1/3+1e-7*(x[1,1])) …

Warnings: ● Place spaces around all binary operators (=, +, -, <-, etc.). ● Place a space before left parenthesis, except in a function call. Corrected:

foo <- function(x) { return(list( a = sum(x[, 1]), b = 1/3 + 1e-7 * (x[1, 1]) ... Google Confidential and Proprietary

Ex: Indentation Code if (x == 5) while (x > 1) x <- x - 1 print(x)

Is anything wrong?

Google Confidential and Proprietary

Ex: Indentation Code if (x == 5) while (x > 1) x <- x - 1 print(x)

# R-bleed bug?

;)

Corrected code if (x == 5) while (x > 1) x <- x - 1 print(x) Google Confidential and Proprietary

Ex: Ease checking program correctness Code x <- -5:-1 x[x <-2]

Is anything wrong?

Google Confidential and Proprietary

Ex: Ease checking program correctness Code x <- -5:-1 x[x <-2]

# Hmm ...

Warning Must have whitespace around <-, <<-, etc Corrected code x <- -5:-1 x[x <- 2] Google Confidential and Proprietary

Ex: Ease checking program correctness Code if (format(Sys.time(), "%Y") == "2014") { print(paste("UseR!", "2014") }

Is anything wrong?

Google Confidential and Proprietary

Ex: Ease checking program correctness Code if (format(Sys.time(), "%Y") == "2014") { print(paste("UseR!", "2014") } Error CRITICAL:root:Unbalanced brackets in { print(paste("UseR!", "2014") } Google Confidential and Proprietary

RLint implementation uses Python Use Python string functions and regular expressions. Algorithm: Stub out comments, strings, user-defined operators. ● Ex: Comment may contain code! ● Ex: Multi-line string Check spacing. Align & indent lines within {}, () and []. ● Align lines by opening bracket. ● Align lines by ‘=’ if they are in the same bracket. Align if/while/for (...) not followed by {}. Unstub comments, strings, user-defined operators. Google Confidential and Proprietary

Application: Improve R community's style consistency Proposal: Adopt R style guide + RLint. ● Run experiments to determine net benefit. Small scale: Individual teams (pkgs) adopt style guide + checker. ● Are these programmers more productive? ● More bug fixes and fewer (un-fixed) bug reports? Medium scale: CRAN packages opt into style guide + checker. ● Specify style guide + checker program. ● Enforced by CRAN server farm. Google Confidential and Proprietary

Summary RLint checks and reformats R code to follow R style guide. RLint used within Google ● Eases checking correctness. ● Improves programmer productivity. Suggest experiment adopting consistent style guide + RLint. ● Does it improve your team's productivity?

Google Confidential and Proprietary

RLint: Reformatting R Code to Follow the Google Style Guide Andy Chen

Alex Blocker, ([email protected]), Andy Chu, Tim Hesterberg, Jeffrey D. Oldham, Caitlin Sadowski, Tom Zhang 2014-07-02

Coding conventions and checkers Coding conventions have existed for decades. ● 1918: The Elements of Style by Strunk & White (writing English) ● 1974: The Elements of Programming Style (writing code) ● 1997: Java code conventions ● 2001: Python style guide ● 2014: Google style guides for 12 languages available Style checkers have existed for decades. ● 1977: lint checks C style ● 2002: PyChecker checks Python style ● 2011: gofmt reformats Go code (70% adoption in 2013) Google Confidential and Proprietary

RLint: Reformatting R Code to Follow the ... - Research at Google

Jul 2, 2014 - Improves programmer productivity. Suggest ... R programmer time is expensive? .... Application: Improve R community's style consistency.

498KB Sizes 4 Downloads 75 Views

Recommend Documents

Follow-the-Regularized-Leader and Mirror ... - Research at Google
as online gradient descent) have an equiva- ... We consider the problem of online convex optimization ...... a variety of datasets to illustrate the key differences.

Searching help pages of R packages - Research at Google
34 matches - Software. Introduction. The sos package provides a means to quickly and flexibly search the help ... Jonathan Baron's R site search database (Baron, 2009) and returns the ..... from http://addictedtor.free.fr/rsitesearch . Bibliography.

Sound Ranking Using Auditory Sparse-Code ... - Research at Google
May 13, 2009 - and particularly for comparison and evaluation of alternative sound ... the (first) energy component, yielding a vector of 38 features per time frame. ... Our data set consists of 8638 sound effects, collected from several sources.

R-code online.pdf
... say this represents the law of diminishing. returns. x=1:100. y=log(x). plot(x,y,main='Diminishing returns',xlab='Invested resources',ylab='Return on investm.

R-code online.pdf
There we go! Now hopefully on to some more interesting stuff! Page 3 of 3. R-code online.pdf. R-code online.pdf. Open. Extract. Open with. Sign In. Main menu.

Migrating to BeyondCorp - Research at Google
involved, from the teams that own individual services, to management, to support teams, to ... a new network design, one that removes the privilege of direct.

Google's Hybrid Approach to Research - Research at Google
To compare our approach to research with that of other companies is beyond the scope of this paper. ... plores fundamental research ideas, develops and maintains the software, and helps .... [8], Google File System [9] and BigTable [10]. 2.

Mobile Computing: Looking to the Future - Research at Google
May 29, 2011 - Page 1 ... fast wired and wireless networks make it economical ... ple access information and use network services. Bill N. Schilit, Google.

Using Machine Learning to Improve the Email ... - Research at Google
Using Machine Learning to Improve the Email Experience ... services including email, and machine learning has come of ... Smart Reply: Automated response.

Introduction to the Aggregate Marketing System ... - Research at Google
Apr 13, 2017 - 2015), and geo experiments (Vaver & Koehler, 2011), across complex modeling scenarios. ... Measuring media effects across many channels, for example, requires ...... In branded search campaigns, the advertiser benefits from a competiti

Using the Wave Protocol to Represent ... - Research at Google
There are several challenges in aggregating health records from multiple sources, including merging data, preserving proper attribution, and allowing.