Data 8R Summer 2017 1

Tables and more Visualizations Discussion 5: July 11, 2017

Review

Write down the definitions of the following terms in your own words: RCT (Randomized Control Trial)

Observational Study

Confounding factor

Observational Study or RCT: A researcher at a hospital decides to study a group of students from Berkeley. He/she decides to record data for each subject on the amount of exercise they do for an entire year. At the same time, the researcher also records the number of colds each volunteer gets. Is this an observational study or an RCT?

If the researcher finds that people who exercise more get fewer colds, what can we say from this?

Is it a valid conclusion to claim that there is an association between exercise and fewer colds in elderly people?

2

Tables and more Visualizations

Observational study. There is an assocation between getting colds and exercising - we cannot conclude causation because there is only a randomized experiment can confirm that. No. There is no external validity because the demographic of students we studied is not the same as the elderly, hence we cannot claim an association among this group. Note: Age is not a confounding factor here. Write all the subexpressions for the following expressions: >>> 5 + (15 * (6 / 2))

5 (15 * (6 / 2)) 15 (6 / 2) 6 2 >>> make_array(10, 15, 20).item(0)

makea rray(10, 15, 20)101520makea rray0makea rray(10, 15, 20).item Write down the outputs of the following code: Assume that the following lines of code have already been executed: >>> array1 = np.arange(1, 11) >>> array2 = make_array(3, 5, 9, 10)

Write down the outputs of following line: >>> np.diff(array2)

[2, 4, 1] >>> array1 / sum(array1)

[1/55, 2/55, 3/55, 4/55, 5/55, 6/55, 7/55, 8/55, 9/55, 10/55]

2

Bar Charts

A frequency/probability distribution is a distribution whose amounts have been normalized to add up to 1. In other words, a frequency distribution describes the proportions of some data. On the other hand, when a

Tables and more Visualizations

3

distribution does not describe the proportions, it is called a count distribution. In other words, this means that a count distribution contains data about a count of some things. Answer the following questions: What is the difference between categorical and numerical variables?

Which variable should you use a bar chart to visualize and why?

Can bar charts be used to graph proportions?

Categorical variables are strings, numbers, etc. that represent categories, and thus cannot have mathematical operators applied to them. Numerical variables are variables that represent metrics or quantities. They can have mathematical operators applied to them. We should use bar charts to graph categorical variables. Yes. Examine the following bar chart and answer some questions about it. A business has graphed the proportion of outputs in each year as a bar chart.

4

Tables and more Visualizations

If the business wanted to compare outputs from year to year, does this bar chart serve its purpose?

If the business wanted to compare outputs for each 2-year period, does the bar chart serve its purpose?

Yes, it allows us to compare each year against each other. No, not easy to visualize combining the bars together.

3

Histograms

Should you use a histogram to graph categorical or numerical variables and why?

What does the width of a histogram bar represent?

What does the height of a histogram bar represent?

Tables and more Visualizations

5

What does the area of a histogram bar represent?

What should the entire area of a histogram sum to (if we’re using a frequency distribution)?

Numerical variables, because it allows us to see the distribution of the proportions. The width represents size of the group of data. It represents the number of units. The height represents the density in that bin (in the units that the width represents). Ie. It represents the percent of entries per unit in the bin relative the width of that bin. The area represents the proportion of elements in the bin (ie. number of elements * percent of entries per unit in the bin). The entire area of a histogram should sum to 1 if we’re using a frequency distribution. It should sum to the count if we use a count distribution. Suppose the same business has now made a histogram of their gross revenues:

What was the most common revenue range?

What does the height of each bar of this histogram represent?

6

Tables and more Visualizations

What does the area of each bar in this histogram represent? 120-140 millin dollars The height represents the percent per million dollars of the values of each bin. The area represents the percentage of gross revenues that fell within each bin (ie. the proportions)

Data 8R Tables and more Visualizations Summer 2017 1 ... - GitHub

number of colds each volunteer gets. Is this an observational ... questions about it. A business has graphed the proportion of outputs in each year as a bar chart.

139KB Sizes 0 Downloads 296 Views

Recommend Documents

Data 8R Tables and more Visualizations Summer 2017 1 ... - GitHub
Jul 11, 2017 - At the same time, the researcher also records the number of ... A business has graphed the proportion of outputs in each year as a bar chart.

Data 8R Intro to Visualizations Summer 2017 1 Similarity and ... - GitHub
Jun 27, 2017 - The chips that are present in your computer contain electrical components called transistors. ... Here's another attempt to improve the plot:.

Data 8R Table Methods and Functions Summer 2017 1 ... - GitHub
Jul 18, 2017 - We have the dataset trips, which contains data on trips taken as part ofa ... def num_long_trips(cutoff): ... We want to see what the distribution of.

Data 8R Table Methods and Functions Summer 2017 1 ... - GitHub
Jul 18, 2017 - Data 8R. Table Methods and Functions. Summer 2017. Discussion 7: ... its range - the difference between the highest value in the array and.

Data 8R Plotting Functions Summer 2017 1 Midterm Review ... - GitHub
Data 8R. Plotting Functions. Summer 2017. Discussion 7: July 20, 2017. 1 Midterm Review. Question 4 ... function onto the table: Hint: Velocity = distance / time.

Data 8R Plotting Functions Summer 2017 1 Midterm Review ... - GitHub
Jul 20, 2017 - In physics calculations, we often want to have the data in terms of centimeters. Create a table called cm table that has the original data and a ...

Data 8R Hypothesis Testing Summer 2017 1 Terminology 2 ... - GitHub
Jul 27, 2017 - simulated on a computer. ... From the histogram, it looks like the higher mean from gambling was not at all that unusual - it certainly could have.

Data 8R Review of Table Methods Summer 2017 - GitHub
Jul 18, 2017 - We find that most trips have smaller length, but a few are very long. We want to see what the distribution of commute lengths looks like, and ...

Data 8R Review of Table Methods Summer 2017 - GitHub
Jul 18, 2017 - We also figure that commuters will be subscribers to the program, not one-time users. ... return np.mean(short_commute.column( Duration ) ...

Data 8R Data Types and Arrays Summer 2017 1 A Test of Skill - GitHub
1 A Test of Skill ... errors) of the following lines of Python code! >>> 6 / 3 ... Luckily, they've supplied a function named bar(labelArray, dataArray) to do.

Data 8R Data Types and Arrays Summer 2017 1 A Test of Skill - GitHub
Data 8R. Data Types and Arrays. Summer 2017. Discussion 4: July 6, 2017 ... Impress the squirrels with your knowledge of data types! .... 4 Data Manipulation.

Data 8R Intro to Python Summer 2017 1 Express Yourself! 2 ... - GitHub
Mike had a tremendous growth spurt over the past year. Find his growth rate over this 1 year. (Hint: The Growth Rate is the absolute difference between the final.

Data 8R Intro to Python Summer 2017 1 Express Yourself! 2 ... - GitHub
An expression describes to the computer how to combine pieces of data. ... inputs to a call expression are expressions themselves, you can have another call ...

data tables - GitHub
fwrite - parallel file writer. SOURCE: http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ ... SOURCE: https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf length.

Innovative Projects Summer 2017 - GitHub
Jan 31, 2017 - 10. Page 2 http://nokiawroclaw.pl/ https://github.com/nokia-wroclaw/ .... Develop a tool that will notify person via android app that some system ...

Reactive Data Visualizations - Semantic Scholar
of the commercial visualization package Tableau [4]. Interactions within data visualization environments have been well studied. Becker et al. investigated brushing in scatter plots [5]. Shneiderman et al. explored dynamic queries in general and how

ST THOMAS MORE RC SCHOOL BLAYDON TIME TABLES 2017 ...
ST THOMAS MORE RC SCHOOL BLAYDON TIME TABLES 2017-18.pdf. ST THOMAS MORE RC SCHOOL BLAYDON TIME TABLES 2017-18.pdf. Open.

lecture 3: more statistics and intro to data modeling - GitHub
have more parameters than needed by the data: posteriors can be ... Modern statistical methods (Bayesian or not) .... Bayesian data analysis, Gelman et al.

Tables and Data Script.pdf
add constraint DCust_ID_FK foreign key (Customer_ID) references. PBTRAINING.XXPB_Customers(customer_id);. Page 3 of 10. Tables and Data Script.pdf.

Supplementary Materials 1. Supplementary Tables
3.87. 27. -75. -48. Orbitofrontal Cortex. 11. R. 76. 3.72. 30. 48. -15. Decreased DMN (Controls > FTD). Dorsomedial PFC. 32. L. 312. 3.98. -27. 51. 21. Anterior Cingulate. 24. -. 107. 3.83. -3. 21. 33 .... correlation pattern, corresponding to a Type

Wildwood 2017 Summer (1).pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Wildwood 2017 ...

Probability and Statistics Formulas and Tables (1).pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Probability and ...

1 Visibility Data & AIPS++ Measurement Sets - GitHub
you screw up, restore it with: $ cd ~/Workshop2007 ... cp -a (/net/birch)/data/oms/Workshop2007/demo.MS . ... thus, “skeleton”: we ignore the data in the MS.