HW 2: Chapter 1. Data Exploration STUDENT NAME Date

PART I: Chapter 1 Problem Sets. PS 1.1: Data Basics 1. OI 1.7: Fisher’s irises: Sir Ronald Aylmer Fisher was an English statistician, evolutionary biologist, and geneticist who worked on a data set that contained sepal length and width, and petal length and width from three species of iris flowers (setosa, versicolor and virginica). There were 50 flowers from each species in the data set. a. How many cases were included in the data? b. How many numerical variables are included in the data? Indicate what they are, and if they are continuous or discrete. c. How many categorical variables are included in the data, and what are they? List the corresponding levels (categories). 2. OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits of UK residents. The textbook displays a data matrix displaying a portion of the data collected in this survey. Note that £ stands for British Pounds Sterling, cig stands for cigarettes, and N/A refers to a missing component of the data. a. What does each row of the data matrix represent? b. How many participants were included in the survey? c. Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.

PS 1.3 Mean vs. Median A small accounting firm pays each of its six clerks \$35,000, two junior accountants \$70,000 each, and the firm’s owner \$420,000. The salary data for the 6 clerks, 2 Jr. accountants and owner looks like # assign salary as an object here 1. What is the mean salary paid at this firm? # use the mean() function here 2. How many of the employees earn less than the mean? 3. What is the median salary? # use the median() function here 4. Which measure tells you more about the typical amount earned at that firm?

1

PS 1.8: Sample correlations The Organisation for Economic Co-operation and Development collects data on the central government debt for many countries. The data for this problem is contained in the debt data set. # Use read.delim() to import the data here by using the code found on the datasets page 1. Draw a scatterplot of 2005(x) against 2006(y) data. # create the scatterplot here. You can use plot(), qplot() or ggplot() 2. Describe the direction, strength and form of the relationship in context of the problem. 3. Calculate the correlation r. # use the cor() function here

PS 1.10: Spurious correlations Go to the Spurious Correlations website: http://tylervigen.com/discover and use the drop down menu to choose two interesting variables to examine the correlation between. Include the image into your homework document by replacing the URL in the example below with your URL. Write a sentence or two describing the trends observed in your example. Explain the difference in correlation and causation in context of your variables.

Figure 1: Divorce rate in Alabama vs US whole milk consumption

2

## HW 2: Chapter 1. Data Exploration - GitHub

OI 1.8: Smoking habits of UK Residents: A survey was conducted to study the smoking habits ... create the scatterplot here. You can use ... Go to the Spurious Correlations website: http://tylervigen.com/discover and use the drop down menu to.

#### Recommend Documents

Chapter 2: Data
Suppose a basketball player has an 80% free throw success rate. How can we use random numbers to simulate whether or not she makes a foul shot?

Data 8R Hypothesis Testing Summer 2017 1 Terminology 2 ... - GitHub
Jul 27, 2017 - simulated on a computer. ... From the histogram, it looks like the higher mean from gambling was not at all that unusual - it certainly could have.

Chapter 2, Section 1 Notes.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Chapter 2, Section 1 Notes.pdf. Chapter 2, Section 1 Notes.pdf.

Chapter 1, Section 2: Quiz - MOBILPASAR.COM
A. KEY TERMS. Match the descriptions in Column I with the terms in Column II. Write the letter of the correct answer in the blank provided. Column I. _____ 1.

AIFFD Chapter 12 - Bioenergetics - GitHub
The authors fit a power function to the maximum consumption versus weight variables for the 22.4 and ... The linear model for the 6.9 group is then fit with lm() using a formula of the form ..... PhD thesis, University of Maryland, College Park. 10.

1 Introduction 2 Vector magnetic potential - GitHub
Sep 10, 2009 - ... describes the derivation of the approximate analytical beam models ...... of the source whose solution was used to correct the residual data.

1 - GitHub
Mar 4, 2002 - is now an integral part of computer science curricula. ...... students have one major department in which they are working OIl their degree.

1 - GitHub
are constantly accelerated by an electric field in the direction of the cathode, the num- ...... als, a standard fit software written at the University of Illinois [Beechem et al., 1991], ...... Technical report, International Computer Science Instit