Data Science Orientation Lab – Exploring Data

Overview Rosie Reeves is an entrepreneurial middle-school student who sells homemade cold drinks from a stand during the summer months. This summer, Rosie sold two flavors of drink (lemon and orange) at two different locations (the beach and the park); and to promote her drinks stall, she distributed leaflets in the local area. Rosie recorded details of her sales and leaflet distribution in a text file, along with a note of the temperature each day. In this lab, you will explore and visualize the data Rosie recorded. Note: The figures used in the lab files for this lab are not the same as the figures used in the course demonstrations!

What You’ll Need To complete the labs, you will need the following: 



A Windows or Mac OS X computer running Excel 2016. If you do not have Excel 2016 installed, you can sign up for a free trial of Office 365 at http://aka.ms/edx-dat101x-o365, and install Office applications on your Windows PC or Mac. The lab files for this course. You can download and extract these from http://aka.ms/edx-dat101xlabs.

Exercise 1: Exploring Data in Excel In this exercise, you will use Excel to explore and visualize the data that Rosie had collected.

Import the Data into Excel 1. Create a new blank Excel workbook and import the text data from the Lemonade2016.csv comma-delimited text file in the DAT101x_Labfiles folder where you extracted the lab files. 2. Format the imported data as a table.

Cleanse the Data 1. Find and remove any duplicate rows in the data. 2. Find and resolve any missing values in the data, either by interpolating missing values that are in an obvious sequence, or by entering the average value for the column (rounded to the nearest whole number) into any cells where a value is missing.

Add Derived Columns 1. Add a column named Sales to the table, in which the total sales of lemon and orange is calculated. 2. Add a column named Revenue to the table, in which the sales revenue is calculated by multiplying the total sales and price. 3. In an empty cell below the table, insert a formula to calculate the total revenue for the summer, and make a note of this value.

Find the Highest and Lowest Temperatures 1. Use conditional formatting to identify the top 10% and bottom 10% temperatures. 2. Note the highest and lowest temperatures.

Create Charts 1. Create a line chart that shows Date and Revenue. Then add a linear trendline to determine whether the revenue trend is rising, falling, or staying level. 2. Create a scatter-plot chart that shows Leaflets on the X axis and Sales on the Y axis, and determine whether any signs of linear correlation can be detected. 3. Create a histogram that shows Revenue distributed into 10 bins, and note whether the distribution of this data is normal, left skewed, or right skewed.

Exercise 2: Using Statistics in Excel In this exercise, you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie’s sales data.

Calculate Descriptive Statistics 1. Calculate descriptive statistics for the Temperature, Leaflets, Price, and Sales columns. 2. Note the mean, median, and mode for each column. 3. Note the range, sample variance, and standard deviation for each column.

Calculate Correlation 1. Determine the strength and direction of any correlations between the Temperature, Leaflets, and Price columns and the Sales column.

Compare Sales of Lemon and Orange Flavors 1. Perform an appropriate test to determine whether there is a statistically significant difference between sales of Lemon and Orange drinks.

Perform Regression 1. Perform regression using Sales as the Y input and Temperature, Leaflets, and Price as the X input. 2. Review the results of the regression, noting the relative significance of the input variables and the intercept and weights applied to them to calculate the predicted value for Y. 3. Use the intercept and weights to calculate a predicted Y value for the following features:  Temperature: 80  Leaflets: 110  Price: 0.35

4. Calculate predicted sales for the above scenario if the number of leaflets distributed is increased to 120.

Microsoft Learning Experiences - GitHub

In this lab, you will explore and visualize the data Rosie recorded. ... you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie's.

662KB Sizes 19 Downloads 268 Views

Recommend Documents

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data

Microsoft Learning Experiences - GitHub
will create. The Azure ML Web service you will create is based on a dataset that you will import into. Azure ML Studio and is designed to perform an energy efficiency regression experiment. What You'll Need. To complete this lab, you will need the fo

Microsoft Learning Experiences - GitHub
Lab 2 – Using a U-SQL Catalog. Overview. In this lab, you will create an Azure Data Lake database that contains some tables and views for ongoing big data processing and reporting. What You'll Need. To complete the labs, you will need the following

Microsoft Learning Experiences - GitHub
The final Execute R/Python Script. 4. Edit the comment of the new Train Model module, and set it to Decision Forest. 5. Connect the output of the Decision Forest Regression module to the Untrained model (left) input of the new Decision Forest Train M

Microsoft Learning Experiences - GitHub
Page 1 ... A web browser and Internet connection. Create an Azure ... Now you're ready to start learning how to build data science and machine learning solutions.

Microsoft Learning Experiences - GitHub
created previously. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. /data/storefile Stocks. 8. Wait for the MapReduce job to complete. Query the Bulk Loaded Data. 1. Enter the following command to start the HBase shell. hbase shell. 2.

Microsoft Learning Experiences - GitHub
videos and demonstrations in the module to learn more. 1. Search for the Evaluate Recommender module and drag it onto the canvas. Then connect the. Results dataset2 (right) output of the Split Data module to its Test dataset (left) input and connect

Microsoft Learning Experiences - GitHub
In this lab, you will create schemas and tables in the AdventureWorksLT database. Before starting this lab, you should view Module 1 – Designing a Normalized ...

Microsoft Learning Experiences - GitHub
Challenge 1: Add Constraints. You have been given the design for a ... add DEFAULT constraints to columns based on the requirements. Challenge 2: Test the ...

Microsoft Learning Experiences - GitHub
Data Science and Machine Learning ... A web browser and Internet connection. ... Azure ML offers a free-tier account, which you can use to complete the labs in ...

Microsoft Learning Experiences - GitHub
Processing Big Data with Hadoop in Azure. HDInsight. Lab 1 - Getting Started with HDInsight. Overview. In this lab, you will provision an HDInsight cluster.

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 2 - Getting Started with IoT Hubs. Overview. In this lab, you will create an Azure IoT Hub and use it to collect data ...

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 1 - Getting Started with Event Hubs. Overview. In this lab, you will create an Azure Event Hub and use it to collect ...

Microsoft Learning Experiences - GitHub
Data Science Essentials. Lab 6 – Introduction to ... modules of this course; but for the purposes of this lab, the data exploration tasks have already been ... algorithm requires all numeric features to be on a similar scale. If features are not on

Microsoft Learning Experiences - GitHub
Selecting the best features is essential to the optimal performance of machine learning models. Only features that contribute to ... Page 3 .... in free space to the right of the existing modules: ... Use Range Builder (all four): Unchecked.

Microsoft Learning Experiences - GitHub
Implementing Predictive Analytics with. Spark in Azure HDInsight. Lab 3 – Evaluating Supervised Learning Models. Overview. In this lab, you will use Spark to ...

Microsoft Learning Experiences - GitHub
Microsoft Azure Machine Learning (Azure ML) is a cloud-based service from Microsoft in which you can create and run data science experiments, and publish ...

Microsoft Learning Experiences - GitHub
A Microsoft Windows, Apple Macintosh, or Linux computer ... In this case, you must either use a Visual Studio Dev Essentials Azure account, or ... NET SDK for.

Microsoft Learning Experiences - GitHub
In the new browser tab that opens, note that a Jupyter notebook named ... (Raw) notebook has been created, and that it contains two cells. The first ..... Page 9 ...