Processing Big Data with Hadoop in Azure HDInsight Lab Setup Guide

Overview This course includes optional labs in which you can try out the techniques demonstrated in the course for yourself.

What You’ll Need To complete the labs, you will need the following: • • • • •

A web browser A Microsoft account A Microsoft Azure subscription A Microsoft Windows, Linux, or Apple Mac OS X computer The lab files for this course

Creating a Free Trial Azure Subscription If you already have a Microsoft Azure subscription, you can skip this section. If you do not have an Azure subscription, you can sign up for the Visual Studio Dev Essentials program at https://visualstudio.com/dev-essentials. This will give you $25 of Azure credit per month for a year. Note that HDInsight clusters consume credit even when not in use, so be careful to delete your clusters after each lab if you don’t intend to use them immediately; otherwise you will run out of credit before the month ends. Alternatively, follow these steps to create a free 30-day trial subscription, which includes enough free credit in your local currency to complete the labs. You will need to provide a valid credit card number for verification, but you will not be charged for Azure services – for more information, see the frequently asked questions in the Azure sign-up page. 1. If you already have a Microsoft account that has not already been used to sign up for a free Azure trial subscription, you’re ready to get started. If not, don’t worry, just create a new Microsoft account at https://signup.live.com. 2. After you’ve created a Microsoft account, browse to http://aka.ms/edx-dat202.1x-az and follow the instructions to sign up for a free trial subscription to Microsoft Azure. You’ll need to sign-in with your Microsoft account if you’re not already signed in. Then you’ll need to:

a.

Enter your cellphone number and have Microsoft send you a text message to verify your identity. b. Enter the code you have been sent to verify it. c. Provide valid payment details. This is required for verification purposes only – your credit card won’t be charged for any services you use during the trial period, and the account is automatically deactivated at the end of the trial period unless you explicitly decide to keep it active.

Configuring a Client Computer You can use a variety of tools to work with Hadoop in HDInsight from Windows, Linux and OSx client computers.

Install Azure Storage Explorer You will be working with Azure blob storage in this course. You can use any Azure storage client to upload and download files to Azure. If you do not already have an Azure storage client installed, you can install Azure Storage Explorer, which is available for Windows, Mac OSX, and Linux. 1. Browse to http://storageexplorer.com/ and follow the instructions to download and install the latest version of Azure Storage Explorer for your operating system (Windows, Mac OSX, or Linux). Note: We recommend using Azure Storage Explorer to transfer files between your local computer and Azure Blob Storage. It provides an intuitive, easy-to-use interface and works on Windows, Mac OS X, and Linux. However, if you prefer you can use the Azure Command Line Interface (which you can download from https://azure.microsoft.com/en-us/downloads/) or any other Azure storage client tool, including Microsoft Visual Studio, AzCopy, and others.

Install a SQL Client Tool In some exercises, you will need to query an Azure SQL Database instance to verify the data it contains. To do this, you will need a client tool. If you do not already have a graphical SQL Server client tool installed (such as Visual Studio or SQL Server Management Studio on Windows, or Talend Open Studio for Data Integration or Navicat for SQL Server on Linux / Mac OSX), you can follow the steps below to install the cross-platform SQL Server command line interface, which is an open source tool for working with SQL Server databases from Windows, Linux or Mac OSX. You can learn more about the SQL CLI at https://www.npmjs.com/package/sql-cli. 1. Browse to https://nodejs.org/en/download/ and follow the instructions to download and install the latest version of Node.js for your operating system (Windows, OSX, or Linux) and architecture (64-bit or 32-bit). 2. Open a Node.JS command line and enter the following command line to install the SQL Server Command line interface package: npm install -g sql-cli

Note: Depending on the security configuration of your system, you may need to run this command as an administrator. On Linux or Mac, you can do this by prefixing the command line

above with the sudo command and entering the administrator password when prompted. On Windows, you can do this by opening a command line as Administrator. 3. Verify installation by viewing the help information for the SQL Server command line interface using the following command: mssql -h

Install PuTTY on Windows HDInsight Hadoop clusters can be provisioned as Linux virtual machines in Azure. When using a Linuxbased HDInsight cluster, you connect to Hadoop services using a remote SSH session. Linux and Mac OSx computers have an SSH client interface built-in, but if you plan to use a Windows client computer with a Linux HDInsight, you must install an SSH client such as PuTTY. Tip: Complete this procedure only if you are using a Windows client. 1. In a Web browser, navigate to http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

2. Download putty.exe, saving it to a suitable folder on your local file system (for example, C:\putty).

3. Create a shortcut to putty.exe on your desktop for convenience.

Download the Lab Files The course materials for this course include files that are required to complete the labs. 1. Download the lab files for this course from https://github.com/MicrosoftLearning/ProcessingBig-Data-with-Hadoop-in-Azure-HDInsight/raw/master/Labs/HDILabs.zip. 2. Extract the HDILabs.zip archive you downloaded to a folder on your local computer. 3. Ensure that the extracted folder and all subfolders are not read-only.

Microsoft Learning Experiences - GitHub

Processing Big Data with Hadoop in Azure. HDInsight. Lab Setup Guide. Overview. This course includes optional labs in which you can try out the techniques ...

843KB Sizes 6 Downloads 237 Views

Recommend Documents

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data

Microsoft Learning Experiences - GitHub
will create. The Azure ML Web service you will create is based on a dataset that you will import into. Azure ML Studio and is designed to perform an energy efficiency regression experiment. What You'll Need. To complete this lab, you will need the fo

Microsoft Learning Experiences - GitHub
Lab 2 – Using a U-SQL Catalog. Overview. In this lab, you will create an Azure Data Lake database that contains some tables and views for ongoing big data processing and reporting. What You'll Need. To complete the labs, you will need the following

Microsoft Learning Experiences - GitHub
The final Execute R/Python Script. 4. Edit the comment of the new Train Model module, and set it to Decision Forest. 5. Connect the output of the Decision Forest Regression module to the Untrained model (left) input of the new Decision Forest Train M

Microsoft Learning Experiences - GitHub
Page 1 ... A web browser and Internet connection. Create an Azure ... Now you're ready to start learning how to build data science and machine learning solutions.

Microsoft Learning Experiences - GitHub
In this lab, you will explore and visualize the data Rosie recorded. ... you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie's.

Microsoft Learning Experiences - GitHub
created previously. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. /data/storefile Stocks. 8. Wait for the MapReduce job to complete. Query the Bulk Loaded Data. 1. Enter the following command to start the HBase shell. hbase shell. 2.

Microsoft Learning Experiences - GitHub
videos and demonstrations in the module to learn more. 1. Search for the Evaluate Recommender module and drag it onto the canvas. Then connect the. Results dataset2 (right) output of the Split Data module to its Test dataset (left) input and connect

Microsoft Learning Experiences - GitHub
In this lab, you will create schemas and tables in the AdventureWorksLT database. Before starting this lab, you should view Module 1 – Designing a Normalized ...

Microsoft Learning Experiences - GitHub
Challenge 1: Add Constraints. You have been given the design for a ... add DEFAULT constraints to columns based on the requirements. Challenge 2: Test the ...

Microsoft Learning Experiences - GitHub
Data Science and Machine Learning ... A web browser and Internet connection. ... Azure ML offers a free-tier account, which you can use to complete the labs in ...

Microsoft Learning Experiences - GitHub
Processing Big Data with Hadoop in Azure. HDInsight. Lab 1 - Getting Started with HDInsight. Overview. In this lab, you will provision an HDInsight cluster.

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 2 - Getting Started with IoT Hubs. Overview. In this lab, you will create an Azure IoT Hub and use it to collect data ...

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 1 - Getting Started with Event Hubs. Overview. In this lab, you will create an Azure Event Hub and use it to collect ...

Microsoft Learning Experiences - GitHub
Data Science Essentials. Lab 6 – Introduction to ... modules of this course; but for the purposes of this lab, the data exploration tasks have already been ... algorithm requires all numeric features to be on a similar scale. If features are not on

Microsoft Learning Experiences - GitHub
Selecting the best features is essential to the optimal performance of machine learning models. Only features that contribute to ... Page 3 .... in free space to the right of the existing modules: ... Use Range Builder (all four): Unchecked.

Microsoft Learning Experiences - GitHub
Implementing Predictive Analytics with. Spark in Azure HDInsight. Lab 3 – Evaluating Supervised Learning Models. Overview. In this lab, you will use Spark to ...

Microsoft Learning Experiences - GitHub
Microsoft Azure Machine Learning (Azure ML) is a cloud-based service from Microsoft in which you can create and run data science experiments, and publish ...

Microsoft Learning Experiences - GitHub
A Microsoft Windows, Apple Macintosh, or Linux computer ... In this case, you must either use a Visual Studio Dev Essentials Azure account, or ... NET SDK for.