Real-Time Big Data Processing Lab 3 – Processing Real-Time Data with Stream Analytics

Overview In this lab, you will create an Azure Stream Analytics job to process simulated device data from the applications you generated in labs 1 and 2.

What You’ll Need To complete the labs, you will need the following:      

A web browser A Microsoft account A Microsoft Azure subscription A Windows, Linux, or Mac OS X computer The lab files for this course The Azure resources created in the previous labs

Important: This lab depends on resources created in Lab 1 and Lab 2. If you have not completed the previous labs (or if you have deleted the resources you created), complete these labs now.

Exercise 1: Processing a Stream of Data To process real-time data as it arrives in an event hub or IoT hub, you can use an Azure Stream Analytics job. In this procedure, you will create a simple Stream Analytics job that reads device readings from your event hub, and stores them in a blob store container.

Create a Storage Account Your streaming solution will store its output in Azure blob storage, so you will need to create an Azure Storage account. 1. 2. 3.

In a web browser, navigate to http://portal.azure.com, and if prompted, sign in using the Microsoft account that is associated with your Azure subscription. In the Microsoft Azure portal, in the Hub Menu, click New. Then in the Storage menu, click Storage account. In the Create storage account blade, enter the following settings and click Create:  Name: Enter a unique name (and make a note of it!)  Deployment model: Resource manager  Account kind: General purpose  Performance: Standard

4.

 Replication: Locally-redundant storage (LRS)  Storage service encryption: Disabled  Subscription: Select your Azure subscription  Resource group: Use the existing resource group you created in the previous procedure  Location: Select the region where you created your service bus namespace In the Azure portal, view Notifications to verify that deployment has started. Then wait for the storage account to be deployed (this can take a few minutes.)

Create a Stream Analytic Job The first step in using Stream Analytics to process real-time data is to create a Stream Analytics job. 1.

In the Microsoft Azure portal, in the Hub Menu, click New. Then in the Internet of Things menu, click Stream Analytics job. 2. In the New Stream Analytics Job blade, enter the following settings, and then click Create:  Name: Enter a unique name (and make a note of it!)  Subscription: Select your Azure subscription  Resource Group: Select the resource group containing your existing resources  Location: Select any available region  Pin to dashboard: Not selected 3. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the job to be deployed (this can take a few minutes.)

Add an Input Stream Analytics jobs get their data from one or more inputs. In this procedure, you will create and sample an input for the event hub you created in the previous lab. 1. In the Azure portal, browse to the Stream Analytics job you created previously. 2. In the blade for your Stream Analytics job, in the Job Topology section, click the Inputs tile. 3. In the Inputs blade, click Add. 4. In the New input blade, enter the following settings, and then click Create:          

Input alias: DeviceData Source Type: Data stream Source: IoT hub Subscription: Use IoT hub from current subscription IoT hub: Select your IoT hub Endpoint: Messaging Shared Access Policy Name: service Consumer group: $Default Event serialization format: JSON Encoding: UTF-8

4. Wait for the input to be created and tested.

Add an Output Stream Analytics jobs return their results to an output. In this procedure, you will add an output to your Stream Analytics job so that the processed results are stored in Azure Blob storage. 1. In the Azure portal, browse to the Stream Analytics job you created previously. 2. In the blade for your Stream Analytics job, in the Job Topology section, click the Outputs tile. 3. In the Outputs blade, click Add. 4. In the New output blade, enter the following settings, and then click Create: 

Output alias: DeviceReadings

          

Sink: Blob storage Subscription: Use blob storage from current subscription Storage account: Select your storage account Container: Create a new container Container: device-readings Path pattern: readings/{date} Date format: YYYY/MM/DD Time format: Should be unavailable Event serialization format: CSV Delimiter: comma (,) Encoding: UTF-8

5. Wait for the output to be created and tested.

Add a Query Now that you have defined an input and an output for your Stream Analytics job, you can connect them by defining a query that will process the data stream. 1. In the Azure portal, browse to the Stream Analytics job you created previously. 2. In the blade for your Stream Analytics job, in the Job Topology section, click the Query tile. 3. In the query blade, modify the default query that is provided for you, replacing YourOutputAlias with the output alias you specified for your output, and YourInputAlias with the input alias you specified for your input (the available inputs and outputs are shown on the left of the query editor pane): SELECT * INTO [DeviceReadings] FROM [DeviceData]

4. Save the query, and then close the query pane.

View the Job Diagram Your Stream Analytics job now consists of an input, connected to an output by a query. You can verify this by viewing the job diagram. 1. In the blade for your Stream Analytics job, click Settings. 2. In the Settings blade, click Job diagram. 3. In the Job diagram blade, verify that your job consists of an IoT Hub input, followed by a query step, followed by a Blob Storage output. 4. Close the Job diagram blade and the Settings blade.

Start the Job A Stream Analytics job runs perpetually, processing data as it arrives. 1. In the blade for your Stream Analytics job, click Start. Then in the Start job blade, ensure that Now is selected and click Start. 2. Wait for the streaming job to start – this can take a minute or so. 3. When the job has started, in the Node.JS console, in the iotdevice folder, enter the following command to run device simulation script and start submitting messages to the IoT hub:

node iotdevice.js 4. 5. 6.

While the script is running, start Azure Storage Explorer, and if necessary, sign into your azure subscription using your Microsoft account. Expand your storage account, and then expand Blob Containers. Double-click the device-readings container, and then browse through the readings folder, and the year, month, and date folder to view the most recent blob that has been generated by your job.

7. Download the blob to open it in a text editor or spreadsheet application, and verify that in addition to device, reading fields, it contains values for EventProcesssedUtcTime, PartitionId, EventEnqueuedUtcTime, and IotHub. 8. Close the downloaded file, and in the Node.JS console, press CTRL+C to stop the script.

Stop the Job When you want to stop processing events, you can stop the job. 1. In the blade for your Stream Analytics job, click Stop. When prompted to confirm, click Yes. 2. Wait for the job to stop running.

Exercise 2: Extending a Streaming Solution A Stream Analytics job can include multiple inputs and outputs, enabling you to combine data streams from multiple sources and send processed results to multiple destinations. Additionally, you can create a streaming topology that uses multiple Stream Analytics jobs to filter and route messages through event hubs. In this exercise, you will extend your streaming solution to filter readings greater than 0.5 and route them to a second Stream Analytics job, which will store then in blob storage in an alerts folder.

Add a Second Output In this procedure, you will add a second output to your Stream Analytics job. 1. In the Azure portal, browse to the Stream Analytics job you created previously. 2. In the blade for your Stream Analytics job, in the Job Topology section, click the Outputs tile. 3. In the Outputs blade, click Add. 4. In the New output blade, enter the following settings, and then click Create:          

Output alias: HighReadings Sink: Event Hub Subscription: Use event hub from current subscription Service bus namespace: Select your service bus namespace Event hub name: Select your event hub Event hub policy name: DeviceAccess (this should be the name of the shared access policy you created in lab 1) Partition key column: Leave blank Event serialization format: JSON Encoding: UTF-8 Format: Line separated

5. Wait for the output to be created and tested.

Modify the Query Now that you have defined a second output for your Stream Analytics job, you can send processed data to it from your query. 1. In the Azure portal, browse to the Stream Analytics job you created previously.

2. In the blade for your Stream Analytics job, in the Job Topology section, click the Query tile. 3. Modify the query as shown below, creating a common table expression for all readings, a SELECT statement that routes the device, reading, and EventEnqueuedEtcTime fields to the blob store output, and a second SELECT statement that filters rows with a reading greater than 0.5 and writes them to the new event hub output: WITH [AllReadings] AS (SELECT * FROM [DeviceData]) SELECT device, reading, EventEnqueuedUtcTime INTO [DeviceReadings] FROM [AllReadings] SELECT device, reading, EventEnqueuedUtcTime INTO [HighReadings] FROM [AllReadings] WHERE CAST(reading AS float) > 0.5

4. Save the query, and then close the query pane.

View the Job Diagram Your Stream Analytics job now consists of an input, connected to two outputs by two steps. You can verify this by viewing the job diagram. 1. In the blade for your Stream Analytics job, click Settings. 2. In the Settings blade, click Job diagram. 3. In the Job diagram blade, verify that your job consists of an IoT Hub input, followed by two query steps, followed by a Blob Storage output and an Event Hub output. 4. Close the Job diagram blade and the Settings blade.

Add a Second Stream Analytics Job The Stream Analytics job you have created routes readings with a high value to an event hub. You will now add a second stream analytics job to process these high readings. 1.

In the Microsoft Azure portal, in the Hub Menu, click New. Then in the Internet of Things menu, click Stream Analytics job. 2. In the New Stream Analytics Job blade, enter the following settings, and then click Create:  Name: Enter a unique name (and make a note of it!)  Subscription: Select your Azure subscription  Resource Group: Select the resource group containing your existing resources  Location: Select any available region  Pin to dashboard: Not selected 3. In the Azure portal, view Notifications to verify that deployment has started. Then wait for the job to be deployed (this can take a few minutes.) Then browse to the blade for the new Stream Analytics job.

4. Add an input to the new job, with the following settings:     

Input alias: HighReadings Source Type: Data stream Source: Event hub Subscription: Use event hub from current subscription Service bus namespace: Select your service bus namespace

Event hub name: Select your event hub Event hub policy name: DeviceAccess (this should be the name of the shared access policy you created in lab 1)  Event hub consumer group: Leave blank  Event serialization format: JSON  Encoding: UTF-8 Add an output to the stream analytics job, with the following settings:  Output alias: DeviceAlerts  Sink: Blob storage  Subscription: Use blob storage from current subscription  Storage account: Select your storage account  Container: device-readings  Path pattern: alerts/{date}  Date format: YYYY/MM/DD  Time format: Should be unavailable  Event serialization format: CSV  Delimiter: comma (,)  Encoding: UTF-8  

6.

5. Add the following query to the stream analytics job: SELECT * INTO [DeviceAlerts] FROM [HighReadings]

Start the Jobs Now you’re ready to start both jobs and test the streaming topology. 1. Start the new stream analytics job and wait for it to start – this can take a minute or so. 2. Start the original stream analytics job and wait for it to start – this can take a minute or so. 3. When the jobs have started, in the Node.JS console, in the iotdevice folder, enter the following command to run device simulation script and start submitting messages to the IoT hub: node iotdevice.js 4. 5. 6.

While the script is running, start Azure Storage Explorer, and if necessary, sign into your azure subscription using your Microsoft account. Expand your storage account, and then expand Blob Containers. Double-click the device-readings container, and then browse through the alerts folder, and the year, month, and date folder to view the most recent blob that has been generated by your job.

7. Download the blob to open it in a text editor or spreadsheet application, and verify that it contains readings with a value greater than 0.5. 8. Close the downloaded file, and in the Node.JS console, press CTRL+C to stop the script.

Stop the Jobs When you want to stop processing events, you can stop the jobs. 1. In the Azure portal, stop both stream analytics jobs.

Exercise 3: Using Static Reference Data Many real-time data processing solutions use static reference data to augment the streaming data. In this exercise, you will add a static dataset containing details of the devices submitting readings to your streaming solution.

Upload Reference Data to Azure The device details data is provided as a text file, which you will upload to your Azure blob storage container. 1. 2.

3. 4. 5.

In the folder where you extracted the lab files, open the devices.csv file in a text editor or spreadsheet application. Review the device data, noting that the first value in each row is a device ID (in the format devn) and the second value is the full name of the device (in the format Device n). Then close the file without saving any changes. Start Azure Storage Explorer, and if necessary, sign into your azure subscription using your Microsoft account. Expand your storage account, and then expand Blob Containers. Double-click the device-readings container, and then in the Upload drop-down list, click Files.

6. Browse to the devices.csv file. Then in the Blob type list, ensure that Block Blob is selected, and in the Upload to folder box type static-data. Click Upload to upload the file. 7. Verify that devices.csv file is now stored in a folder named static-data in your blob storage container.

Modify a Stream Analytics Job Now that you have uploaded the static reference data you can modify the query in the first of your Stream Analytics jobs to look up the device name. 1. In the Azure portal, browse to the first Stream Analytics job you created, which reads device data from an IoT hub input, and routes the processed results to a blob storage output and an event hub output. 2. In the blade for your Stream Analytics job, in the Job Topology section, click the Inputs tile. 3. In the Inputs blade, click Add. 4. In the New input blade, enter the following settings, and then click Create:          

Input alias: DeviceDetails Source Type: Reference data Subscription: Use blob storage from current subscription Storage account: Select your storage account Container: device-readings Path pattern: static-data/devices.csv Partition key column: Leave blank Event serialization format: CSV Delimiter: comma (,) Encoding: UTF-8

5. Wait for the input to be created and tested. 6. In the blade for your Stream Analytics job, in the Job Topology section, click the Query tile. 7. Verify that the existing query looks like this: WITH [AllReadings] AS (SELECT * FROM [DeviceData])

SELECT device, reading, EventEnqueuedUtcTime INTO [DeviceReadings] FROM [AllReadings] SELECT device, reading, EventEnqueuedUtcTime INTO [HighReadings] FROM [AllReadings] WHERE CAST(reading AS float) > 0.5

8. Modify the query as shown below, joining the streaming data source to the static reference data: WITH [AllReadings] AS ( SELECT strm.*, stat.DeviceName FROM [DeviceData] AS strm JOIN [DeviceDetails] AS stat ON strm.device = stat.DeviceID ) SELECT device, DeviceName, reading, EventEnqueuedUtcTime INTO [DeviceReadings] FROM [AllReadings] SELECT device, DeviceName, reading, EventEnqueuedUtcTime INTO [HighReadings] FROM [AllReadings] WHERE CAST(reading AS float) > 0.5

9. Save the query, and then close the query pane.

View the Job Diagram Your Stream Analytics job now consists of two inputs, connected to two outputs by two steps. You can verify this by viewing the job diagram. 1. In the blade for your Stream Analytics job, click Settings. 2. In the Settings blade, click Job diagram. 3. In the Job diagram blade, verify that your job consists of an IoT Hub input and a blob storage input, followed by two query steps, followed by a Blob Storage output and an Event Hub output. 4. Close the Job diagram blade and the Settings blade.

Start the Jobs Now you’re ready to start both jobs and test the streaming topology. 1. Start both analytics jobs and wait for them to start – this can take a minute or so. 2. When the jobs have started, in the Node.JS console, in the iotdevice folder, enter the following command to run device simulation script and start submitting messages to the IoT hub: node iotdevice.js 3. 4.

While the script is running, start Azure Storage Explorer, and if necessary, sign into your azure subscription using your Microsoft account. Expand your storage account, and then expand Blob Containers.

5.

Double-click the device-readings container, and then browse through the readings folder, and the year, month, and date folder to view the most recent blob that has been generated by your job.

6. Download the blob to open it in a text editor or spreadsheet application, and verify that it contains the device ID and name for each reading. 7. Close the downloaded file, and in the Node.JS console, press CTRL+C to stop the script.

Stop the Jobs When you want to stop processing events, you can stop the jobs. 1. In the Azure portal, stop both stream analytics jobs. Note: You will use the resources you created in this lab when performing the next lab, so do not delete them. Ensure that all stream analytics jobs and Node.js scripts are stopped to minimize ongoing resource usage costs.

Microsoft Learning Experiences - GitHub

To process real-time data as it arrives in an event hub or IoT hub, you can use an Azure Stream Analytics job. In this procedure, you will create a simple Stream Analytics job that reads device readings from your event hub, and stores them in a blob store container. Create a Storage Account. Your streaming solution will store ...

1MB Sizes 0 Downloads 235 Views

Recommend Documents

Microsoft Learning Experiences - GitHub
Performance for SQL Based Applications. Then, if you have not already done so, ... In the Save As dialog box, save the file as plan1.sqlplan on your desktop. 6.

Microsoft Learning Experiences - GitHub
A Windows, Linux, or Mac OS X computer. • Azure Storage Explorer. • The lab files for this course. • A Spark 2.0 HDInsight cluster. Note: If you have not already ...

Microsoft Learning Experiences - GitHub
Start Microsoft SQL Server Management Studio and connect to your database instance. 2. Click New Query, select the AdventureWorksLT database, type the ...

Microsoft Learning Experiences - GitHub
performed by writing code to manipulate data in R or Python, or by using some of the built-in modules ... https://cran.r-project.org/web/packages/dplyr/dplyr.pdf. ... You can also import custom R libraries that you have uploaded to Azure ML as R.

Microsoft Learning Experiences - GitHub
Developing SQL Databases. Lab 4 – Creating Indexes. Overview. A table named Opportunity has recently been added to the DirectMarketing schema within the database, but it has no constraints in place. In this lab, you will implement the required cons

Microsoft Learning Experiences - GitHub
create a new folder named iislogs in the root of your Azure Data Lake store. 4. Open the newly created iislogs folder. Then click Upload, and upload the 2008-01.txt file you viewed previously. Create a Job. Now that you have uploaded the source data

Microsoft Learning Experiences - GitHub
will create. The Azure ML Web service you will create is based on a dataset that you will import into. Azure ML Studio and is designed to perform an energy efficiency regression experiment. What You'll Need. To complete this lab, you will need the fo

Microsoft Learning Experiences - GitHub
Lab 2 – Using a U-SQL Catalog. Overview. In this lab, you will create an Azure Data Lake database that contains some tables and views for ongoing big data processing and reporting. What You'll Need. To complete the labs, you will need the following

Microsoft Learning Experiences - GitHub
The final Execute R/Python Script. 4. Edit the comment of the new Train Model module, and set it to Decision Forest. 5. Connect the output of the Decision Forest Regression module to the Untrained model (left) input of the new Decision Forest Train M

Microsoft Learning Experiences - GitHub
Page 1 ... A web browser and Internet connection. Create an Azure ... Now you're ready to start learning how to build data science and machine learning solutions.

Microsoft Learning Experiences - GitHub
In this lab, you will explore and visualize the data Rosie recorded. ... you will use the Data Analysis Pack in Excel to apply some statistical functions to Rosie's.

Microsoft Learning Experiences - GitHub
created previously. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. /data/storefile Stocks. 8. Wait for the MapReduce job to complete. Query the Bulk Loaded Data. 1. Enter the following command to start the HBase shell. hbase shell. 2.

Microsoft Learning Experiences - GitHub
videos and demonstrations in the module to learn more. 1. Search for the Evaluate Recommender module and drag it onto the canvas. Then connect the. Results dataset2 (right) output of the Split Data module to its Test dataset (left) input and connect

Microsoft Learning Experiences - GitHub
In this lab, you will create schemas and tables in the AdventureWorksLT database. Before starting this lab, you should view Module 1 – Designing a Normalized ...

Microsoft Learning Experiences - GitHub
Challenge 1: Add Constraints. You have been given the design for a ... add DEFAULT constraints to columns based on the requirements. Challenge 2: Test the ...

Microsoft Learning Experiences - GitHub
Data Science and Machine Learning ... A web browser and Internet connection. ... Azure ML offers a free-tier account, which you can use to complete the labs in ...

Microsoft Learning Experiences - GitHub
Processing Big Data with Hadoop in Azure. HDInsight. Lab 1 - Getting Started with HDInsight. Overview. In this lab, you will provision an HDInsight cluster.

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 2 - Getting Started with IoT Hubs. Overview. In this lab, you will create an Azure IoT Hub and use it to collect data ...

Microsoft Learning Experiences - GitHub
Real-Time Big Data Processing with Azure. Lab 1 - Getting Started with Event Hubs. Overview. In this lab, you will create an Azure Event Hub and use it to collect ...

Microsoft Learning Experiences - GitHub
Data Science Essentials. Lab 6 – Introduction to ... modules of this course; but for the purposes of this lab, the data exploration tasks have already been ... algorithm requires all numeric features to be on a similar scale. If features are not on

Microsoft Learning Experiences - GitHub
Selecting the best features is essential to the optimal performance of machine learning models. Only features that contribute to ... Page 3 .... in free space to the right of the existing modules: ... Use Range Builder (all four): Unchecked.

Microsoft Learning Experiences - GitHub
Implementing Predictive Analytics with. Spark in Azure HDInsight. Lab 3 – Evaluating Supervised Learning Models. Overview. In this lab, you will use Spark to ...

Microsoft Learning Experiences - GitHub
Microsoft Azure Machine Learning (Azure ML) is a cloud-based service from Microsoft in which you can create and run data science experiments, and publish ...

Microsoft Learning Experiences - GitHub
A Microsoft Windows, Apple Macintosh, or Linux computer ... In this case, you must either use a Visual Studio Dev Essentials Azure account, or ... NET SDK for.