Batch Analytics


Batch Analytics

Overview

Process big data jobs in seconds with Azure Data Lake Analytics. There is no infrastructure to worry about because there are no servers, virtual machines, or clusters to wait for, manage, or tune. Instantly scale the processing power, measured in Azure Data Lake Analytics Units (AU), from one to thousands for each job. You only pay for the processing that you use per job.

U-SQL is a simple, expressive, and extensible language that allows you to write code once and have it automatically parallelized for the scale you need. Process petabytes of data for diverse workload categories such as querying, ETL, analytics, machine learning, machine translation, image processing, and sentiment analysis by leveraging existing libraries written in .NET languages, R, or Python.

In this lab Learn how to

  • use Data Lake Analytics to run big data analysis jobs that scale to massive data sets
  • how to create and manage batch, real-time, and interactive analytics jobs, and
  • how to query using the U-SQL language

Task 1: Create Azure Data Lake Analytics Service

Create Data Lakae Analytics service to mine data stored in Data Lake Store.

Click on Create a resource

Create Datalake Analytics Service_

Click on Data + Analytics

Create Datalake Analytics Service_

Click on Data Lake Analytics

Create Datalake Analytics Service_

Pick the Data Lake Store where device telemetry data is being stored from Stream Analytics job

Pick A Store_

Use existing resource group and click on Create button

Create Service_

Task 2: Create Sample Data and Install Extensions

Click on Sample scripts

Create Sample Data_

Click on sample data missing button to create sample data in Data lake Store

Create Sample Data_

You should see successful message after data is copied

Create Sample Data_

Install Extensions

Install Extensions_

Successful Extension installation

Install Extensions_

Task 3: VS Code Integration

Submit Job using VS Code, Try Samples First to Learn through Data Lake Analytics

Install VS Code Extension for Data Lake Analytics

Install Extensions_

Run Samples

Run Samples to learn Data Lake Analytics

Run Samples_

Compile Script

Compile Script_

Click on List more accounts

![Run Samples]./media/11_VSCode_Open_Sample_Script_Compile_Select_Account.png)

Select a Data Lake Analytics Account

Select Account_

Select master key

Select master key_

Compile as USQL

Compile USQL_

USQL script should be compiled

Compiled USQL_

Submit Job To Run

Submit Job_

Default priority is 1000 and number default number of nodes to run the script are 5

Submit Job Success_

Job Success with Job Analytics

Job Success_

View Input File

Veiw Input File_

View Output File

View Output File_

Task 4: Create an Analytics Job against MXChip (or Simulator)

Create an analytics job to convert JSON to CSV using U-SQL and Data Lake Analytics

Create a new mxchip_analytics.usql file in the project

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

//Extract the Json string using a default Text extractor.

@json =
    EXTRACT jsonString string FROM @"/workshop/streaming/2018/03/{*}/{*}.json" USING Extractors.Tsv(quoting:false);

//Use the JsonTuple function to get the Json Token of the string so it can be parsed later with Json .NET functions

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS rec FROM @json;

@columnized = SELECT 
            rec["deviceId"] AS deviceId,
            rec["temperature"] AS temperature,
            rec["humidity"] AS humidity,
            rec["time"] AS time
    FROM @jsonify;



//Output the file to a tool of your choice.

OUTPUT @columnized
TO @"/workshop/output/out.csv"
USING Outputters.Csv();

Register two assemblies Newtonsoft and Samples.formats. Download the dlls from /libs folder and register

U-SQL Analytics_

Select the dlls from the /libs folder

U-SQL Analytics_

Submit Job

Submit Job to convert all JSON files to CSV files

U-SQL Analytics_

View Jobs

U-SQL Analytics_