Advanced Tools, Techniques and Applications Unit No: 6

• MapReduce • Combiner • Partitioner • MapReduce Word Count Example • MongoDB and MapReduce Functions • ETL Processing • Apache Pig • Pig Features • Pig Execution Modes • Pig Running Modes • Pig UDF’s • Word Count Example using Pig

Advanced Tools, Techniques, Applications

2



Large scale data processing was difficult! ◦ ◦ ◦ ◦ ◦



Managing hundreds or thousands of processors Managing parallelization and distribution I/O Scheduling Status and monitoring Fault/crash tolerance

MapReduce provides all of these, easily! Source: http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto-0002.html

Advanced Tools, Techniques, Applications

3



What is it? ◦ Programming model used by Google ◦ A combination of the Map and Reduce models with an associated implementation ◦ Used for processing and generating large data sets

Advanced Tools, Techniques, Applications

4



How does it solve our previously mentioned problems? ◦ MapReduce is highly scalable and can be used across many computers. ◦ Many small machines can be used to process jobs that normally could not be processed by a large machine.

Advanced Tools, Techniques, Applications

5



Inputs a key/value pair

◦ Key is a reference to the input value ◦ Value is the data set on which to operate



Evaluation

◦ Function defined by user ◦ Applies to every value in value input  Might need to parse input



Produces a new list of key/value pairs ◦ Can be different type from input pair

Advanced Tools, Techniques, Applications

6

 

 

Starts with intermediate Key / Value pairs Ends with finalized Key / Value pairs Starting pairs are sorted by key Iterator supplies the values for a given key to the Reduce function.

Advanced Tools, Techniques, Applications

7



Typically a function that:

◦ Starts with a large number of key/value pairs  One key/value for each word in all files being greped (including multiple entries for the same word)

◦ Ends with very few key/value pairs

 One key/value for each unique word across all the files with the number of instances summed into this entry



Broken up so a given worker works with input of the same key. Advanced Tools, Techniques, Applications

8

Map returns information

Reduces accepts information

Reduce applies a user defined function to reduce the amount of data

Advanced Tools, Techniques, Applications

9



Yahoo!

◦ Webmap application uses Hadoop to create a database of information on all known webpages



Facebook



Rackspace

◦ Hive data center uses Hadoop to provide business statistics to application developers and advertisers ◦ Analyzes sever log files and usage data using Hadoop

Advanced Tools, Techniques, Applications

11



Creates an abstraction for dealing with complex overhead

◦ The computations are simple, the overhead is messy



Removing the overhead makes programs much smaller and thus easier to use

◦ Less testing is required as well. The MapReduce libraries can be assumed to work properly, so only user code needs to be tested



Division of labor also handled by the MapReduce libraries, so programmers only need to focus on the actual computation Advanced Tools, Techniques, Applications

12

import java.io.IOException; import java.util.*; import import import import import

org.apache.hadoop.fs.Path; org.apache.hadoop.conf.*; org.apache.hadoop.io.*; org.apache.hadoop.mapred.*; org.apache.hadoop.util.*;

public class WordCount { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text();

Advanced Tools, Techniques, Applications

13

public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }

Advanced Tools, Techniques, Applications

14

public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }

Advanced Tools, Techniques, Applications

15

public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1]));

}

JobClient.runJob(conf);

Advanced Tools, Techniques, Applications

16



MapReduce is composed of several components, including:

◦ JobTracker -- the master node that manages all jobs and resources in a cluster ◦ TaskTrackers -- agents deployed to each machine in the cluster to run the map and reduce tasks ◦ JobHistoryServer -- a component that tracks completed jobs, and is typically deployed as a separate function or with JobTracker

Advanced Tools, Techniques, Applications

17

  







MapReduce operates in parallel across massive cluster sizes. MapReduce is available in several languages, including C, C++, Java, Ruby, Perl and Python. Programmers can use MapReduce libraries to create tasks without dealing with communication or coordination between nodes. MapReduce is also fault-tolerant, with each node periodically reporting its status to a master node. If a node doesn't respond as expected, the master node reassigns that piece of the job to other available nodes in the cluster. This creates resiliency and makes it practical for MapReduce to run on inexpensive commodity servers.

Advanced Tools, Techniques, Applications

18

Advanced Tools, Techniques, Applications

19

Advanced Tools, Techniques, Applications

20

Advanced Tools, Techniques, Applications

21

Advanced Tools, Techniques, Applications

22

Advanced Tools, Techniques, Applications

23

Advanced Tools, Techniques, Applications

24

Advanced Tools, Techniques, Applications

25

Advanced Tools, Techniques, Applications

26

Advanced Tools, Techniques, Applications

27

Advanced Tools, Techniques, Applications

28

Advanced Tools, Techniques, Applications

29

Advanced Tools, Techniques, Applications

30

Advanced Tools, Techniques, Applications

31

Advanced Tools, Techniques, Applications

32

Advanced Tools, Techniques, Applications

33

Advanced Tools, Techniques, Applications

34

Advanced Tools, Techniques, Applications

35

Advanced Tools, Techniques, Applications

36

Specifying ranges in FOREACH Operator

Advanced Tools, Techniques, Applications

37

Advanced Tools, Techniques, Applications

38

Advanced Tools, Techniques, Applications

39

Advanced Tools, Techniques, Applications

40

Advanced Tools, Techniques, Applications

41

Advanced Tools, Techniques, Applications

42

Advanced Tools, Techniques, Applications

43

Advanced Tools, Techniques, Applications

44

Advanced Tools, Techniques, Applications

45

The NESTED FOREACH

Advanced Tools, Techniques, Applications

46

Advanced Tools, Techniques, Applications

47

Advanced Tools, Techniques, Applications

48

Advanced Tools, Techniques, Applications

49

Advanced Tools, Techniques, Applications

50

Splitting Data Sets

Advanced Tools, Techniques, Applications

51

Advanced Tools, Techniques, Applications

52

Advanced Tools, Techniques, Applications

53

UDF’s

Advanced Tools, Techniques, Applications

54

Advanced Tools, Techniques, Applications

55

Advanced Tools, Techniques, Applications

56

Advanced Tools, Techniques, Applications

57

Advanced Tools, Techniques, Applications

58

Advanced Tools, Techniques, Applications

59

Advanced Tools, Techniques, Applications

60

1.

How Hadoop helps to process big data? Explain with

suitable case study. 2.

Why MapReduce is important as far as big data is concerned? Explain with some analogy.

3.

Write

short

note

on



mapper,

reducer,

combiner,

partitioner. 4.

Write and explain MapReduce program (Java/Python) to

count word occurrences in a text file. 5.

How MongoDb can be helpful in MapReduce paradigm?

Advanced Tools, Techniques, Applications

61

6.

What is Pig? Where it can be found in Hadoop

architecture? 7.

Explain execution modes of Pig.

8.

Write and explain any 10 HDFS commands.

9.

Write short note on UDF.

10.

Write and explain wordcount example using Pig.

Advanced Tools, Techniques, Applications

62

Thank You http://www.pavanjaiswal.com

Advanced Tools, Techniques, Applications

63

Unit 6 Advanced Tools Techniques.pdf

Managing hundreds or thousands of processors. ◦ Managing parallelization and distribution. ◦ I/O Scheduling. ◦ Status and monitoring. ◦ Fault/crash tolerance. MapReduce provides all of these, easily! Source: http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto-0002.html. 3. Advanced Tools, Techniques,.

6MB Sizes 0 Downloads 187 Views

Recommend Documents

Unit 6 Advanced Tools and Technologies.pdf
Page 2 of 70. Contents. • Multiprocessor scheduling. • Real time scheduling. • Linux scheduling. • UNIX free BSD scheduling. • Windows vista scheduling.

Unit 5 Advanced Tools & Technologies.pdf
Page 2 of 35. Make tools. ◦ make, nmake, cmake. AWK tool. Grep, egrep, fgrep. Sorting tools. UEFI boot. Case study of Fedora 19 EFI files.

UNIT 6 APRT - eGyanKosh
conducting refresher courses on fire fighting rescue services. During Ninth ... kliders and winches, and (13) type certification ofaircraft DGCA also coordinates all.

UNIT 6 | Celebrations - encarnara
ljlt'litllfln'l have bt't'l'tcould I be] better - they played great music, and everyone danced until 3.00! By the ... There may is a solution to this problem. -T“."L".-) qu.

Math 6+ Unit 6 Overview.pdf
Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps.

Unit 6 Review.pdf
A sledgehammer is used to drive a wedge into a log to split it. When the wedge is driven 0.2 m into. the log, the log is separated by a distance of 5 cm. A force of 19000 N is needed to split the log and the. sledgehammer exerts a force of 9800 N. a.

Unit 6 Grammar Past tense
verbs. Be – Past Tense. Be - Past Tense Negative. Subject + Verb. Subject + Verb + not. Singular. Plural. Singular. Plural. I was. We were. I was not. We were not.

Unit 6 Grammar Past tense
not in school last week because their family visited Australia. 4. The girls' mother ______ furious because they were playing rowdily. 5. My teachers. very satisfied with my results. 6. Florence and her brother ... The basic form of a verb changes to

UNIT 6 APRT
standards of airworthiness and grant of certificates of air worthiness to civil aircrafts registered in India ...... 7.7.1 The Mechanics of Registration ! I. 7.7.2 Alteration ...

Unit 6 Embedded Android.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

SC-UNIT-6.pdf
'le"1ufred, colnhan "lF&e BanJ 'TNr. amPtf$eg c}re uged Sc)ch atrotnl/tn?^f .... E^. ob- hv br,Zs a-'F. Page 3 of 12. Main menu. Displaying SC-UNIT-6.pdf. Page 1 ...

Math 6 Unit 4 Overview.pdf
Finding the Least Common Multiple. Finding the Greatest Common Factor. Multiplication Facts (0-12). This unit builds to the following future skills and. concepts: Solving Formulas. Distributive Property. Converting Fractions, Decimals, and Percent. A

IADIS Conference Template - Research Unit 6
consuming applications), the sensitiveness to packet delays (latency and jitter) .... represents the multimedia server, the proxy which is located at the edge of the .... Wireless Network Measurement: This module is responsible of monitoring the ...

Math 6+ Unit 13 Overview.pdf
the context in which the data was gathered. d. Relating the choice of measures of center and variability to. the shape of the data distribution and the context in ...

Unit 6 16th century.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Unit 6 16th ...

CfE Advanced Higher Physics Unit 3: Electromagnetism
or transmitted in any form or by any means, without written permission from the publisher. Heriot-Watt University accepts no responsibility or liability whatsoever with regard to the information contained in this study guide. Distributed by the SCHOL

CfE Advanced Higher Physics Unit 3: Electromagnetism
Julie Boyle (St Columba's School). Reviewed by: ... Energy transformation associated with movement of charge . .... The unification of electricity and magnetism .

Unit 6 Embedded Android.pdf
Page 2 of 60. Contents. Porting Linux. Linux and real time. Kernel preemption. Creating real time processes. Embedded Android bootloader.

Math 6 Unit 11 Overview.pdf
If you have feedback or suggestions on improvement, please feel free to contact [email protected]. Page 2 of 2. Math 6 Unit 11 Overview.pdf. Math 6 Unit ...

EM4 Unit 6 Study Guide.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. EM4 Unit 6 Study Guide.pdf. EM4 Unit 6 Study Guide.pdf. Open.

Unit 6 Chemistry Vocabulary.pdf
Proton. Reactant. Reactivity. Subscript. Valence electron. Page 1 of 1. Unit 6 Chemistry Vocabulary.pdf. Unit 6 Chemistry Vocabulary.pdf. Open. Extract.

IADIS Conference Template - Research Unit 6
Research Academic Computer Technology Institute and Computer ... Cross layer adaptation, Multimedia transmission, Wireless and Mobile Networking. 1.

UNIT 6 TE S.OF PA
various aspects of export import documentation, the electronic data interchange system and ... In this method, the payment is made eitlier at tlie time of acceptance of the ..... There was an absence of signatures of witnesses, when required, ...

Math 6+ Unit 3 Overview.pdf
Page 1 of 1. Math 6+ Unit 3 Overview.pdf. Math 6+ Unit 3 Overview.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Math 6+ Unit 3 Overview.pdf.