Hadoop MapReduce2 Optimization in Heterogeneous Cluster -


i have configuration:

  • hadoop: v2.7.1 (yarn)
  • an input file: size = 100 gb.
  • 3 slaves: each has 4 vcores speed = 2 ghz , ram = 8 gb
  • 5 slaves: each has 2 vcores speed = 1 ghz , ram = 2 gb
  • mapreduce program: wordcount

how can minimize wordcount execution time assigning small input splits 5 slower slaves , big input splits 3 fastest slaves?

for each machine can determine number of map/reduce slots, if want send less workload slower machines can define, example 2 map/reduce task slots each slower machine , 4 map/reduce task slot each of fast machines. way can control how work load each different node in cluster receives.


Comments