MapReduce中自定义Combiner

username2

浏览: 722169 次
性别:
来自: 黑龙江

最近访客更多访客>>

dsh_oliver

杭州007

loginboot

xmmdream

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop学习笔记

以下作为自己的学习记录。

1 MapReduce中数据的整个处理流程。

Map输出数据->key排序并且计算partintion->Map本地所有数据数据Combiner->

shuffle中的自定义排序->自定义分组->reduce中数据汇总

例子：

一、自定义Combiner使用

1 自定义Combiner

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MyCombiner extends Reducer<Text, LongWritable, Text, LongWritable> {
	
	protected void reduce( Text key, Iterable<LongWritable> values, Context context) throws java.io.IOException, InterruptedException {
		
		// 显示次数表示规约函数被调用了多少次，表示k2有多少个分组
		System.out.println("Combiner输入分组<" + key.toString() + ",N(N>=1)>");
		long count = 0L;
		for (LongWritable value : values) {
			count += value.get();
			// 显示次数表示输入的k2,v2的键值对数量
			System.out.println("Combiner输入键值对<" + key.toString() + "," + value.get() + ">"+this);
		}
		context.write(key, new LongWritable(count));
		// 显示次数表示输出的k2,v2的键值对数量
		System.out.println("Combiner输出键值对<" + key.toString() + "," + count + ">");
	};
}

2 主类的使用

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MyCombiner extends Reducer<Text, LongWritable, Text, LongWritable> {
	
	protected void reduce( Text key, Iterable<LongWritable> values, Context context) throws java.io.IOException, InterruptedException {
		
		// 显示次数表示规约函数被调用了多少次，表示k2有多少个分组
		System.out.println("Combiner输入分组<" + key.toString() + ",N(N>=1)>");
		long count = 0L;
		for (LongWritable value : values) {
			count += value.get();
			// 显示次数表示输入的k2,v2的键值对数量
			System.out.println("Combiner输入键值对<" + key.toString() + "," + value.get() + ">"+this);
		}
		context.write(key, new LongWritable(count));
		// 显示次数表示输出的k2,v2的键值对数量
		System.out.println("Combiner输出键值对<" + key.toString() + "," + count + ">");
	};
}

3 自定义Partition

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

 

public class MyPartintioner extends Partitioner<Text, LongWritable>{
 
	/**
	 * key map输出的key
	 * value map 输出的value 
	 */
	@Override
	public int getPartition(Text key, LongWritable value, int numPartitions) {
		
//		   System.out.println("--------enter DefinedPartition flag--------"); 
	        /** 
	        * 注意：这里采用默认的hash分区实现方法 
	        * 根据组合键的第一个值作为分区 
	        * 这里需要说明一下，如果不自定义分区的话，mapreduce框架会根据默认的hash分区方法， 
	        * 将整个组合将相等的分到一个分区中，这样的话显然不是我们要的效果 
	        */
//		   System.out.println(key+ "--------out DefinedPartition flag--------"+ value );
			System.out.println("Partitioner  key:"+key+"  value:"+value+"  "+ ( ( key.hashCode()&Integer.MAX_VALUE)%numPartitions ) +"   "+this);
	       return ( key.hashCode()&Integer.MAX_VALUE)%numPartitions; 
	}
	
}

分享到：

MapReduce2中自定义排序分组 | 2.x MapReduce的测试类

2016-01-28 19:07
浏览 808
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce中自定义Combiner

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

MapReduce中自定义Combiner

评论

发表评论

相关推荐

strom使用示例

Hadoop2.x动态添加或删除datanode

MapReduce2中自定义排序分组

2.x MapReduce的测试类

Sqoop

kafka使用与安装

storm 的安装使用

Hbase 的Java API 操作

Hbase 的java API 操作

Hbase集群安装

HIVE的安装与使用

HA 下执行JAVA操作hdfs

hadoop 2.x集群安装与配置

zookeeper安装

hadoop 2.x wordcount练习

Hadoop 2.x单节点部署学习。

SequenceFile和MapFile使用

重新编译Hadoop

Hadoop 中数据的序列化与反序列化

Hadoop基于文件的数据结构

最近访客更多访客>>