hadoop1-商品推荐之商品关联性最简易建模1

1、如题，这是自己通过在QQ交流上得到的一个想法。

2、

数据文件 1.txt：

001={001,002,004,006,008}　　003={003,002,001,009,004}

002={002,003,005,006,008,009,007}　　004={004,005,006,009,008,007}

005={005,003,007,008,001,002}　　006={006,001,004,009,005,008}

说明：

　　1、每一个数字代表一个商品Id

　　2、每一行中有两个商品，每个商品ID后={}内部代表的是从001这个商品直接跳转到{里面商品}的记录。商品自身属于一个跳转

如： 001——》002

001——》004

…..

得到对应的记录为：001={001,002,004,006,008}

3、每一行两个商品对应的{}相同的商品ID为表示一个关联

如：001={001,002,004,006,008} 003={003,002,001,009,004}

对应关联的有：001,002,004亦即3个关联

3、要求reduce输出为001:003=3

　　即：商品001和商品003的关联性为3

设计目的：

通过两两商品的关联性对比，得到商品的最优推荐（比较简单的啊）。

思路分析：

　　1、 map端：得到商品A和商品B的ID组合为key,

同时分割行得到关联商品交集的数量作为value输出

2、reducer端原样输出，不做处理（可以使用默认的reducer）

设计代码：

package product;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class SimpleRelation {

	public static class Map extends Mapper<LongWritable, Text, Text, IntWritable>{
		private static Text k = new Text();
		private static IntWritable v = new IntWritable(0);
		
		protected void map(LongWritable key, Text value, Context context) 
				throws java.io.IOException ,InterruptedException {
			// line demo :"001={001,002,004,006,008}\t003={003,002,001,009,004}"
			String line = value.toString();
			//分割为两个商品信息
			String[] splits = line.split("\t");
			if(splits.length != 2)
				return;
			//对每个商品信息进行分割
			String[] proc1 = splits[0].split("=");
			String[] proc2 = splits[1].split("=");
			
			k.set(proc1[0]+":"+proc1[0]);
			v.set(getSameNum(proc1[1],proc2[1]));
			
			context.write(k, v);
		};
		//取得交集的数量，此部分或可以优化
		private int getSameNum(String str1, String str2) {
			//str1 = "{001,002,004,006,008}" str2 = "{003,002,001,009,004}"
			//取交集即可。
			//取得对应的list集合,Arrays.asList返回的是固定大小的list，仅能查，不能修改，所以上面采用手工赋值的方式
			List<String> proc1 = new ArrayList<String>();
			String[] temp = str1.substring(1, str1.length()-1).split(",");
			for (String s : temp) {
				proc1.add(s);
			}
			List<String> proc2 = Arrays.asList(str2.substring(1, str2.length()-1).split(","));
			//该方法从列表中移除未包含在指定 proc2 中的所有元素。 
			proc1.retainAll(proc2);
			return proc1.size();
		}
	}
	// 仅使用map即可得到解决，reducer采用系统默认的即可
	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage:SimpleRelation");
			System.exit(2);
		}
		Job job = new Job(conf,"SimpleRelation");
		job.setJarByClass(SimpleRelation.class);
		
		job.setMapperClass(Map.class);
		
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

程序输出：

[root@hadoop ~]# hadoop dfs -cat /output/*

001:001 3

002:002 5

005:005 3

当然程序比较简单，毕竟是自己设计的，不过往下会一步步应用我们所学的。

posted @ 2014-06-24 08:24 jseven 阅读(368) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

hadoop1-商品推荐之商品关联性最简易建模1

公告