mahout学习笔记1
http://blog.csdn.net/zhoubl668/article/details/13297663
http://www.cnblogs.com/dlts26/archive/2012/06/20/2555772.html
https://mahout.apache.org/users/recommender/userbased-5-minutes.html
推荐
人们的行为遵从某种模式,这种模式可以预测偏好和发现还不知到的东西
ollaborative filtering—producing recommendations based on, and only based on, knowledge of users’ relationships to items. These techniques require no knowledge of the properties of the items themselves. This is, in a way, an advantage. This recommender framework doesn’t care whether the items are books, theme parks, flowers, or even other people,because nothing about their attributes enters into any of the input.
协同过滤-----这种推荐完全基于user和item之间的关系,而不需要知道item的具体属性;那么这种方法的优势在于不必关心推荐的item是什么。继而有如下潜在的应用场景:那些复杂的难以通过程序表示的item;没有item属性的情况;发现一些新形式的item(比如买西装的人通常买了黑色的商务腰带???可能吧,举例而已)。协同过滤的缺点是:计算量大,user和item膨胀的时候,系统会越来越慢;user对item的评价稀疏的情况,计算user之间的相似度可能不准确(即稀疏性问题);如果user对item没有评价,这个item就不会被推荐(即最初评价问题)。
user-based recommendation
/** * */ package recommender.mahoutTest; import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.eval.RecommenderBuilder; import org.apache.mahout.cf.taste.eval.RecommenderEvaluator; import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.recommender.UserBasedRecommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; /** * Sep 17, 2014 * @author andy */ public class Quickstart { /** * load dataset * @return * @throws IOException */ private DataModel getDataModel() throws IOException { DataModel model = new FileDataModel(new File("src/main/resources/dataset.csv")); return model; } public void getUserBasedRecommenderResult() { DataModel model = null; try { model = getDataModel(); //compute similarity //core algorithm /** * The idea behind this approach is that when we want to compute * recommendations for a particular users, we look for other users * with a similar taste and pick the recommendations from their * items. For finding similar users, we have to compare their * interactions. There are several methods for doing this. One * popular method is to compute the correlation coefficient between * their interactions * * 为用户推荐item的思想是找到和该用户品味/偏好相同的用户,把这些用户的偏好的item推荐给该用户 * 为了找到形同品味/偏好的用户,需要计算用户品味/偏好的相互关系 */ UserSimilarity similarity = new PearsonCorrelationSimilarity(model); //a threshold, define which similar users we want to leverage for the recommender UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, similarity, model); for (long id : neighborhood.getUserNeighborhood(2)) { System.out.println(id); } //pulls all these components together,create recommender UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(2, 3); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } catch (IOException e) { e.printStackTrace(); } catch (TasteException e) { // TODO Auto-generated catch block e.printStackTrace(); } } private class MyRecommenderBuilder implements RecommenderBuilder { /* (non-Javadoc) * @see org.apache.mahout.cf.taste.eval.RecommenderBuilder#buildRecommender(org.apache.mahout.cf.taste.model.DataModel) */ public Recommender buildRecommender(DataModel dataModel) { UserSimilarity similarity = null; try { similarity = new PearsonCorrelationSimilarity(dataModel); } catch (TasteException e) { e.printStackTrace(); } UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1, similarity, dataModel); return new GenericUserBasedRecommender(dataModel, neighborhood, similarity); } } public void evaluate() { DataModel model = null; try { model = getDataModel(); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderBuilder builder = new MyRecommenderBuilder(); //low value means a better result double result = evaluator.evaluate(builder, null, model, 0.9, 1.0); System.out.println(result); } catch (IOException e) { e.printStackTrace(); } catch (TasteException e) { e.printStackTrace(); } } /** * @param args */ public static void main(String[] args) { Quickstart quickstart = new Quickstart(); quickstart.getUserBasedRecommenderResult(); quickstart.evaluate(); } }
we can summarize the role of each component now. A DataModel implementation stores and provides access to all the preference, user, and item data needed in the computation. A UserSimilarity implementation provides some notion of how similar two users are; this could be based on one of many possible metrics or calculations.A UserNeighborhood implementation defines a notion of a group of users that are most similar to a given user. Finally, a Recommender implementation pulls all these components together to recommend items to users.
User-based recommendation & Item-based recommendation difference
A basic illustration of the difference between user-based and item-based recommendation: user-based recommendation (large dashes) finds similar users, and sees what they like. Item-based recommendation (short dashes) sees what the user likes, and then finds similar items.
数据的格式
user,item,preference
who likes what, and how much
问题:
推荐给user的item符合算法,相似程度也很高,但是user并不熟悉甚至没听说过这个item,这个推荐结果好不好呢?
preference是true和false的表现形式,没有相对的偏好程度在里面,在推荐结果集里面如何取得更好的推荐item的子集呢?
核心的东西:
对数据的预处理,保证用来training的数据都是有价值的
相似算法的选择与实现