Hadoop 上使用C 语言编程【转】

转自：https://www.linuxidc.com/Linux/2012-04/58991.htm

今天尝试用C语言在Hadoop上编写统计单词的程序，具体过程如下：

一、编写map和reduce程序

mapper.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF_SIZE 2048
#define DELIM '\n'
int main(int argc, char * argv[])
{
char buffer[BUF_SIZE];
while(fgets(buffer,BUF_SIZE-1,stdin))
{
int len = strlen(buffer);
if(buffer[len-1] == DELIM) // 将换行符去掉
buffer[len-1] = 0;
char *query = NULL;
query = strtok(buffer, " ");
while(query)
{
printf("%s\t1\n",query);
query = strtok(NULL," ");
}
}
return 0;
}

reducer.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFER_SIZE 1024
#define DELIM "\t"
int main(int argc, char * argv[])
{
char str_last_key[BUFFER_SIZE];
char str_line[BUFFER_SIZE];
int count = 0;
*str_last_key = '\0';
while( fgets(str_line,BUFFER_SIZE-1,stdin) )
{
char * str_cur_key = NULL;
char * str_cur_num = NULL;
str_cur_key = strtok(str_line,DELIM);
str_cur_num = strtok(NULL,DELIM);
if(str_last_key[0] =='\0')
{
strcpy(str_last_key,str_cur_key);
}
if(strcmp(str_cur_key, str_last_key))// 前后不相等，输出
{
printf("%s\t%d\n",str_last_key,count);
count = atoi(str_cur_num);
}else{// 相等，则加当前的key的value
count += atoi(str_cur_num);
}
strcpy(str_last_key,str_cur_key);
}
printf("%s\t%d\n",str_last_key,count);
return 0;
}

二、编译

gcc mapper.c -o mapper

gcc reducer.c -o reducer

三、运行

（一）启动hadoop后将待统计单词的输入文件放到 input文件夹中：bin/hadoop fs -put 待统计文件 input

（二）使用contrib/streaming/下的jar工具调用上面的mapper\reducer:

bin/hadoop jar /home/huangkq/Desktop/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper /home/huangkq/Desktop/hadoop2/mapper -reducer /home/huangkq/Desktop/hadoop2/reducer -input input -output c_output -jobconf mapred.reduce.tasks=2

说明：hadoop-streaming-0.20.203.0.jar是一个管道工具

（三）查看结果：bin/hadoop fs -cat c_output/*

posted @ 2018-05-24 15:39 Sky&Zhang 阅读(338) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· TypeScript + Deepseek 打造卜卦网站：技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗？
· 【译】Visual Studio 中新的强大生产力特性
· 【设计模式】告别冗长if-else语句：使用策略模式优化代码结构
· AI与.NET技术实操系列（六）：基于图像分类模型对图像进行分类

历史上的今天：
2017-05-24 OpenCV实践之路——人脸检测（C++/Python) 【转】
2017-05-24 40行代码的人脸识别实践【转】
2016-05-24 Linux下的Backlight子系统（二）【转】
2016-05-24 Linux内核设计与实现读书笔记(8)-内核同步方法【转】
2016-05-24 request threaded-only IRQs with IRQF_ONESHOT【转】
2016-05-24 Linux kernel中断子系统之（五）：驱动申请中断API【转】

公告

昵称： Sky&Zhang
园龄： 10年3个月
粉丝： 494
关注： 21

+加关注

2025年3月

日

一

二

三

四

五

六

随笔分类 (4732)

文章档案 (1)

2020年8月(1)

sky

我所做的事情都是源于自己对梦想的追求--分享技术、共同创造新世界---欢迎交流：zhangbinghua2012@163.com skyzhangbinghua@gmai.com

Hadoop 上使用C 语言编程【转】

公告

搜索

积分与排名

随笔分类 (4732)

文章档案 (1)

相册 (1)

阅读排行榜

评论排行榜

推荐排行榜

最新评论