01 2014 档案
摘要:#!/bin/bash## 文件目录######################## Local Contens ############################### 主目录root_dir= hadoop@bigdata03:/data/beiyou/minelab/#子目录 $root_dir/Src/liming/ /yinhang/ /shaoxianlei/# 子目录 $root_dir/source_data Commmon/search_keywords.data /dat...
阅读全文
摘要:根据昵称爬取id的数据预处理以及各式转换#!/bin/bashroot_dir=`pwd`out_all_file="$root_dir"/result_data/user.allout_map="$root_dir"/result_data/name_id.maprm -rf $out_all_filerm -rf $out_map#######put the user.out in the dictory $root_dir/source_data/###########processing the jar######################
阅读全文
摘要:1,java分词package com.bobo.util;import ICTCLAS.I3S.AC.ICTCLAS50;public class Cutwords { public static String Segment(String microblog) { String textSeg = ""; try { ICTCLAS50 testICTCLAS50 = new ICTCLAS50(); String argu = "."; testICTCLAS50.ICTCLAS_Init...
阅读全文
摘要:一、解析用户原始信息的json文件#!/usr/bin/python# -*- coding=utf-8 -*-import osimport sysimport jsondef main(): root_dir = sys.argv[1] province_file = root_dir +"/conf/province.list" fin = open(province_file, 'r') provinces = set() for line in fin: province = line.strip() province...
阅读全文