多线程_分析词频-我们到底能走多远系列（5）

我们到底能走多远系列（5）

一日不扯淡，便觉空虚迷茫：

　　西瓜不是切出来的，房子不是画出来的，人生却是磨出来的。

　　前天，坐旁边的8年C++高手辞职走人了。磨了8年的男人，有些谢顶，有些憔悴。

　　我想就算每天学到的只有一点点，8年也能悟出不少东西来吧。我想他虽不是及其努力的那种，但凭着对编程技术或者知识的爱好，不自然的会去学到很多东西。这几个月接触下来他，发现他几个工作上的几个点：

　　1，编码的时候，基本是不用google的。（google都是在编码开始前搞的大概）

　　2，出错，异常，问题的时候，直接看底层实现。因为有时候他会抱怨怎么没有提供底层代码包。

　　3，很关注和自己作用的技术有关的新闻，然后和我扯淡...

　　4，功能实现的时候，基本在考虑怎样的设计模式会更好。（一位5年经验的哥哥评论他的代码是说：这不是java嘛...）

难道我们8年后也是这样编程吗？没有优势，也就只能用时间去填。

说实话，我感觉园子里搞java的很少啊，至少要比.net的少多了，每天有时间看看你们写的blog，没找到几篇是关于java的。

写几句代码，只是为了保持编程的敏感度。

最近，一直在学英语，就像能不能写个程序把一篇文章的或者很多文章的单词都集中起来，然后分析出单词的出现频率。

写了个简单的程序后，然后想，是不是顺便学习下java的多线程。然后就有了下面很挫的代码。

程序的结构：

实现：

package code.words;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class WordsAnalysis {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        
        File f = new File("d:\\我的文档\\test");
        File[] fs = f.listFiles();
        // 分成两半
        List<File> files1 = new ArrayList<File>();
        for (int i = 0; i < fs.length/2; i++) {
            files1.add(fs[i]);
            
        }
        List<File> files2 = new ArrayList<File>();
        for (int i = fs.length/2; i < fs.length; i++) {
            files2.add(fs[i]);
            
        }
        // 工作线程总数
        int threadCount = 0;
        // 共享数据
        AllCountModel acm = new AllCountModel();
        acm.setThreadCount(++threadCount);
        ThreadTest tt1 = new ThreadTest(files1, acm);
        // 1号线程
        tt1.start();
        acm.setThreadCount(++threadCount);
        ThreadTest tt2 = new ThreadTest(files2, acm);
        // 2号线程
        tt2.start();
        MonitorThread mt = new MonitorThread(acm);
        // 监视线程
        mt.start();
        
    }
    
    /**
     * 
     * @param file
     * @param wordsMap
     * @return
     * @throws IOException
     */
    public Map<String, Integer> countWords(File file, Map<String, Integer> wordsMap) throws IOException{
        
        // 读流
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        // 一行字符串
        String str;
        // 读取每一行
        while((str = reader.readLine()) != null ){
            str = str.trim();
            // 跳过空行
            if(str.equals("") || str == null){
                continue;
            }
            // 按空格分离成单词
            String[] strs = str.split(" ");
            for (int i = 0; i < strs.length; i++) {
                String word = strs[i].trim();
                // 重现的单词
                if(wordsMap.containsKey(word)){
                    // 计数
                    wordsMap.put(word, (wordsMap.get(word) + 1));
                }else{
                    // 第一次出现的新单词
                    wordsMap.put(word, 1);
                }
            }
        }
        // 关闭流
        reader.close();
        return wordsMap;
    }
    
    /**
     * 打印结果
     * @param AllCountModel 共享的结果集
     */
    public static void show(AllCountModel acm){
        System.out.println(acm.getThreadCount());
        for (List<File> lists : acm.getLastMap().keySet()) {
            System.out.println(lists);
            for (String str : acm.getLastMap().get(lists).keySet()) {
                System.out.println(str + " : " + acm.getLastMap().get(lists).get(str));
            }
            System.out.println("------------------------------------------");
        }
        
    }
}

1，共享数据类

View Code

package code.words;

import java.io.File;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class AllCountModel {

    // 在运行的线程总数
    private int threadCount;

    // 每个线程处理的文件对应的结果集
    private HashMap<List<File>, Map<String, Integer>> lastMap = new HashMap<List<File>, Map<String, Integer>>();
    
    public int getThreadCount() {
        return threadCount;
    }
    public void setThreadCount(int threadCount) {
        this.threadCount = threadCount;
    }
    public HashMap<List<File>, Map<String, Integer>> getLastMap() {
        return lastMap;
    }
    public void setLastMap(HashMap<List<File>, Map<String, Integer>> lastMap) {
        this.lastMap = lastMap;
    }
}

2，工作线程

View Code

package code.words;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.HashMap;
import java.util.Map;

public class ThreadTest extends Thread{
    
    private List<File> files = new ArrayList<File>();
    private Map<String, Integer> wordsMap = new HashMap<String, Integer>();
    private AllCountModel allCountModel;
    
    // 每一个线程都传入不一样的files，所以不用担心这个对象的同步冲突
    public ThreadTest(List<File> files, AllCountModel allCountModel){
        this.files = files;
        this.allCountModel = allCountModel;
    }
    
    public void run() {
        WordsAnalysis wa = new WordsAnalysis();
        // 解析传入的全部文件
        for (File file : files) {
            try {
                // 解析文件内容
                wordsMap = wa.countWords(file, wordsMap);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        // 锁住共享数据（必须这么做，否则共享的数据会紊乱）
        synchronized (allCountModel) {
            // 更新线程总数
            allCountModel.setThreadCount(allCountModel.getThreadCount() - 1);
            // 更新结果集
            allCountModel.getLastMap().put(files, wordsMap);
        }
    }
}

3，检测线程

View Code

package code.words;

public class MonitorThread extends Thread{

    // 共享数据
    private AllCountModel acm;
    
    public MonitorThread(AllCountModel acm){
        this.acm = acm;
    }
    
    public void run() {
        while(true){
            try {
                // 隔段时间检查一次
                sleep(500); 
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            // 线程全部工作完毕
            if(0 >= acm.getThreadCount()){
                // 打印出结果
                WordsAnalysis.show(acm);
                return;
            }
        }
    }
}

4，主程序

package code.words;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class WordsAnalysis {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        
        File f = new File("d:\\我的文档\\test");
        File[] fs = f.listFiles();
        // 分成两半
        List<File> files1 = new ArrayList<File>();
        for (int i = 0; i < fs.length/2; i++) {
            files1.add(fs[i]);
            
        }
        List<File> files2 = new ArrayList<File>();
        for (int i = fs.length/2; i < fs.length; i++) {
            files2.add(fs[i]);
            
        }
        // 工作线程总数
        int threadCount = 0;
        // 共享数据
        AllCountModel acm = new AllCountModel();
        acm.setThreadCount(++threadCount);
        ThreadTest tt1 = new ThreadTest(files1, acm);
        // 1号线程
        tt1.start();
        acm.setThreadCount(++threadCount);
        ThreadTest tt2 = new ThreadTest(files2, acm);
        // 2号线程
        tt2.start();
        MonitorThread mt = new MonitorThread(acm);
        // 监视线程
        mt.start();
        
    }
    
    /**
     * 
     * @param file
     * @param wordsMap
     * @return
     * @throws IOException
     */
    public Map<String, Integer> countWords(File file, Map<String, Integer> wordsMap) throws IOException{
        
        // 读流
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        // 一行字符串
        String str;
        // 读取每一行
        while((str = reader.readLine()) != null ){
            str = str.trim();
            // 跳过空行
            if(str.equals("") || str == null){
                continue;
            }
            // 按空格分离成单词
            String[] strs = str.split(" ");
            for (int i = 0; i < strs.length; i++) {
                String word = strs[i].trim();
                // 重现的单词
                if(wordsMap.containsKey(word)){
                    // 计数
                    wordsMap.put(word, (wordsMap.get(word) + 1));
                }else{
                    // 第一次出现的新单词
                    wordsMap.put(word, 1);
                }
            }
        }
        // 关闭流
        reader.close();
        return wordsMap;
    }
    
    /**
     * 打印结果
     * @param AllCountModel 共享的结果集
     */
    public static void show(AllCountModel acm){
        System.out.println(acm.getThreadCount());
        for (List<File> lists : acm.getLastMap().keySet()) {
            System.out.println(lists);
            for (String str : acm.getLastMap().get(lists).keySet()) {
                System.out.println(str + " : " + acm.getLastMap().get(lists).get(str));
            }
            System.out.println("------------------------------------------");
        }
        
    }
}

改进：还有很多需要改进的地方：比如分析单词时，需要更加精准，没有过滤掉非单词内容；线程上的设计完全是凭空想出来的，一定还有更好的线程结构可以代替。

------------------------------------------------

多线程带来的数据共享问题和synchronized关键字的作用：

为什么多线程会引起数据冲突呢：

package code.mytest;

public class Test1 extends Thread{

    private String[] strs;
    
    public Test1(String[] strs){
        this.strs = strs;
    }
    
    @Override
    public void run() {
        
        while(true){
            strs[0] = strs[0] + "A";
            System.out.println(strs[0]);
            try {
                sleep(500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        
    }

    public static void main(String[] args) {
        String[] strs = new String[]{"A"};
        Test1 t1 = new Test1(strs);
        Test1 t2 = new Test1(strs);
        Test1 t3 = new Test1(strs);
        t1.start();
        t2.start();
        t3.start();
    }
}

上面代码执行的结果：只管的可以理解到为什么要注意共享数据这个事情了

AA
AAAA
AAA
AAAAA
AAAAAA
AAAAAAA
AAAAAAAA
AAAAAAAAA
AAAAAAAAAA
AAAAAAAAAAA
AAAAAAAAAAAA
AAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA

那么用synchronized关键字后的效果：

    public void run() {
        
        while(true){
            // 保证在处理strs时，其他线程不能动这个数据，从而避免了数据冲突
            synchronized (strs) {
                strs[0] = strs[0] + "A";
            }
            System.out.println(strs[0]);
            try {
                sleep(500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        
    }

执行的结果：

AA
AAA
AAAA
AAAAA
AAAAAA
AAAAAAA
AAAAAAAA
AAAAAAAAA
AAAAAAAAAA
AAAAAAAAAAA
AAAAAAAAAAAA
AAAAAAAAAAAAA

希望能带给你理解上的帮助。

------------------------------------------------------------

送给各位一句话：来自《天国王朝》

be without fear in the face of enemies
be brave and upright that god may love thee
speak the truth even if leads to your death

----------------------------------------------------------------------

努力不一定成功，但不努力肯定不会成功。
共勉。

posted on 2012-09-02 22:27 每当变幻时阅读(2282) 评论(5) 编辑收藏举报

刷新页面返回顶部

干掉自己

多线程_分析词频-我们到底能走多远系列（5）

我们到底能走多远系列（5）

公告

导航