【Java/Csv/Regex】用正则表达式去劈分带引号的csv文件行,得到想要的行数据

csv文件是用引号分隔的文本行,为了完善内容人们又用引号把每个区块的内容又包了起来,于是形成下面的文件:

"1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","傅宗龙","18","19","20"
"1","2","3","4","5.55","6","7","8","9","10","朱由检","12","13","14","15","16,666,666","17","袁崇焕","19","20"
"醉里挑灯看剑,梦回吹角连营","2","3","4","孙传庭","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"
",,,,,,,,,","2","3","4","熊廷弼","6","7","8","9","10","11","12","卢象升","14","15","16","17","18","19","20"

要解析这样的文件也算简单,只用在劈分时加入一些细节就好,代码如下:

复制代码
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.util.ArrayList;
import java.util.List;

/**
 * 解析一个csv文件,将其内容转化为一个嵌套链表
 * @author 逆火
 *
 * 2019年11月23日 上午8:51:15
 */
public class CsvfileParser {
    private List<List<String>> contents;
    
    public CsvfileParser(String filename) throws IOException {
        contents=new ArrayList<List<String>>();
        LineNumberReader fileReader = new LineNumberReader(new FileReader(filename));
        String line = null;

        while ((line = fileReader.readLine()) != null) {
            System.out.println("Line " + fileReader.getLineNumber() +": " + line);
            contents.add(getArrayFromLine(line));
        }
        
        fileReader.close();
        
        
    }
    
    private List<String> getArrayFromLine(String line) {
        List<String> retval=new ArrayList<String>();
        
        // (^\\s*\")匹配每行开头的",这会产生数组第一项为零长度字符串,所以下面遍历时选择跳过
        // (\"\\s*,\\s*\")匹配中间的","
        // (\"\\s*$)匹配每行结尾的"
        String[] arr=line.split("(^\\s*\")|(\"\\s*,\\s*\")|(\"\\s*$)");
        
        for(int i=1;i<arr.length;i++) {// Jump first empty string
            retval.add(arr[i]);
        }
        
        return retval;
    }
    
    public void printContents() {
        for(List<String> ls:contents) {
            System.out.println(String.join("|", ls));
        }
    }
    
    public static void main(String[] args) throws IOException {
        CsvfileParser cp=new CsvfileParser("C:\\Users\\horn1\\Desktop\\sample.csv");
        System.out.println("---------------------------");
        cp.printContents();
    }
}
复制代码

输出如下:

复制代码
Line 1: "1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","傅宗龙","18","19","20"
Line 2: "1","2","3","4","5.55","6","7","8","9","10","朱由检","12","13","14","15","16,666,666","17","袁崇焕","19","20"
Line 3: "醉里挑灯看剑,梦回吹角连营","2","3","4","孙传庭","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"
Line 4: ",,,,,,,,,","2","3","4","熊廷弼","6","7","8","9","10","11","12","卢象升","14","15","16","17","18","19","20"
---------------------------
1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|傅宗龙|18|19|20
1|2|3|4|5.55|6|7|8|9|10|朱由检|12|13|14|15|16,666,666|17|袁崇焕|19|20
醉里挑灯看剑,梦回吹角连营|2|3|4|孙传庭|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20
,,,,,,,,,|2|3|4|熊廷弼|6|7|8|9|10|11|12|卢象升|14|15|16|17|18|19|20
复制代码

 

--END-- 2019年11月23日09:14:45

posted @   逆火狂飙  阅读(685)  评论(0编辑  收藏  举报
编辑推荐:
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 【杭电多校比赛记录】2025“钉耙编程”中国大学生算法设计春季联赛(1)
历史上的今天:
2014-11-23 JS里取前天,昨天和今天
2014-11-23 【高中数学/对数函数】设f(x)为定义在(0,+∞)上的连续函数,且对任意x都有f(2^x)+f(3^x)=x,求f(x)的解析式?
生当作人杰 死亦为鬼雄 至今思项羽 不肯过江东
点击右上角即可分享
微信分享提示