[置顶]【原创】『.net专题』数据清洗归类
需求提出:
数据样本:
ATLU000824
ATLU000823
ATLU000822
BSIU938615
BSIU938614
BSIU938612
BSIU938611
有这样一组字符,长度固定,前面4为字母,后面6位数字,然后需要分类
像他们前4位是一样的,但是后面的数字不是连续的
需要分开
嗯,按字母分类然后同样字母的话,看后面的数字若是连续的,就是一类,不是连续的,即使4位字母一样,也不是一类
期望结果:
实现语言:c# .net
实现方法:
using System.Text.RegularExpressions; using System.Collections.Generic; private void button1_Click(object sender, EventArgs e) { textBox2.Clear(); List<List<string>> category = new System.Collections.Generic.List<System.Collections.Generic.List<string>>(); string pattern = @"^([A-Z]{4})(\d{6})$"; string last4Prefix = string.Empty; int last6Number = 0; for (int i = 0; i < textBox1.Lines.Count(); i++) { string code = textBox1.Lines[i].Trim(); if (!Regex.IsMatch(code, pattern)) { MessageBox.Show("序列号格式不正确!"); return; } MatchCollection collection = Regex.Matches(code, pattern); List<string> codelist = null; int number = int.Parse(collection[0].Groups[2].Value); if ((last4Prefix != collection[0].Groups[1].Value) || ((number != last6Number + 1) && (number != last6Number - 1))) { codelist = new System.Collections.Generic.List<string>(); category.Add(codelist); } else { codelist = category[category.Count - 1]; } codelist.Add(code); last4Prefix = collection[0].Groups[1].Value; last6Number = number; } foreach (List<string> codelist in category) { foreach (string code2 in codelist) { textBox2.AppendText(code2 + "\r\n"); } textBox2.AppendText("--------------------\r\n"); } } }
实现的 DEMO 代码下载:
自解压压缩包:数据清洗.exe