浇铸

博客园 首页 新随笔 联系 订阅 管理

这个是我做的使用正则表达式来切割字符串的一个案例部分代码:

功能简介:一条物流信息字符串中截取出多种数据详情

需要切割出的数据包括:

起始地,目的地,货物数量,货物单位,货物类型,车数量,车长度,车类型,价格,价格单位

需要切割字符串类型举例(多种,包括有车求货,有货求车等):

 

1、湖南衡阳->湖南湘潭、湖南长沙,有6米半封闭车,求1-8吨货

2、湖南长沙->河南南阳,有9.6米平板车,求货

3、湖南长沙->青海、陕西西安、陕西汉中、陕西安康,有2台6.8米高栏车,求12-25吨货

4、湖南湘潭->湖南岳阳临湘,有27-120吨重货,求1-3台9.6-17.5米车,今天定车 明天装货

5、湖南岳阳临湘->山东菏泽,有15-28吨棉花,求9.6-17米车,230元/吨,马上可以装货

6、湖南长沙->湖南益阳安化县,有1吨货物,求2-3米车,急运

7、湖南湘潭->江西赣州,有35-40吨货物,求半挂车,急运

...

 

当然,这些仅仅只是举例

下面是方法详细代码:

using System.Text;

using System.Text.Regularexpressions;

#region 使用正则表达式切割字符串
        /// <summary>
        /// 切割字符串分别赋值给KC_INFO实例的属性:
        /// KCI_START_ADDRESS,KCI_END_ADDRESS,KCP_GOODS_ID,KCI_GOODS_NUMBER,KCI_GOODS_UNIT,KCP_CAR_ID,KCI_CAR_NUMBER,KCI_CAR_LENGTH,KCI_PRICE,KCI_PRICE_UNIT
        /// </summary>
        /// <param name="ki"></param>
        public void splitMainInfoString(KC_INFO ki)
        {
            string reg_start_address = string.Format(@"^[^->]*(?=->)");
            string reg_end_address = string.Format(@"(?<=->)[^,]*(?=,)");
            string reg_goods_number = string.Empty;
            string reg_goods_unit = string.Empty;
            string reg_goods_id = string.Empty;
            string reg_car_id = string.Empty;
            string reg_car_number = string.Empty;
            string reg_car_length = string.Empty;
            string reg_price = string.Empty;
            string reg_price_unit = string.Empty;
            //用,号分割字符串
            MatchCollection ms = Regex.Matches(InfoText, @"[^,]*[^\s](?=,|$)");
            string goodsInfoString = string.Empty;
            string carInfoString = string.Empty;
            if((new int[]{2,22,5}.Contains(this.Type))) //有货求车
            {
                //货物信息字符串
                goodsInfoString = ms[1].ToString();
                //车辆信息字符串
                carInfoString = ms[2].ToString();
                //匹配货物数量的Regularexpression
                reg_goods_number = string.Format(@"(?<=有)((\d+\.\d+)|(\d+-\d+)|\d+)(?=(吨|方|件|车|公斤|个|台))");
                //匹配货物单位
                reg_goods_unit = string.Format(@"(?<=有((\d+\.?\d+)|(\d+-\d+)|\d+))(吨|方|件|车|公斤|个|台)");
                //匹配货物单位
                reg_goods_id = string.Format(@"(?<=有((\d+\.\d)|(\d+-\d+)(\d+))(吨|方|件|车|公斤|个|台))\w+");

 

                //匹配需要车的数量
                reg_car_number = string.Format(@"(?<=求)((\d+-\d+)|(\d+))(?=台)|(?<=求)大量");
                //匹配车的长度
                reg_car_length = @"(?<=求*[^\d])((\d+\.\d+)|(\d+-\d+)|(\d+\.?\d{0,}-\d+\.?\d{0,})|(\d+))(?=米)";
                //匹配车的类型
                reg_car_id = string.Format(@"(冷藏|后八轮或前四后八|零担|\s无箱板车|本地车|驳船|60公分栏半挂)(?=车)");
            }
            else                                        //有车求货
            {
                goodsInfoString = ms[2].ToString();
                carInfoString = ms[1].ToString();
                //匹配需要货物数量
                reg_goods_number = @"(?<=求)(((\d+\.?\d{0,})-(\d+\.?\d{0,}))|(\d+\.\d+)|(\d+)|大量)";
                //货物单位统一为吨
                reg_goods_unit = string.Format("吨");
                //不限货物类型
                reg_goods_id = string.Empty;   
                //匹配车的数量
                reg_car_number = string.Format(@"(?<=有)(\d+-\d+|\d+|大量)(?=台)");
                //匹配车的长度
                reg_car_length = @"(((\d+\.?\d{0,})-(\d+\.?\d{0,}))|(\d+\.\d+)|\d+)(?=米)";
                //匹配车的类型
                reg_car_id= @"(?<=有\d+台*米)\w+(?=车)|(?<=有*米)\w{0,}车";
            }

            //切割出起始地址
            ki.KCI_START_ADDRESS = r(reg_start_address);
            //切割出目的地址
            ki.KCI_END_ADDRESS = r(reg_end_address);
            //货物数量
            ki.KCI_GOODS_NUMBER = r(goodsInfoString, reg_goods_number).Length > 0 ? double.Parse(r(goodsInfoString, reg_goods_number)) : 0;
            //货物单位
            ki.KCI_GOODS_UNIT = r(goodsInfoString, reg_goods_unit);
            //暂定为1,通过得到的货物类型匹配相应的ID号码
            string goodsType = r(goodsInfoString, reg_goods_id);
            if (TypeDic.goodsTypeDic.Keys.Contains(goodsType))
                ki.KCP_GOODS_ID = TypeDic.goodsTypeDic[goodsType].ID;
            else
                ki.KCP_GOODS_ID = 1;//缺省编号
            //车类型,暂定为1,通过得到的车类型匹配相应的ID号码
            string carType = r(reg_car_id);
            if (TypeDic.carTypeDic.Keys.Contains(carType))
                ki.KCP_CAR_ID = TypeDic.carTypeDic[carType].ID;
            else
                ki.KCP_CAR_ID = 1;
            //车的数量
            ki.KCI_CAR_NUMBER = r(carInfoString, reg_car_number);
            //车的长度
            string len = r(carInfoString, reg_car_length);
            ki.KCI_CAR_LENGTH = len.Length > 0 ? double.Parse(len) : 0;
            if (ms.Count > 3)      //如果信息字符串中包含价格信息
            {
                string priceInfoString = ms[3].ToString();
                //匹配价格
                reg_price = @"((\d+\.?\d{0,}-\d+\.?\d{0,})|(\d+\.\d+)|(\d+))(?=元)";
                //匹配价格单位
                reg_price_unit = string.Format(@"(?<=元\/)(吨|方|个|台|件|车|箱)");
                //价钱
                string price = r(priceInfoString, reg_price);
                ki.KCI_PRICE = price.Length > 0 ? double.Parse(price) : 0;
                //价格单位
                ki.KCI_PRICE_UNIT = r(priceInfoString, reg_price_unit);
            }
        }
        /// <summary>
        /// 执行匹配捕获,返回符合条件的字符串
        /// </summary>
        /// <param name="regularexpression"></param>
        /// <returns></returns>
        private string r(string regularexpression)
        {
            return Regex.Match(InfoText, regularexpression).ToString();
        }
        private string r(string input,string regularexpression)
        {
            return Regex.Match(input, regularexpression).ToString();
        }
        #endregion

 

在最初我是想完全使用正则表达式来完成这个工作,但是如果这样Regex将会复杂无比,哪怕现在这段代码中你看起来比较简单的一些字段匹配.

下面是我最初写的使用平衡组来取得货物数量的正则表达式.

            //带详细数据的货物类型捕获表达式:   (?<=,有\d{0,}\.?\d{0,}((吨|方|件|车|公斤|个|台)))\w+(?=,)
            //无详细数据的货物类型捕获表达式:   (?<=,有)货物(?=,)
            (?(,有\d))(?<=,有\d{0,}\.?\d{0,}((吨|方|件|车|公斤|个|台)))\w+(?=,)|(?<=,有)货物(?=,)"

这里只是代码,如果有需要注释解析的朋友,回帖中告诉我,我会附上注释.

 

三天的正则表达式学习发现,看起来复杂的东西,未必就正的用起来也很复杂,很明显,如果纯粹通过程序的判断和字符串方法来取得我想要的数据,这将是一个超级复杂的逻辑,不但易错,而且效率很低.

posted on 2010-05-10 23:16  浇铸  阅读(3498)  评论(0编辑  收藏  举报