正则的实验应用

"""
有一个长文本,需要解析成特定的数据格式

解析后的格式为:
{
'name': 'Variopartner SICAV',
'lei': '529900LPCSV88817QH61',
'sub_fund': [{
'title': 'TARENO GLOBAL WATER SOLUTIONS FUND',
'isin': ['LU2001709034', 'LU2057889995', 'LU2001709547']
}, {
'title': 'TARENO FIXED INCOME FUND',
'isin': ['LU1299722972']
}, {
'title': 'TARENO GLOBAL EQUITY FUND',
'isin': ['LU1299721909', 'LU1299722113', 'LU1299722030']
}, {
'title': 'MIV GLOBAL MEDTECH FUND',
'isin': ['LU0329630999', 'LU0329630130']
}]
}
注意sub_fund数组的个数不是固定为4,并且isin个数不固定,需要写成通用逻辑,以适应最多100个sub_fund。
"""

long_text = """
Variopartner SICAV
529900LPCSV88817QH61

  1. TARENO GLOBAL WATER SOLUTIONS FUND
    LU2001709034
    LU2057889995
    LU2001709547
  2. TARENO FIXED INCOME FUND
    LU1299722972
  3. TARENO GLOBAL EQUITY FUND
    LU1299721909
    LU1299722113
    LU1299722030
  4. MIV GLOBAL MEDTECH FUND
    LU0329630999
    LU0329630130
  5. TARENO GLOBAL EQUITY FUND
    LU0329630999
    LU0329630130
    LU0329630999
    LU0329630130
    """
    dicts = {}
    lists = []
    result = re.split('\d. ',long_text)
    print(result)
    data1 = result[0].split('\n')
    print(data1)
    dicts['name'] = data1[1]
    dicts['lei'] = data1[2]
    print(dicts)

data2 = result[1:]
for i in data2:
ii = i.split('\n')
lists.append({
'title': ii[0],
'isin': ii[1:-1]
})

dicts['sub_fund'] = lists
print(dicts)

posted @   淡然。。  阅读(11)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
点击右上角即可分享
微信分享提示