Json解析--jsonPath

Json解析

拿到一个Json字符串,如果想提取其中的部分内容,就需要遍历了。在遍历过程中进行判断。

还有一种方式,类似于XPath,叫做JsonPath。

$ pip install jsonpath

官网 https://goessner.net/articles/JsonPath/

语法和实例 https://github.com/json-path/JsonPath

https://github.com/alibaba/fastjson/wiki/JSONPath

XPath JSONPath 说明
/ $ 根节点
. @ 当前节点
/ . or [] 子节点
.. 不支持 父节点
// .. 向下任意层
* * 通配符,匹配任意节点
@ 不支持 属性
[] [] 下标
| [,] XPath是或操作。JSONPath结果集的集合,也运行替换名和数组索引作为一个集合
不支持 [start:end:step] 切片
[] ?() 过滤条件
不支持 () 表达式
() 不支持 分组

范例文本

{ "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}
XPath JSONPath 说明
/store/book/author $.store.book[*].author 根节点下store下所有book的anthor
//author $..author 所有的author
/store/* $.store.* 根节点下store下的所有节点
/store//price $.store..price 根下的store下任意层的price
//book[3] $..book[2] 任意层的第三个book
//book[last()] $..book[(@.length-1)]
$..book[-1:]
按顺序排列最后一个book
//book[position()<3] $..book[0,1]
$..book[:2]
前两个book
//book[isbn] $..book[?(@.isbn)] book这个节点下有isbn的节点
//book[price<10] $..book[?(@.price<10)] preice<10 的book节点
//* $..* 所有层次的所有节点

依然用豆瓣电影的热门电影的Json
https://movie.douban.com/j/search_subjects?type=movie&tag=热门&page_limit=10&page_start=0

找到得分高于8分的

from jsonpath import jsonpath
import requests
import json

url = """https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=10&page_start=0"""
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'

with requests.get(url=url, headers={'user-agent': user_agent}) as res:
    print(res.status_code)
    x = json.loads(res.text)
    
    # 所有title
    print(jsonpath(x, '$..title'))
    # rate>8的title和rate
    print(jsonpath(x, '$..subjects[?(@.rate>"8")].title,rate'))
    
    
    
"""
200
['不要抬头', '杰伊·比姆', '希德尼娅的骑士 编织爱的行星', '温柔酒吧', '弥撒', '黑客帝国:矩阵重启', '魔法满屋', '法兰西特派', '沙丘', '铁道英雄']
['杰伊·比姆', '8.7', '希德尼娅的骑士 编织爱的行星', '8.5']
"""

jsonpath中必须是一个python中的对象,例如字典

爬取的结果属于字符串,需要用json.loads()转换为python对象

小技巧

a = ['晒后假日', '8.1', '伊尼舍林的报丧女妖', '7.9', '鱼之子', '8.2', '夜枭', '7.1', '今夜,就算这份爱恋从世界上消失','7.6', '核磁共振', '7.8', '亲密', '8.1', '乐土', '8.6', '晨光正好', '7.5', '穿靴子的猫2', '8.1', '塔尔', '7.4','造梦之家', '7.5', '西线无战事', '8.5', '上帝的笔误', '7.3']
a1 = [(a[i], a[i + 1]) for i in range(0, len(a), 2)]
print(a1)
"""
将爬取后的数据格式化

[('晒后假日', '8.1'), ('伊尼舍林的报丧女妖', '7.9'), ('鱼之子', '8.2'), ('夜枭', '7.1'), ('今夜,就算这份爱恋从世界上消失', '7.6'), ('核磁共振', '7.8'), ('亲密', '8.1'), ('乐土', '8.6'), ('晨光正好', '7.5'), ('穿靴子的猫2', '8.1'), ('塔尔', '7.4'), ('造梦之家', '7.5'), ('西线无战事', '8.5'), ('上帝的笔误', '7.3')]
"""

优化写法

import json
from jsonpath import jsonpath
import requests


def request_json(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
    }
    with requests.get(url=url, headers=headers) as res:
        if res.status_code == 200:
            with open("rp.json", "wb") as f:
                f.write(res.content)


def run():
    with open("rp.json", "r", encoding="utf8") as f:
        content = json.loads(f.read())
        res = jsonpath(content, "$..subjects[?(@.rate>'8')].title,rate")
        if res:
            print([(res[i], res[i + 1]) for i in range(0, len(res), 2)])


if __name__ == '__main__':
    url = "https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&page_limit=10&page_start=0"
    # request_json(url)
    run()
    
"""
[('巴比伦', '8.0'), ('晒后假日', '8.1'), ('鱼之子', '8.2')]
"""
posted @ 2023-02-19 14:28  厚礼蝎  阅读(100)  评论(0编辑  收藏  举报