lxinghua

博客园 首页 新随笔 联系 订阅 管理

csv存储

1. csv文件时大数据文件储存格式的文件结构与Excel不同;

2. CSV是一种通用、相对简单的文件格式,被用户。商业和科学广泛应用。最广泛的应用是在程序之间转移表格数据,而这些程序本身是在不兼容的格式上进行操作的(往往是私有的和/或无规范的格式);

3. 因为大量程序都支持某种CSV变体,至少是作为一种可选择的输入/输出格式;

4. 例如,一个用户可能需要交换信息,从一个私有格式存储数据的数据库程序,到一个数据格式完全不同的电子表格。最可能的情况是,该数据库程序可以导出数据为“CSV”,然后被导出的CSV文件可以被电子表字程序导入。

import csv

headers = ("name", "age", "height")

students = [
    ("李四", 18, 180),
    ("张三", 18, 180),
    ("张三", 18, 180),
    ("张三", 18, 180),
    ("张三", 18, 180),
    ("张三", 18, 180),
]

# with open("students.csv", "a+", encoding="utf-8", newline="") as fp:
#     write = csv.DictWriter(fp, headers)
#
#     write.writeheader()  # 写入头
#     write.writerows(students)


with open("students.csv", "w+", encoding="utf-8", newline="") as fp:
    write = csv.writer(fp)
    write.writerow(headers)     # writerow 用于单个数据类型(元组类型)


with open("students.csv", "a", encoding="utf-8", newline="") as fp:
    write = csv.writer(fp)
    write.writerows(students)   # writerows 用于多组数据类型(列表类型)

Mysql数据库存储

 MySQL基本命令

1. 登录数据库: mysql -uusername -ppassword

2. 查看数据库: show databases

3. 创建数据库: create database database_name

4. 使用数据库: use database_name

5. 创建表: create table if not exists table_name(字段1 类型 属性,字段2 类型 属性,......)

6. 查看所有数据: select * from table_name

7. 插入数据: insert into table_name(字段1, 字段2, ...) values(字段1的值,字段2的值,...)

Python连接MySQL

import pymysql
# 连接mysql
conn = pymysql.connect(user="root", password="root", host="localhost", port=3306, database="maqu", charset="utf8mb4")

# 获取游标
cursor = conn.cursor()
# 添加一条数据
sql = "insert into photos(title,href,img_url) values(%s,%s,%s)"

data = (
    "this a title",
    "https://www.baidu.com",
    "https://www.baidu.com/imgs/asdfjalsdhflksahdfk.jpg"

)
cursor.execute(sql, data)
conn.commit()  # 提交
import requests
import re
import json
import hashlib
from bs4 import BeautifulSoup
import pymysql

url = "https://www.huashi6.com/"

document = requests.get(url).text

bs = BeautifulSoup(document, "html.parser")
items = bs.select("div.c-section-waterfall div.c-section-work-item")

photos = []  # 用来存储图片数据
for item in items:
    try:
        a = item.select_one("div.waterfall-img a")
        title, href = a["title"], a["href"]

        document2 = requests.get(href).text

        # 使用正则表达式从script标签中提取内容
        match_obj = re.search(r'<script type="application/ld\+json">(.*?)</script>', document2, re.S)
        json_str = match_obj.group(1).strip("\n\r\t ")
        img_url = "https:" + json.loads(json_str)["images"][0].split("?")[0]

        # 下载图片
        content = requests.get(img_url).content
        file_name = hashlib.md5(img_url.encode("utf-8")).hexdigest()
        file_dir = "./imgs/" + file_name + ".png"

        print("正在下载: ", img_url)
        with open(file_dir, "wb") as fp:
            fp.write(content)

        # 把数据放到列表里面
        photos.append((title, href, file_dir))
    except Exception:
        pass

"""

photos = [
    ("asdfa", "adsfasd.com", "asdlfjals;.jpg),
    ("asdfa", "adsfasd.com", "asdlfjals;.jpg),
    ("asdfa", "adsfasd.com", "asdlfjals;.jpg),
    ("asdfa", "adsfasd.com", "asdlfjals;.jpg),
]

"""

# 连接mysql
conn = pymysql.connect(user="root", password="root", host="localhost", port=3306, database="maqu", charset="utf8mb4")

# 获取游标
cursor = conn.cursor()
# 添加一条数据
sql = "insert into photos(title,href,img_url) values(%s,%s,%s)"

# 添加一条数据
# data = (
#     "this a title",
#     "https://www.baidu.com",
#     "https://www.baidu.com/imgs/asdfjalsdhflksahdfk.jpg"
#
# )
# cursor.execute(sql, data)


# 添加多条数据
try:
    cursor.executemany(sql, photos)
    conn.commit()  # 提交
except Exception as e:
    print(e)
    conn.rollback()


cursor.close()
conn.close()

mongoDB数据库存储

1. Ubuntu中启动mongodb: sudo service mongodb start; 关闭: sudo service mongodb stop

2. 启动mongodb: mongo

3. 显示数据库别表:show dbs

4. 切换当前数据集至test:use test [test为数据库的库名,可以更换。如果不存在该数据库,则会新创建一个test数据库]

5. 显示当前数据库中的模块:show collections

6. 创建集合:db.createCollection('集合') [如果没有该集合又直接使用了,则会自动创建]

7. 查找数据:db.data.find() [data为集合的名称]

8. 插入数据:data.insert({'x': 1, 'y': 2})  [插入的数据要是字典形式]

from pymongo import MongoClient

client = MongoClient("127.0.0.1", 27017)

# 创建数据库
maqu = client.maqu
music = maqu.music  # collection

# 添加文档
# music.insert_one({"title": "mongodb的使用"})
# music.insert_one({"name": "friendship"})


# 获取数据

# rv = music.find()  # 获取所有的文档,是一个游标对象。
# for item in rv:
#     print(item)

# print(list(rv))  # 把游标对象转成列表

# 获取一条数据
# rv = music.find_one() # 返回结果是一个字典
# print(rv, type(rv), rv["title"])

# rv = music.find_one({"name": "friendship"})  # 条件查询 获取一条
# print(rv)

# rv = music.find({"name": "friendship"})  # 条件查询 获取多条
# print(list(rv))
from pymongo import MongoClient
import datetime

client = MongoClient("127.0.0.1", 27017)

db = "maqushop"

maqushop = client[db]  # 如果数据库的名称是变量

# 创建一个测试结合  test_collection
collect = "test_collection"
collect = maqushop[collect]

# 构建一个document
# post = {"author": "Mike",
#         "text": "My first blog post!",
#         "tags": ["mongodb", "python", "pymongo"],
#         "date": datetime.datetime.utcnow()}
# obj_id = collect.insert_one(post).inserted_id
#
# print(obj_id)

# 插入多条数据

posts = [
{"author": "Mike",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()},
{"author": "friendhsip",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()},
{"author": "yuer",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

]

ids = collect.insert_many(posts).inserted_ids
# print(ids)

 

posted on 2023-03-15 20:05  興華  阅读(19)  评论(0编辑  收藏  举报