pymysql通过DBUtils实现连接池技术

DBUtils 是一套 Python 数据库连接池包，并允许对非线程安全的数据库接口进行线程安全包装。

一、安装

pip install DBUtils

二、导入模块

# 针对不同版本，可能导入方式存在差别
try:
    from dbutils.pooled_db import PooledDB
    from dbutils.persistent_db import PersistentDB
except:
    from DBUtils.PooledDB import PooledDB
    from DBUtils.PersistentDB import PersistentDB

三、使用

1、DBUtils提供两种外部接口：

PersistentDB：

为每个线程都会创建一个新的数据库连接，并始终服务于该特定线程，线程即使调用了close方法，也不会关闭，只是把连接重新放到连接池，供自己线程再次使用。当线程终止时，连接自动关闭
通过重用数据库连接来提高数据库的访问性能，同时确保了线程间不共享数据库连接（当线程数一定的情况下，推荐使用）
注意：避免短时间内创建多个线程（创建多个线程时，可使用延时操作）
实例化时常用参数设置：
```
import pymysql
from dbutils.persistent_db import PersistentDB

pool = PersistentDB(pymysql, host=MYSQL_HOST, port=MYSQL_PORT, user=MYSQL_USER, db=MYSQL_DATABASE, password=MYSQL_PASSWORD, charset='utf8mb4')
```
第一个参数指定的是连接数据库的模块pymysql，其它使用默认参数（maxusage=None或者0表示每个连接的使用次数不受限，ping=1表示每次获取连接时，检查连接是否可用...），后面都是使用pymysql中的参数

示例

import threading
import time
import requests
from bs4 import BeautifulSoup
import pymysql
from dbutils.persistent_db import PersistentDB

# 数据库连接参数
MYSQL_HOST = 'your_host'
MYSQL_PORT = 3306
MYSQL_USER = 'your_user'
MYSQL_PASSWORD = 'your_password'
MYSQL_DATABASE = 'your_database'

# 创建持久连接池
pool = PersistentDB(pymysql,
                    maxusage=None,  # 一个链接最多被重复使用的次数，None表示无限制
                    setsession=[],  # 开始会话前执行的命令列表。如：["set datestyle to ...", "set time zone ..."]
                    ping=1, # 每次使用连接之前都会检查连接是否有效。如果连接无效，连接池会自动尝试重新建立连接，默认值：1
                    host=MYSQL_HOST,
                    port=MYSQL_PORT,
                    user=MYSQL_USER,
                    password=MYSQL_PASSWORD,
                    database=MYSQL_DATABASE,
                    charset='utf8mb4')

# 定义一个函数，用于从网页提取数据
def extract_data_from_url(url):
    response = requests.get(url)
    response.raise_for_status()  # 检查HTTP请求是否成功
    soup = BeautifulSoup(response.content, 'html.parser')
    # 根据网页结构提取数据
    # 例如，提取标题和链接：
    title = soup.find('h1').text.strip()
    link = url
    return title, link

def execute_with_retry(cursor, sql, params):
    max_retries = 3
    for attempt in range(max_retries):
        try:
            cursor.execute(sql, params)
            return True
        except pymysql.OperationalError as e:
            if attempt == max_retries - 1:
                raise
            print(f"Database operation failed, retrying... (Attempt {attempt + 1})")
            time.sleep(1)  # 等待一秒后重试

def crawl_and_insert_data(thread_id, urls):
    conn = None
    cursor = None
    try:
        conn = pool.connection()
        cursor = conn.cursor()
        for url in urls:
            try:
                title, link = extract_data_from_url(url)

                # 执行插入操作，使用重试机制
                sql = "INSERT INTO your_table (title, link) VALUES (%s, %s)"
                if execute_with_retry(cursor, sql, (title, link)):
                    conn.commit()
                    print(f"Thread {thread_id} inserted data: {title} - {link}")
                else:
                    print(f"Thread {thread_id} failed to insert data for {url}")
            except requests.RequestException as e:
                print(f"Thread {thread_id} failed to fetch {url}: {e}")
                continue
            except Exception as e:
                print(f"Thread {thread_id} encountered an error processing {url}: {e}")
                conn.rollback()
                continue

    except Exception as e:
        print(f"Thread {thread_id} encountered a database error: {e}")
    finally:
        if cursor:
            cursor.close()
        if conn:
            conn.close()

# 创建 URL 列表
urls = [f"https://example.com/page/{i}" for i in range(1, 101)]  # 假设有 100 个页面

# 将 URL 列表分成 5 份
chunk_size = len(urls) // 5

url_chunks = [urls[i:i + chunk_size] for i in range(0, len(urls), chunk_size)]

# 创建和启动 5 个线程
threads = []
for i in range(5):
    thread = threading.Thread(target=crawl_and_insert_data, args=(i, url_chunks[i]))
    threads.append(thread)
    thread.start()

# 等待所有线程完成
for thread in threads:
    thread.join()
print("All threads finished and connection pool closed.")

PooledDB

能够提供线程间可共享的数据库连接，但是需要通过maxshared参数指定最多可共享连接数
对于应用中需要频繁地创建和销毁线程时，推荐使用

实例化常用参数设置：

import pymysql
from dbutils.pooled_db import PooledDB

pool = PooledDB(pymysql, maxconnections=10, blocking=True, host=MYSQL_HOST, port=MYSQL_PORT, user=MYSQL_USER, db=MYSQL_DATABASE, password=MYSQL_PASSWORD, charset='utf8mb4')

参数：

mincached：初始时连接池中的连接数，默认为0。
maxcached：连接池中最大的闲置连接数，默认为0表示不限制。
maxconnections：连接池允许的最大连接数，默认为0表示不限制。
blocking：当连接池达到最大连接数并且所有连接都在使用中时，是否阻塞等待可用连接，默认为False。
ping：每次使用连接之前都会检查连接是否有效，默认1表示检查

示例

import pymysql
import threading

from dbutils.pooled_db import PooledDB

# 数据库连接参数
MYSQL_HOST = 'your_host'
MYSQL_PORT = 3306
MYSQL_USER = 'your_user'
MYSQL_PASSWORD = 'your_password'
MYSQL_DATABASE = 'your_database'

# 创建连接池
pool = PooledDB(pymysql,
                 mincached=5,  # 初始化时，链接池中至少创建的空闲链接：5个
                 maxcached=5,  # 链接池中最多闲置的链接：5个，多余的会被关闭
                 maxconnections=10,  # 链接池允许的最大连接数：10个，包括使用中的和空闲的连接，多余的请求会阻塞
                 blocking=True,  # 连接池中如果没有可用连接后，是否阻塞等待
                 host=MYSQL_HOST,
                 port=MYSQL_PORT,
                 user=MYSQL_USER,
                 password=MYSQL_PASSWORD,
                 database=MYSQL_DATABASE,
                 ping=1,   # 每次使用连接之前都会检查连接是否有效。如果连接无效，连接池会自动尝试重新建立连接，默认值：1
                 charset='utf8mb4')

def insert_data(thread_id):
    # 从连接池中获取连接
    conn = pool.connection()
    cursor = conn.cursor()

    try:
        # 执行插入操作
        sql = "INSERT INTO your_table (column1, column2) VALUES (%s, %s)"
        cursor.execute(sql, ('value1', 'value2'))
        conn.commit()
        print(f"Thread {thread_id} inserted data successfully.")
    except Exception as e:
        conn.rollback()
        print(f"Thread {thread_id} failed to insert data: {e}")
    finally:
        # 关闭连接和游标
        cursor.close()
        conn.close() # 不会真正关闭数据库连接，而是将这个连接标记为可用状态，并将其放回连接池中

# 创建和启动5个线程
threads = []
for i in range(5):
    thread = threading.Thread(target=insert_data, args=(i,))
    threads.append(thread)
    thread.start()

# 等待所有线程完成
for thread in threads:
    thread.join()

# 关闭连接池
pool.close()
print("All threads finished.")

处理异常重试

def insert_data(thread_id):
    max_retries = 3
    retry_count = 0
    
    while retry_count < max_retries:
        try:
            conn = pool.connection()
            cursor = conn.cursor()
            
            # 执行插入操作
            sql = "INSERT INTO your_table (column1, column2) VALUES (%s, %s)"
            cursor.execute(sql, ('value1', 'value2'))
            conn.commit()
            print(f"Thread {thread_id} inserted data successfully.")
            break  # 操作成功，跳出循环
        
        except pymysql.OperationalError as e:
            # 处理数据库操作错误
            print(f"Thread {thread_id} encountered an operational error: {e}")
            retry_count += 1
            if retry_count >= max_retries:
                print(f"Thread {thread_id} failed after {max_retries} retries")
            else:
                print(f"Thread {thread_id} retrying... (Attempt {retry_count + 1})")
                time.sleep(1)  # 等待一秒后重试
        
        except Exception as e:
            # 处理其他异常
            print(f"Thread {thread_id} encountered an unexpected error: {e}")
            break
        
        finally:
            # 确保连接被正确关闭
            if 'cursor' in locals():
                cursor.close()
            if 'conn' in locals():
                conn.close()

是否需要加锁？

在大多数情况下，使用连接池进行多线程数据库操作不需要额外的锁。
数据库系统和连接池已经提供了必要的并发控制机制

只有在特定的、需要在应用层保证操作原子性的场景下，才需要考虑额外的同步措施。

import threading
from DBUtils.PooledDB import PooledDB
import pymysql

pool = PooledDB(pymysql, ...)
lock = threading.Lock()

def update_counter(thread_id):
    with lock:
        conn = pool.connection()
        cursor = conn.cursor()
        try:
            # 读取当前计数
            cursor.execute("SELECT count FROM counter WHERE id = 1")
            current_count = cursor.fetchone()[0]
            
            # 更新计数
            new_count = current_count + 1
            cursor.execute("UPDATE counter SET count = %s WHERE id = 1", (new_count,))
            conn.commit()
            print(f"Thread {thread_id} updated counter to {new_count}")
        except Exception as e:
            conn.rollback()
            print(f"Thread {thread_id} failed to update counter: {e}")
        finally:
            cursor.close()
            conn.close()

# 使用线程执行更新操作
threads = [threading.Thread(target=update_counter, args=(i,)) for i in range(5)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

View Code

2、是否需要手动关闭连接池
- 但出于资源管理的可靠性和及时性考虑，对于 PooledDB，强烈建议手动关闭连接池。这种做法可以确保资源被正确和及时地释放，避免潜在的资源泄漏问题。
3、如何选择PersistentDB以及PooledDB
- 比较两者的特点
  - PersistentDB
    - PersistentDB 是 DBUtils 提供的另一种连接池实现。它通过维护一个持久的、固定大小的连接池来提供性能。连接在创建后会一直保持打开状态,直到连接池被销毁。适用于高并发、长连接的场景，比如 Web 服务器。
  - PooledDB
    - PooledDB 是更常见和推荐使用的连接池实现。它会根据需求动态地创建和关闭连接，以适应不同的负载情况。连接在使用后会被归还到连接池，以供后续复用。适用于并发较高，但连接时间较短的场景，比如批处理任务。
- 比较两者的优缺点
  - 性能
    - PersistentDB 由于连接一直保持打开状态,在高并发场景下性能更优。
    - PooledDB 需要频繁创建和关闭连接,但对于短连接场景更加灵活。
  - 资源利用
    - PersistentDB 会一直占用数据库连接资源,即使连接处于空闲状态。
    - PooledDB 可以更好地管理连接,在空闲时释放连接,节约资源。
  - 可扩展性
    - PooledDB 可以根据负载动态调整连接数,更容易扩展。
    - PersistentDB 由于连接池大小固定,扩展性相对较差。
  - 异常处理
    - PooledDB 能更好地处理连接失效等异常情况,自动重试或创建新连接。
    - PersistentDB 对于连接失效的处理相对较弱。
- 综合考虑
  - 如果你的应用程序有较高的并发需求，且连接时间较长(如 Web 服务器)，那么 PersistentDB 可能是更好的选择。
  - 如果你的应用程序有较高的并发需求,但连接时间较短(如批处理任务),那么 PooledDB 可能更适合。
  - 如果你的应用程序并发不高,或者需求不太明确，那么使用 PooledDB 通常是一个更安全的选择。它能够更好地适应不同的场景，并提供更好的异常处理能力。
  - 总的来说，根据你的具体需求和场景特点，选择合适的连接池实现是很重要的。两种方式都有各自的优缺点，需要权衡考虑。如果你对选择还有疑问,可以先尝试使用 PooledDB，它是更通用和推荐的选择。

posted @ 2023-02-22 01:16 eliwang 阅读(1115) 评论(0) 编辑收藏举报

刷新页面返回顶部

eliwang

学无止境的小渣渣

pymysql通过DBUtils实现连接池技术

一、安装

二、导入模块

三、使用

1、DBUtils提供两种外部接口：

PersistentDB：

PooledDB

2、是否需要手动关闭连接池

3、如何选择PersistentDB以及PooledDB

比较两者的特点

比较两者的优缺点

综合考虑

公告