网络爬虫之数据库连接

  爬取的数据一般需要提交给数据库,这里就介绍了三个主流数据库的连接(mysql,redis,mongodb),如果你的数据库服务器都放在liunx系统上首先要修改一下配置文件将bind 127.0.0.1修改为bind 0.0.0.0这样才能访问数据库。并且需要查看linux防火墙设置。如果开启要将其关闭。

 

查看防火墙状态
systemctl status firewalld.service

如果显示Loaded: running就要将其关闭

关闭防火墙
Systemctl stop firewalld.service

Loaded: loaded表示已经关闭

 

连接mysql:

首先检查是否安装上pymsql

import pymysql

conn = pymysql.connect(host='172.16.70.130',port=3306,user=‘user',password=‘passwd’)#host是你的主机地址 port默认为3306 user表示你的用户名 password表示密码 另外可以指定库只需要传递database参数即可

cur = conn.cursor()
cur.execute('select version()')
data = cur.fetchall()
print(data)#打印版本号

运行结果如下:

(('5.7.27',),)

 

连接redis:

首先检查是否安装redis

import redis
conn = redis.StrictRedis(host='172.16.70.130',port=6379,decode_responses=True,db=1)
#host:主机名 port:端口号默认6379 如果有设密码需要传递password参数 db指定库 默认为0
print(conn.info())

结果如下:

{'redis_version': '5.0.5'#版本号, 'redis_git_sha1': 0, 'redis_git_dirty': 0, 'redis_build_id': '6a23e5766d3175f5', 'redis_mode': 'standalone', 'os': 'Linux 3.10.0-1160.21.1.el7.x86_64 x86_64', 'arch_bits': 64, 'multiplexing_api': 'epoll', 'atomicvar_api': 'atomic-builtin', 'gcc_version': '4.8.5', 'process_id': 3702, 'run_id': '0b9a27d474df47866f2615cbb4c12a157c202d57', 'tcp_port': 6379, 'uptime_in_seconds': 6837, 'uptime_in_days': 0, 'hz': 10, 'configured_hz': 10, 'lru_clock': 7365539, 'executable': '/opt/redis-5.0.5/./src/redis-server', 'config_file': '/opt/redis-5.0.5/redis.conf', 'connected_clients': 2, 'client_recent_max_input_buffer': 2, 'client_recent_max_output_buffer': 0, 'blocked_clients': 0, 'used_memory': 5019560, 'used_memory_human': '4.79M', 'used_memory_rss': 11505664, 'used_memory_rss_human': '10.97M', 'used_memory_peak': 8052504, 'used_memory_peak_human': '7.68M', 'used_memory_peak_perc': '62.34%', 'used_memory_overhead': 858080, 'used_memory_startup': 791392, 'used_memory_dataset': 4161480, 'used_memory_dataset_perc': '98.42%', 'allocator_allocated': 5438560, 'allocator_active': 6467584, 'allocator_resident': 17625088, 'total_system_memory': 1019559936, 'total_system_memory_human': '972.33M', 'used_memory_lua': 37888, 'used_memory_lua_human': '37.00K', 'used_memory_scripts': 0, 'used_memory_scripts_human': '0B', 'number_of_cached_scripts': 0, 'maxmemory': 0, 'maxmemory_human': '0B', 'maxmemory_policy': 'noeviction', 'allocator_frag_ratio': 1.19, 'allocator_frag_bytes': 1029024, 'allocator_rss_ratio': 2.73, 'allocator_rss_bytes': 11157504, 'rss_overhead_ratio': 0.65, 'rss_overhead_bytes': -6119424, 'mem_fragmentation_ratio': 2.32, 'mem_fragmentation_bytes': 6549264, 'mem_not_counted_for_evict': 0, 'mem_replication_backlog': 0, 'mem_clients_slaves': 0, 'mem_clients_normal': 66616, 'mem_aof_buffer': 0, 'mem_allocator': 'jemalloc-5.1.0', 'active_defrag_running': 0, 'lazyfree_pending_objects': 0, 'loading': 0, 'rdb_changes_since_last_save': 0, 'rdb_bgsave_in_progress': 0, 'rdb_last_save_time': 1617974663, 'rdb_last_bgsave_status': 'ok', 'rdb_last_bgsave_time_sec': 0, 'rdb_current_bgsave_time_sec': -1, 'rdb_last_cow_size': 634880, 'aof_enabled': 0, 'aof_rewrite_in_progress': 0, 'aof_rewrite_scheduled': 0, 'aof_last_rewrite_time_sec': -1, 'aof_current_rewrite_time_sec': -1, 'aof_last_bgrewrite_status': 'ok', 'aof_last_write_status': 'ok', 'aof_last_cow_size': 0, 'total_connections_received': 14, 'total_commands_processed': 60113, 'instantaneous_ops_per_sec': 0, 'total_net_input_bytes': 7993087, 'total_net_output_bytes': 9155003, 'instantaneous_input_kbps': 0.0, 'instantaneous_output_kbps': 0.0, 'rejected_connections': 0, 'sync_full': 0, 'sync_partial_ok': 0, 'sync_partial_err': 0, 'expired_keys': 0, 'expired_stale_perc': 0.0, 'expired_time_cap_reached_count': 0, 'evicted_keys': 0, 'keyspace_hits': 15, 'keyspace_misses': 5, 'pubsub_channels': 0, 'pubsub_patterns': 0, 'latest_fork_usec': 321, 'migrate_cached_sockets': 0, 'slave_expires_tracked_keys': 0, 'active_defrag_hits': 0, 'active_defrag_misses': 0, 'active_defrag_key_hits': 0, 'active_defrag_key_misses': 0, 'role': 'master', 'connected_slaves': 0, 'master_replid': '545becf1f5f2952dcf76619dbc67fe7b95a03776', 'master_replid2': 0, 'master_repl_offset': 0, 'second_repl_offset': -1, 'repl_backlog_active': 0, 'repl_backlog_size': 1048576, 'repl_backlog_first_byte_offset': 0, 'repl_backlog_histlen': 0, 'used_cpu_sys': 20.550579, 'used_cpu_user': 8.305302, 'used_cpu_sys_children': 0.058768, 'used_cpu_user_children': 0.202806, 'cluster_enabled': 0, 'db1': {'keys': 1, 'expires': 0, 'avg_ttl': 0}}

 

mongodb连接:

首先检查是否安装pymongo

import pymongo
client = pymongo.MongoClient(host='172.16.70.130',port=27017)
#host:主机地址 port:端口号默认27017

#如果开启权限认证就需要进行登陆认证
client['admin'].authenticate(username,password)

db = client[‘databasename’] #指定库

然后就可以往里插入或者查找数据验证是否连接成功

 

连接成功后就可根据需求建表建库保存数据,还可以通过操作文件句柄保存在本地。

posted @ 2021-04-09 22:37  Ccdjun  阅读(365)  评论(0编辑  收藏  举报