博客系统备份
转载: http://www.codingsoho.com/zh/blog/blog-system-backup/
前言
我的网站codingsoho运行了一段时间,数据库也日渐庞大,感觉数据库的备份变得刻不容缓了。周末整理了一下,实现了基本的备份功能。包括数据库及媒体文件的本地以及云备份,同时实时邮件通知备份进展。
工具备份MySQL文件 - Navicat
如果你不打算用代码来实现,Navicat可以帮忙你实现基本的数据库备份功能。刚好我也是用Navicat管理MySQL的数据库的,所以顺便学习了一下。
假定你已经创建了自己的连接。打开连接属性,在[高级]标签里设置"设置保存路径",这样你的备份文件将会存在此路经。
设置完成之后,打开[计划]功能,点击下方[新建批处理作业]
会出现下方的批处理窗口,在左边选择对应的数据库,它就会出现在右边的[可用任务]里,点击它,将将会被选中出现在下面的[已选择任务]列表里。
点击做上方的[保存按钮],保存该任务,在弹出的下方对话框填入文件名确定即可。
这时可以看到,刚刚创建的backup任务已出现在计划列表里。
选中该任务,点击上方的[设置任务计划],出现一个弹出对话框,选中[计划]标签栏,点击[新建]按钮开始新建一个计划,根据下面的选项,你可以选择按天或者其他的选择。
确定之后,它会再让你输入密码确认。
设置完成之后,它机会自动的去备份你的数据库了。看了一下它备份的psc
格式,可以通过Navicat转为sql文件。
python脚本备份
上面只是简单介绍一下工具,下面开始我们的主题:如何用python来实现。
备份MySQL
MySQL备份的命令是mysqldump
常用的格式为
mysqldump -u[username] -p[password] [dbname] > [save-path]
所以,我们备份的核心功能其实就是调用这句命令
cmd_dump='mysqldump -u%s -p%s %s > %s' %(conf.sql_username,conf.sql_pwd,db_name,output_filename)
os.system(cmd_dump)
在上面的语句中,用户名和密码我们存在配置文件里,db_name
是数据库名字,因为我不仅要备份一个数据库,所以这个会是一个动态的结果,最终我们要设置好备份位置output_filename
配置文件为conf.py,内容如下
sql_username = "root"
sql_pwd = "******"
sql_exe_path = "D:\Program Files (x86)\MySQL\MySQL Server 5.5\bin"
output_sql_path = "C:\\backup\sql"
output_media_path = "C:\\backup\media"
logs_path = "C:\\backup\log"
完整代码如下
# set log path
logname = datatime_util.filename_prefix + '.log'
logpath = os.path.join(conf.logs_path, logname)
# set filename
filename = datatime_util.filename_prefix + '.sql'
# loop each db
for db_name in conf.dbs:
msg = '-'*2 + datatime_util.logname_prefix + " "
# set file path
output_filename = conf.output_sql_path + os.sep + db_name + "_" + filename
# execute dump command
cmd_dump='mysqldump -u%s -p%s %s > %s' %(conf.sql_username,conf.sql_pwd,db_name,output_filename)
if os.system(cmd_dump)==0:
msg += 'database backup successfully: {}'.format(output_filename)
else:
msg += 'database backup failed! {}'.format(output_filename)
write_logs(logpath,msg)
dbs定义如下
dbs = (
"sales6",
"sales7",
)
其中,调用了一个命令write_logs
,用于写日志,就是常见的写文件操作,追加方式,代码如下
def write_logs(filename,contents):
with open(filename,'a') as f:
f.write(contents + '\n')
datatime_util
提供根据时间设置一些文件名和log的方法,这样我们能够保证创建的文件或内容可溯性强,同时不会重复,实现如下
class DateTimeUtil(object):
@property
def foldername_day(self):
return time.strftime('%Y-%m-%d')
@property
def filename_prefix(self):
return time.strftime('%Y-%m-%d-%H%M%S')
@property
def logname_prefix(self):
return time.strftime('%Y-%m-%d,%H:%M:%S')
备份媒体文件
备份媒体文件用到了window的命令xcopy
,实现方法类似,代码如下
for media in conf.medias:
prefix = '+'*2 + datatime_util.logname_prefix + ' '
foldername = media[0] + "-" + datatime_util.filename_prefix #
output_foldername = conf.output_media_path + os.sep + foldername
if not os.path.exists(output_foldername):
if not os.mkdir(output_foldername):
msg = prefix + "succeed to create media folder {}".format(output_foldername)
write_logs(logpath, msg)
else:
msg = prefix + "fail to create media folder {}".format(output_foldername)
write_logs(logpath, msg)
sys.exit()
cmd_copy='xcopy %s %s /s /e' %(media[1], output_foldername)
if os.system(cmd_copy)==0:
msg = prefix + 'media copy successfully: {}'.format(media[1])
else:
msg = prefix + 'media copy failed! {}'.format(media[1])
write_logs(logpath,msg)
medias的定义如下,里面放的是应用的media文件
medias = (
("sales6","E:\\Computer\\virtualenv\\djangocmsblog\\djangocms35-csblog\\media"),
)
它同样轮询媒体库,然后将目录下的文件拷到指定目录,并按合适规则命名
媒体文件压缩
媒体文件往往很大,如果直接拷贝对磁盘要求太高,所以我们接下来对它压缩一下。
高效的方法应该是读原文件直接压缩,我这儿本来是分两步介绍的,所以先拷贝,再压缩,最后再删除文件夹。实际使用时可以进行优化。
f = zipfile.ZipFile( output_zipname, 'w', zipfile.ZIP_DEFLATED )
zip_subdir = "media"
for dirpath, dirnames, filenames in os.walk( output_foldername ):
for filename in filenames:
filepath = os.path.join(dirpath, filename)
zip_path = os.path.join("media", filepath[len(conf.output_media_path)+1:])
f.write(filepath, zip_path)
cmd_remove='rm -rf %s' %(output_foldername)
os.system(cmd_remove)
f.close()
关于压缩技术已在Django压缩包下载一文做过介绍,这儿就不再详述了。
提醒一下的是,对于命令f.write
,第二个参数最好指定,否则它会根据源文件路径压缩,层级太多。
有的windows电脑上rm
命令不支持,我们可以用下面命令代替
例如:
del /s /q *.svn
rmdir /s /q *.svn
/q
disables Yes/No prompting
/s
means delete the file(s) from all subdirectories.
定时调度
到这儿,我们的基本功能就实现了。但是我们希望它是周期性调度的,而不是只执行一次。
接下来我们用python的schedule
功能来实现。
首先如果没安装先安装一下
pip install schedule
使用非常简单,它的demo代码如下
import schedule
import time
def job():
print("I'm working...")
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
schedule.run_pending()
time.sleep(1)
假如说我想每分钟都运行一次,可以按照下面这样实现。
schedule.every(1).minutes.do(backup)
while True:
schedule.run_pending()
time.sleep(1)
邮件通知
每一次备份都有可能出错,我们希望结果可以实时通知到管理员,不管备份是成功还是失败了。
这儿我采用邮件方式来实现。
我用的是sina邮箱,首先你要在sina客户端配置好,让它支持smtp.
接下来实现发送邮件的功能,我做了一个封装类
import smtplib
from email.mime.text import MIMEText
import conf
class MailUtil():
def __init__(self, *args, **kwargs):
self.SMTPserver = kwargs.get('address', None)
self.sender = kwargs.get('sender', None)
self.password = kwargs.get('password', None)
self.destination = args[0]
def sendmail(self, subject, message):
msg = MIMEText(message,_charset='utf-8')
msg['Subject'] = subject
msg['From'] = self.sender
msg['To'] = self.destination
mailserver = smtplib.SMTP(self.SMTPserver, 25)
mailserver.login(self.sender, self.password)
mailserver.sendmail(self.sender, [self.destination], msg.as_string())
mailserver.quit()
# print('send email success')
在conf.py里配置好邮箱发送需要的相关配置
sender_user = dict(address='smtp.sina.com',sender='codingsoho@sina.com',password='codingsoho')
receiver_user = ('codingsoho@163.com',)
有点奇怪,我往live.com发送总是不成功。后来网上搜了一下
注意:使用的sender和receivers 都需要在对应的邮箱设置中开启SMTP服务,否则发送不会成功。
调用发送则非常简单
mailsending = MailUtil(*conf.receiver_user, **conf.sender_user)
mailsending.sendmail("codingsoho backup notification", msg)
但是部署到阿里云上时,这儿又出错了。
Traceback (most recent call last):
File "C:\virtualenv\codingsohobackup\mailutils.py", line 26, in <module>
mailsending.sendmail("codingsoho backup notification", "send sucessfully")
File "C:\virtualenv\codingsohobackup\mailutils.py", line 18, in sendmail
mailserver = smtplib.SMTP(self.SMTPserver, 25, timeout=120)
File "C:\Python27\lib\smtplib.py", line 256, in __init__
(code, msg) = self.connect(host, port)
File "C:\Python27\lib\smtplib.py", line 316, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "C:\Python27\lib\smtplib.py", line 291, in _get_socket
return socket.create_connection((host, port), timeout)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
socket.error: [Errno 10060]
原因是新开通的ECS默认禁止连接外网的25端口,可以尝试连接SSL 465端口,如 smtplib.SMTP('smtp.sina.com', 465)
。
这两个协议如下
25端口(SMTP):25端口为SMTP(Simple Mail TransferProtocol,简单邮件传输协议)服务所开放的,是用于发送邮件。
465端口(SMTPS):465端口是为SMTPS(SMTP-over-SSL)协议服务开放的,这是SMTP协议基于SSL安全协议之上的一种变种协议,它继承了SSL安全协议的非对称加密的高度安全可靠性,可防止邮件泄露。SMTPS和SMTP协议一样,也是用来发送邮件的,只是更安全些,防止邮件被黑客截取泄露,还可实现邮件发送者抗抵赖功能。防止发送者发送之后删除已发邮件,拒不承认发送过这样一份邮件。
我试了一下,至少telnet smtp.sina.com 465
能连上,但是还是报错。
Traceback (most recent call last):
File "C:\virtualenv\codingsohobackup\mailutils.py", line 26, in <module>
mailsending.sendmail("codingsoho backup notification", "send sucessfully")
File "C:\virtualenv\codingsohobackup\mailutils.py", line 18, in sendmail
mailserver = smtplib.SMTP(self.SMTPserver, 465)
File "C:\Python27\lib\smtplib.py", line 256, in __init__
(code, msg) = self.connect(host, port)
File "C:\Python27\lib\smtplib.py", line 317, in connect
(code, msg) = self.getreply()
File "C:\Python27\lib\smtplib.py", line 368, in getreply
raise SMTPServerDisconnected("Connection unexpectedly closed")
smtplib.SMTPServerDisconnected: Connection unexpectedly closed
查了一下资料,原来SMTP
和SMTPS
使用不同的函数来创建。
另外,上面代码中只支持单个接收人,我同时修改了代码支持多个接收地址,修改后的代码如下:
class MailUtil():
def __init__(self, *args, **kwargs):
self.SMTPserver = kwargs.get('address', None)
self.sender = kwargs.get('sender', None)
self.password = kwargs.get('password', None)
self.destination = args[0]
self.port = kwargs.get('port', 25)
def sendmail(self, subject, message, port=25):
msg = MIMEText(message,_charset='utf-8')
msg['Subject'] = Header(subject, 'utf-8')
msg['From'] = self.sender
msg['To'] = ", ".join(self.destination) # multi recipient concatenated with ","
try:
if 25 == self.port:
mailserver = smtplib.SMTP(self.SMTPserver, self.port)
elif 465 == self.port:
mailserver = smtplib.SMTP_SSL(self.SMTPserver, self.port)
else:
mailserver = smtplib.SMTP(self.SMTPserver, 25)
# for debug
'''
mailserver.set_debuglevel(1)
mailserver.ehlo()
'''
try:
mailserver.login(self.sender, self.password)
except smtplib.SMTPException as e:
print("Authentication failed:", e)
sys.exit(1)
mailserver.sendmail(self.sender, self.destination, msg.as_string())
except (socket.gaierror, socket.error, socket.herror, smtplib.SMTPException) as e:
print(" *** your message may not have been sent! ")
print(e)
sys.exit(2)
else:
print("Message successfully sent to %d recipient(s) " %len(self.destination ))
mailserver.quit()
配置文件同样修改
sender_user = dict(address='smtp.sina.com',sender='codingsoho@sina.com',password='codingsoho')
receiver_user = (['codingsoho@163.com','codingsoho@126.com'],)
完成之后,邮件就可以正常发送了。
上传到百度云
安装
pip install bypy
授权
执行bypy info
,然后根据提示通过百度的网址访问获取授权码,激活授权。
bypy info
(env35) E:\Computer\virtualenv\codingsohobackup\src>bypy info
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> You are running Python on Windows, which doesn't support Unicode so well.
Files with non-ASCII names may not be handled correctly.
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> WARNING: System locale is not 'UTF-8'.
Files with non-ASCII names may not be handled correctly.
You should set your System Locale to 'UTF-8'.
Current locale is 'cp936'
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<W> Encoding for StdOut: cp936
<W> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Please visit:
https://openapi.baidu.com/oauth/2.0/authorize?scope=basic+netdisk&redirect_uri=oob&response_type=code&client_id=q8WE4EpCsau1oS0MplgMKNBn
And authorize this app
Paste the Authorization Code here within 10 minutes.
Press [Enter] when you are done
根据上面提示的网址https://openapi.baidu.com/oauth/2.0/authorize?scope=basic+netdisk&redirect_uri=oob&response_type=code&client_id=q8WE4EpCsau1oS0MplgMKNBn
去获取授权码,然后输入并回车
56dd11c4921359297a44901686fadcce
Authorizing, please be patient, it may take upto 300 seconds...
Authorizing/refreshing with the OpenShift server ...
OpenShift server failed, authorizing/refreshing with the Heroku server ...
Successfully authorized
Quota: 2.007TB
Used: 172.078MB
授权成功。
测试
接下来测试百度云的同步功能,包括上传、下载及同步比较。由于百度PCS API权限限制,程序只能存取百度云端/apps/bypy目录下面的文件和目录。
执行
bypy list
/apps/bypy ($t $f $s $m $d):
上传命令:
bypy upload
比如创建本地文件1.txt,执行该命令,即可将文件上传
下载命令:
bypy downdir
完成之后可以通过下面命令比较
bypy compare
上面的例子中,比较结果如下
==== Same files ===
F - 1.txt
==== Different files ===
==== Local only ====
==== Remote only ====
Statistics:
--------------------------------
Same: 1
Different: 0
Local only: 0
Remote only: 0
程序实现媒体文件上传
我们会用 bypy upload
实现文件上传,查看源代码https://github.com/houtianze/bypy/blob/master/bypy/bypy.py,它的定义如下
def upload(self, localpath = '', remotepath = '', ondup = "overwrite"):
''' Usage: upload [localpath] [remotepath] [ondup] - \
upload a file or directory (recursively)
localpath - local path, is the current directory '.' if not specified
remotepath - remote path at Baidu Yun (after app root directory at Baidu Yun)
ondup - what to do upon duplication ('overwrite' or 'newcopy'), default: 'overwrite'
这儿localpath可以是目录,也可以是指定的文件
接下来用subprocess
来调用该命令执行上传
import subprocess
subprocess.call(['bypy', 'upload', str(result_file), "/upload/to/"])
这样,文件就会上传到百度云的/app/bypy/upload/to
目录
Next
剩下的需要完成的
- 备份媒体文件时,只备份差异部分,否则媒体库变多后对磁盘是很大的负担