Django 大文件下载
django提供文件下载时,若果文件较小,解决办法是先将要传送的内容全生成在内存中,然后再一次性传入Response对象中:
1
2
3
4
|
def simple_file_download(request): # do something... content = open ( "simplefile" , "rb" ).read() return HttpResponse(content) |
如果文件非常大时,最简单的办法就是使用静态文件服务器,比如Apache或者Nginx服务器来处理下载。不过有时候,我们需要对用户的权限做一下限定,或者不想向用户暴露文件的真实地址,或者这个大内容是临时生成的(比如临时将多个文件合并而成的),这时就不能使用静态文件服务器了。
django文档中提到,可以向HttpResponse传递一个迭代器,流式的向客户端传递数据。
要自己写迭代器的话,可以用yield:
1
2
3
4
5
6
7
8
9
10
11
12
|
def read_file(filename, buf_size = 8192 ): with open (filename, "rb" ) as f: while True : content = f.read(buf_size) if content: yield content else : break def big_file_download(request): filename = "filename" response = HttpResponse(read_file(filename)) return response |
或者使用生成器表达式,下面是django文档中提供csv大文件下载的例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
import csv from django.utils.six.moves import range from django.http import StreamingHttpResponse class Echo( object ): """An object that implements just the write method of the file-like interface. """ def write( self , value): """Write the value by returning it, instead of storing in a buffer.""" return value def some_streaming_csv_view(request): """A view that streams a large CSV file.""" # Generate a sequence of rows. The range is based on the maximum number of # rows that can be handled by a single sheet in most spreadsheet # applications. rows = ([ "Row {0}" . format (idx), str (idx)] for idx in range ( 65536 )) pseudo_buffer = Echo() writer = csv.writer(pseudo_buffer) response = StreamingHttpResponse((writer.writerow(row) for row in rows), content_type = "text/csv" ) response[ 'Content-Disposition' ] = 'attachment; filename="somefilename.csv"' return response |
python也提供一个文件包装器,将类文件对象包装成一个迭代器:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
class FileWrapper: """Wrapper to convert file-like objects to iterables""" def __init__( self , filelike, blksize = 8192 ): self .filelike = filelike self .blksize = blksize if hasattr (filelike, 'close' ): self .close = filelike.close def __getitem__( self ,key): data = self .filelike.read( self .blksize) if data: return data raise IndexError def __iter__( self ): return self def next ( self ): data = self .filelike.read( self .blksize) if data: return data raise StopIteration |
使用时:
1
2
3
4
5
6
7
8
9
10
|
from django.core.servers.basehttp import FileWrapper from django.http import HttpResponse import os def file_download(request,filename): wrapper = FileWrapper( open (filename, 'rb' )) response = HttpResponse(wrapper, content_type = 'application/octet-stream' ) response[ 'Content-Length' ] = os.path.getsize(path) response[ 'Content-Disposition' ] = 'attachment; filename=%s' % filename return response |
django也提供了StreamingHttpResponse类来代替HttpResponse对流数据进行处理。
压缩为zip文件下载:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
import os, tempfile, zipfile from django.http import HttpResponse from django.core.servers.basehttp import FileWrapper def send_zipfile(request): """ Create a ZIP file on disk and transmit it in chunks of 8KB, without loading the whole file into memory. A similar approach can be used for large dynamic PDF files. """ temp = tempfile.TemporaryFile() archive = zipfile.ZipFile(temp, 'w' , zipfile.ZIP_DEFLATED) for index in range ( 10 ): filename = __file__ # Select your files here. archive.write(filename, 'file%d.txt' % index) archive.close() wrapper = FileWrapper(temp) response = HttpResponse(wrapper, content_type = 'application/zip' ) response[ 'Content-Disposition' ] = 'attachment; filename=test.zip' response[ 'Content-Length' ] = temp.tell() temp.seek( 0 ) return response |
不过不管怎么样,使用django来处理大文件下载都不是一个很好的注意,最好的办法是django做权限判断,然后让静态服务器处理下载。
这需要使用sendfile的机制:"传统的Web服务器在处理文件下载的时候,总是先读入文件内容到应用程序内存,然后再把内存当中的内容发送给客户端浏览器。这种方式在应付当今大负载网站会消耗更多的服务器资源。sendfile是现代操作系统支持的一种高性能网络IO方式,操作系统内核的sendfile调用可以将文件内容直接推送到网卡的buffer当中,从而避免了Web服务器读写文件的开销,实现了“零拷贝”模式。 "
Apache服务器里需要mod_xsendfile模块来实现,而Nginx是通过称为X-Accel-Redirect的特性来实现。
nginx配置文件:
1
2
3
4
5
6
|
# Will serve /var/www/files/myfile.tar.gz # When passed URI /protected_files/myfile.tar.gz location / protected_files { internal; alias / var / www / files; } |
或者
1
2
3
4
5
6
|
# Will serve /var/www/protected_files/myfile.tar.gz # When passed URI /protected_files/myfile.tar.gz location / protected_files { internal; root / var / www; } |
注意alias和root的区别。
django中:
1
|
response[ 'X-Accel-Redirect' ] = '/protected_files/%s' % filename |
这样当向django view函数发起request时,django负责对用户权限进行判断或者做些其它事情,然后向nginx转发url为/protected_files/filename的请求,nginx服务器负责文件/var/www/protected_files/filename的下载:
1
2
3
4
5
6
7
8
9
10
11
|
@login_required def document_view(request, document_id): book = Book.objects.get( id = document_id) response = HttpResponse() name = book.myBook.name.split( '/' )[ - 1 ] response[ 'Content_Type' ] = 'application/octet-stream' response[ "Content-Disposition" ] = "attachment; filename={0}" . format ( name.encode( 'utf-8' )) response[ 'Content-Length' ] = os.path.getsize(book.myBook.path) response[ 'X-Accel-Redirect' ] = "/protected/{0}" . format (book.myBook.name) return response |