python抓取新浪首页的小例子
参考
廖雪峰的python教程:http://www.liaoxuefeng.com/wiki/001374738125095c955c1e6d8bb493182103fac9270762a000/001386832653051fd44e44e4f9e4ed08f3e5a5ab550358d000
代码:
1 #!/usr/bin/python 2 3 # import module 4 import socket 5 import io 6 7 # create TCP object 8 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 9 # connect sina 10 s.connect(('www.sina.com.cn', 80)) 11 # send request 12 s.send('GET / HTTP/1.1\r\nHost: www.sina.com.cn\r\nConnection: close\r\n\r\n') 13 # receive data 14 buffer = [] 15 while True: 16 # every time receive 1k data 17 d = s.recv(1024) 18 if d: 19 buffer.append(d) 20 else: 21 break 22 data = ''.join(buffer) 23 # close socket 24 header, html = data.split('\r\n\r\n', 1) 25 print header 26 # write receive data to file 27 with open('sina.html', 'wb') as f: 28 f.write(html)
主要功能是模拟浏览器访问网页服务器,并从网页服务器获取返回信息