Python 爬虫笔记、多线程、xml解析、基础笔记(不定时更新)

1  Python学习网址http://www.runoob.com/python/python-multithreading.html

 

 

    注意高级中的xml解析和多线程

 

 

 

 

参考笔记 虫师  http://www.cnblogs.com/fnng/p/3576154.html

 

#自动访某个网址

from selenium import webdriver
import time
M = 100000 
i = 0
URL = 'http://www.baidu.com'
browser = webdriver.Firefox() #浏览器名字,以本机安装为准
while i < M:
    browser.get(URL)
    time.sleep(1)
    i += 1
browser.quit()
print '本次python总共打开了', i, ''  


#提取一级标题

import urllib2
from sgmllib import SGMLParser
URL = 'http://www.baidu.com' 
class ListName(SGMLParser):
    def __init__(self):
        SGMLParser.__init__(self)
        self.is_h4 = ""
        self.name = []
    def start_h4(self, attrs):
        self.is_h4 = 1
    def end_h4(self):
        self.is_h4 = ""
    def handle_data(self, text):
        if self.is_h4 == 1:
            self.name.append(text)
 
content = urllib2.urlopen(URL).read()
listname = ListName()
listname.feed(content)
for item in listname.name:
    print item.decode('gbk').encode('utf8')    


#访问百度,并填写表单,中文暂时不好解决,英文没问题

# coding = utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
from selenium import webdriver


browser = webdriver.Firefox()

browser.get("http://www.baidu.com")
browser.find_element_by_id("kw").send_keys("你好").decode('gbk').encode('gb2312')
browser.find_element_by_id("su").click()
time.sleep(30)  # 休眠3秒
browser.quit()

 

 

3 Python多线程  http://www.cnblogs.com/fnng/p/3670789.html

 

科技在发展,时代在进步,我们的CPU也越来越快,CPU抱怨,P大点事儿占了我一定的时间,其实我同时干多个活都没问题的;于是,操作系统就进入了多任务时代。我们听着音乐吃着火锅的不在是梦想。python提供了两个模块来实现多线程thread 和threading ,thread 有一些缺点,在threading 得到了弥补,为了不浪费你和时间,所以我们直接学习threading 就可以了,引入threadring来同时播放音乐和视频:

 

#coding=utf-8

import threading
from time import ctime,sleep


def music(func):
    for i in range(2):
        print "I was listening to %s. %s" %(func,ctime())
        sleep(1)

def move(func):
    for i in range(2):
        print "I was at the %s! %s" %(func,ctime())
        sleep(5)

threads = []
t1 = threading.Thread(target=music,args=(u'爱情买卖',))
threads.append(t1)
t2 = threading.Thread(target=move,args=(u'阿凡达',))
threads.append(t2)

if __name__ == '__main__':
    for t in threads:
        t.setDaemon(True)
        t.start()
    t.join()
    print "all over %s" %ctime()

输出内容

I was listening to 爱情买卖. Thu Jul 09 14:39:20 2015
I was at the 阿凡达! Thu Jul 09 14:39:20 2015
I was listening to 爱情买卖. Thu Jul 09 14:39:21 2015
I was at the 阿凡达! Thu Jul 09 14:39:25 2015
all over Thu Jul 09 14:39:30 2015

 

Python学习网址

http://www.scipy-lectures.org/

 

GUI编程

https://wiki.python.org/moin/GuiProgramming

posted @ 2015-07-03 08:34  kongmeng  阅读(520)  评论(0编辑  收藏  举报