一些面试问题以及一些解法

玖富火眼python爬虫工程师面试题

2017.12可以百度谷歌搜索，可以微信电话求助

S1 = u'\xe6\x97\xa0\xe5\x90\x8d'

S2 = '\xe6\x97\xa0\xe5\x90\x8d'

分别在win环境和linux环境下输出为中文

from urllib import parse
S1 = u'\xe6\x97\xa0\xe5\x90\x8d'
s2 = S1.encode('raw_unicode_escape').decode('utf8')
print(s2)
S2 = '\xe6\x97\xa0\xe5\x90\x8d'
print(S2.encode('raw_unicode_escape').decode('utf8'))

目前这两个输出的东西一样，应该是有问题，但是至少符合题意

参考 http://www.cnblogs.com/fanjp666888/p/7797720.html

html_str = """

<div id="box1">this from blog.csdn.net/lncxydjq , DO NOT COPY!

<div id="box2">*****

</div>

"""

分别用xpath beautifulsoup 正则获取lxml文档中的注释

# 第二题
from bs4 import BeautifulSoup
import re
from lxml import etree
contents = """
<div id="box1">this from blog.csdn.net/lncxydjq , DO NOT COPY!
<div id="box2">*****

</div>
</div>
"""
demo = re.compile('',re.S)
lists = demo.findall(contents)
print(lists[0])

print (etree.HTML(contents).xpath('//div[@id="box1"]/div/node()')[1].text)

soup = BeautifulSoup(contents,'lxml')
a = soup.find_all('div',id='box2')
for i in a:
print(i.contents[1])

import requests

url = 'http://xyzfgjj.xys.gov.cn/chaxun_geren.asp'

response = requests.get(url)

print response.text # 乱码

print response.content # 乱码

# 输出如下：

# <td><font color="#ff0000">µ±Ç°Î»ÖÃ£º</font><a href="index.asp">Ê×Ò³

# </a> >> ¸öÈËÕÊ»§²éÑ¯</td>

让其输出为中文

# 第三题
import requests
url = 'http://xyzfgjj.xys.gov.cn/chaxun_geren.asp'
response = requests.get(url)
print(response.content.decode('gbk')) # 乱码

如何在生产环境中对限定使用ie浏览器 activeX密码登录控件的银行网站做网银登录？

暂无标准答案，目测可以使用调取接口的方式生成cookie

基于tornado写出一个或多个异步协程爬虫demo，展示点：get请求，post请求，获取/更新cookies，设置/获取完整请求头信息

这个网上有例子，但是由于之前没有用过这个框架，短时间看的不太能接受，以后慢慢研究

https://blog.csdn.net/WuLex/article/details/78398304

用除了python以外的其他语言写个helloworld

Alert(‘hello world’)

Printf(‘hello world’);

<?php

Print(‘hello world’)

posted @ 2018-07-24 17:28 猪啊美阅读(209) 评论(0) 编辑收藏举报

刷新页面返回顶部

猪啊美

一些面试问题以及一些解法

玖富火眼python爬虫工程师面试题

公告