正文识别 正文抽取
http://www.pythontip.com/blog/post/4165/
http://www.weixinxi.wang/open/extract.html java写的在线接口 可提供收费接口服务
http://dataunion.org/424.html 总结Python正文提取的工具包
https://www.baidu.com/s?wd=python%20%20%E8%AF%86%E5%88%AB%E6%AD%A3%E6%96%87&rsv_spt=1&rsv_iqid=0xb631ad110007500b&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&oq=mongodb%20linux%E5%90%AF%E5%8A%A8%20centOs&rsv_t=4201777ACfHHdawIugmM9%2Bp0BQhSYyZjZdevxz7oeznUXqpxHxiDmfEjtvWUAHDsZHbv&inputT=11423&rsv_pq=a97e6ea3000a4fa7&rsv_sug3=326&rsv_sug1=199&rsv_sug7=100&rsv_sug2=0&rsv_sug4=11424
算法介绍
http://www.cnblogs.com/phoenixnudt/articles/2382140.html
http://blog.csdn.net/wangran51/article/details/8110082