fetch_20newsgroups 数据集导入失败: no handlers could be fetch_20newsgroups
最简单的办法
下载'20news-bydate.pkz', 放到C:\\Users\[Current user]\scikit_learn_data 下边就行.
实际上
scikit learning默认的路径是C:\\Users\[Current user]\scikit_learn_data
也可以添加环境变量'SCIKIT_LEARN_DATA', 程序会在环境变量设置的目录后加scikit_learn_data作为数据集存放的目录
不想用这两个目录的话,可以改site-package/sklearn/datasets/base.py里 的函数: get_data_home(data_home=None)
另一个解决的办法是
1. 手动下载 http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz,
存放到scikit_learn_data/20news_home/下
2. 改site-package/sklearn/datasets/twenty_newsgroups.py里的函数: download_20newsgroups
注释掉下边代码:
if not os.path.exists(target_dir): os.makedirs(target_dir) if os.path.exists(archive_path): # Download is not complete as the .tar.gz file is removed after # download. logger.warning("Download was incomplete, downloading again.") os.remove(archive_path) logger.warning("Downloading dataset from %s (14 MB)", URL) opener = urlopen(URL) with open(archive_path, 'wb') as f: f.write(opener.read())
3. 运行, 程序会自动解压20news-bydate.tar.gz,生成缓存文件20news-bydate.pkz.