python使用deepwalk模型算节点相似度
待整理
github:https://github.com/prateekjoshi565/DeepWalk
方法:
https://blog.csdn.net/gdh756462786/article/details/79108665/
一、直接依赖requirements.txt会有问题,
ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec'
需要把gensim的版本改成3.8.3
二、具体过程
下载源代码
https://github.com/phanein/deepwalk
数据集的定义
http://leitang.net/social_dimension.html
核心代码
walks = graph.build_deepwalk_corpus(G, num_paths=args.number_walks, path_length=args.walk_length, alpha=0, rand=random.Random(args.seed)) print("Training...") model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)
安装
cd deepwalk-master pip install -r requirements.txt python setup.py install
复现试验结果
1. BlogCatalog dataset
生成Embedding
deepwalk --format mat --input example_graphs/blogcatalog.mat --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.embeddings
评估
python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings --network example_graphs/blogcatalog.mat --num-shuffle 10 --all
2. Karate dataset
生成Embedding
--format默认.adjlist文件
deepwalk --input example_graphs/karate.adjlist --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/karate.embeddings
评估
--network需要.mat文件
option如下:
usage: scoring [-h] --emb EMB --network NETWORK [--adj-matrix-name ADJ_MATRIX_NAME] [--label-matrix-name LABEL_MATRIX_NAME] [--num-shuffles NUM_SHUFFLES] [--all]
optional arguments:
-h, --help show this help message and exit
--emb EMB Embeddings file (default: None)
--network NETWORK A .mat file containing the adjacency matrix and node
labels of the input network. (default: None)
--adj-matrix-name ADJ_MATRIX_NAME
Variable name of the adjacency matrix inside the .mat
file. (default: network)
--label-matrix-name LABEL_MATRIX_NAME
Variable name of the labels matrix inside the .mat
file. (default: group)
--num-shuffles NUM_SHUFFLES
Number of shuffles. (default: 2)
--all The embeddings are evaluated on all training percents
from 10 to 90 when this flag is set to true. By
default, only training percents of 10, 50 and 90 are
used. (default: False)
参考:https://blog.csdn.net/YizhuJiao/article/details/81095346
github:https://github.com/phanein/deepwalk