define different Jieba objects in python file

Now, I have three different vocab.txt (glove, tencent.ai, fasttext).

Target: use these vocab.txt to init jieba object in one python file.

Method: if define three different jieba objects, there should be three different cache files here. Of course, should solve how to pass in different cache file paths ?  In 

/home/user/anaconda3/envs/py36/lib/python3.6/site-packages/jieba/__init__.py, change the parameters of the __init__() function.

 51
 52 class Tokenizer(object):
 53
 54     def __init__(self, tmp_dir=None, dictionary=DEFAULT_DICT):
 55         self.lock = threading.RLock()
 56         if dictionary == DEFAULT_DICT:
 57             self.dictionary = dictionary
 58         else:
 59             self.dictionary = _get_abs_path(dictionary)
 60         self.FREQ = {}
 61         self.total = 0
 62         self.user_word_tag_tab = {}
 63         self.initialized = False
 64         self.tmp_dir = tmp_dir
 65         self.cache_file = None

 

Result:

 1 import sys
 2 sys.path.append('/home/user/anaconda3/envs/py36/lib/python3.6/site-packages/jieba')
 3 from jieba import Tokenizer
 4 class Jieba(object):
 5     """docstring for Jie"""
 6     def __init__(self, vocab_path, model_path):
 7         super(Jie, self).__init__()
 8         self.jieba = Tokenizer(os.path.join("/home/user/models/serving_embedding_torch/model_path/torch/data", model_path))
 9         self.jieba.load_userdict(vocab_path)
10 
11     def seg(self, text):
12         print(list(self.jieba.cut(text, cut_all=False)))
13 
14 a = Jieba('glove.model/vocab.txt', 'glove.model')
15 b = Jieba('tencent.model/vocab.txt', 'tencent.model')
16 c = Jieba('fb.model/vocab.txt', 'fb.model')
17 text = "区块链是一个好方向海派青年公寓龙爪槐"
18 a.seg(text)
19 b.seg(text)
20 c.seg(text)
(py36) user@big-001:~/models/serving_embedding_torch/model_path/torch/data$  python3 peel.py
Building prefix dict from the default dictionary ...
2019-10-17 17:14:20,745 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/glove.model/jieba.cache
2019-10-17 17:14:21,575 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/glove.model/jieba.cache
Loading model cost 0.899 seconds.
2019-10-17 17:14:21,644 DEBUG: Loading model cost 0.899 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:14:21,644 DEBUG: Prefix dict has been built succesfully.
Building prefix dict from the default dictionary ...
2019-10-17 17:14:26,352 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/tencent.model/jieba.cache
2019-10-17 17:14:27,101 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/tencent.model/jieba.cache
Loading model cost 0.805 seconds.
2019-10-17 17:14:27,158 DEBUG: Loading model cost 0.805 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:14:27,159 DEBUG: Prefix dict has been built succesfully.
Building prefix dict from the default dictionary ...
2019-10-17 17:18:41,279 DEBUG: Building prefix dict from the default dictionary ...
Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/fb.model/jieba.cache
2019-10-17 17:18:42,045 DEBUG: Dumping model to file cache /home/user/models/serving_embedding_torch/model_path/torch/data/fb.model/jieba.cache
Loading model cost 0.822 seconds.
2019-10-17 17:18:42,101 DEBUG: Loading model cost 0.822 seconds.
Prefix dict has been built succesfully.
2019-10-17 17:18:42,102 DEBUG: Prefix dict has been built succesfully.
['区块', '链是', '一个', '好', '方向', '海派', '青年', '公寓', '龙爪槐']
['区块链', '是', '一个', '好方向', '海派青年公寓', '龙爪槐']
['区块链', '是', '一个', '好', '方向', '海派', '青年', '公寓', '龙爪槐']

 

posted @ 2019-10-17 17:50  寒杰士  阅读(221)  评论(0编辑  收藏  举报