杂项

结巴直接分词

　　python -m jieba -d ' ' allTrain.txt > train_contents.txt

使用redis

　　cmd1 ：redis-server.exe redis.windows.conf

　　cmd2：redis-cli.exe -h 127.0.0.1 -p 6379

　　scrapy-redis src- scrapy-redis copy- scrapy project

redis

　　keys * 列出

　　https://github.com/rmax/scrapy-redis

　　type jobbole:requests ：类型

　　zrange jobbole:requests 0 1 ：zset元素

　　scard jobbole:dupefilter ：set元素数量

　　smembers jobbole:dupefilter ：获得key

查看mysql文件夹位置

　　show global variables like "%datadir%"

打开 tensorflow summary 的目录执行 tensorboard --logdir=C:\redis\logs

　　TensorBoard 0.1.6 at http://DESKTOP-FIPG2GH:6006 (Press CTRL+C to quit) 便可以在浏览器输入 localhost：6006 查看tensorflow 模型相关 graph HISTOGRAMS

jupyter

　　'sha1:f0147912cfac:fe72a5a54b1bb234881e4fdc5d04419d70dc4e58'

LINUX下批量修改文件夹下面的文件名

　　i=1; for x in *; do mv $x $i.扩展名; let i=i+1; done

删除文件夹及文件夹下所有内容

　　rm -rf folder

python 替换掉字符串中的换行符

　　str.replace('\n',' ')

RE处理数据

 1 import re
 2 import os
 3 dir_list = [dirs for dirs in sorted(os.listdir()) if dirs.endswith('.json')]
 4 print("JSON文件:{0}".format(len(dir_list)))
 5 path = '../pubmedData/'
 6 if not os.path.exists(path):
 7     os.makedirs(path)
 8     
 9 for file in dir_list:
10     print("正在处理:{0}".format(file))
11     with open(file,'r') as f:
12         x = f.read()
13     cit_pubmed = re.findall('cit {(.*?)Pubmed-entry',x,re.DOTALL)
14     print("匹配到的总数:{0}".format(len(cit_pubmed)))
15 
16     i = 0
17     j = 0
18     k = 0
19     set_title_list = []
20     set_abstract_list = []
21     set_issn_list = []
22     issn_class = []
23     for y in range(len(cit_pubmed)):
24         #title
25         title = re.findall('title {(.*?)authors {',cit_pubmed[y],re.DOTALL)
26         set_title_list.append(len(title))
27         if len(title) == 2:
28             i += 1
29             title = re.findall('name "(.*?)."', title[0], re.DOTALL)
30         if len(title) == 1:
31             title = re.findall('name "(.*?)."', title[0], re.DOTALL)
32             i += 1
33         
34         #issn
35         issn = re.findall('issn "(.*?)",',cit_pubmed[y], re.DOTALL)
36         if len(issn) == 1:
37             #abstract
38             abstract = re.findall('abstract "(.*?).",',cit_pubmed[y],re.DOTALL)
39             if len(abstract) == 1:
40                 with open(path + issn[0] + '.txt','a') as f:
41                     f.write(abstract[0].replace("\n", " ") + '\n')
42                 j += 1
43             set_abstract_list.append(len(abstract))
44             
45             issn_class.append(issn[0])
46             k += 1
47         set_issn_list.append(len(issn))
48         
49     set_title_list = set(set_title_list) 
50     set_abstract_list = set(set_abstract_list)
51     set_issn_list = set(set_issn_list)
52     print("TITLE种类：{0},总数：{1}".format(set_title_list, i))
53     print("ABSTRACT种类：{0},总数：{1}".format(set_abstract_list, j))
54     print("ISSN种类：{0},总数：{1}".format(set_issn_list, k))
55     print("ISSN_CLASS:{0}类".format(len(set(issn_class))))

numpy argsort()　

1 import numpy as np
2 x=np.array([5,4,3,2,1])
3 y = x.argsort()
4 #output array([4, 3, 2, 1, 0])

取出ndarray 中最大的五个数的index

x=np.array([[5,4,3,2,1,7,8,9],[1,2,3,4,5,9,8,6]])
y = map(lambda label: label.argsort()[-1:-6:-1], x)
t = list()
t.extend(y)
#result [array([7, 6, 5, 0, 1]), array([5, 6, 7, 4, 3])]

numpy.hstack() horizontal 水平的 a = array([1,2,3]) b = array([4,5,6]) c = array([1,2,3,4,5,6])

numpy.vstack() vertical 垂直的 a = array([1,2,3]) b = array([4,5,6]) c = array([1,2,3],[4,5,6])

统计数组中出现次数最少的两个值

1 from collections import Counter
2 a = [1,2,3,4,2,3,4,5]
3 x = Counter(a).most_common()[-2:]

查看文件夹大小

　　du -h --max-depth=1 pubmedData

查看单个文件大小

　　ls -sh 1932-6203.txt

列出当前文件夹下前十个最大的文件

　　du -a | sort -n -r | head -n 10

python 引用　

 1 x = [1,2,3]
 2 y = x
 3 print (y)
 4 >>[1,2,3]
 5 x.pop()
 6 print (y)
 7 >>[1,2]
 8 x = [1,2,3]
 9 y = x[:]
10 print (y)
11 >>[1,2,3]
12 x.pop()
13 print (y)
14 >>[1,2,3]

Python中一个对象有两个头部信息 1.型标志符标识对象的类型 2.引用计数器用来决定是不是可以回收这个变量

类型属于对象的不属于变量 python变量是在特定的时间引用了特定的变量 a = 123(整数) a = '123'(字符串) a = 1.23(float)

对象的垃圾收集 a = 123(整数) a = '123'(字符串) a = 1.23(float) 如果a 从指向int对象123 变成指向str对象‘123’则int对象123就要进行回收被回收的空间自动放到 # 自由内存空间池 #

递归计算任意结构list元素和

def sum(l):
    total = 0
    for x in l:
        if  not isinstance(x, list):
            total += x
        else:
            total += sum(x)
    return total
sum([[1,2,3],[1,[2]]])

posted @ 2017-11-21 15:28 WangLC 阅读(322) 评论(0) 编辑收藏举报

刷新页面返回顶部

WANGLC

杂项

公告