Python Reference in Data Analysis / Mining Tools

If you are already familiar with the module/package loading methods of Python, the following table is relatively easy to find.

Python is referenced in the following table as a module. Some modules are not native modules. Please use pip install * to install;

Mechine Learning

Category

Subcategory Python
LDA   sklearn.discriminant_analysis.LinearDiscriminantAnalysis
QDA   sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis
SVM (Support Vector Machine) Support Vector Classifier (SVC) sklearn.svm.SVC
Non-support vector classifier (nonSVC) sklearn.svm.NuSVC
Linear Support Vector Classifier (Lenear SVC) sklearn.svm.LinearSVC
Based on proximity K-proximity classifier sklearn.neighbors.KNeighborsClassifier
Radius proximity classifier sklearn.neighbors.RadiusNeighborsClassifier
Nearest Centroid Classifier sklearn.neighbors.NearestCentroid
Bayes Naive Bayes sklearn.naive_bayes.GaussianNB
Multinomial Naive Bayes sklearn.naive_bayes.MultinomialNB
Bernoulli Naive Bayes sklearn.naive_bayes.BernoulliNB
DecisionTree DecisionTree Classifier sklearn.tree.DecisionTreeClassifier
DecisionTree Regressor sklearn.tree.DecisionTreeRegressor
Assemble Method Bagging Random Forest Classifier sklearn.ensemble.RandomForestClassifier
Bagging Random Forest Regressor sklearn.ensemble.RandomForestRegressor
Boosting Gradient Boosting xgboost Module
Boosting AdaBoost sklearn.ensemble.AdaBoostClassifier
Cluster kmeans scipy.cluster.kmeans.kmeans
Hierarchical Cluster scipy.cluster.hierarchy.fcluster
DBSCAN sklearn.cluster.DBSCAN
Birch sklearn.cluster.Birch
K-Medoids Cluster

pyclust.KMedoids(Unknown reliability)

Association Rule Apriori Algorithm

apriori(Unknown reliability, not support py3),
PyFIM(Unknown reliability, unable to install with pip)

FP-Growth Algorithm

fp-growth(Unknown reliability, not support py3),
PyFIM(Unknown reliability, unable to install with pip)

Neural Network Neural Network neurolab.net, keras.*
Deep Learning keras.*
 


Connector & IO

Database

CategoryPython
MySQL mysql-connector-python(Official)
Oracle cx_Oracle
Redis redis
MongoDB pymongo
neo4j py2neo
Cassandra cassandra-driver
ODBC pyodbc
JDBC Unknown[Jython Only]

IO

CategoryPython
excel xlsxWriter, pandas.(from/to)_excel, openpyxl
csv csv.writer
json json
picture PIL


Statistics

CategoryPython
描述性统计汇总 scipy.stats.descirbe
均值 scipy.stats.gmean(几何平均数), scipy.stats.hmean(调和平均数), numpy.mean, numpy.nanmean, pandas.Series.mean
中位数 numpy.median, numpy.nanmediam, pandas.Series.median
众数 scipy.stats.mode, pandas.Series.mode
分位数 numpy.percentile, numpy.nanpercentile, pandas.Series.quantile
经验累积函数(ECDF) statsmodels.tools.ECDF
标准差 scipy.stats.std, scipy.stats.nanstd, numpy.std, pandas.Series.std
方差 numpy.var, pandas.Series.var
变异系数 scipy.stats.variation
协方差 numpy.cov, pandas.Series.cov
(Pearson)相关系数 scipy.stats.pearsonr, numpy.corrcoef, pandas.Series.corr
峰度 scipy.stats.kurtosis, pandas.Series.kurt
偏度 scipy.stats.skew, pandas.Series.skew
直方图 numpy.histogram, numpy.histogram2d, numpy.histogramdd

Regression (including statistics and machine learning)

类别Python
普通最小二乘法回归(ols) statsmodels.ols, sklearn.linear_model.LinearRegression
广义线性回归(gls) statsmodels.gls
分位数回归(Quantile Regress) statsmodels.QuantReg
岭回归 sklearn.linear_model.Ridge
LASSO sklearn.linear_model.Lasso
最小角回归 sklearn.linear_modle.LassoLars
稳健回归 statsmodels.RLM

 

Hypothetical Test

类别Python
t检验 statsmodels.stats.ttest_ind, statsmodels.stats.ttost_ind, statsmodels.stats.ttost.paired; scipy.stats.ttest_1samp, scipy.stats.ttest_ind, scipy.stats.ttest_ind_from_stats, scipy.stats.ttest_rel
ks检验(检验分布) scipy.stats.kstest, scipy.stats.kstest_2samp
wilcoxon(非参检验,差异检验) scipy.stats.wilcoxon, scipy.stats.mannwhitneyu
Shapiro-Wilk正态性检验 scipy.stats.shapiro
Pearson相关系数检验 scipy.stats.pearsonr

Time series

CategoryPython
AR statsmodels.ar_model.AR
ARIMA statsmodels.arima_model.arima
VAR statsmodels.var_model.var

posted on 2019-04-26 13:32  胖fufu的海鸥  阅读(263)  评论(0编辑  收藏  举报

导航