数据分析常用网站 持续更新!!!
数据分析常用网站
欢迎大家补充,直接在下面留言就可以了。不限于R,excel,sql,欢迎Python学者和统计学学者。
日后会陆续贴出一些有大数据分析项目的比赛,欢迎组队
大数据比赛
赛事公告
R语言
基石
- The R Project for Statistical Computing https://www.r-project.org/
- Rstudio https://www.rstudio.com
- r-bloggers http://www.r-bloggers.com/
- 汇总R语言 (这一个够用一年的) https://github.com/qinwf/awesome-R
请大家关注 <https://github.com/qinwf/awesome-R> 这是github上很多人一直维护的(外国网站和github居多,但是很有用!!!)
Awesome R 转载于https://github.com/qinwf/awesome-R
A curated list of awesome R packages and tools. Inspired by awesome-machine-learning.
For better navigation, see https://awesome-r.com
for [Top 50](https://github.com/rstudio/RStartHere/blob/master/top_downloads_2016/top_packages) CRAN downloaded packages or repos with 400+- Awesome R
- Integrated Development Environments
- Syntax
- Data Manipulation
- Graphic Displays
- Html Widgets
- Reproducible Research
- Web Technologies and Services
- Parallel Computing
- High Performance
- Language API
- Database Management
- Machine Learning
- Natural Language Processing
- Bayesian
- Optimization
- Finance
- Bioinformatics
- Network Analysis
- R Development
- Logging
- Other Tools
- Other Interpreters
- Learning R
- Resources
- Other Awesome Lists
- Contributing
Integrated Development Environments
Integrated Development Environment
- RStudio - A powerful and productive user interface for R. Works great on Windows, Mac, and Linux.
- Emacs + ESS - Emacs Speaks Statistics is an add-on package for emacs text editors.
- Sublime Text + R-Box - Add-on package for Sublime Text 2/3.
- TextMate + r.tmblundle - Add-on package for TextMate 1/2.
- StatET - An Eclipse based IDE for R.
- Revolution R Enterprise - Revolution R would be offered free to academic users and commercial software would focus on big data, large scale multiprocessor functionality.
- R Commander - A package that provides a basic graphical user interface.
- IRkernel - R kernel for Jupyter.
- Deducer - A Menu driven data analysis GUI with a spreadsheet like data editor.
- Radiant - A platform-independent browser-based interface for business analytics in R, based on the Shiny.
- Vim-R - Vim plugin for R.
- Nvim-R - Neovim plugin for R.
- JASP - A complete package for both Bayesian and Frequentist methods, that is familiar to users of SPSS.
- Bio7 - A IDE contains tools for model creation, scientific image analysis and statistical analysis for ecological modelling.
- RTVS - R Tools for Visual Studio.
Syntax
Packages change the way you use R.
- magrittr - Let's pipe it.
- pipeR - Multi-paradigm Pipeline Implementation.
- lambda.r - Functional programming and simple pattern matching in R.
- purrr - A FP package for R in the spirit of underscore.js.
Data Manipulation
Packages for cooking data.
- dplyr - Fast data frames manipulation and database query.
- data.table - Fast data manipulation in a short and flexible syntax.
- reshape2 - Flexible rearrange, reshape and aggregate data.
- readr - A fast and friendly way to read tabular data into R.
- haven - Improved methods to import SPSS, Stata and SAS files in R.
- tidyr - Easily tidy data with spread and gather functions.
- broom - Convert statistical analysis objects into tidy data frames.
- rlist - A toolbox for non-tabular data manipulation with lists.
- jsonlite - A robust and quick way to parse JSON files in R.
- ff - Data structures designed to store large datasets.
- lubridate - A set of functions to work with dates and times.
- stringi - ICU based string processing package.
- stringr - Consistent API for string processing, built on top of stringi.
- bigmemory - Shared memory and memory-mapped matrices. The big* pacakges provide additional tools including linear models (biglm) and Random Forests (bigrf).
- fuzzyjoin - Join tables together on inexact matching.
Graphic Displays
Packages for showing data.
- ggplot2 - An implementation of the Grammar of Graphics.
- ggfortify - A unified interface to ggplot2 popular statistical packages using one line of code.
- ggrepel - Repel overlapping text labels away from each other.
- ggalt - Extra Coordinate Systems, Geoms and Statistical Transformations for ggplot2.
- ggplot2 Extensions - Showcases of ggplot2 extensions.
- lattice - A powerful and elegant high-level data visualization system.
- rgl - 3D visualization device system for R.
- Cairo - R graphics device using cairo graphics library for creating high-quality display output.
- extrafont - Tools for using fonts in R graphics.
- showtext - Enable R graphics device to show text using system fonts.
- animation - A simple way to produce animated graphics in R, using ImageMagick.
- gganimate - Create easy animations with ggplot2.
- misc3d - Powerful functions to deal with 3d plots, isosurfaces, etc.
- xkcd - Use xkcd style in graphs.
- imager - An image processing package based on CImg library to work with images and display them.
HTML Widgets
Packages for interactive visualizations.
- d3heatmap - Interactive heatmaps with D3.
- DataTables - Displays R matrices or data frames as interactive HTML tables.
- DiagrammeR - Create JS graph diagrams and flowcharts in R.
- dygraphs - Charting time-series data in R.
- formattable - Formattable Data Structures.
- ggvis - Interactive grammar of graphics for R.
- Leaflet - One of the most popular JavaScript libraries interactive maps.
- MetricsGraphics - Enables easy creation of D3 scatterplots, line charts, and histograms.
- networkD3 - D3 JavaScript Network Graphs from R.
- scatterD3 - Interactive scatterplots with D3.
- plotly - Interactive ggplot2 and Shiny plotting with plot.ly.
- rCharts - Interactive JS Charts from R.
- rbokeh - R Interface to Bokeh.
- threejs - Interactive 3D scatter plots and globes.
Reproducible Research
Packages for literate programming.
- knitr - Easy dynamic report generation in R.
- xtable - Export tables to LaTeX or HTML.
- rapport - An R templating system.
- rmarkdown - Dynamic documents for R.
- slidify - Generate reproducible html5 slides from R markdown.
- Sweave - A package designed to write LaTeX reports using R.
- texreg - Formatting statistical models in LaTex and HTML.
- checkpoint - Install packages from snapshots on the checkpoint server.
- brew - Pre-compute data to enhance your report templates. Can be combined with knitr.
- ReporteRs - An R package to generate Microsoft Word, Microsoft PowerPoint and HTML reports.
- bookdown - Authoring Books with R Markdown.
Web Technologies and Services
Packages to surf the web.
- Web Technologies List - Information about how to use R and the world wide web together.
- shiny - Easy interactive web applications with R.
- RCurl - General network (HTTP/FTP/...) client interface for R.
- httr - User-friendly RCurl wrapper.
- httpuv - HTTP and WebSocket server library.
- XML - Tools for parsing and generating XML within R.
- rvest - Simple web scraping for R, using CSSSelect or XPath syntax.
- OpenCPU - HTTP API for R.
- Rfacebook - Access to Facebook API via R.
- RSiteCatalyst - R client library for the Adobe Analytics.
Parallel Computing
Packages for parallel computing.
- parallel - R started with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow.
- Rmpi - Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment.
- foreach - Executing the loop in parallel.
- SparkR - R frontend for Spark.
- DistributedR - A scalable high-performance platform from HP Vertica Analytics Team.
- ddR - Provides distributed data structures and simplifies distributed computing in R.
High Performance
Packages for making R faster.
- Rcpp - Rcpp provides a powerful API on top of R, make function in R extremely faster.
- Rcpp11 - Rcpp11 is a complete redesign of Rcpp, targetting C++11.
- compiler - speeding up your R code using the JIT
Language API
Packages for other languages.
- rJava - Low-level R to Java interface.
- jvmr - Integration of R, Java, and Scala.
- rJython - R interface to Python via Jython.
- rPython - Package allowing R to call Python.
- runr - Run Julia and Bash from R.
- RJulia - R package Call Julia.
- RinRuby - a Ruby library that integrates the R interpreter in Ruby.
- R.matlab - Read and write of MAT files together with R-to-MATLAB connectivity.
- RcppOctave - Seamless Interface to Octave and Matlab.
- RSPerl - A bidirectional interface for calling R from Perl and Perl from R.
- V8 - Embedded JavaScript Engine.
- htmlwidgets - Bring the best of JavaScript data visualization to R.
- rpy2 - Python interface for R.
Database Management
Packages for managing data.
- RODBC - ODBC database access for R.
- DBI - Defines a common interface between the R and database management systems.
- elastic - Wrapper for the Elasticsearch HTTP API
- mongolite - Streaming Mongo Client for R
- RMySQL - R interface to the MySQL database.
- ROracle - OCI based Oracle database interface for R.
- RPostgreSQL - R interface to the PostgreSQL database system.
- RSQLite - SQLite interface for R
- RJDBC - Provides access to databases through the JDBC interface.
- rmongodb - R driver for MongoDB.
- rredis - Redis client for R.
- RCassandra - Direct interface (not Java) to the most basic functionality of Apache Cassanda.
- RHive - R extension facilitating distributed computing via Apache Hive.
- RNeo4j - Neo4j graph database driver.
Machine Learning
Packages for making R cleverer.
- AnomalyDetection - AnomalyDetection R package from Twitter.
- ahaz - Regularization for semiparametric additive hazards regression.
- arules - Mining Association Rules and Frequent Itemsets
- bigrf - Big Random Forests: Classification and Regression Forests for
Large Data Sets - bigRR - Generalized Ridge Regression (with special advantage for p >> n
cases) - bmrm - Bundle Methods for Regularized Risk Minimization Package
- Boruta - A wrapper algorithm for all-relevant feature selection
- BreakoutDetection - Breakout Detection via Robust E-Statistics from Twitter.
- bst - Gradient Boosting
- CausalImpact - Causal inference using Bayesian structural time-series models.
- C50 - C5.0 Decision Trees and Rule-Based Models
- caret - Classification and Regression Training
- Clever Algorithms For Machine Learning
- CORElearn - Classification, regression, feature evaluation and ordinal
evaluation - CoxBoost - Cox models by likelihood based boosting for a single survival
endpoint or competing risks - Cubist - Rule- and Instance-Based Regression Modeling
- e1071 - Misc Functions of the Department of Statistics (e1071), TU Wien
- earth - Multivariate Adaptive Regression Spline Models
- elasticnet - Elastic-Net for Sparse Estimation and Sparse PCA
- ElemStatLearn - Data sets, functions and examples from the book: "The Elements
of Statistical Learning, Data Mining, Inference, and
Prediction" by Trevor Hastie, Robert Tibshirani and Jerome
Friedman - evtree - Evolutionary Learning of Globally Optimal Trees
- FSelector - A feature selection framework, based on subset-search or feature ranking approches.
- frbs - Fuzzy Rule-based Systems for Classification and Regression Tasks
- GAMBoost - Generalized linear and additive models by likelihood based
boosting - gamboostLSS - Boosting Methods for GAMLSS
- gbm - Generalized Boosted Regression Models
- glmnet - Lasso and elastic-net regularized generalized linear models
- glmpath - L1 Regularization Path for Generalized Linear Models and Cox
Proportional Hazards Model - GMMBoost - Likelihood-based Boosting for Generalized mixed models
- grplasso - Fitting user specified models with Group Lasso penalty
- grpreg - Regularization paths for regression models with grouped
covariates - h2o - Deeplearning, Random forests, GBM, KMeans, PCA, GLM
- hda - Heteroscedastic Discriminant Analysis
- ipred - Improved Predictors
- kernlab - kernlab: Kernel-based Machine Learning Lab
- klaR - Classification and visualization
- kohonen - Supervised and Unsupervised Self-Organising Maps.
- lars - Least Angle Regression, Lasso and Forward Stagewise
- lasso2 - L1 constrained estimation aka ‘lasso’
- LiblineaR - Linear Predictive Models Based On The Liblinear C/C++ Library
- lme4 - Mixed-effects models
- LogicReg - Logic Regression
- maptree - Mapping, pruning, and graphing tree models
- mboost - Model-Based Boosting
- Machine Learning For Hackers
- mvpart - Multivariate partitioning
- MXNet - MXNet brings flexible and efficient GPU computing and state-of-art deep learning to R.
- ncvreg - Regularization paths for SCAD- and MCP-penalized regression
models - nnet - eed-forward Neural Networks and Multinomial Log-Linear Models
- oblique.tree - Oblique Trees for Classification Data
- pamr - Pam: prediction analysis for microarrays
- party - A Laboratory for Recursive Partytioning
- partykit - A Toolkit for Recursive Partytioning
- penalized - L1 (lasso and fused lasso) and L2 (ridge) penalized estimation
in GLMs and in the Cox model - penalizedLDA - Penalized classification using Fisher's linear discriminant
- penalizedSVM - Feature Selection SVM using penalty functions
- quantregForest - quantregForest: Quantile Regression Forests
- randomForest - randomForest: Breiman and Cutler's random forests for classification and regression.
- randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC).
- rattle - Graphical user interface for data mining in R.
- rda - Shrunken Centroids Regularized Discriminant Analysis
- rdetools - Relevant Dimension Estimation (RDE) in Feature Spaces
- REEMtree - Regression Trees with Random Effects for Longitudinal (Panel)
Data - relaxo - Relaxed Lasso
- rgenoud - R version of GENetic Optimization Using Derivatives
- rgp - R genetic programming framework
- Rmalschains - Continuous Optimization using Memetic Algorithms with Local
Search Chains (MA-LS-Chains) in R - rminer - Simpler use of data mining methods (e.g. NN and SVM) in
classification and regression - ROCR - Visualizing the performance of scoring classifiers
- RoughSets - Data Analysis Using Rough Set and Fuzzy Rough Set Theories
- rpart - Recursive Partitioning and Regression Trees
- RPMM - Recursively Partitioned Mixture Model
- RSNNS - Neural Networks in R using the Stuttgart Neural Network
Simulator (SNNS) - RWeka - R/Weka interface
- RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least
Angle Regression - sda - Shrinkage Discriminant Analysis and CAT Score Variable Selection
- SDDA - Stepwise Diagonal Discriminant Analysis
- SuperLearner and subsemble - Multi-algorithm ensemble learning packages.
- svmpath - svmpath: the SVM Path algorithm
- tgp - Bayesian treed Gaussian process models
- tree - Classification and regression trees
- varSelRF - Variable selection using random forests
- xgboost - eXtreme Gradient Boosting Tree model, well known for its speed and performance.
Natural Language Processing
Packages for Natural Language Processing.
- text2vec - Fast Text Mining Framework for Vectorization and Word Embeddings.
- tm - A comprehensive text mining framework for R.
- openNLP - Apache OpenNLP Tools Interface.
- koRpus - An R Package for Text Analysis.
- zipfR - Statistical models for word frequency distributions.
- NLP - Basic functions for Natural Language Processing.
- LDAvis - Interactive visualization of topic models.
- topicmodels - Topic modeling interface to the C code developed by by David M. Blei for Topic Modeling (Latent Dirichlet Allocation (LDA), and Correlated Topics Models (CTM)).
- syuzhet - Extracts sentiment from text using three different sentiment dictionaries.
- SnowballC - Snowball stemmers based on the C libstemmer UTF-8 library.
- quanteda - R functions for Quantitative Analysis of Textual Data.
- Topic Models Resources - Topic Models learning and R related resources.
- NLP for - NLP related resources in R. @Chinese
Bayesian
Packages for Bayesian Inference.
- coda - Output analysis and diagnostics for MCMC.
- mcmc - Markov Chain Monte Carlo.
- MCMCpack - Markov chain Monte Carlo (MCMC) Package.
- R2WinBUGS - Running WinBUGS and OpenBUGS from R / S-PLUS.
- BRugs - R interface to the OpenBUGS MCMC software.
- rjags - R interface to the JAGS MCMC library.
- rstan - R interface to the Stan MCMC software.
Optimization
Packages for Optimization.
- minqa - Derivative-free optimization algorithms by quadratic approximation.
- nloptr - NLopt is a free/open-source library for nonlinear optimization.
- lpSolve - Interface to
Lp_solve
to Solve Linear/Integer Programs.
Finance
Packages for dealing with money.
- quantmod - Quantitative Financial Modelling & Trading Framework for R.
- TTR - Functions and data to construct technical trading rules with R.
- PerformanceAnalytics - Econometric tools for performance and risk analysis.
- zoo - S3 Infrastructure for Regular and Irregular Time Series.
- xts - eXtensible Time Series.
- tseries - Time series analysis and computational finance.
- fAssets - Analysing and Modelling Financial Assets.
Bioinformatics
Packages for processing biological datasets.
- Bioconductor - Tools for the analysis and comprehension of high-throughput genomic data.
- genetics - Classes and methods for handling genetic data.
- gap - An integrated package for genetic data analysis of both population and family data.
- ape - Analyses of Phylogenetics and Evolution.
- pheatmap - Pretty heatmaps made easy.
Network Analysis
Packages to construct, analyze and visualize network data.
- Network Analysis List - Network Analysis related resources.
- igraph - A collection of network analysis tools.
- network - Basic tools to manipulate relational data in R.
- sna - Basic network measures and visualization tools.
- networkDynamic - Support for dynamic, (inter)temporal networks.
- ndtv - Tools to construct animated visualizations of dynamic network data in various formats.
- statnet - The project behind many R network analysis packages.
- ergm - Exponential random graph models in R.
- latentnet - Latent position and cluster models for network objects.
- tnet - Network measures for weighted, two-mode and longitudinal networks.
- rgexf - Export network objects from R to GEXF, for manipulation with network software like Gephi or Sigma.
- visNetwork - Using vis.js library for network visualization.
R Development
Packages for packages.
- Package Development List - R packages to improve package development.
- devtools - Tools to make an R developer's life easier.
- testthat - An R package to make testing fun.
- R6 - simpler, faster, lighter-weight alternative to R's built-in classes.
- pryr - Make it easier to understand what's going on in R.
- roxygen - Describe your functions in comments next to their definitions.
- lineprof - Visualise line profiling results in R.
- packrat - Make your R projects more isolated, portable, and reproducible.
- installr - Functions for installing softwares from within R (for Windows).
- import - An import mechanism for R.
- Rocker - R configurations for Docker.
- RStudio Addins - List of RStudio addins.
- drat - Creation and use of R repositories on GitHub or other repos.
- covr - Test coverage for your R package and (optionally) upload the results to coveralls or codecov.
- lintr - Static code analysis for R to enforce code style.
- staticdocs - Generate static html documentation for an R package.
Logging
Packages for Logging
- futile.logger - A logging package in R similar to log4j
- log4r - A log4j derivative for R
- logging - A logging package emulating the python logging package.
Other Tools
Handy Tools for R
- git2r - Gives you programmatic access to Git repositories from R.
Other Interpreters
Alternative R engines.
- CXXR - Refactorising R into C++.
- fastR - FastR is an implementation of the R Language in Java atop Truffle and Graal.
- incanter - Clojure-based, R-like statistical computing and graphics environment for the JVM with Lisp spirit.
- pqR - a "pretty quick" implementation of R
- renjin - a JVM-based interpreter for R.
- rho - Refactor the interpreter of the R language into a fully-compatible, efficient, VM for R.
- riposte - a fast interpreter and JIT for R.
- RRO - Revolution R Open.
- TERR - TIBCO Enterprise Runtime for R.
Learning R
Packages for Learning R.
- swirl - An interactive R tutorial directly in your R console.
- DataScienceR - a list of R tutorials for Data Science, NLP and Machine Learning.
Resources
Where to discover new R-esources.
Websites
- R-project - The R Project for Statistical Computing.
- R Bloggers - There are people scattered across the Web who blog about R. This is simply an aggregator of many of those feeds.
- DataCamp - Learn R data analytics online.
- Quick-R - An excellent quick reference.
- Advanced R - An online version of the Advanced R book.
- Efficient R Programming - An online home of the O’Reilly book: Efficient R Programming.
- CRAN Task Views - Task Views for CRAN packages.
- The R Programming Wikibook - A collaborative handbook for R.
- R-users - A job board for R users (and the people who are looking to hire them)
- R Cookbook - A problem-oriented website that supports the R Graphics Cookbook.
- tryR - A quick course for getting started with R.
Books
- R Books List - List of R Books.
- The Art of R Programming - It's a good resource for systematically learning fundamentals such as types of objects, control statements, variable scope, classes and debugging in R.
- Free Books - CRAN Contributed Documentation in many languages.
- R Cookbook - A quick and simple introduction to conducting many common statistical tasks with R.
- Books written as part of the Johns Hopkins Data Science Specialization:
- Exploratory Data Analysis with R - Basic analytical skills for all sorts of data in R.
- R Programming for Data Science - More advanced data analysis that relies on R programming.
- Report Writing for Data Science in R - R-based methods for reproducible research and report generation.
- R Packages - A book (in paper and website formats) on writing R packages.
- R in Action - This book aims at all levels of users, with sections for beginning, intermediate and advanced R ranging from "Exploring R data structures" to running regressions and conducting factor analyses.
- Use R! - This series of inexpensive and focused books from Springer publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area, such as Bayesian networks, ggplot2 and Rcpp.
- R for SAS and SPSS users - An excelllent resource for users already familiar with SAS or SPSS.
- An Introduction to R - A very good introductory text on R, also covers some advanced topics.
- Introduction to Statistical Learning with Application in R - A simplified and "operational" version of The Elements of Statistical Learning. Free softcopy provided by its authors.
- The R Inferno - Patrick Burns gives insight into R's ins and outs along with its quirks!
Podcasts
- Not So Standard Deviations - The Data Science Podcast.
- R World News - R World News helps you keep up with happenings within the R community.
- @Bob Rudis and @Jay Jacobs.
- The R-Podcast - Giving practical advice on how to use R.
- R Talk - News and discussions of statistical software and language R.
Reference Cards
- R Reference Card 2.0 - Material from R for Beginners by permission of Emmanuel Paradis (Version 2 by Matt Baggott).
- Regression Analysis Refcard - R Reference Card for Regression Analysis.
- Reference Card for ESS - Reference Card for ESS.
- R Markdown Cheat sheet - Quick reference guide for writing reports with R Markdown.
- Shiny Cheat sheet - Quick reference guide for building Shiny apps.
- ggplot2 Cheat sheet - Quick reference guide for data visualisation with ggplot2.
- devtools Cheat sheet - Quick reference guide to package development in R.
MOOCs
Massive open online courses.
- The Analytics Edge - Hands-on introduction to data analysis with R from MITx.
- Johns Hopkins University Data Science Specialization - 9 courses including: Introduction to R, literate analysis tools, Shiny and some more.
- HarvardX Biomedical Data Science - Introduction to R for the Life Sciences.
- Explore Statistics with R - Covers introduction, data handling and statistical analysis in R.
Lists
Great resources for learning domain knowledge.
- Books - List of R Books.
- DataScienceR - a list of R tutorials for Data Science, NLP and Machine Learning.
- ggplot2 Extensions - Showcases of ggplot2 extensions.
- Natural Language Processing - NLP related resources in R. @Chinese
- Network Analysis - Network Analysis related resources.
- Open Data - Using R to obtain, parse, manipulate, create, and share open data.
- Posts - Great R blog posts or Rticles.
- Package Development - R packages to improve package development.
- R Project Conferences - Information about useR! Conferences and DSC Conferences.
- RStartHere - A guide to some of the most useful R packages, organized by workflow.
- RStudio Addins - List of RStudio addins.
- Topic Models - Topic Models learning and R related resources.
- Web Technologies - Information about how to use R and the world wide web together.