Papers on github

Interesting Readings

Big Data Benchmark – Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez.
NoSQL Comparison – Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison.

Interesting Papers

2013 – 2014

2014 – Stanford – Mining of Massive Datasets.
2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
2013 – AMPLab – MLbase: A Distributed Machine-learning System.
2013 – AMPLab – Shark: SQL and Rich Analytics at Scale.
2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark.
2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.
2013 – Metamarkets – Druid: A Real-time Analytical Data Store.
2013 – Google – Online, Asynchronous Schema Change in F1.
2013 – Google – F1: A Distributed SQL Database That Scales.
2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
2013 – Facebook – Scuba: Diving into Data at Facebook.
2013 – Facebook – Unicorn: A System for Searching the Social Graph.
2013 – Facebook – Scaling Memcache at Facebook.

2011 – 2012

2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter.
2012 – AMPLab – Blink and It’s Done: Interactive Queries on Very Large Data.
2012 – AMPLab – Fast and Interactive Analytics over Hadoop Data with Spark.
2012 – AMPLab – Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
2012 – Microsoft – Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
2012 – Microsoft – Paxos Made Parallel.
2012 – AMPLab – BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
2012 – Google – Processing a trillion cells per mouse click.
2012 – Google – Spanner: Google’s Globally-Distributed Database.
2011 – AMPLab – Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
2011 – AMPLab – Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
2011 – Google – Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

2001 – 2010

2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.
2010 – AMPLab – Spark: Cluster Computing with Working Sets.
2010 – Google – Storage Architecture and Challenges.
2010 – Google – Pregel: A System for Large-Scale Graph Processing.
2010 – Google – Large-scale Incremental Processing Using Distributed Transactions and Notiﬁcations base of Percolator and Caffeine.
2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.
2010 – Yahoo – S4: Distributed Stream Computing Platform.
2009 – HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
2008 – AMPLab – Chukwa: A large-scale monitoring system.
2007 – Amazon – Dynamo: Amazon’s Highly Available Key-value Store.
2006 – Google – The Chubby lock service for loosely-coupled distributed systems.
2006 – Google – Bigtable: A Distributed Storage System for Structured Data.
2004 – Google – MapReduce: Simplied Data Processing on Large Clusters.
2003 – Google – The Google File System.

posted @ 2015-01-04 09:49 njuzhoubing 阅读(528) 评论(0) 编辑收藏举报

刷新页面返回顶部