Papers on github
Interesting Readings
- Big Data Benchmark – Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez.
- NoSQL Comparison – Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison.
- 2014 – Stanford – Mining of Massive Datasets.
- 2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
- 2013 – AMPLab – MLbase: A Distributed Machine-learning System.
- 2013 – AMPLab – Shark: SQL and Rich Analytics at Scale.
- 2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark.
- 2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
- 2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.
- 2013 – Metamarkets – Druid: A Real-time Analytical Data Store.
- 2013 – Google – Online, Asynchronous Schema Change in F1.
- 2013 – Google – F1: A Distributed SQL Database That Scales.
- 2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
- 2013 – Facebook – Scuba: Diving into Data at Facebook.
- 2013 – Facebook – Unicorn: A System for Searching the Social Graph.
- 2013 – Facebook – Scaling Memcache at Facebook.
- 2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter.
- 2012 – AMPLab – Blink and It’s Done: Interactive Queries on Very Large Data.
- 2012 – AMPLab – Fast and Interactive Analytics over Hadoop Data with Spark.
- 2012 – AMPLab – Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
- 2012 – Microsoft – Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
- 2012 – Microsoft – Paxos Made Parallel.
- 2012 – AMPLab – BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
- 2012 – Google – Processing a trillion cells per mouse click.
- 2012 – Google – Spanner: Google’s Globally-Distributed Database.
- 2011 – AMPLab – Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
- 2011 – AMPLab – Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
- 2011 – Google – Megastore: Providing Scalable, Highly Available Storage for Interactive Services.
- 2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.
- 2010 – AMPLab – Spark: Cluster Computing with Working Sets.
- 2010 – Google – Storage Architecture and Challenges.
- 2010 – Google – Pregel: A System for Large-Scale Graph Processing.
- 2010 – Google – Large-scale Incremental Processing Using Distributed Transactions and Notifications base of Percolator and Caffeine.
- 2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.
- 2010 – Yahoo – S4: Distributed Stream Computing Platform.
- 2009 – HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
- 2008 – AMPLab – Chukwa: A large-scale monitoring system.
- 2007 – Amazon – Dynamo: Amazon’s Highly Available Key-value Store.
- 2006 – Google – The Chubby lock service for loosely-coupled distributed systems.
- 2006 – Google – Bigtable: A Distributed Storage System for Structured Data.
- 2004 – Google – MapReduce: Simplied Data Processing on Large Clusters.
- 2003 – Google – The Google File System.