Here is a collection of articles and news on scalable systems. The links are updated via RSS sources. You can subscribe to this page via RSS feed or by email.

  • Posted on Monday December 05, 2016
    (source)   How we treat each other is based on empathy. Empathy is based on shared experience. What happens when we have nothing in common?Systems are now being constructed so we’ll never see certain kinds of information. Each of us live in our own algorithmically created Skinner Box /silo/walled garden, fed ... Continue Reading »
  • Posted on Friday December 02, 2016
    Hey, it's HighScalability time:  A phrase you've probably heard a lot this week: AWS announces...   If you like this sort of Stuff then please support me on Patreon.18 minutes: latency to Mars; 100TB: biggest dynamodb table; 55M: visits to Kaiser were virtual; $2 Billion: yearly Uber losses; 91%: Apple's take of smartphone profits; 825: AI patents ... Continue Reading »
  • Posted on Monday November 28, 2016
    This is a guest repost Barzan Mozafari, an assistant professor at University of Michigan and an advisor to a new startup, snappydata.io, that recently launched an open source OLTP + OLAP Database built on Spark. Almost everyone these days is complaining about performance in one way or another. It’s not uncommon for database administrators ... Continue Reading »
  • Posted on Friday November 25, 2016
    Hey, it's HighScalability time:  Margaret Hamilton was honored with the Presidential Medal of Freedom for writing Apollo guidance software. Oddly, she's absent from best programmers of all time lists.  If you like this sort of Stuff then please support me on Patreon.98 seconds: before camera infected with malware; zeptosecond: smallest fragment of time ever measured; 50%: Google ... Continue Reading »
  • Posted on Tuesday November 22, 2016
    Sergey Ignatchenko continues his excellent book series with a new chapter on databases. This is a guest repost.  The idea of single-write-connection is used extensively in the post, as it's defined elsewhere I asked Sergey for a definition so the article would make a little more sense...As for single-write-connection - I mean that ... Continue Reading »
  • Posted on Friday November 18, 2016
    Hey, it's HighScalability time:  Now you don't have to shrink yourself to see inside a computer. Here's a fully functional 16-bit computer that's over 26 square feet huge! Bighex machine.    If you like this sort of Stuff then please support me on Patreon.50%: drop in latency and CPU load after adopting PHP7 at Tumblr; ... Continue Reading »
  • Posted on Wednesday November 16, 2016
    Our mission at Optimizely is to help decision makers turn data into action. This requires us to move data with speed and reliability. We track billions of user events, such as page views, clicks and custom events, on a daily basis. To provide our customers with immediate access to key ... Continue Reading »
  • Posted on Monday November 14, 2016
    This is a guest post by Urban Airship. Contributors: Adam Lowry, Sean Moran, Mike Herrick, Lisa Orr, Todd Johnson, Christine Ciandrini, Ashish Warty, Nick Adlard, Mele Sax-Barnett, Niall Kelly, Graham Forest, and Gavin McQuillanUrban Airship is trusted by thousands of businesses looking to grow with mobile. Urban Airship is a ... Continue Reading »
  • Posted on Sunday November 29, 2015
    Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is ... Continue Reading »
  • Posted on Tuesday March 10, 2015
    Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discounts, recommendations, and stock levels that can be solved using data analysis methods. The rise of omni-channel retail that integrates marketing, ... Continue Reading »
  • Posted on Sunday September 14, 2014
    Hadoop 2 or YARN is the new version of Hadoop. It adds the yarn resource manager in addition to the HDFS and MapReduce components. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce designed and implemented by Google initially for ... Continue Reading »
  • Posted on Tuesday March 18, 2014
    Benchmarks are important to understand the performance and quantitative and qualitative comparison of different systems. Many analytic frameworks, such as Hive, Impala and Shark, are designed and implemented these years and become fundamental software for processing big data. How to benchmark these big data analytic systems is an interesting problem. ... Continue Reading »
  • Posted on Tuesday February 04, 2014
    The public cloud storage services like Amazon S3, Google Cloud Storage and Windows Azure Storage replicate the data to ensure high availability. On the other hand, with data being replicated, the storage services exhibits certain data consistency models. Different cloud service providers employ different data consistency models nowadays. In this ... Continue Reading »
  • Posted on Tuesday August 20, 2013
    The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and ... Continue Reading »
  • Posted on Friday July 19, 2013
    Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. You can download the slides from Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. These slides contain the “Numbers everyone should know” which everyone working on systems should be familiar with. Numbers Everyone Should Know L1 ... Continue Reading »
  • Posted on Wednesday July 17, 2013
    Here is a list of tutorials for learning how to write MapReduce programs on Hadoop, the opensource MapReduce implementation with HDFS. MapReduce Tutorials The official tutorial on Hadoop MapReduce framework: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html. Yahoo! Hadoop Tutorial A comprehensive tutorial on Hadoop from Yahoo! Developer Network: http://developer.yahoo.com/hadoop/tutorial/. More about MapReduce To better understand ... Continue Reading »
  • Posted on Tuesday January 22, 2013
    Storage Architecture and Challenges in Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. Download PDF. This slides introduces some of Google’s storage systems with insights and discussion of problems. Related posts:Colossus: Successor to the Google File System (GFS) Data Consistency Models of Public Cloud Storage Services: Amazon S3, Google ... Continue Reading »
  • Posted on Tuesday January 22, 2013
    Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean. Everyone who is interested in large distributed systems should read: PDF for Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean. Related posts:Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean Favorite Sayings by ... Continue Reading »
  • Posted on Friday December 21, 2012
    MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important. Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark. During our work on MapReduce, ... Continue Reading »
  • Posted on Tuesday December 11, 2012
    Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows. Storage systems Google File System (GFS): http://research.google.com/archive/gfs.html HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html Colossus (GFS2): Colossus: Successor to ... Continue Reading »
Please share if you like this post:

One comment:

  1. Note for blog authors: if you do not want your articles appear here (we just post a excerpt, not the full content), please drop me a message and I will delete them. If you have good suggestions on blogs/sites (with a RSS feed) to add to this list, please also let me know.

Leave a Reply

Your email address will not be published. Required fields are marked *