Skip to content

SysTutorials

  • Tutorials
  • Linux
  • Linux Manuals
  • Systems
  • Programming
  • Software
  • Subscribe
  • Search
SysTutorials

  • Linux | Tutorial

    How to Compress/Uncompress Files in Linux Using gzip, bzip2, 7z, rar and zip

    ByEric Ma Mar 25, 2013Aug 23, 2020

    Compress/uncompress files are frequent operations. The normal tools for compressing/uncompressing in Linux is gzip, bzip2, 7z, rar and zip. This post introduces how to compress and uncompress file in Linux using these tools. We use best compressing rate with all these tools and mark the options for “best rate” in bold fonts. We can delete…

    Read More How to Compress/Uncompress Files in Linux Using gzip, bzip2, 7z, rar and zipContinue

  • Insights | Systems

    Storage Architecture and Challenges by Andrew Fikes at Google Faculty Summit 2010

    ByEric Ma Jan 22, 2013Aug 30, 2020

    Storage Architecture and Challenges in Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. Download PDF (from archive.org). This slides introduces some of Google’s storage systems with insights and discussion of problems.

    Read More Storage Architecture and Challenges by Andrew Fikes at Google Faculty Summit 2010Continue

  • Insights | Systems

    Designs, Lessons and Advice from Building Large Distributed Systems

    ByEric Ma Jan 22, 2013Aug 30, 2020

    Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean. Everyone who is interested in large distributed systems should read: PDF for Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.

    Read More Designs, Lessons and Advice from Building Large Distributed SystemsContinue

  • Computing systems | News

    PUMA: A MapReduce Benchmark Suite

    ByEric Ma Dec 20, 2012Sep 5, 2020

    MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important. Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark. During our work on MapReduce, we developed a benchmark suite…

    Read More PUMA: A MapReduce Benchmark SuiteContinue

  • Tutorial

    Hadoop TeraSort Benchmark

    ByEric Ma Dec 18, 2012Sep 5, 2020

    TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running…

    Read More Hadoop TeraSort BenchmarkContinue

  • Computing systems | Storage systems

    Large-scale Data Storage and Processing System in Datacenters

    ByEric Ma Dec 11, 2012Aug 30, 2020

    Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows. Storage systems Google File System (GFS): http://research.google.com/archive/gfs.html HDFS implementation: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html Colossus (GFS2): Colossus: Successor to the Google File System (GFS)…

    Read More Large-scale Data Storage and Processing System in DatacentersContinue

  • Computing systems | Resource management | Storage systems

    Microsofts Cosmos Service

    ByEric Ma Dec 10, 2012May 31, 2020

    Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”. There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows. What is Microsoft’s Cosmos service? by Yaron Y. Goland. Microsoft Cosmos: Petabytes perfectly processed perfunctorily by Seth…

    Read More Microsofts Cosmos ServiceContinue

  • Storage systems | Systems

    Colossus: Successor to the Google File System (GFS)

    ByEric Ma Nov 29, 2012Aug 2, 2020

    Colossus is the successor to the Google File System (GFS) as mentioned in the paper on Spanner at OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper at SOSP 2003. There is still some information about Colossus…

    Read More Colossus: Successor to the Google File System (GFS)Continue

  • News

    Conference Ranking by Average Number of Citations in the Last 5 Years, 2012

    ByEric Ma Oct 24, 2012

    I am trying to find out the top conferences that have the largest average number of citations in the last 5 years on the Internet but fail to find one. However, there are many rankings about the overall citations and numbers of publications. Hence, it is not hard to calculate the average number of citations…

    Read More Conference Ranking by Average Number of Citations in the Last 5 Years, 2012Continue

  • Computing systems | Storage systems | Systems

    Hadoop Installation Tutorial (Hadoop 1.x)

    ByEric Ma Oct 9, 2012Nov 28, 2020

    Update: If you are new to Hadoop and trying to install one. Please check the newer version: Hadoop Installation Tutorial (Hadoop 2.x). Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed…

    Read More Hadoop Installation Tutorial (Hadoop 1.x)Continue

  • Tutorial

    Reading List for Distributed Systems and Cloud Computing

    ByEric Ma Sep 15, 2012Aug 30, 2020

    Understanding the literature is usually the first step to do research, which is the same for systems research on cloud computing. A reading list may help a lot to those that just start in cloud computing research. Prof. Lin Gu, my PhD supervisor, compiled a reading list for system research on cloud computing. The reading…

    Read More Reading List for Distributed Systems and Cloud ComputingContinue

  • News

    Conferences on Cloud Computing 2013

    ByEric Ma Sep 1, 2012

    This post lists important conferences related to Cloud Computing in year 2013. SOSP 2013 SOSP’13: The 24th ACM Symposium on Operating Systems Principles. November 3-6, 2013, Nemacolin Woodlands Resort, Pennsylvania. The biennial ACM Symposium on Operating Systems Principles is the world’s premier forum for researchers, developers, programmers, and teachers of computer systems technology. Academic and…

    Read More Conferences on Cloud Computing 2013Continue

  • Linux

    Managing Repositories on Git Server Using Gitosis

    ByEric Ma Mar 25, 2012Aug 23, 2020

    How to manage users and repositories and how to use these repositories will be introduced in this post. Please refer to Setting Up a Git Server Using Gitosis for how to set up the git server. Please refer to Howto for New Git Users for how to use git as a new user. Create a…

    Read More Managing Repositories on Git Server Using GitosisContinue

  • Linux | Tutorial

    Setting Up a Git Server Using Gitosis

    ByEric Ma Feb 25, 2012Sep 26, 2014

    Update: Since gitosis is not maintained and supported, please check out gitolite for setting up a new git server. (see the comment from Sitaram Chamarty, the gitolite author, the author of gitolite.) Gitosis is a piece of software writen by Tommi Virtanen for hosting git repositories. It manages multiple repositories under the same user account….

    Read More Setting Up a Git Server Using GitosisContinue

  • Tutorial

    Hadoop Default Ports

    ByEric Ma Jan 15, 2012Mar 27, 2018

    Hadoop’s namenode and datanodes expose a bunch of TCP ports used by Hadoop’s daemons to communicate to each other or listen directly to users’ requests. These ports information are needed by both the Hadoop users and cluster administrators to write programs or configure firewalls/gateways accordingly. A post written by Philip Zeyliger from Cloudera’s blog summarizes the…

    Read More Hadoop Default PortsContinue

  • Tutorial

    A Simple Sort Benchmark on Hadoop

    ByEric Ma Jan 7, 2012Apr 5, 2016

    After [[hadoop-installation-tutorial|installing Hadoop]], we usually run some benchmark programs to test whether the system works well. In the post of the Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The Sort program is also…

    Read More A Simple Sort Benchmark on HadoopContinue

  • News

    Conferences on Cloud Computing 2012

    ByEric Ma May 11, 2011Mar 27, 2018

    This post lists important conferences on Cloud Computing in year 2012. OSDI 2012 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12) October 8–10, 2012, Hollywood, CA “The tenth OSDI seeks to present innovative, exciting research in computer systems. OSDI brings together professionals from academic and industrial backgrounds in what has become a…

    Read More Conferences on Cloud Computing 2012Continue

  • Tutorial

    Pitfalls and Lessons on Configuing and Tuning Hadoop

    ByEric Ma Apr 26, 2011Mar 27, 2018

    This post lists pitfalls and lessons learning when configuring and tuning Hadoop. Hadoop with IPv6 Hadoo doesn’t support IPv6 currently (up to 0.20.2 and 0.21.0): Hadoop and IPv6. The performance of the cluster may suffer from turning IPv6 on in clusters: mail archive. One good practice is to disable IPv6 on servers in the Hadoop…

    Read More Pitfalls and Lessons on Configuing and Tuning HadoopContinue

  • Tutorial

    Setting Up Standalone (Local) Hadoop

    ByEric Ma Apr 6, 2011Apr 5, 2016

    Hadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment….

    Read More Setting Up Standalone (Local) HadoopContinue

  • News

    Conferences on Cloud Computing 2011

    ByEric Ma Mar 29, 2011Feb 26, 2019

    This post lists important conferences on Cloud Computing in year 2011. ACM Symposium on Cloud Computing October 27 and 28, 2011, Cascais, Portugal Submission Deadline: April 30, 2011 23rd ACM Symposium on Operating Systems Principles (SOSP) October 23-26, 2011, Cascais, Portugal Submission deadline: March 18, 2011, 11:59 PM GMT EuroSys 2011 April 10-13, 2011. Salzburg,…

    Read More Conferences on Cloud Computing 2011Continue

Page navigation

Previous PagePrevious 1 … 67 68 69 70 Next PageNext

© 2026 SysTutorials

  • Tutorials
  • Linux
  • Linux Manuals
  • Systems
  • Programming
  • Software
  • Subscribe
  • Search