Cluster

| |

Understanding Cloud Storage Consistency Models

Cloud storage systems utilize various consistency models to balance performance, availability, and data accuracy. This article explores these models, their trade-offs, and examples of systems using them. We’ll also discuss the CAP theorem and its implications. Consistency Models Strong Consistency Definition: Guarantees that any read operation returns the most recent write for a given piece…

| | |

Understanding the Raft Consensus Protocol

The Raft consensus protocol is a distributed consensus algorithm designed to be more understandable than other consensus algorithms like Paxos. It ensures that a cluster of servers can agree on the state of a system even in the presence of failures. Key Concepts Raft divides the consensus problem into three relatively independent subproblems: Leader Election:…

| |

Decentralized Exchanges (DEX) vs. Centralized Exchanges (CEX): A Technical Comparison

Cryptocurrency exchanges have revolutionized the way we trade digital assets, with two main types of exchanges dominating the market: decentralized exchanges (DEX) and centralized exchanges (CEX). In this article, we’ll compare the DEX and CEX from a technical perspective. Decentralized Exchanges (DEX) DEX operate on a decentralized blockchain network, such as Ethereum, and are built…

|

Installing R and RStudio Server in Ubuntu Linux

R is a language and environment for statistical computing and graphics, providing a wide variety of statistical and graphical techniques. The R environment is open source software under GPL. R has rich software packages and is widely used for statistical analysis. RStudio Server is an R integrated development environment (IDE) that provides many useful features…

What is the Future of Big Data Analytics and Hadoop?

What is the Future of Big Data Analytics and Hadoop?

Big Data has taken a lead in the IT industry and has played a significant role in the Business growth and decision-making processes that gives you an edge over the competitors. This is equally applicable to the organizations as well as professionals existing in the analytics domain. Big Data Analytics bring an ocean of opportunities…

|

How to handle missing blocks and blocks with corrupt replicas in HDFS?

One of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS…

How to email admins automatically after a Linux server starts?

Managing a cluster of servers, I would like to notified when a server is started. How to make the Linux servers email me or other admins automatically after they are started? I did this by adding a crontab entry on each servers like @reboot date | mailx -S smtp=smtp://smtp.example.com -s “`hostname` started” -r zma@example.com zma@example.com…

How to add a new HDFS NameNode metadata directory to an existing cluster?

We have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability….

How to check the replication factor of a file in HDFS?

A related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22…

How to change an running HDFS cluster’s replication factor?

Now, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor…

Systems Conferences

Which ones are good systems conferences? Top ones by ACM and USENIX: OSDI: https://www.usenix.org/conferences/byname/179 SOSP: http://sosp.org/ Other SIGOPS Events: http://www.sigops.org/conf-sponsored.html EuroSys: http://www.eurosys.org/ SoCC: http://www.socc2013.org/ (SoCC 2013) ASPLOS: http://www.sigplan.org/Conferences/ASPLOS/Main VEE: http://www.sigplan.org/vee.htm USENIX ATC: https://www.usenix.org/conferences/byname/131 NSDI: https://www.usenix.org/conferences/byname/178 IEEE Conferences: ICDCS: http://www.temple.edu/cis/icdcs2013/ (2013) IPDPS: http://www.ipdps.org/ Other related ones and workshops: HPCA: Search HPCA ConferenceSC: http://www.supercomp.org/IEEE CLUSTER: http://www.clustercomp.org/ HotCloud:…