This post summarizes Linux Kernel new features, bugfixes and changes in Linux 4.9.60 Release. Linux 4.9.60 Release contains 24 changes, patches or new features. In total, there are 64,224 lines of Linux source code changed/added in Linux 4.9.60 release compared to Linux 4.9 release. To view the source code of Linux 4.9.60 kernel release online,
Read more
Tag: Cluster
Installing R and RStudio Server in Ubuntu Linux 20.04
Posted onR is a language and environment for statistical computing and graphics, providing a wide variety of statistical and graphical techniques. The R environment is open source software under GPL. R has rich software packages and is widely used for statistical analysis. RStudio Server is an R integrated development environment (IDE) that provides many useful features
Read more
The cultural impact of cloud technology
Posted onCloud technology is one of the latest forms of technology. A cloud is a place where exactly the data is stored. Also, the cloud is the place where the data is managed and processed. Cloud ensures that the data managed on a cluster or the network of servers. All of these servers are available remotely
Read more
What is the Future of Big Data Analytics and Hadoop?
Posted onBig Data has taken a lead in the IT industry and has played a significant role in the Business growth and decision-making processes that gives you an edge over the competitors. This is equally applicable to the organizations as well as professionals existing in the analytics domain. Big Data Analytics bring an ocean of opportunities
Read more
How to estimate the memory usage of HDFS NameNode for a HDFS cluster?
Posted onHDFS stores the metadata of files and blocks in the memory of the NameNode. How to estimate the memory usage of HDFS NameNode for a HDFS cluster? Each file and each block has around 150 bytes of metadata on NameNode. So you may do the calculation based on this. For examples, assume block size is
Read more
Which filesystem operations in HDFS is atomic?
Posted onAtomicity is a very important and fundamental property aspect of filesystems. Applications semantics and many functions depend on and only be available based on the atomicity models of the underlying filesystem. Which filesystem operations in HDFS is atomic? So that locks can be implemented on top of it. In a reasonably widely usable filesystem, some
Read more
How to handle missing blocks and blocks with corrupt replicas in HDFS?
Posted onOne of HDFS cluster’s hdfs dfsadmin -report reports: Under replicated blocks: 139016 Blocks with corrupt replicas: 9 Missing blocks: 0 The “Under replicated blocks” can be re-replicated automatically after some time. How to handle the missing blocks and blocks with corrupt replicas in HDFS? Understanding these blocks A block is “with corrupt replicas” in HDFS
Read more
How to email admins automatically after a Linux server starts?
Posted onManaging a cluster of servers, I would like to notified when a server is started. How to make the Linux servers email me or other admins automatically after they are started? I did this by adding a crontab entry on each servers like @reboot date | mailx -S smtp=smtp://smtp.example.com -s “`hostname` started” -r zma@example.com zma@example.com
Read more
How to add a new HDFS NameNode metadata directory to an existing cluster?
Posted onWe have a running HDFS cluster. Currently, the NameNode metadata data directory has only one directory configured in hdfs-site.xml: <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/</value> <description>NameNode directory for namespace and transaction logs storage.</description> </property> We would like to add a new directory for dfs.namenode.name.dir to make replicas of the metadata on a separated disk for higher data reliability.
Read more
How to check the replication factor of a file in HDFS?
Posted onA related question: how to find the replication factors of files in a HDFS cluster? method 1: You can use the HDFS command line to ls the file. The second column of the output will show the replication factor of the file. For example, $ hdfs dfs -ls /usr/GroupStorage/data1/out.txt -rw-r–r– 3 hadoop zma 11906625598 2014-10-22
Read more
How to change an running HDFS cluster’s replication factor?
Posted onNow, I have a running HDFS cluster storing lost files. I want to change its default replication factor. How to change it? What will happen after it is changed? For example, I change from 2 to 3. Will HDFS automatically re-replicate the data chunks? First, the replication factor is client decided. Second, the replication factor
Read more
How to balance DataNode storage in HDFS?
Posted onAs nodes are added and deleted in a Hadoop cluster. Storage usage across DataNodes may be different. Some DataNodes’ disks are almost used up while some others’ are almost empty. How to balance data across DataNodes in HDFS? Hadoop provides the balancer to redistribute the data. Brief introduction to balancer in Hadoop: balancer. The design
Read more
How to totally disable firewall or iptables on Fedora 20
Posted onOur servers run inside our own cluster and no firewall is needed. How to totally disable firewall or iptables on Fedora 20? Fedora 20 uses FirewallD as the firewall service. To totally disable firewalld: # systemctl disable firewalld # systemctl stop firewalld
Directly SSH to hosts using internal IPs through the gateway
Posted onWe have many hosts with internal IPs like 10.0.3.* behind a gateway, say gateway.example.org. The hosts with internal IP connect to the Internet through the gateway. How to directly SSH to hosts using internal IPs through the gateway? Here is the solution: Directly SSH to Hosts with LAN IPs Through the Gateway
Random string password generator in Scala
Posted onManaging our research cluster, I frequently need to generate some string for new users’ password. How to generate them automatically and randomly in Scala? The passwords need characters ‘a’ – ‘z’, ‘A’ – ‘Z’ and ‘0’ – ‘9’ only. This piece of code works very well for me: def randomString(len: Int): String = { val
Read more
Rsync with non-standard ssh ports
Posted onThis problem appears when I try to rsync directories with hosts inside a cluster used NAT for forwarding ports to internal nodes. Hence, the ssh port for internal nodes are not the default 22. So, how to use rsync with the non-standard ssh ports? The -e options of rsync play the trick very well. For
Read more
Systems Conferences
Posted onWhich ones are good systems conferences? Top ones by ACM and USENIX: OSDI: https://www.usenix.org/conferences/byname/179 SOSP: http://sosp.org/ Other SIGOPS Events: http://www.sigops.org/conf-sponsored.html EuroSys: http://www.eurosys.org/ SoCC: http://www.socc2013.org/ (SoCC 2013) ASPLOS: http://www.sigplan.org/Conferences/ASPLOS/Main VEE: http://www.sigplan.org/vee.htm USENIX ATC: https://www.usenix.org/conferences/byname/131 NSDI: https://www.usenix.org/conferences/byname/178 IEEE Conferences: ICDCS: http://www.temple.edu/cis/icdcs2013/ (2013) IPDPS: http://www.ipdps.org/ Other related ones and workshops: HPCA: Search HPCA ConferenceSC: http://www.supercomp.org/IEEE CLUSTER: http://www.clustercomp.org/ HotCloud:
Read more
How to install and configure a MySQL cluster on CentOS/RHEL 6.3?
Posted onAny good tutorial on how to install and configure a MySQL cluster on CentOS/RHEL 6.3? Check these posts: Installing MySQL Cluster on CentOS 6.3Configuring the MySQL Cluster General tutorials: MySQL Cluster Installation and CentOS 6: Install MySQL Cluster – The Simple Way.
Quartz Implementation in Java
Posted onIn this post, java development India based experts will explain the concept of Quartz. You will also learn the method of setting up the Quartz in this article. You can ask experts if anything bothers you. Technology Quartz is the open source Java technology for scheduling background jobs. If we want to execute the task
Read more
Three Methods of Executing Commands on Many Nodes in Parallel via SSH on Linux
Posted onIt is common to execute commands on many nodes/hosts via SSH for managing a cluster of Linux servers. On Linux, there are many choices for this task. Generally, to run commands on many nodes, there are two modes: serial mode and parallel mode. In serial mode, the command is executed on the node one by
Read more