Tutorial

Setting Up Standalone (Local) Hadoop

ByEric Ma Apr 6, 2011Apr 5, 2016

Hadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.

1. Hadoop package and software installation

Follow the instruction of “1. Install needed packages” part in [[hadoop-installation-tutorial|Hadoop Installation Tutorial]] to install packages. Fllow “4. Hadoop Concigurations” to configure hadoop-env.sh (this file only).

2. Just run Hadoop!

Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.

The example finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+'
$ cat output/*

The jar file’s name may be different depending on the Hadoop distribution’s version.

Is it simple? Enjoy it and go further to play [[hadoop-installation-tutorial|Fully-distributed Hadoop Installation]].

How to make “tree” output consistent on Linux

ByEric Ma Mar 24, 2018Mar 24, 2018

I tried tree on different Linux boxes to verify the files by diff. However, I found the format can be a little bit different on different nodes. For examples, the tree result could be . |– test2 | |– test4 | `– test5 `– test3 1 directory, 3 files or . ├── test2 │ ├──…

Programming | Tutorial

Reading JSON from URL in R

ByDaniel Nov 6, 2020Nov 6, 2020

R is good at analyzing data. Many data are provided as JSON from RESTful APIs with an URL. R natively support many data format. For JSON, we can use some libraries. For reading JSON from URL, we can use the “jsonlite” package. In this example, we use an example of fetching BTC prices of last…

QA | Tutorial

How to write file content with sudo in Vim?

ByQ A Mar 23, 2019Nov 22, 2019

Vim opens file even if the user does bot have write permission to the file. But after revision, how to write file content with sudo in Vim if Vim reports no permission to write the file. Use this command inside of vim to write to the file with sudo: :w !sudo tee % Here, !…

Why are keys overlap in SST files at Level 0 in LevelDB system?

ByEric Ma Mar 24, 2018Mar 24, 2018

As is known, keys may be overlap in SST files of level 0 in LevelDB. I am wondering why it needs to be overlap? An sstable is read-only after being written to the disk from a memtable. Accumulated K/Vs in the log up to a certain size are organized as a memtable and written to…

How to install vbetool on CentOS 6.6?

ByEric Ma Mar 24, 2018Mar 24, 2018

I found CentOS 6.6 does not ship vbetool in its default repositories, EPEL or RPMfusion. How to install vbetool on CentOS 6.6? First, download the source package from http://www.codon.org.uk/~mjg59/vbetool/ and unpack the package. Second, install needed packages: # yum pciutils-devel pciutils-devel-static libx86-devel During the building, it will try to find the libpci.a in a different…

Linux | Linux Kernel

Linux Kernel 4.14.223 Release

ByTony Mar 7, 2021

This post summarizes Linux Kernel new features, bugfixes and changes in Linux 4.14.223 Release. Linux 4.14.223 Release contains 176 changes, patches or new features. In total, there are 277,921 lines of Linux source code changed/added in Linux 4.14.223 release compared to Linux 4.14 release. To view the source code of Linux 4.14.223 kernel release online,…

1. Hadoop package and software installation

2. Just run Hadoop!

Similar Posts

Leave a Reply Cancel reply