FUSE for HDFS

Working with HDFS has many benefits, but there’s one disadvantage that for me, at least, was very painful and that is that it doesn’t behave like other POSIX file system. If you want to access the filesystem, you have to use it via the hadoop -fs commands.

To work around this limitation, there’s FUSE for HDFS, which exposes HDFS as a mount point on your machine. This means that you can now expose files from HDFS to any service that speaks POSIX and also to the command line. A nice side effect is that you can now use a visual client, like WinSCP, to explore your HDFS.

In my case I’ve gone with Cloudera’s implementation, which is still quite raw (as in “do not use in production” raw), and installed it on a 64-bit CentOS 5.5 server. Here’s a quick guide of how to install it.

First, install Cloudera’s repository:

yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

Now, install Java, Hadoop Client and FUSE:

yum install java-1.6.0-openjdk-devel
yum install hadoop-0.20-native.x86_64
yum install hadoop-0.20-libhdfs.x86_64
yum install fuse-libs.x86_64

Next, install the FUSE implementation for HDFS:

yum install hadoop-0.20-fuse.x86_64

Since JAVA_HOME is hard-coded to be /usr/lib/j2sdk1.6-sun, you need to create a link to it (this gave me quite a bit of grief):

ln -s $JAVA_HOME /usr/lib/j2sdk1.6-sun

Now create the mount-point for HDFS:

mkdir [mount point]
hadoop-fuse-dfs dfs://[server]:[hdfs port] [mount point]

And you’re good to go.

Bonus Reading: What is FUSE?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s