Working with HDFS has many benefits, but there’s one disadvantage that for me, at least, was very painful and that is that it doesn’t behave like other POSIX file system. If you want to access the filesystem, you have to use it via the hadoop -fs commands.
To work around this limitation, there’s FUSE for HDFS, which exposes HDFS as a mount point on your machine. This means that you can now expose files from HDFS to any service that speaks POSIX and also to the command line. A nice side effect is that you can now use a visual client, like WinSCP, to explore your HDFS.
In my case I’ve gone with Cloudera’s implementation, which is still quite raw (as in “do not use in production” raw), and installed it on a 64-bit CentOS 5.5 server. Here’s a quick guide of how to install it.
First, install Cloudera’s repository:
yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
Now, install Java, Hadoop Client and FUSE:
yum install java-1.6.0-openjdk-devel yum install hadoop-0.20-native.x86_64 yum install hadoop-0.20-libhdfs.x86_64 yum install fuse-libs.x86_64
Next, install the FUSE implementation for HDFS:
yum install hadoop-0.20-fuse.x86_64
Since JAVA_HOME is hard-coded to be /usr/lib/j2sdk1.6-sun, you need to create a link to it (this gave me quite a bit of grief):
ln -s $JAVA_HOME /usr/lib/j2sdk1.6-sun
Now create the mount-point for HDFS:
mkdir [mount point] hadoop-fuse-dfs dfs://[server]:[hdfs port] [mount point]
And you’re good to go.
Bonus Reading: What is FUSE?