Skip to content

HDFS

HDFS

Warning

This documentation is deprecated, please check here for its new home

The HDFS client works fully in userspace, nothing special regarding running inside a container.

Below is an example of accessing a HDFS cluster.

Launch the container

In this case we use the openstack docker image we maintain, which has the CERN kerberos setup:

$ sudo docker run -it gitlab-registry.cern.ch/cloud/ciadm /bin/bash
[root@a1a0e64b71e4 /]#

Client setup

This is a centos based docker image, use other packaging tools for other distros.

Cloudera provides the required client packages.

wget https://archive.cloudera.com/cdh5/one-click-install/redhat/7/x86_64/cloudera-cdh-5-0.x86_64.rpm?_ga=1.113241284.1142312481.1478274935
rpm -ivh cloudera-cdh-5-0.x86_64.rpm\?_ga\=1.113241284.1142312481.1478274935 
yum clean all
yum install -y hadoop java-1.7.0

The configuration will be cluster specific, you should have a sample already available.

/etc/hadoop should have similar contents to this:

ls -l /etc/hadoop/conf
total 92
-rw-r--r-- 1 hdfs hdfs 4146 May  3  2016 capacity-scheduler.xml
-r-------- 1 root yarn  159 May  3  2016 container-executor.cfg
-rw-r--r-- 1 hdfs hdfs 8240 Oct 10 13:24 core-site.xml
-rw-r--r-- 1 hdfs hdfs 1272 Oct 10 13:24 dfs.includes
-rw-r--r-- 1 hdfs hdfs  787 May  3  2016 hadoop-env.sh
-rw-r--r-- 1 hdfs hdfs 3251 May  3  2016 hadoop-metrics.properties
-rw-r--r-- 1 hdfs hdfs 4214 May  3  2016 hadoop-policy.xml
-rw-r--r-- 1 hdfs hdfs 7973 Oct 10 13:24 hdfs-site.xml
-rw-r--r-- 1 hdfs hdfs 8684 May  3  2016 log4j.properties
-rw-r--r-- 1 hdfs hdfs 5878 May  3  2016 mapred-site.xml
-rw-r--r-- 1 hdfs hdfs 1272 Oct 10 13:24 mapred.includes
-rw-r--r-- 1 root root  127 May  3  2016 taskcontroller.cfg
-rw-r--r-- 1 root yarn   70 May  3  2016 yarn-env.sh
-rw-r--r-- 1 hdfs hdfs 6456 Oct 10 13:24 yarn-site.xml

Client usage

We're using kerberos based authentication in this example.

# kinit rbritoda@CERN.CH
# hdfs dfs -ls /
Found 9 items
drwxr-xr-x   - hdfs  zp            0 2014-11-13 15:29 /atlas
drwxrwxr-x+  - hdfs  zh            0 2016-09-25 06:36 /cms
drwx------   - hbase hdfs          0 2016-11-08 14:51 /hbase
drwxr-xr-x   - hdfs  hdfs          0 2016-11-04 13:42 /lost+found
drwxr-xr-x   - hdfs  hdfs          0 2016-02-12 13:37 /project
drwxr-xr-x   - hdfs  hdfs          0 2016-11-09 00:48 /system
drwxrwxrwt   - hdfs  hdfs          0 2016-11-09 09:20 /tmp
drwxr-xr-x   - hdfs  hdfs          0 2016-11-08 10:42 /user
drwxr-xr-x   - hdfs  hdfs          0 2014-05-21 11:22 /var

If you require this to be done often, you might want to make a docker image with this setup.

Build a dockerfile based on the instructions above, but avoid putting any secrets inside the configuration.


Last update: June 1, 2022