HDFS
HDFS
Warning
This documentation is deprecated, please check here for its new home
The HDFS client works fully in userspace, nothing special regarding running inside a container.
Below is an example of accessing a HDFS cluster.
Launch the container
In this case we use the openstack docker image we maintain, which has the CERN kerberos setup:
$ sudo docker run -it gitlab-registry.cern.ch/cloud/ciadm /bin/bash
[root@a1a0e64b71e4 /]#
Client setup
This is a centos based docker image, use other packaging tools for other distros.
Cloudera provides the required client packages.
wget https://archive.cloudera.com/cdh5/one-click-install/redhat/7/x86_64/cloudera-cdh-5-0.x86_64.rpm?_ga=1.113241284.1142312481.1478274935
rpm -ivh cloudera-cdh-5-0.x86_64.rpm\?_ga\=1.113241284.1142312481.1478274935
yum clean all
yum install -y hadoop java-1.7.0
The configuration will be cluster specific, you should have a sample already available.
/etc/hadoop should have similar contents to this:
ls -l /etc/hadoop/conf
total 92
-rw-r--r-- 1 hdfs hdfs 4146 May 3 2016 capacity-scheduler.xml
-r-------- 1 root yarn 159 May 3 2016 container-executor.cfg
-rw-r--r-- 1 hdfs hdfs 8240 Oct 10 13:24 core-site.xml
-rw-r--r-- 1 hdfs hdfs 1272 Oct 10 13:24 dfs.includes
-rw-r--r-- 1 hdfs hdfs 787 May 3 2016 hadoop-env.sh
-rw-r--r-- 1 hdfs hdfs 3251 May 3 2016 hadoop-metrics.properties
-rw-r--r-- 1 hdfs hdfs 4214 May 3 2016 hadoop-policy.xml
-rw-r--r-- 1 hdfs hdfs 7973 Oct 10 13:24 hdfs-site.xml
-rw-r--r-- 1 hdfs hdfs 8684 May 3 2016 log4j.properties
-rw-r--r-- 1 hdfs hdfs 5878 May 3 2016 mapred-site.xml
-rw-r--r-- 1 hdfs hdfs 1272 Oct 10 13:24 mapred.includes
-rw-r--r-- 1 root root 127 May 3 2016 taskcontroller.cfg
-rw-r--r-- 1 root yarn 70 May 3 2016 yarn-env.sh
-rw-r--r-- 1 hdfs hdfs 6456 Oct 10 13:24 yarn-site.xml
Client usage
We're using kerberos based authentication in this example.
# kinit rbritoda@CERN.CH
# hdfs dfs -ls /
Found 9 items
drwxr-xr-x - hdfs zp 0 2014-11-13 15:29 /atlas
drwxrwxr-x+ - hdfs zh 0 2016-09-25 06:36 /cms
drwx------ - hbase hdfs 0 2016-11-08 14:51 /hbase
drwxr-xr-x - hdfs hdfs 0 2016-11-04 13:42 /lost+found
drwxr-xr-x - hdfs hdfs 0 2016-02-12 13:37 /project
drwxr-xr-x - hdfs hdfs 0 2016-11-09 00:48 /system
drwxrwxrwt - hdfs hdfs 0 2016-11-09 09:20 /tmp
drwxr-xr-x - hdfs hdfs 0 2016-11-08 10:42 /user
drwxr-xr-x - hdfs hdfs 0 2014-05-21 11:22 /var
If you require this to be done often, you might want to make a docker image with this setup.
Build a dockerfile based on the instructions above, but avoid putting any secrets inside the configuration.