How to set hadoop.tmp.dir if I have multiple disks per node?

+1 vote

I have ten disks per node,and I don't know what value I should set to "hadoop.tmp.dir". Some said this property refers to a location in local disk while some other said it refers to a directory in HDFS. I am confused, who can explain it ?

I want to spread I/O since I have ten disks per node, so should I set a comma-separated list of directories (which are on different disks) to "hadoop.tmp.dir" ?

posted Dec 16, 2013 by Sheetal Chauhan

Make sure to also set mapred.local.dir to the same set of output directories, this is were the intermediate key-value pairs are stored!

2 Answers

+2 votes

hadoop.tmp.dir is a directory created on local file system. For example if you have set hadoop.tmp.dir property to /home/training/hadoop

This directory will be created when you format the namenode by running the command
hadoop namenode -format

When you open this folder you will see two subfolders dfs and mapred. The /home/training/hadoop/mapred folder will be on HDFS also

Hope this helps

answer Dec 16, 2013 by Deepankar Dubey
+1 vote

You can set the hadoop tmp dir to a directory or a disk you can mount the disk and put path of that to the configuration file.

link /mnt

and you should set right permission for the mounted disk.

answer Dec 16, 2013 by Sonu Jindal
