How to install Snappy for Hadoop

I am running on Hadoop 1.0.4 and I would like to use Snappy for map output compression. I am adding the configurations:

configuration.setBoolean("mapred.compress.map.output", true);
 configuration.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec");

And Ive added libsnappy.so.1 to $HADOOP_HOME/lib/native/Linux-amd64-64/
Still, all map tasks fail with "native snappy library not available". Could anyone elaborate on how to install Snappy for Hadoop ?

1 Answer

Did you build it for your platform? You can do an "ldd" on the .so file to check if the dependent libs are present. Also make sure you placed it in the right directory for your platform (Linux-amd64-64 or Linux-i386-32)

answer Jan 1, 2014 by Ahmed Patel

I did everything mentioned in the link Ted mentioned, and the test actually works, but using Snappy for MapReduce map output compression still fails with "native snappy library not available".

commented Jan 2, 2014 by anonymous

Your natives should be in LD_LIBRARY_PATH or java.library.path for hadoop to pick them up. You can try adding export HADOOP_OPTS=$HADOOP_OPTS -Djava.library.path= to hadoop-env.sh in TTs and clients/gateways and restart TTs and give it another try. The reason its working for Hbase is you are manually pointing HBASE_LIBRARY_PATH to the natives.

My guess is they are in a wrong location.

commented Jan 2, 2014 by anonymous

Similar Questions

+8 votes

Hadoop native and snappy library?

I'm trying to enable the Hadoop native library and the snappy library for compression in Hadoop 2.2.0, but I always end up with:

./hadoop/bin/hadoop checknative -a
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false

I compiled hadoop-2.2.0-src from scratch for x64 and put the resulting .so in hadoop/lib/native/. I also compiled snappy from scratch and put it there. In a different approach I installed snappy via sudo apt-get
and then linked the resulting .so to hadoop/lib/native/libsnappy.so, still no luck.

What is going on here? Why won't Hadoop find my native libraries? Is there any log where I can check what went wrong during loading?

+1 vote

DNS pre request for hadoop install

I am trying to use ambari (hortonwoks) to install hadoop. One step is to pre-config DNS as manual link below. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_using_Ambari_book/content/ambari-chap1-5-5.html

Since I am only using internal network, not sure how to config fully.qualified.domain.name. My hostname -f only shows localhost. And in /etc/sysconfig/network, it give HOSTNAME=localhost.localdomain

If anyone already get hadoop running, what is the real DNS requests for hadoop. Any suggestion, thanks a lot.

+1 vote

How we can install Hadoop 1.2.1 on RHEL 7/6/5 or on CentOS?

+2 votes

Hive install under hadoop

I want to use hive in hadoop2.2.0, so I execute following steps:

$ tar ¨Cxzf hive-0.11.0.tar.gz 
$ export HIVE_HOME=/home/software/hive 
$ export PATH=${HIVE_HOME}/bin:${PATH} 
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse 
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse 
$ hive

Error creating temp dir in hadoop.tmp.dir file:/home/software/temp due to Permission denied

How to make hive install success?

0 votes

How to write a Job for importing Files from an external Rest API into Hadoop

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

The job ask the Rest API if there are new files
if so, the job imports the first file in the list
look if the file already exits
if not, the job imports the file
if yes, the job compares the data with the data already stored
if changed the job updates the file
if more file exits the job continues with 2 -
otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

How to install Snappy for Hadoop

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview