How to migrate hbase table from hbase-0.94 to hbase-0.98 which both belong to different hadoop clusters

350 views

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

posted Jan 8, 2015 by anonymous

Looking for an answer? Promote on:

Similar Questions

+1 vote

+2 votes

Apache Hadoop includes HDFS Federation.
Does anyone know how to migrate Apache Hadoop 1.x HDFS to Apache Hadoop 2.x HDFS?

I am getting the following error:

$ bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId 
Error: Could not find or load main class start

+1 vote

How to upgrade hadoop from 2.0 to hadoop 2.4.0?

I currently have a hadoop 2.0 cluster in production, I want to upgrade to latest release.
current version: hadoop version Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase hive hue impala mapreduce oozie sqoop zookeeper

Can someone point me to how to upgrade hadoop from 2.0 to hadoop 2.4.0?

+4 votes

I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job. The scenario is this:

The job is a data import using Map/Reduce / HBase
The data is being imported to one table
The table only has a couple of regions
As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting the regions
As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.

I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.

How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?

Am I missing something here?

+2 votes

...

Your comment on this post: