top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to migrate hbase table from hbase-0.94 to hbase-0.98 which both belong to different hadoop clusters

+2 votes
350 views

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

posted Jan 8, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

Apache Hadoop includes HDFS Federation.
Does anyone know how to migrate Apache Hadoop 1.x HDFS to Apache Hadoop 2.x HDFS?

I am getting the following error:

$ bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId 
Error: Could not find or load main class start 
+1 vote

I currently have a hadoop 2.0 cluster in production, I want to upgrade to latest release.
current version: hadoop version Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase hive hue impala mapreduce oozie sqoop zookeeper

Can someone point me to how to upgrade hadoop from 2.0 to hadoop 2.4.0?

+4 votes

I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job. The scenario is this:

  • The job is a data import using Map/Reduce / HBase
  • The data is being imported to one table
  • The table only has a couple of regions
  • As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting the regions
  • As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
  • The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.

I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.

How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?

Am I missing something here?

...