top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What are the different operational commands in HBase at record level and table level?

+1 vote
214 views
What are the different operational commands in HBase at record level and table level?
posted Dec 26, 2016 by Karthick.c

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

+4 votes

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

+1 vote

I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a "hashmap" against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third one is putting it into HBase and use it for
lookup.

I'm more familiar with the first approach, so it seems simpler to me. However, I have read that using a distributed cache for files beyond a few megabytes is not recommended because the file is replicated across
all the data nodes. This doesn't seem that bad to me because I just pay this overhead once at the beginning of the job, and then each node gets a copy locally, right? If I were to go with join, would it not increase the workload (more entries) and create the same network congestion issue? And wouldn't going with HBase means making it a bottleneck?

What's the advantage and disadvantage of going for one solution over the others? What if, for example, that "hashmap" needs to be from, say, a 40GB file. How would my option change? At which point would
each option make sense?

...