What are the different operational commands in HBase at record level and table level?

214 views

posted Dec 26, 2016 by Karthick.c

Looking for an answer? Promote on:

Similar Questions

+2 votes

How to migrate hbase table from hbase-0.94 to hbase-0.98 which both belong to different hadoop clusters

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

+2 votes

What are the different types of tombstone markers in HBase for deletion?

+4 votes

Add few record(s) to a Hive table or a HDFS file on a daily basis

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

+3 votes

Explain the difference between HBase and Hive?

+1 vote

Advantage/disadvantage of dbm vs join vs HBase

I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a "hashmap" against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third one is putting it into HBase and use it for
lookup.

I'm more familiar with the first approach, so it seems simpler to me. However, I have read that using a distributed cache for files beyond a few megabytes is not recommended because the file is replicated across
all the data nodes. This doesn't seem that bad to me because I just pay this overhead once at the beginning of the job, and then each node gets a copy locally, right? If I were to go with join, would it not increase the workload (more entries) and create the same network congestion issue? And wouldn't going with HBase means making it a bottleneck?

What's the advantage and disadvantage of going for one solution over the others? What if, for example, that "hashmap" needs to be from, say, a 40GB file. How would my option change? At which point would
each option make sense?

What are the different operational commands in HBase at record level and table level?

Your comment on this post:

Your answer

Preview