top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop client setup to access remote cluster

+3 votes
337 views

I have setup a HDP 2.3 cluster on Linux(CentOS). Now I am trying to utilize my ETL programs to access this cluster from a windows environment.
Should I setup Apache Hadoop on Windows local/server. What setup should I do ? What goes into the core-site.xml (mention my remote HDFS url ?/)
Any pointers would be helpful.

posted Oct 7, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+3 votes
  1. How Hadoop provides Multi-tenancy using scheduler's or in simple terms "WHAT ARE THE STEPS TO CONFIGURE A MULTI-TENANT HADOOP CLUSTER?"
    Here multi-tenancy means different users can run there applications(similar/different) in a way such that each user is completely unaware of other and one user can't interfere with other user's data in hdfs such that data is secure and each user gets its fair proportion of resources to execute its applications in parallel.

  2. And is there any way to verify that cluster tenants are able to get their applications executed easily without any other intervention while keeping their data not secure and safe in hdfs?

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

I want to upgrade my cluster ,in doc,one of step is backup namenode dfs.namenode.name.dir directory.
I have two directories defined in hdfs-site.xml, should I backup them all ,or just one of them?

dfs.namenode.name.dir
file:///data/namespace/1,file:///data/namespace/2
+1 vote

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

+3 votes

From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."

In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)

What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?

...