Hadoop doesn't find the input file

342 views

I am trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I have successfully added the input and output directory on to HDFS. But when I run

$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5

I am getting something like:

INFO input.FileInputFormat: Total input paths to process : 0

Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception.

Can someone help me out?

posted Jan 4, 2014 by Anderson

Looking for an answer? Promote on:

Can you pastebin the stack trace involving the NPE ?

commented Jan 4, 2014 by Amit Mishra

Thanks for your help.
I just removed the crawl directory (output directory) from the command and it works! I m storing the output in a Cassandra cluster using Gora anyway. So I dont think I want to store that on HDFS

commented Jan 4, 2014 by anonymous

Similar Questions

0 votes

Hadoop: WholeFileInputFormat takes the entire input file as input or each record(input split) as whole?

+2 votes

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+3 votes

Why Hadoop doesn’t support Updates and append?

+2 votes

Hadoop: namenode doesn't update block locations when data directories of a datanode is changed?

I am running hadoop-2.4.0 cluster. Each datanode has 10 disks, directories for 10 disks are specified in dfs.datanode.data.dir.

A few days ago, I modified dfs.datanode.data.dir of a datanode () to reduce disks. so two disks were excluded from dfs.datanode.data.dir, after the datanode was restarted, I expected that the namenode would update block locations. In other words, I thought the namenode should remove from block locations associated with blocks which were stored on excluded disks, but the namenode didnt update the block locations...

In my understanding, datanode send a block report to the namenode when datanode start so the namenode should update block locations immediately.

Is a bug? Could anyone please explain?

0 votes

The archive file created in Hadoop always has the extension of

...

Hadoop doesn't find the input file

Your comment on this post:

Your answer

Preview