本文讲述使用IntelliJ IDEA时遇到Hadoop提示input path does not exist(输入路径不存在)的解决过程。
环境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.4, Hadoop 1.2.1
Hadoop放在虚拟机中,宿主机通过SSH连接,IDE和数据文件在宿主机。
这是自学Hadoop的第三天。以前做过点.NET开发,Mac、IntelliJ IDEA、Hadoop、CentOS对我而言,相当陌生。第一份Hadoop代码就遇到了问题。
以下代码摘自《Hadoop In Action》第4章第1份代码。
1 public class MyJob extends Configured implements Tool {
2 public static class MapClass extends MapReduceBase
3 implements Mapper<Text, Text, Text, Text> {
4 @Override
5 public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
6 throws IOException {
7 output.collect(value, key);
8 }
9 }
10
11
12 public static class Reduce extends MapReduceBase
13 implements Reducer<Text, Text, Text, Text> {
14 @Override
15 public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
16 String csv = "";
17 while (values.hasNext()) {
18 if (csv.length() > 0) {
19 csv += ", ";
20 }
21 csv += values.next().toString();
22 }
23 output.collect(key, new Text(csv));
24 }
25 }
26
27 @Override
28 public int run(String[] args) throws Exception {
29 Configuration configuration = getConf();
30
31 JobConf job = new JobConf(configuration, MyJob.class);
32
33 Path in = new Path(args[0]);
34 Path out = new Path(args[1]);
35
36 FileInputFormat.setInputPaths(job, in);
37 FileOutputFormat.setOutputPath(job, out);
38
39 job.setJobName("MyJob");
40 job.setMapperClass(MapClass.class);
41 job.setReducerClass(Reduce.class);
42
43 job.setInputFormat(KeyValueTextInputFormat.class);
44 job.setOutputFormat(TextOutputFormat.class);
45 job.setOutputKeyClass(Text.class);
46 job.setOutputValueClass(Text.class);
47 job.set("key.value.separator.in.input.line", ",");
48
49 JobClient.runJob(job);
50
51 return 0;
52 }
53
54 public static void main(String[] args) {
55 try {
56 int res = ToolRunner.run(new Configuration(), new MyJob(), args);
57 System.exit(res);
58 } catch (Exception e) {
59 e.printStackTrace();
60 }
61 }
62 }
主函数做了异常处理,其余和原书一致。
直接在IDEA中执行代码,数据文件目录和书上不同,故命令行参数和原书略有差别,如下:
/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt output
IDEA的配置如图
数据文件路径如图
以上配置无拼写错误。然后,我很高兴地按下'Run MyJob.main()' ,准备等结果,继续跟着书走。
悲剧了,IDEA输出input path does not exist。输入路径是/Users/michael/IdeaProjects/Hadoop/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt,这不是Working directory拼上我给的第一个参数么,怎么回事。
整份代码,就run方法中用了Path,应该是这边的问题。
在FileOutputFormat.setOutputPath(job, out);后面加上System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());发现输入路径真的被合并到工作路径下了。怪不得报错呢(StackOverflow中有人说是我的数据文件没提交到Hadoop才会报这个错误)。
现在,可以判断问题是FileInputFormat.setInputPaths(job, in);导致的。进源码看看它是怎么工作的。
/**
* Set the array of {@link Path}s as the list of inputs
* for the map-reduce job.
*
* @param conf Configuration of the job.
* @param inputPaths the {@link Path}s of the input directories/files
* for the map-reduce job.
*/
public static void setInputPaths(JobConf conf, Path... inputPaths) {
Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
for(int i = 1; i < inputPaths.length;i++) {
str.append(StringUtils.COMMA_STR);
path = new Path(conf.getWorkingDirectory(), inputPaths);
str.append(StringUtils.escapeString(path.toString()));
}
conf.set("mapred.input.dir", str.toString());
}
可以看到,源码第一句就是合并conf和inputPaths。 既然合并了工作路径,那就把它去掉好了。
在FileInputFormat.setInputPaths(job, in);前保存合并前结果
Path workingDirectoryBak = job.getWorkingDirectory();
再设置为根目录
job.setWorkingDirectory(new Path("/"));
然后在它后面设置回来
job.setWorkingDirectory(workingDirectoryBak);
加上输出,确认操作结果
System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
新代码如下,mac下的输入法不好用,直接中式英语写注释
1 public int run(String[] args) throws Exception {
2 Configuration configuration = getConf();
3
4 JobConf job = new JobConf(configuration, MyJob.class);
5
6 Path in = new Path(args[0]);
7 Path out = new Path(args[1]);
8
9 // backup current directory, namely /Users/michael/IdeaProjects/Hadoop where source located
10 Path workingDirectoryBak = job.getWorkingDirectory();
11 // set to root dir
12 job.setWorkingDirectory(new Path("/"));
13 // let it combine root and input path
14 FileInputFormat.setInputPaths(job, in);
15 // set it back
16 job.setWorkingDirectory(workingDirectoryBak);
17 // print to confirm
18 System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());
19
20 FileOutputFormat.setOutputPath(job, out);
21
22 job.setJobName("MyJob");
23 job.setMapperClass(MapClass.class);
24 job.setReducerClass(Reduce.class);
25
26 job.setInputFormat(KeyValueTextInputFormat.class);
27 job.setOutputFormat(TextOutputFormat.class);
28 job.setOutputKeyClass(Text.class);
29 job.setOutputValueClass(Text.class);
30 job.set("key.value.separator.in.input.line", ",");
31
32 JobClient.runJob(job);
33
34 return 0;
35 }
再试一次,正常,将近1分钟执行完,配置差就是这样。
|