JAVA高级面试进阶训练营视频教程

admin · 发表于 2021-7-4 18:02:13

本文讲述使用IntelliJ IDEA时遇到Hadoop提示input path does not exist（输入路径不存在）的解决过程。

环境：Mac OS X 10.9.5, IntelliJ IDEA 13.1.4, Hadoop 1.2.1

Hadoop放在虚拟机中，宿主机通过SSH连接，IDE和数据文件在宿主机。

这是自学Hadoop的第三天。以前做过点.NET开发，Mac、IntelliJ IDEA、Hadoop、CentOS对我而言，相当陌生。第一份Hadoop代码就遇到了问题。

以下代码摘自《Hadoop In Action》第4章第1份代码。

 1 public class MyJob extends Configured implements Tool {
 2     public static class MapClass extends MapReduceBase
 3             implements Mapper<Text, Text, Text, Text> {
 4         @Override
 5         public void map(Text key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
 6                 throws IOException {
 7             output.collect(value, key);
 8         }
 9     }
10 
11 
12     public static class Reduce extends MapReduceBase
13             implements Reducer<Text, Text, Text, Text> {
14         @Override
15         public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
16             String csv = "";
17             while (values.hasNext()) {
18                 if (csv.length() > 0) {
19                     csv += ", ";
20                 }
21                 csv += values.next().toString();
22             }
23             output.collect(key, new Text(csv));
24         }
25     }
26 
27     @Override
28     public int run(String[] args) throws Exception {
29         Configuration configuration = getConf();
30 
31         JobConf job = new JobConf(configuration, MyJob.class);
32 
33         Path in = new Path(args[0]);
34         Path out = new Path(args[1]);
35 
36         FileInputFormat.setInputPaths(job, in);
37         FileOutputFormat.setOutputPath(job, out);
38 
39         job.setJobName("MyJob");
40         job.setMapperClass(MapClass.class);
41         job.setReducerClass(Reduce.class);
42 
43         job.setInputFormat(KeyValueTextInputFormat.class);
44         job.setOutputFormat(TextOutputFormat.class);
45         job.setOutputKeyClass(Text.class);
46         job.setOutputValueClass(Text.class);
47         job.set("key.value.separator.in.input.line", ",");
48 
49         JobClient.runJob(job);
50 
51         return 0;
52     }
53 
54     public static void main(String[] args) {
55         try {
56             int res = ToolRunner.run(new Configuration(), new MyJob(), args);
57             System.exit(res);
58         } catch (Exception e) {
59             e.printStackTrace();
60         }
61     }
62 }

主函数做了异常处理，其余和原书一致。

直接在IDEA中执行代码，数据文件目录和书上不同，故命令行参数和原书略有差别，如下：

/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt output

IDEA的配置如图

数据文件路径如图

以上配置无拼写错误。然后，我很高兴地按下'Run MyJob.main()' ，准备等结果，继续跟着书走。

悲剧了，IDEA输出input path does not exist。输入路径是/Users/michael/IdeaProjects/Hadoop/Users/michael/Desktop/Hadoop/HadoopInAction/cite75_99.txt，这不是Working directory拼上我给的第一个参数么，怎么回事。

整份代码，就run方法中用了Path，应该是这边的问题。

在FileOutputFormat.setOutputPath(job, out);后面加上System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());发现输入路径真的被合并到工作路径下了。怪不得报错呢（StackOverflow中有人说是我的数据文件没提交到Hadoop才会报这个错误）。

现在，可以判断问题是FileInputFormat.setInputPaths(job, in);导致的。进源码看看它是怎么工作的。

  /**
   * Set the array of {@link Path}s as the list of inputs
   * for the map-reduce job.
   * 
   * @param conf Configuration of the job. 
   * @param inputPaths the {@link Path}s of the input directories/files 
   * for the map-reduce job.
   */ 
  public static void setInputPaths(JobConf conf, Path... inputPaths) {
    Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
      str.append(StringUtils.COMMA_STR);
      path = new Path(conf.getWorkingDirectory(), inputPaths);
      str.append(StringUtils.escapeString(path.toString()));
    }
    conf.set("mapred.input.dir", str.toString());
  }

可以看到，源码第一句就是合并conf和inputPaths。既然合并了工作路径，那就把它去掉好了。

在FileInputFormat.setInputPaths(job, in);前保存合并前结果

　　Path workingDirectoryBak = job.getWorkingDirectory();

再设置为根目录

　　job.setWorkingDirectory(new Path("/"));

然后在它后面设置回来

　　job.setWorkingDirectory(workingDirectoryBak);

加上输出，确认操作结果

　　System.out.println(FileInputFormat.getInputPaths(job)[0].toUri());

新代码如下，mac下的输入法不好用，直接中式英语写注释

1 public int run(String[] args) throws Exception { 2 Configuration configuration = getConf(); 3 4 JobConf job = new JobConf(configuration, MyJob.class); 5 6 Path in = new Path(args[0]); 7 Path out = new Path(args[1]); 8 9 // backup current directory, namely /Users/michael/IdeaProjects/Hadoop where source located 10 Path workingDirectoryBak = job.getWorkingDirectory(); 11 // set to root dir 12 job.setWorkingDirectory(new Path("/")); 13 // let it combine root and input path 14 FileInputFormat.setInputPaths(job, in); 15 // set it back 16 job.setWorkingDirectory(workingDirectoryBak); 17 // print to confirm 18 System.out.println(FileInputFormat.getInputPaths(job)[0].toUri()); 19 20 FileOutputFormat.setOutputPath(job, out); 21 22 job.setJobName("MyJob"); 23 job.setMapperClass(MapClass.class); 24 job.setReducerClass(Reduce.class); 25 26 job.setInputFormat(KeyValueTextInputFormat.class); 27 job.setOutputFormat(TextOutputFormat.class); 28 job.setOutputKeyClass(Text.class); 29 job.setOutputValueClass(Text.class); 30 job.set("key.value.separator.in.input.line", ","); 31 32 JobClient.runJob(job); 33 34 return 0; 35 }

再试一次，正常，将近1分钟执行完，配置差就是这样。

		自动登录	找回密码
密码			立即注册

JAVA高级面试进阶训练营视频教程	Java架构师系统进阶VIP课程	分布式高可用全栈开发微服务教程	Go语言视频零基础入门到精通	Java架构师3期(课件+源码)
Java开发全终端实战租房项目视频教程	SpringBoot2.X入门到高级使用教程	大数据培训第六期全套视频教程	深度学习（CNN RNN GAN）算法原理	Java亿级流量电商系统视频教程
互联网架构师视频教程	年薪50万Spark2.0从入门到精通	年薪50万！人工智能学习路线教程	年薪50万大数据入门到精通学习路线	年薪50万机器学习入门到精通教程
仿小米商城类app和小程序视频教程	深度学习数据分析基础到实战	最新黑马javaEE2.1就业课程	从 0到JVM实战高手教程	MySQL入门到精通教程

JAVA高级面试进阶训练营视频教程

Java架构师系统进阶VIP课程

Hadoop on Mac with IntelliJ IDEA - 1 解决input path does not exist问题