Spark-Java本地模式运行,导出文件跳空指针


1、如果用collect()导出RDD结果是不会报错的
2、使用MAVEN导的spark-1.1.0的包是不会出这个问题的

3、下面的问题出现在我是用的是MAVEN导的1.4.1的包,个人估计是依赖问题,但是不知该如何定位,求解答。
4、该测试环境是在WIN7的ECLIPSE,使用直接编译报的错误

请问,这个问题出错在哪里呢,该如何解决呢,谢谢。


 15/08/06 09:01:12 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 5)
java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:798)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/08/06 09:01:12 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 6, localhost, PROCESS_LOCAL, 1165 bytes)
15/08/06 09:01:12 INFO Executor: Running task 1.0 in stage 3.0 (TID 6)
15/08/06 09:01:12 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 5, localhost): java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:798)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

15/08/06 09:01:12 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; aborting job
15/08/06 09:01:12 INFO TaskSchedulerImpl: Cancelling stage 3
15/08/06 09:01:12 INFO TaskSchedulerImpl: Stage 3 was cancelled
15/08/06 09:01:12 INFO Executor: Executor is trying to kill task 1.0 in stage 3.0 (TID 6)
15/08/06 09:01:12 INFO DAGScheduler: ResultStage 3 (saveAsTextFile at SparkTest.java:73) failed in 0.530 s
15/08/06 09:01:12 INFO DAGScheduler: Job 1 failed: saveAsTextFile at SparkTest.java:73, took 10.054969 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 5, localhost): java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
    at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:468)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:798)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1104)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/08/06 09:01:12 INFO SparkContext: Invoking stop() from shutdown hook
15/08/06 09:01:12 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
15/08/06 09:01:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/08/06 09:01:12 INFO Executor: Executor killed task 1.0 in stage 3.0 (TID 6)
15/08/06 09:01:12 WARN TaskSetManager: Lost task 1.0 in stage 3.0 (TID 6, localhost): TaskKilled (killed intentionally)
15/08/06 09:01:12 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
15/08/06 09:01:12 INFO SparkUI: Stopped Spark web UI at http://192.168.134.1:4040
15/08/06 09:01:12 INFO DAGScheduler: Stopping DAGScheduler
15/08/06 09:01:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/08/06 09:01:12 INFO Utils: path = C:\Users\Administrator\AppData\Local\Temp\spark-23087a14-2765-4609-a0c3-4c65ef12dad3\blockmgr-51dbd4ec-0363-468a-b232-4800bc189e9c, already present as root for deletion.
15/08/06 09:01:12 INFO MemoryStore: MemoryStore cleared
15/08/06 09:01:12 INFO BlockManager: BlockManager stopped
15/08/06 09:01:12 INFO BlockManagerMaster: BlockManagerMaster stopped
15/08/06 09:01:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/08/06 09:01:12 INFO SparkContext: Successfully stopped SparkContext
15/08/06 09:01:12 INFO Utils: Shutdown hook called
15/08/06 09:01:12 INFO Utils: Deleting directory C:\Users\Administrator\AppData\Local\Temp\spark-23087a14-2765-4609-a0c3-4c65ef12dad3
15/08/06 09:01:12 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/08/06 09:01:12 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

scala 包冲突 spark java

BD-艾斯 9 years, 6 months ago

Your Answer