revolutionanalytics / rmr2 Goto Github PK
View Code? Open in Web Editor NEWA package that allows R developer to use Hadoop MapReduce
A package that allows R developer to use Hadoop MapReduce
Continuing from issue RevolutionAnalytics/RHadoop#167
Was RevolutionAnalytics/RHadoop#164 in the old repo
I tried to run rmr2 tutorial on CDH4, and I compiled source code of R and made it executable by all users. Now I can run WordCount jave code, but stucked at rmr2 package, because I saw this message first:
Error: java.lang.RuntimeException: Error in configuring object
I wonder how I can debug rmr2 package, any thoughts?
Here are details.
Source code:
[cloudera@localhost ~]$ cat test.rmr.R
library(rmr2)
small.ints = to.dfs(1:1000)
mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
Rscript is installed on /usr/bin:
[cloudera@localhost ~]$ ls -l /usr/bin/Rscript
lrwxrwxrwx 1 root root 37 Apr 1 15:19 /usr/bin/Rscript -> /home/cloudera/software/R/bin/Rscript
[cloudera@localhost ~]$ ls -l /home/cloudera/software/R/bin/Rscript
-rwxr-xr-x 1 cloudera cloudera 17730 Apr 1 10:23 /home/cloudera/software/R/bin/Rscript
[cloudera@localhost ~]$ ls -l /home/cloudera/software/R/bin/Rscript
-rwxr-xr-x 1 cloudera cloudera 17730 Apr 1 10:23 /home/cloudera/software/R/bin/Rscript
Error message:
[cloudera@localhost ~]$ Rscript test.rmr.R
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: methods
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
13/04/01 17:35:27 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/01 17:35:27 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning message:
In to.dfs(1:1000) : Converting to.dfs argument to keyval with a NULL key
13/04/01 17:35:29 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/tmp/Rtmps3CHWS/rmr-local-env406f381b22dd, /tmp/Rtmps3CHWS/rmr-global-env406f575cacdf, /tmp/Rtmps3CHWS/rmr-streaming-map406f54bdce9f] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.0.jar] /tmp/streamjob34376797830229279.jar tmpDir=null
13/04/01 17:35:31 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/01 17:35:33 INFO mapred.FileInputFormat: Total input paths to process : 1
13/04/01 17:35:33 INFO mapreduce.JobSubmitter: number of splits:2
13/04/01 17:35:33 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/04/01 17:35:33 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
13/04/01 17:35:33 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/04/01 17:35:33 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/04/01 17:35:33 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/04/01 17:35:33 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/04/01 17:35:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1364677243840_0008
13/04/01 17:35:33 INFO client.YarnClientImpl: Submitted application application_1364677243840_0008 to ResourceManager at /0.0.0.0:8032
13/04/01 17:35:33 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1364677243840_0008/
13/04/01 17:35:33 INFO mapreduce.Job: Running job: job_1364677243840_0008
13/04/01 17:35:45 INFO mapreduce.Job: Job job_1364677243840_0008 running in uber mode : false
13/04/01 17:35:45 INFO mapreduce.Job: map 0% reduce 0%
13/04/01 17:36:00 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:00 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:12 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:14 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:23 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:27 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:36 INFO mapreduce.Job: map 50% reduce 0%
13/04/01 17:36:36 INFO mapreduce.Job: Job job_1364677243840_0008 failed with state FAILED due to: Task failed task_1364677243840_0008_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
13/04/01 17:36:37 INFO mapreduce.Job: Counters: 6
Job Counters
Failed map tasks=7
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=85605
Total time spent by all reduces in occupied slots (ms)=0
13/04/01 17:36:37 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Calls: mapreduce -> mr
Execution halted
Picking up from RevolutionAnalytics/RHadoop#177
It does seem a bit complicated to do. Your data is structured, but has two structures. So we need to deal with it as unstructured data, that is to use lists as arguments to keyval. One way of doing that is to split the data frames by the key you want, as in
splitcars = split(mtcars, mtcars$mpg)
keyal(names(splitcars), splitcars)
The same for the other type of frames. One caveat is that if you have many different keys this is going to create lots of small data frames, which is very inefficient. rmr2 has the same limitation, which comes straight from R. I am not sure I understand the overall goals of your program, so if this doesn't do it please give me a bit more context.
The current setting of N records doesn't generalize from small to large records. What we want is to load as much data as possible in one shot, without running out of memory. As it is now it's actually easier than a set number of records for binary formats. The short term approach could be to leave it a number of lines for text formats and make it a number of bytes or MB for binary formats.
started work with 7a03b29 need to change man
keyval.length defaults not fit for vectorized reduce
Makes for less boilerplate code
file clean up works, but the directory is left behinf
It's not the number of calls, it's the number of records that go through any number of calls. This doesn't help with vectorization issue in a way because what we need to keep in check is the number of reduce calls, not the number of records, which can be big and yet not imply inefficiency if the number of distinct keys is small and the code vectorized. Unfortunately an efficient fix is not obvious because the split into distinct keys happens inside the reduce.keyval function which doesn't return the number of calls to reduce. Moving the counter call inside reduce.keyval it seems to me, would violate separation of concerns (the keyval concept doesn't depend on any backend or any other part of mapreduce).
Hi,
i was trying to use the equijoin function of rmr2.1 .While using the equijoin I encountered the error "rbind.fill must be a dataframe". Only few of my reduce tasks were affected with this error.
I checked the code and my suspicion was there might be keys being passed with NA or null value. Checked it out that wasn't the case.
Then I even coerced my object to a data frame and tried running the equijoin it still fails.
Can you help me figure out the possible other causes for this problem?
Thanks,
Mayank
Hi,
I was the output.format command for my mapreduce output with rmr2.1
The output file gets created peacefully. If i specify my output format as the following way,
-output.format = make.output.format("csv",sep=",")
the file gets created with no issue. But if i specify it as the following way,
-output.format = make.output.format("csv",sep="|")
the last column of my data frame has a "\t" appended at the end of each row.
Is there a reason for the "\t" to be present?
Thanks,
Mayank
After reading http://www.slideshare.net/Hadoop_Summit/innovations-in-apache-hadoop-mapreduce-pig-hive-for-improving-query-performance slide 19 in particular not only I was reminded of the large performance advantages of map side joins but that they have natural use cases in things like star schemas and the like. Moreover, it seems like an rmr implementation shouldn't be all that difficult. One decision is if we should hide it behind the regular equijoin interface as an implementation change, with at most an API hint to use map side algorithm or some addition to the API. The latter is less conservative, makes the API more complex, but it also allows to do big-to-many-small joins typical of a star schema in one step, if all the small tables fit in memory, which allows to skip persisting of intermediate results, one for each small table.
I am facing the following problem:
these two R commands:
library(rmr2)
small.ints = to.dfs(1:1000)
produce the following output:
library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
small.ints = to.dfs(1:1000)
sh: 1: /usr/local/hadoop: Permission denied
Warning message:
In to.dfs(1:1000) : Converting to.dfs argument to keyval with a NULL key
HADOOP_CMD
is set to /usr/local/hadoop
. The directory should be accessible, I even used chmod -R 777...
I use hadoop v1.1.2 which resides in /usr/local/hadoop
.
R is selfcompiled v3.0.1 and stored in /usr/local/R.
Are there any good ideas or can anyone give advice what I can try next?
If backend.parameters
is considered deprecated, what is the best-practice for things like rmr-wide options. For example, a multi-purpose Hadoop cluster is unlikely to have the configuration optimized for R tasks. In particular, things like mapred.child.java.opts
are likely to be set for Java jobs, allocating a large amount of memory to the JVM. mapreduce
jobs need to either reduce maximum heap space allocated to the JVM (128 MB is as low as I could go) or increase mapred.job.map.memory.mb
which is likely to make everyone else angry at you since this is probably configured for your cluster's specific hardware to allow efficient distribution of tasks.
Is there/should there be another way to set rmr Hadoop parameters (as opposed to job-specific parameters)?
Right now a big data object (returned by mapreduce when output is NULL) is an alternative to an explicit hdfs path with some interesting properties (garbage collected). I would like to explore the possibility of turning this concept into a class to unify support of different I/O possibilities and better hide the implementation. A big data object could be an explicit path, a temporary, garbage collected file or directory, a managed file which is refreshed when it is stale compared to its inputs and generating function, a hbase table and what not. It would incorporate a notion of a format which is not handled separately.
Hi,
I was using the rmr2.1 package and while using the to.dfs I encountered the error below
"Converting to.dfs argument to keyval with a NULL key log4j:ERROR Could not find value for key log4j.appender."
This error is rmr2.1 specific only, I try the same commands on rmr2.02 and they run.
Is there any reason for this to fail in rmr2.1?
Also what could be a fix to solve this issue?
Thanks,
Mayank
partially picking up from RevolutionAnalytics/RHadoop#48
We need to file issue upstream with the streaming folks and even better submit a patch. We need to remark how the hive and streaming implementations of typedbytes have diverged and that this is not good.
splitting this from issue #50
Picking up from RevolutionAnalytics/RHadoop#178 What you are trying to do is not supported. Please try with two separate inputs.
It is common practice to apply transformation to mapreduce programs that change the number and nature of jobs involved, usually to minimize I/O while preserving the same function. It is done in Hive, Pig and Cascading for example. In rmr it is a little more challenging because
On the positive side are the reflection capabilities of R that allows to inspect the parse tree, for instance. A little example of what could be done is in a function optimize in the source, completely untested. The only optimization applied it to reduce a chain of mapreduce calls that have a reduce only at the end to a single mapreduce job by composing the mappers.
According to changes in 2.1 we cannot return NULL keys in reduce function. Instead, we should use 0-length values. It works with local backend but fails on hadoop:
WORKS:
rmr.options(backend = "local")
mapreduce(to.dfs(keyval(1:3, 1:3)), reduce = function(k, v) if (k %% 2) keyval(k, v) else keyval(integer(0), integer(0)))
WORKS:
rmr.options(backend = "hadoop")
mapreduce(to.dfs(keyval(1:2, 1:2)), reduce = function(k, v) if (k %% 2) keyval(k, v) else keyval(integer(0), integer(0)))
FAILS:
rmr.options(backend = "hadoop")
mapreduce(to.dfs(keyval(1:3, 1:3)), reduce = function(k, v) if (k %% 2) keyval(k, v) else keyval(integer(0), integer(0)))
opening on behalf of @everdark
Hi,
Recently I update the package to 2.1.0 (from 1.3.1!) and found something unexpected in a even simplest form of a mapreduce job. Here it is:
test <- from.dfs(
mapreduce(
input=fname.sample,
map=function(.,obs) keyval(NULL,1),
input.format=make.input.format(format="csv", sep=","),
reduce=NULL,
combine=NULL
))
test
where fname.sample is a string indicating the path of a .csv file stored in Hadoop.
The result given was:
test
$key
NULL
$val
[1] 1 1
which is quite weird for me to understand what's going on.
Why did the values get duplicated? This case does happen in my code for a more serious job (where the key will unnecessarily recycles...), which in turn make the result quite unpredictable.
Does anyone have ideas on this issue?
The complete log in the R console is as follows.
packageJobJar: [/tmp/RtmpVpDPTq/rmr-local-env16ec54892904, /tmp/RtmpVpDPTq/rmr-global-env16ec49ffa7b3, /tmp/RtmpVpDPTq/rmr-streaming-map16ec5fd2c921, /tmp/hadoop-mis/hadoop-unjar6866723637623951400/] [] /tmp/streamjob7363209296587117900.jar tmpDir=null
13/03/15 00:09:31 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/15 00:09:32 INFO streaming.StreamJob: getLocalDirs(): [/data/hadoop/mapred/temp/]
13/03/15 00:09:32 INFO streaming.StreamJob: Running job: job_201302201645_2649
13/03/15 00:09:32 INFO streaming.StreamJob: To kill this job, run:
13/03/15 00:09:32 INFO streaming.StreamJob: /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=s1dhd02.buyabs.corp:8021 -kill job_201302201645_2649
13/03/15 00:09:32 INFO streaming.StreamJob: Tracking URL: XXXXXX
13/03/15 00:09:33 INFO streaming.StreamJob: map 0% reduce 0%
13/03/15 00:09:43 INFO streaming.StreamJob: map 100% reduce 0%
13/03/15 00:09:45 INFO streaming.StreamJob: map 100% reduce 100%
13/03/15 00:09:45 INFO streaming.StreamJob: Job complete: job_201302201645_2649
13/03/15 00:09:45 INFO streaming.StreamJob: Output: /tmp/RtmpVpDPTq/file16ec62d597d8
13/03/15 00:09:52 WARN snappy.LoadSnappy: Snappy native library is available
13/03/15 00:09:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/03/15 00:09:52 INFO snappy.LoadSnappy: Snappy native library loaded
13/03/15 00:09:52 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:52 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:52 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:52 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:54 WARN snappy.LoadSnappy: Snappy native library is available
13/03/15 00:09:54 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/03/15 00:09:54 INFO snappy.LoadSnappy: Snappy native library loaded
13/03/15 00:09:54 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:54 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:54 INFO compress.CodecPool: Got brand-new decompressor
13/03/15 00:09:54 INFO compress.CodecPool: Got brand-new decompressor
http://dumbotics.com/2009/06/08/multiple-outputs/
https://github.com/klbostee/dumbo/blob/master/dumbo/backends/streaming.py
possibly reuse some classes from dumbo-related feathers project, also useful forhttps://github.com/RevolutionAnalytics/RHadoop/issues/4 Keep in mind though that all keys are raw type (bytes in Java) when using the native format
I tried to run rmr2 tutorial on CDH4, and I compiled source code of R and made it executable by all users. Now I can run WordCount jave code, but stucked at rmr2 package, because I saw this message first:
Error: java.lang.RuntimeException: Error in configuring object
Here are details.
Source code:
[cloudera@localhost ~]$ cat test.rmr.R
library(rmr2)
small.ints = to.dfs(1:1000)
mapreduce(
input = small.ints,
map = function(k, v) cbind(v, v^2))
Rscript is installed on /usr/bin:
[cloudera@localhost ~]$ ls -l /usr/bin/Rscript
lrwxrwxrwx 1 root root 37 Apr 1 15:19 /usr/bin/Rscript -> /home/cloudera/software/R/bin/Rscript
[cloudera@localhost ~]$ ls -l /home/cloudera/software/R/bin/Rscript
-rwxr-xr-x 1 cloudera cloudera 17730 Apr 1 10:23 /home/cloudera/software/R/bin/Rscript
[cloudera@localhost ~]$ ls -l /home/cloudera/software/R/bin/Rscript
-rwxr-xr-x 1 cloudera cloudera 17730 Apr 1 10:23 /home/cloudera/software/R/bin/Rscript
Error message:
[cloudera@localhost ~]$ Rscript test.rmr.R
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: methods
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
13/04/01 17:35:27 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/01 17:35:27 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning message:
In to.dfs(1:1000) : Converting to.dfs argument to keyval with a NULL key
13/04/01 17:35:29 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/tmp/Rtmps3CHWS/rmr-local-env406f381b22dd, /tmp/Rtmps3CHWS/rmr-global-env406f575cacdf, /tmp/Rtmps3CHWS/rmr-streaming-map406f54bdce9f] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.0.jar] /tmp/streamjob34376797830229279.jar tmpDir=null
13/04/01 17:35:31 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/01 17:35:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/01 17:35:33 INFO mapred.FileInputFormat: Total input paths to process : 1
13/04/01 17:35:33 INFO mapreduce.JobSubmitter: number of splits:2
13/04/01 17:35:33 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/04/01 17:35:33 WARN conf.Configuration: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
13/04/01 17:35:33 WARN conf.Configuration: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/04/01 17:35:33 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/04/01 17:35:33 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/04/01 17:35:33 WARN conf.Configuration: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
13/04/01 17:35:33 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/04/01 17:35:33 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/04/01 17:35:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1364677243840_0008
13/04/01 17:35:33 INFO client.YarnClientImpl: Submitted application application_1364677243840_0008 to ResourceManager at /0.0.0.0:8032
13/04/01 17:35:33 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1364677243840_0008/
13/04/01 17:35:33 INFO mapreduce.Job: Running job: job_1364677243840_0008
13/04/01 17:35:45 INFO mapreduce.Job: Job job_1364677243840_0008 running in uber mode : false
13/04/01 17:35:45 INFO mapreduce.Job: map 0% reduce 0%
13/04/01 17:36:00 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:00 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:12 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:14 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:23 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:27 INFO mapreduce.Job: Task Id : attempt_1364677243840_0008_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:424)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "Rscript": java.io.IOException: error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
13/04/01 17:36:36 INFO mapreduce.Job: map 50% reduce 0%
13/04/01 17:36:36 INFO mapreduce.Job: Job job_1364677243840_0008 failed with state FAILED due to: Task failed task_1364677243840_0008_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
13/04/01 17:36:37 INFO mapreduce.Job: Counters: 6
Job Counters
Failed map tasks=7
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=85605
Total time spent by all reduces in occupied slots (ms)=0
13/04/01 17:36:37 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Calls: mapreduce -> mr
Execution halted
see https://issues.apache.org/jira/browse/HADOOP-5528
use hashing to make it more user-friendly
With the current reduce interface this doesn't matter that much because one can sort the values associated with a key in memory. If we revisit the issue of an iterator type interface for reduce (for when values are big and can not be held in memory more than a few at a time) then this will go on the short track. The reference to hashing above means the following. Since this binary partitioner is very low level and allows only to specify the keys as number of bytes to consider or skip, it would be very hard to support complex keys and provide a user friendly API. If we take two lists of primary and secondary keys, hash the former and then prepend the hash to the key, we can use this simple binary partitioner even with complex keys. The next hurdle though is ordering, as byte ordering which what java would perform is unlikely to be the correct ordering for the original key domain. An additional hurdle is the efficient implementation of all of this. One wonders why the author of the patch for the above issue, didn't use typedbytes serialization for this case as he did with the multiplefileoutputformat. In that case, the key is an ArrayList with the first element being the filename and the second the actual key. The same could have been done for primary and secondary, and we should consider submitting our own patch to make it that way.
The dumbo author comments
No grand reasons for it really, it just seemed sensible to keep it general/low-level. Writing a custom partitioner using typed bytes isn't hard though, e.g.:
picking up from RevolutionAnalytics/RHadoop#113
Hi,
I was looking into join and couldn't find an option for equijoin. Is there any way which would let me specify the output format for equijoin?
Thanks,
Mayank
it seems cmp is a remnant of a bygone version with only one use
When you use from.dfs to load a directory from hadoop, rmr will spawn multiple hadoop dfs -get
tasks for each individual file. Instead it could use hdfs.getmerge to do this in a more efficient way, drastically decreasing the JVM startup overhead.
Since the rmr2 format is referred to as "csv"
, shouldn't it actually call read.csv
so that it has the expected default parameters? Of particular importance is comment.char = ""
, which I spent a surprising amount of time debugging before I finally noticed that rmr actually calls read.table
. I think it specifies somewhere in the documentation that read.table
is being called, but at least I still found it surprising that it's not calling read.csv
.
Here is some code:
#!/usr/bin/env Rscript
#
# Based on Breen's Example 2: airline
#
library(rmr2)
# assumes 'airline' and airline/data exists on HDFS under user's home directory
hdfs.data.root = 'airline'
hdfs.data = file.path(hdfs.data.root, 'data')
# unless otherwise specified, directories on HDFS should be relative to user's home
hdfs.out.root = hdfs.data.root
hdfs.out = file.path(hdfs.out.root, 'out')
mapper.year.market.enroute_time = function(k, fields) {
# Skip header line in csv formatted file
if (!(as.character(fields[[1]]) == "Year")) {
keyval(as.character(fields[[9]]), 1)
}
}
reducer.year.market.enroute_time = function(key, vv) {
# count values for each key
keyval(key, sum(as.numeric(vv),na.rm=TRUE))
}
mr.year.market.enroute_time = function (input, output) {
mapreduce(input = input,
output = output,
input.format = make.input.format("csv", sep = ","),
map = mapper.year.market.enroute_time,
reduce = reducer.year.market.enroute_time)
}
out = from.dfs(mr.year.market.enroute_time(hdfs.data, hdfs.out))
results.df = as.data.frame(out,stringsAsFactors=F )
colnames(results.df) = c('carrier', 'count')
print(results.df)
Here is a sample csv file:
Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,FlightNum,TailNum,ActualElapsedTime,CRSElapsedTime,AirTime,ArrDelay,DepDelay,Origin,Dest,Distance,TaxiIn,TaxiOut,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay
2004,3,25,4,848,840,1241,1225,HA,1,N587HA,353,345,218,16,8,LAX,HNL,2556,4,11,0,,0,16,0,0,0,0
2004,3,25,4,1426,1425,2135,2140,HA,2,N592HA,309,315,402,-5,1,HNL,LAX,2556,10,17,0,,0,0,0,0,0,0
2004,3,25,4,1222,1220,1551,1605,HA,3,N583HA,329,345,192,-14,2,LAX,HNL,2556,4,13,0,,0,0,0,0,0,0
2004,3,25,4,2220,2225,524,525,HA,4,N583HA,304,300,400,-1,-5,HNL,LAX,2556,5,19,0,,0,0,0,0,0,0
2004,3,25,4,1016,1010,1431,1430,HA,7,N591HA,375,380,228,1,6,LAS,HNL,2762,3,24,0,,0,0,0,0,0,0
2004,3,25,4,2243,2250,617,615,HA,8,N584HA,334,325,434,2,-7,HNL,LAS,2762,7,13,0,,0,0,0,0,0,0
2004,3,25,4,1717,1725,2046,2110,HA,9,N584HA,329,345,196,-24,-8,LAX,HNL,2556,6,7,0,,0,0,0,0,0,0
Approximately half of the real rows of the file will be "missed" when processed by the script. There is no warning or error message.
Hi,
I was reading the earlier issues about backend.parameters being depreciated and that you are planning to remove it from rmr 2.1 .
I think there is a case for still keeping the backend.parameters option. I agree with your view that on a properly set cluster it is not required, but there are issues security and access restrictions which sometimes hinder figuring out a wrongly set cluster and an erroneous code. In these cases backend.parameters helps you figure it out.
Recently I had a issue where only 1 reducer was being spawned, it turned out to be a wrongly configured cluster which I was able to guess only after specifying backend.parameters and reduce tasks to a specific number.
So I would really appreciate if you keep that option. It is helpful.
provide a way to set options to Rscript
I'm using RMR and I'd like to serialize multiple randomForests to hdfs.
reducer <- function (k,v) {
rf <- randomForest(formula=model.formula,
data=v,
na.action=na.roughfix,
ntree=number.trees,
do.trace=TRUE
)
keyval(k,list(forest=rf))
}
I'm calling the reducer like this
mapreduce(input="train_clean.csv",
input.format=titanic.input.format,
map=mapper,
reduce=reducer,
output.format="native",
output="titanic-out")
When I run this, the reducers fail like this:
2013-05-02 08:18:27,372 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:334)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:458)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:218)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:376)
2013-05-02 08:18:27,375 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:334)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:458)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:218)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:376
If I change output.format=text, the same code works.
I'd like to be able to specify the name of my mapreduce job to identity it in the JobTracker among all others.
While this can be done by specifying "mapred.job.name" in the backend.parameters; that feature is deprecated as discussed in #9
There is a Warning in equijoin, "Doesn't work with multiple inputs like mapreduce".
I am guessing that this means when multiple files are present inside the folder,map reduce takes them as multiple input arguments.
If this is not possible, would it also not be able to read the output of a reduce job if Multiple part files are created?
Hi,
I'm getting the following error while running MapReduce jobs in rmr2 v2.0.2. The output of the standard error stream is as follows:
Loading required package: rmr2
Loading required package: Rcpp
Loading required package: methods
Loading required package: int64
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Warning: NAs introduced by coercion
Error in .Call("typed_bytes_reader", data, nobjs, PACKAGE = "rmr2") :
negative length vectors are not allowed
Calls: ... keyval.reader -> format -> typed.bytes.reader -> .Call Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:502)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Can you throw some light on this issue?
Thanks for your help.
rmr2 uses the tempdir() function for creating a name for a temporary directory on HDFS. While it is possible to modify the base directory by setting the environment variable TMPDIR (or alternatively TMP or TEMP) before starting R, there are some limitations. In particular, the problem is that the directory containing temp files has to be the same both on the local file system as well as on HDFS. The following paragraph illustrates this with an example.
For example, if I set the environment variable TMPDIR to some specific location on HDFS, e.g. /user/hduser/tmp, I cannot do it, because I don't have /user/hduser/tmp on my local file system. (R does not allow a change of the temp dir, unless the directory exists on the local file system and is writable.) This is a problem in itself; furthermore, it gets exacerbated if I also need to use tempdir() for my own code.
There is also another reason why changing any of TMPDIR, TMP, or TEMP might not be a good idea: It also affects other parts of the system. For example, it also affects LXDE and Openbox; if you set any of those variables to a non-existing local directory, your background wallpaper will not load after logging in, and you also cannot change the desktop preferences any more.
I'm not exactly sure why, but Rprofmem
doesn't always succeed:
> Rprofmem()
Error in Rprofmem() : memory profiling is not available on this system
It looks like activate.profiling
works fine but close.profiling
fails because it tries to close both Rprof(NULL)
and Rprofmem(NULL)
Hi,
While installing rmr2 on a system with CDH4, I get the following message, stating "Cannot find hadoop-core jar file in hadoop home".
This seems to be more of a warning, as the package runs as expected regardless.
Is this something that we should be bothered about?
The exact install log is as follows:
* installing to library â/usr/local/lib64/R/libraryâ
* installing *source* package ârmr2â ...
** libs
g++ -I/usr/local/lib64/R/include -DNDEBUG -I/usr/local/include `/usr/local/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic -g -O2 -c extras.cpp -o extras.o
g++ -I/usr/local/lib64/R/include -DNDEBUG -I/usr/local/include `/usr/local/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic -g -O2 -c hbase-to-df.cpp -o hbase-to-df.o
g++ -I/usr/local/lib64/R/include -DNDEBUG -I/usr/local/include `/usr/local/lib64/R/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic -g -O2 -c typed-bytes.cpp -o typed-bytes.o
g++ -shared -L/usr/local/lib64 -o rmr2.so extras.o hbase-to-df.o typed-bytes.o -L/usr/local/lib64/R/library/Rcpp/lib -lRcpp -Wl,-rpath,/usr/local/lib64/R/library/Rcpp/lib -L/usr/local/lib64/R/lib -lR
((which hbase && (mkdir -p ../inst; cd hbase-io; sh build_linux.sh; cp build/dist/* ../../inst)) || echo "can't build hbase IO classes, skipping" >&2)
/usr/bin/hbase
build_linux.sh: line 159: [: missing `]'
Using /usr/lib/hadoop as hadoop home
Using /usr/lib/hbase as hbase home
Copying libs into local build directory
ls: cannot access /usr/lib/hadoop/hadoop-*-core.jar: No such file or directory
ls: cannot access /usr/lib/hadoop/hadoop-core-*.jar: No such file or directory
Cannot find hadoop-core jar file in hadoop home
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/local/lib64/R/library/rmr2/libs
** R
** preparing package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called âquickcheckâ
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (rmr2)
Continuing issue RevolutionAnalytics/RHadoop#115
This has become issue https://issues.apache.org/jira/browse/HADOOP-9300
Hi,
I've written a MapReduce job that takes as input, the output of a previous MapReduce job.
The first few lines of the job look something like this:
map_1 = function(k,input_data)
{
index = which(k ==1)
if(length(index) > 0)
{
input_data=do.call("rbind",input_data[index])
input_data = as.data.frame(input_data)
if(nrow(input_data)!=0)
{
#some processing
keyval("constant_key",processed_input_data)
}
}
}
Now, in the processing part, I've clipped some columns but there is nothing that will change the number of rows. But the following is the error I am getting.
Loading required package: rmr2
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: methods
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
data ENTERING FOR at risk19sh: -c: line 0: syntax error near unexpected token `(ʼ
sh: -c: line 0: `/usr/lib/hadoop/bin/hadoop dfs -put /mnt/data/mapred/local/taskTracker/musigma/jobcache/job_201303201047_0032/attempt_201303201047_0032_m_000000_0/work/tmp/RtmpF1qXQp/file708455830e3d c(" new/output/alltags-v33_1/AtRisk/part-00001", " new/output/alltags-v33_1/AtRisk/part-00002")ʼ
Length of Keys = 1
Length of Values = 3
Error in rmr.recycle(k, v) : Canʼt recycle 0-length argument
Calls: <Anonymous> ... keyval.writer -> format -> recycle.keyval -> keyval -> rmr.recycle
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Since there is already an if condition that should take care if the rows are zero, the error is being thrown at keyval, and we are not able to figure out why.
What could be the problem?
I wrote a sample code for equijoin, and I am getting a 'segfault: memory not mapped' error. My R session crashes if I run it from RStudio, but using R, I can exit gracefully (please see the output below).
The code I am trying to run is:
library("rmr2")
authors = data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books = data.frame(
name = I(c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
to.dfs(kv=authors, output="authors.csv", format=make.output.format("csv", sep=","))
to.dfs(kv=books, output="books.csv", format=make.output.format("csv", sep=","))
eqj = function(left.input="authors.csv", right.input="books.csv", output=NULL) {
map.left = function(k, v) {
names(v) = c("surname", "nationality", "deceased")
keyval(v[, "surname"], v[, drop=FALSE])
}
map.right = function(k, v) {
names(v) = c("name", "title", "other.author")
keyval(v[, "name"], v[, "title", drop=FALSE])
}
equijoin(left.input="authors.csv", right.input="books.csv",
input.format=make.input.format("csv", sep=",", as.is=TRUE),
outer="left", map.left=map.left, map.right=map.right)
}
merged = eqj()
The output while running through R is:
> library("rmr2")
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
> from.dfs(equijoin(left.input = to.dfs(keyval(1:10, 1:10^2)), right.input = to.dfs(keyval(1:10, 1:10^3))))
*** caught segfault ***
address 0x8, cause 'memory not mapped'
Traceback:
1: .Call("typedbytes_writer", objects, native, PACKAGE = "rmr2")
2: writeBin(.Call("typedbytes_writer", objects, native, PACKAGE = "rmr2"), con)
3: typedbytes.writer(interleave(keys(kvs), values(kvs)), con, native)
4: format(kv, con)
5: keyval.writer(kv)
6: write.file(kv, tmp)
7: to.dfs(keyval(1:10, 1:10^2))
8: xor(!is.null(left.input), !is.null(input) && (is.null(left.input) == is.null(right.input)))
9: stopifnot(xor(!is.null(left.input), !is.null(input) && (is.null(left.input) == is.null(right.input))))
10: equijoin(left.input = to.dfs(keyval(1:10, 1:10^2)), right.input = to.dfs(keyval(1:10, 1:10^3)))
11: to.dfs.path(input)
12: from.dfs(equijoin(left.input = to.dfs(keyval(1:10, 1:10^2)), right.input = to.dfs(keyval(1:10, 1:10^3))))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 3
[user@localhost ~]$
Any clue on why this is happening?
I would like to propose the definition of a keyval object and an interface for an object that can be a key or a value. The goal is to provide an extensible definition for keyval, which is right now limited to containing vectors, lists, matrices or data frames. Not too shabby, but what about sparse matrices? The design is sketched already in the file keyval.R
. The function starting with rmr.
define an interface for things that can be keys or values. It actually doesn't have to be the same for the two and I think for the sake of generality the symmetry should be dropped. The functions ending in .keyval
define an interface for the keyval class, and hopefully also an implementation that relies completely on the rmr
interface to hide the differences between the concrete data structures. There are some exceptions like the c.or.rbind
function which is rmr.
type function where the name is maybe too tied to a possible implementation or key.normalize, which should be part of a key interface as the name suggests. The goal here is to allow the user to mapreduce any data structures that satisfy a small number of properties, like the ability to be split, unsplit, sliced, measured and a few more, all reasonable (and serde-bility).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.