nexr / rhive Goto Github PK
View Code? Open in Web Editor NEWRHive is an R extension facilitating distributed computing via Apache Hive.
Home Page: http://nexr.github.io/RHive
RHive is an R extension facilitating distributed computing via Apache Hive.
Home Page: http://nexr.github.io/RHive
I'm trying to work with RHive on Amazon EMR and I'm getting an error with rhive.connect, but the connection seems to be working:
{code}
library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-6. For overview type ‘?RHive’.
HIVE_HOME=/home/hadoop/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
rhive.connect(port=10003)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2012-10-16 21:15:43,446 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:121)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:225)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:190)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1330)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1348)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:246)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,497 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,517 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,546 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,563 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3079)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:598)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:548)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:529)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,593 INFO [LeaseChecker] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1235)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1247)
at java.lang.Thread.run(Thread.java:662)
2012-10-16 21:15:43,609 INFO [Thread-7] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3338)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3202)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2415)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2656)
2012-10-16 21:15:43,656 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,659 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,664 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,668 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,681 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3725)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3640)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:96)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
{code}
Error in 1:listStatus$length : argument of length 0
though i can run select query and rhive.desc.table('table') commands
I recently install Hive(0.11.0) and Hadoop(1.2.1) on ubuntu(13.10 x64). R(3.0.1 "Good Sport") is installed correctly and working well with other packages. I also installed Rhive(2.0.0) package, when I tried connecting to hive from R it is showing error message as shown below. please help me on the same.
issue case:
aaa = rhive.big.query("select *,
CASE
WHEN petallength < 2.45 THEN 'first'
WHEN petallength >= 2.45 THEN 'second'
END as separation
from iris_3")
expected output:
1 5.1 3.5 1.4 0.2 setosa first
2 4.9 3.0 1.4 0.2 setosa first
.
.
50 5.0 3.3 1.4 0.2 setosa first
51 7.0 3.2 4.7 1.4 versicolor second
.
.
Need progress bar or verbose message, in order to know ETA when we fetch big data from Hive query result using "rhive.query" function. Sometimes, the "rhive.query" takes too long time. need a sort of indicator.
Hi,
We are trying to execute the exemples (https://github.com/nexr/RHive/wiki/RHive-example-code).
When trying to execute the query, our jobs failed with a KryoException.
It seems that an UDF instance is serialized even it contained converters that are not designed for serialization (no default constructor).
We are using Hadoop 2.2 and hive 0.12 (Horton distribution).
Are those examples still correct ?
Do you have an idea of the cause of our error ?
Regards,
Philippe
The exemple :
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}
rhive.assign('coefficient',coefficient)
rhive.assign('scoring',scoring)
rhive.exportAll(‘scoring’)
rhive.query("select R('scoring',col_sal,0.0) from emp")
Exception :
Error: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:314)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:263)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:376)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:552)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter
Serialization trace:
converters (com.nexr.rhive.hive.udf.RUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109)
at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:810)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:720)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:733)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:287)
... 13 more
In mrapply, mapapply and reduceapply, user can use custom environment.
user can use this custom environment in mapper and reduce function.
Hi,
I met an error when I build a package which have dependency on RHive 0.0-3 as like below
installing source package ‘clog’ ...
** R
** data
** inst
** preparing package for lazy loading
Warning in file(file, "rt") :
cannot open file '/srv/clog/hadoop-0.20.203.0/conf/slaves': No such file or directory
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
Error : package ‘RHive’ could not be loaded
ERROR: lazy loading failed for package ‘clog’
Above error is occurred when I run 'R CMD check packagefile' and I can't finish building package.
I thinks it's cause by returning error code when RHive is loaded by library.
Hi, this is what I get after installing version 2.0.0
rhive.init()
rhive.env()
hadoop home: /usr/local/hadoop
hive home: /usr/local/hiverhive.connect(host='master',port='10000')
Error: class not found
Please consider that version 0.0.7 (the previous version I used) worked just fine. The environment I got from 0.0.7 was
Hive Home Directory : /usr/local/hive
Hadoop Home Directory : /usr/local/hadoop
Hadoop Conf Directory :
Default RServe List
master slave1 slave2 slave3 slave4 slave5 slave6 slave7 slave8 slave9 slave10master : RHIVE_DATA = /home/hduser/RData/
slave1 : RHIVE_DATA = /home/hduser/RData
slave2 : RHIVE_DATA = /home/hduser/RData
slave3 : RHIVE_DATA = /home/hduser/RData
slave4 : RHIVE_DATA = /home/hduser/RData
slave5 : RHIVE_DATA = /home/hduser/RData
slave6 : RHIVE_DATA = /home/hduser/RData
slave7 : RHIVE_DATA = /home/hduser/RData
slave8 : RHIVE_DATA = /home/hduser/RData
slave9 : RHIVE_DATA = /home/hduser/RData
slave10 : RHIVE_DATA = /home/hduser/RData
Connected HiveServer : master:10000
recommanding use rhive.big.query function
Hello,
When I do a simple
rhive.query("select * from X limit 10000")
it takes 90s to answer once the query is completed on the hiveserver (OK displayed on the console).
It increases linearily with data size, always exactly 9 ms per line, it does not depend on the line length.
It is several order of magnitude slower than any other kind of data transfer between R and whatever. My guess is that there is some kind of timeout somewhere.
rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
Error in rdata[[i]] : subscript out of bounds
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coerciontraceback()
3: FUN(X[[1L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
1: rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
please check.
rhive.write.table(editweights)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:575 mismatched input '0' expecting Identifier near ',' in column specification
, errorCode:11, SQLState:42000)
head(editweights)
V1 a A b B c C d D e E f F g G h H i I j J
1 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1
2 -1 0.0 -0.3 -2.0 -3.0 -1.5 -3.0 -1.0 -3.0 -1.0 -1 -1.5 -3.0 -2.0 -3.0 -2.0 -3.0 -2.0 -3 -2 -3
3 -1 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -3
4 -1 -2.0 -3.0 0.0 -0.3 -1.0 -3.0 -1.5 -3.0 -2.0 -2 -1.0 -3.0 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1 -3
5 -1 -3.0 -3.0 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -0.5 -0.5 -0.5 -0.5 -3.0 -3 -3 -3
6 -1 -1.5 -3.0 -1.0 -3.0 0.0 -0.3 -0.5 -0.5 -0.5 -1 -0.5 -0.5 -1.0 -3.0 -1.5 -3.0 -2.0 -3 -2 -3
k K l L m M n N o O p P q Q r R s S t T u U v
1 -1.0 -1 -1 -1 -1 -1 -1.0 -1.0 -1 -1 -1 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1.0
2 -2.0 -3 -2 -3 -2 -3 -2.0 -3.0 -2 -2 -2 -2 -0.5 -0.5 -1.5 -1.5 -0.5 -0.5 -2.0 -2.0 -2 -3 -2.0
3 -3.0 -3 -3 -3 -3 -3 -3.0 -3.0 -3 -3 -3 -3 -0.5 -0.5 -3.0 -3.0 -0.5 -0.5 -3.0 -3.0 -3 -3 -3.0
4 -1.5 -3 -2 -3 -1 -3 -0.5 -0.5 -2 -2 -2 -2 -2.0 -2.0 -1.5 -1.5 -2.0 -3.0 -1.0 -1.0 -1 -3 -0.5
5 -3.0 -3 -3 -3 -3 -3 -0.5 -0.5 -3 -3 -3 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -0.5
6 -2.0 -3 -2 -3 -2 -3 -1.5 -3.0 -2 -2 -2 -2 -2.0 -2.0 -1.0 -1.0 -1.0 -3.0 -0.5 -0.5 -2 -3 -0.5
V w W x X y Y z Z 0 1 2 3 4 5 6 7 8 9
1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0
2 -3.0 -0.5 -0.5 -1.0 -3.0 -2.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
3 -3.0 -0.5 -0.5 -3.0 -3.0 -3.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
4 -0.5 -2.0 -2.0 -1.5 -3.0 -1.0 -3 -2.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
5 -0.5 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
6 -0.5 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
_ $ #
1 -0.5 -0.15 -0.2
2 -1.5 -100.00 -100.0
3 -1.5 -100.00 -100.0
4 -1.5 -100.00 -100.0
5 -1.5 -100.00 -100.0
6 -1.5 -100.00 -100.0
plz fix as soon as possible.
I have used rhive.query with no problem. But when I use rhive.write.table(myTableName), the following error occurs:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: File myTableName.rhive does not exist.
[1] "myTableName"
It created an empty table with correct column names. I double checked that the class of myTableName is a data frame.
Does anyone come across this problem? Thanks in advance.
recommand use rhive.big.query function
R-UDF function should take multi type variables as arguments.
Windows setup guide is necessary. because many people use windows as their operating system. we need test and make a guide for that people.
Hive provides user to use any script in hive query. Hive uses hadoop streaming to provide it.
RHive can provide function to run R script in HQL with this Hive function.
Hi,
I had a serious problem when I try to use RHive on just recovered custler.
The problem was rhive.connect was not finished never for long long time.
I figured it out, the cause of problem is that mysql server for Hive was down.
I am not sure this problem can be solved in RHive or not but anyway, I think timeout parameter may be necessary with default value in rhive.connect() function.
Thanks.
rhive.mrapply
function (tablename, mapperFUN, reducerFUN, mapinput = NULL,
mapoutput = NULL, by = NULL, reduceinput = NULL, reduceoutput = NULL,
mapper_args = NULL, reducer_args = NULL, buffersize = -1L,
verbose = FALSE, hiveclient = rhive.defaults("hiveclient"))
.....
...
..
Plz modify function parameters especially all remains after reducerFUN to be neglectable parameters.
like "rhive.mrapply("weights", map, reduce)" to apply map for all columns.
rhive.connect makes long message when I use.
I don't know what is the mean of the message.
can you guys make a way to save these to log file to hide the hard message and can you explain what is this?
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive-0.7.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-0.20.203.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
I tried to connect to a HiveServer2 instance, and it remained stuck at the rhive.connect() call.
I was able to connect to a HiveServer1 instance.
However, the latest Cloudera Manager (4.5) has support for managing only HiveServer2.
Are there any plans for HiveServer2 support?
When I run any kind of query, the results returned by HiveClient are not tab-separated, but delimitted with the literal string "\001" resulting in improper results. Is this happening to anyone else?
reported from Haven
develop RHive aggregate function similar to R aggregate function
this function syntax is
FUN(table-name, hiveFUN, col, ..., groups)
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
- coefficient * sal
- }
rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)
Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)
Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
My rhive package version is RHive_0.0-6.
Hello there. I recently started using your RHive package, and have been mostly happy so far. One concern I have is with code of "rhive.hdfs.connect". It tries to make changes in the root HDFS directory. In most real systems, including ours, it would be disallowed. Would it make sense to make changes in /tmp instead of /? I had to change your code to make it work with our system. Thanks!
= Yakov
[root@hadoop RHive]# ant build
Buildfile: build.xml
compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone who can help me??
Need rhive.hdfs.chown, rhive.hdfs.chmod functions (not much strongly)
rhive.write.table(iris)
[1] "iris"
rhive.desc.table("iris")
col_name data_type comment
1 rowname string
2 sepallength double
3 sepalwidth double
4 petallength double
5 petalwidth double
6 species string
rhive.napply('iris', function(column1) { column1 * 10}, 'sepallength')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:40 cannot recognize input near 'CREATE' 'TABLE' 'iris_napply1328157031_table' in select clause
, errorCode:11, SQLState:42000)
plz check.
1.query : select logdata from ulog limit1
2.hive console result
{"{"body":"SEQ_ID":"20120709160001430307","HOST_NAME":"u2dlpweb01","LOG_TIME":"20120709160001","REQ_TIME":"20120709160001","LOG_KIND":"SVC","KT_USER_ID":"","KT_SVC_ID":"X","SESSION_KEY":"","FILE_ID":"X","RT_CODE":"1","DIVIDE1":"211.55.29.102","DIVIDE2":"http://gate2.ucloud.com/api/1/pcclient/pcauth","DIVIDE3":"POST","DIVIDE4":"200","DIVIDE5":"0000","DIVIDE6":"WIN","DIVIDE7":"7","DIVIDE8":"uCloud","DIVIDE9":"1.0.2","DIVIDE10":"personal","DIVIDE11":"GATEWAY","DIVIDE12":"4000","DIVIDE13":"X","DIVIDE14":"X","DIVIDE15":"X","DIVIDE16":"X","DIVIDE17":"X","DIVIDE18":"uCloud/1.0.2 WIN/7 PC personal","DIVIDE19":"X","DIVIDE20":"X","DIVIDE21":"X","DIVIDE22":"X","DIVIDE23":"X","DIVIDE24":"X","DIVIDE26":"X","DIVIDE26":"X","DIVIDE27":"X","DIVIDE28":"X","DIVIDE29":"X","DIVIDE30":"X","timestamp":1341820971766,"pri":"INFO","nanos":784667370878488,"host":"u2dlpweb01","fields":{"AckTag":"20120709-170250760+0900.784666364804488.00000018","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000:\u001F짰I","tailSrcFile":"ucloud-003.log","rolltag":"20120709-170444530+0900.524399892735665.00000020"}}"}
Time taken: 14.89 seconds
3.Rstudio rhive.query
rhive.query('select logdata from ulog limit 1')
logdata
1 NA
Warning message:
NAs introduced by coercion
4.ulog table script
CREATE EXTERNAL TABLE IF NOT EXISTS ulog (
logdata MAP<STRING,STRING>
)
PARTITIONED BY(logdt STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '='
LOCATION '/ucloud/collected/ucloudpersonal'
RHive should handle huge result from Hive.
I am getting the following error with Hive-0.9.0:11:
Error in rdata[[i]] : subscript out of bounds
traceback() reveals:
3: FUN(X[[25L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
What does this error mean? Can this be fixed? Thanks in advance!
Hi There,
Is the RHive will work with hiveserver2 which is enabled with kerberos security ?
When I try to connect to Hive I am getting following exception in my R Studio console
rhive.connect(host="hostname.domain.com/default;principal=hive/[email protected]",defaultFS="hdfs://namenode.domain.com:8020/user/me",hiveServer2=TRUE)
14/05/05 15:10:52 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "Thread-35" java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: hive/[email protected]:10000/default
at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:64)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:198)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:138)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:123)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at com.nexr.rhive.hive.DatabaseConnection.connect(DatabaseConnection.java:51)
at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.connect(HiveJdbcClient.java:330)
at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.run(HiveJdbcClient.java:322)
Error: java.lang.IllegalStateException: Not connected to hiveserver
Thanks,
Prabhu.
In RUDF and RUDAF, null data is passed to R function. but R can not handle this data because null in R is not null but NULL.
rhive use hive-trift service so error log drop into hive-trift's stdout
this cause inconvenience to user
It's hard to debug and find error poins.
We need an appropriate way to catch the remote debug messages and It should works for dedicate account session.
design distributed aggregate function similar to R aggregate function
develop RHive apply function using RUDF.
We designed two apply function depended on return type.
these functions syntax is below :
[n|s]apply(hive-tablename, FUN, col1, ...)
I have a table created using "ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';".
While rhive.query("SELECT * FROM serdetable") works,
selecting special column rhive.query("SELECT col1 FROM serdetable")
returns
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
I tried to run the same query directly from hive shell and it works - which means that the jar that contains org.openx.data.jsonserde.JsonSerDe class was loaded by the hive.
I have to mention that trying on other table created with default Regex SerDe returns the same error.
Any help would be appriciated!
I download the RHive 2.0 package,cd into the work directory,then run 'ant build' ,it faild:
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml
compile:
[mkdir] Created dir: /opt/RHive/build/classes
[javac] Compiling 21 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:44: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!stat.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:115: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!src.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:147: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] length[i] = items[i].isDir() ? fs.getContentSummary(items[i].getPath()).getLength() : items[i].getLen();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/JobManager.java:33: warning: [deprecation] getUsedMemory() in org.apache.hadoop.mapred.ClusterStatus has been deprecated
[javac] clusterStatus.getUsedMemory();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
[javac] 4 warnings
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Total time: 3 seconds
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml
compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone can help me ?
Hello All,
I installed RHive-0.7 and CDH4.4,rhive.connect() is worked.
when I run 'rhive.exportAll('scoring')',its print the Error as follow:
rhive.exportAll('scorint')
Error in RSeval(rcon, command) : remote evaluation failed
In addition: Warning messages:
1: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
2: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
I want to know what the problem ......I'm looking forward to your reply.
I am trying to use RHive with CDH4. And at rhive.connect() it gives me the following error-
WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Error in .jfindClass(as.character(class)) : class not found
Any ideas on this?
implement RHive api to connect hdfs and to read/write data to hdfs.
design distributed apply function similar to R apply function
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.