nexr / rhive Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 63.0 3.72 MB

RHive is an R extension facilitating distributed computing via Apache Hive.

Home Page: http://nexr.github.io/RHive

R 50.49% Java 49.51%

rhive's People

Contributors

Stargazers

Watchers

Forkers

echiu64 euriion adityashanbhag qiubo zwqjsj0404 bradleyjs miere03 terrymeng hengqujushi prakratiagrawal ashwanthkumar patrickliu95 bluemir wewang y-lan strategist922 bsspirit verdit apatil chromeu askyer karamcse maple-wang kwangnam prabhunkl andrewzhang1 smartdw lisun7170 mykook kenahoo sjehan carolssnz alexvorobiev rainstar82 qlikviewmlr kimmyoungsoo2 wangbin83-gmail-com datasciencemom tinosm phoenixhadoop sch8312 luojiahuli feilx ashishchandan tahirh rickatozline diyaaang minyk sugia0805 calculus-ask wypb coreysk8wow yunxingwang helianthuslulu ryu3065 bertcarnell jiyulongxu antoniosecchi leeseongsub cristianpachacama amulmgr henrynkoh

rhive's Issues

rhive.connect Error but it still works?

I'm trying to work with RHive on Amazon EMR and I'm getting an error with rhive.connect, but the connection seems to be working:

{code}

library(RHive)
Loading required package: rJava
Loading required package: Rserve
This is RHive 0.0-6. For overview type ‘?RHive’.
HIVE_HOME=/home/hadoop/hive
call rhive.init() because HIVE_HOME is set.
rhive.init()
rhive.connect(port=10003)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2012-10-16 21:15:43,446 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:121)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:225)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:190)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1330)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1348)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:246)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,497 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,517 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:242)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,546 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:760)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:712)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:355)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:211)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,563 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3079)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:598)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:548)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:529)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:229)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
2012-10-16 21:15:43,593 INFO [LeaseChecker] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.renewLease(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:1235)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1247)
at java.lang.Thread.run(Thread.java:662)
2012-10-16 21:15:43,609 INFO [Thread-7] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3338)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3202)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2415)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2656)
2012-10-16 21:15:43,656 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,659 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,664 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2692)
2012-10-16 21:15:43,668 INFO [DataStreamer for file /rhive/lib/rhive_udf.jar block blk_-6216429567430606302_1405] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2693)
2012-10-16 21:15:43,681 INFO [main] metrics.MetricsSaver(?): add metric {}
java.lang.ArrayIndexOutOfBoundsException: -1
at amazon.emr.metrics.MetricsUtil.getProcessMainClassName(Unknown Source)
at amazon.emr.metrics.MetricsSaver.(Unknown Source)
at amazon.emr.metrics.MetricsSaver.ensureSingleton(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addInternal(Unknown Source)
at amazon.emr.metrics.MetricsSaver.addValue(Unknown Source)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:230)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3725)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3640)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:96)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:50)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at RJavaTools.invokeMethod(RJavaTools.java:386)
{code}

rhive.load.table('table') command throwing following error:

Error in 1:listStatus$length : argument of length 0

though i can run select query and rhive.desc.table('table') commands

Connection error with 2.0.0

I recently install Hive(0.11.0) and Hadoop(1.2.1) on ubuntu(13.10 x64). R(3.0.1 "Good Sport") is installed correctly and working well with other packages. I also installed Rhive(2.0.0) package, when I tried connecting to hive from R it is showing error message as shown below. please help me on the same.

rhive.query function dose not handle hive's "CASE ~ WHEN ~ END" syntax properly

issue case:
aaa = rhive.big.query("select *,
CASE
WHEN petallength < 2.45 THEN 'first'
WHEN petallength >= 2.45 THEN 'second'
END as separation
from iris_3")

expected output:
1 5.1 3.5 1.4 0.2 setosa first
2 4.9 3.0 1.4 0.2 setosa first
.
.
50 5.0 3.3 1.4 0.2 setosa first
51 7.0 3.2 4.7 1.4 versicolor second
.
.

Need verbose mode when fetching big result

Need progress bar or verbose message, in order to know ETA when we fetch big data from Hive query result using "rhive.query" function. Sometimes, the "rhive.query" takes too long time. need a sort of indicator.

RUDF rhive.query failed due to serialisation exception

Hi,

We are trying to execute the exemples (https://github.com/nexr/RHive/wiki/RHive-example-code).
When trying to execute the query, our jobs failed with a KryoException.
It seems that an UDF instance is serialized even it contained converters that are not designed for serialization (no default constructor).

We are using Hadoop 2.2 and hive 0.12 (Horton distribution).

Are those examples still correct ?
Do you have an idea of the cause of our error ?

Regards,
Philippe

The exemple :
coefficient <- 1.1 scoring <- function(sal) { coefficient * sal } rhive.assign('coefficient',coefficient) rhive.assign('scoring',scoring) rhive.exportAll(‘scoring’) rhive.query("select R('scoring',col_sal,0.0) from emp")

Exception :
Error: java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter Serialization trace: converters (com.nexr.rhive.hive.udf.RUDF) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:314) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:263) at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383) at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:376) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:552) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter Serialization trace: converters (com.nexr.rhive.hive.udf.RUDF) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097) at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109) at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672) at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:810) at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:720) at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:733) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:287) ... 13 more

support custom environment using mrapply

In mrapply, mapapply and reduceapply, user can use custom environment.
user can use this custom environment in mapper and reduce function.

Checking Hadoop slaves shoud be added

Hi,
I met an error when I build a package which have dependency on RHive 0.0-3 as like below

installing source package ‘clog’ ...
** R
** data
** inst
** preparing package for lazy loading
Warning in file(file, "rt") :
cannot open file '/srv/clog/hadoop-0.20.203.0/conf/slaves': No such file or directory
Error : .onLoad failed in loadNamespace() for 'RHive', details:
call: .jnew("org/apache/hadoop/conf/Configuration")
error: java.lang.ClassNotFoundException
Error : package ‘RHive’ could not be loaded
ERROR: lazy loading failed for package ‘clog’

removing ‘/Users/aidenhong/Documents/workspace/cog_git/src/main/R/cog.Rcheck/clog’

Above error is occurred when I run 'R CMD check packagefile' and I can't finish building package.
I thinks it's cause by returning error code when RHive is loaded by library.

version 2.0.0 Error at init

Hi, this is what I get after installing version 2.0.0

rhive.init()
rhive.env()
hadoop home: /usr/local/hadoop
hive home: /usr/local/hive

rhive.connect(host='master',port='10000')
Error: class not found

Please consider that version 0.0.7 (the previous version I used) worked just fine. The environment I got from 0.0.7 was

Hive Home Directory : /usr/local/hive
Hadoop Home Directory : /usr/local/hadoop
Hadoop Conf Directory :
Default RServe List
master slave1 slave2 slave3 slave4 slave5 slave6 slave7 slave8 slave9 slave10master : RHIVE_DATA = /home/hduser/RData/
slave1 : RHIVE_DATA = /home/hduser/RData
slave2 : RHIVE_DATA = /home/hduser/RData
slave3 : RHIVE_DATA = /home/hduser/RData
slave4 : RHIVE_DATA = /home/hduser/RData
slave5 : RHIVE_DATA = /home/hduser/RData
slave6 : RHIVE_DATA = /home/hduser/RData
slave7 : RHIVE_DATA = /home/hduser/RData
slave8 : RHIVE_DATA = /home/hduser/RData
slave9 : RHIVE_DATA = /home/hduser/RData
slave10 : RHIVE_DATA = /home/hduser/RData

Connected HiveServer : master:10000

Remove Rserver

rhive.query function show warning message when query result is too big

recommanding use rhive.big.query function

RHive query results transfer extremely slow

Hello,

When I do a simple
rhive.query("select * from X limit 10000")

it takes 90s to answer once the query is completed on the hiveserver (OK displayed on the console).

It increases linearily with data size, always exactly 9 ms per line, it does not depend on the line length.

It is several order of magnitude slower than any other kind of data transfer between R and whatever. My guess is that there is some kind of timeout somewhere.

rhive.query

rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")
Error in rdata[[i]] : subscript out of bounds
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion

traceback()
3: FUN(X[[1L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})
1: rhive.query("SELECT * FROM listvirtualmachinesresponse_virtualmachine limit 6")

please check.

rhive.write.table

rhive.write.table(editweights)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:575 mismatched input '0' expecting Identifier near ',' in column specification
, errorCode:11, SQLState:42000)

head(editweights)
V1 a A b B c C d D e E f F g G h H i I j J
1 0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1
2 -1 0.0 -0.3 -2.0 -3.0 -1.5 -3.0 -1.0 -3.0 -1.0 -1 -1.5 -3.0 -2.0 -3.0 -2.0 -3.0 -2.0 -3 -2 -3
3 -1 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -3
4 -1 -2.0 -3.0 0.0 -0.3 -1.0 -3.0 -1.5 -3.0 -2.0 -2 -1.0 -3.0 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1 -3
5 -1 -3.0 -3.0 -0.3 0.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -0.5 -0.5 -0.5 -0.5 -3.0 -3 -3 -3
6 -1 -1.5 -3.0 -1.0 -3.0 0.0 -0.3 -0.5 -0.5 -0.5 -1 -0.5 -0.5 -1.0 -3.0 -1.5 -3.0 -2.0 -3 -2 -3
k K l L m M n N o O p P q Q r R s S t T u U v
1 -1.0 -1 -1 -1 -1 -1 -1.0 -1.0 -1 -1 -1 -1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1 -1.0
2 -2.0 -3 -2 -3 -2 -3 -2.0 -3.0 -2 -2 -2 -2 -0.5 -0.5 -1.5 -1.5 -0.5 -0.5 -2.0 -2.0 -2 -3 -2.0
3 -3.0 -3 -3 -3 -3 -3 -3.0 -3.0 -3 -3 -3 -3 -0.5 -0.5 -3.0 -3.0 -0.5 -0.5 -3.0 -3.0 -3 -3 -3.0
4 -1.5 -3 -2 -3 -1 -3 -0.5 -0.5 -2 -2 -2 -2 -2.0 -2.0 -1.5 -1.5 -2.0 -3.0 -1.0 -1.0 -1 -3 -0.5
5 -3.0 -3 -3 -3 -3 -3 -0.5 -0.5 -3 -3 -3 -3 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3 -0.5
6 -2.0 -3 -2 -3 -2 -3 -1.5 -3.0 -2 -2 -2 -2 -2.0 -2.0 -1.0 -1.0 -1.0 -3.0 -0.5 -0.5 -2 -3 -0.5
V w W x X y Y z Z 0 1 2 3 4 5 6 7 8 9
1 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1 -1.0 -1.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0
2 -3.0 -0.5 -0.5 -1.0 -3.0 -2.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
3 -3.0 -0.5 -0.5 -3.0 -3.0 -3.0 -3 -0.5 -0.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
4 -0.5 -2.0 -2.0 -1.5 -3.0 -1.0 -3 -2.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
5 -0.5 -3.0 -3.0 -3.0 -3.0 -3.0 -3 -3.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
6 -0.5 -0.5 -0.5 -0.5 -0.5 -1.5 -3 -1.0 -3.0 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5
_ $ #
1 -0.5 -0.15 -0.2
2 -1.5 -100.00 -100.0
3 -1.5 -100.00 -100.0
4 -1.5 -100.00 -100.0
5 -1.5 -100.00 -100.0
6 -1.5 -100.00 -100.0

plz fix as soon as possible.

rhive.write.table error

I have used rhive.query with no problem. But when I use rhive.write.table(myTableName), the following error occurs:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.io.FileNotFoundException: File myTableName.rhive does not exist.
[1] "myTableName"

It created an empty table with correct column names. I double checked that the class of myTableName is a data frame.

Does anyone come across this problem? Thanks in advance.

rhive.query function show warning message when query result is too big

recommand use rhive.big.query function

support multi type arguments of R-UDF function

R-UDF function should take multi type variables as arguments.

Windows setup guide is necessary

Windows setup guide is necessary. because many people use windows as their operating system. we need test and make a guide for that people.

export function can be overwritten by other RHive users when they use same name

support R script with mapreduce script function of HIVE

Hive provides user to use any script in hive query. Hive uses hadoop streaming to provide it.
RHive can provide function to run R script in HQL with this Hive function.

Timeout checking in rhive.connect()

Hi,
I had a serious problem when I try to use RHive on just recovered custler.
The problem was rhive.connect was not finished never for long long time.
I figured it out, the cause of problem is that mysql server for Hive was down.
I am not sure this problem can be solved in RHive or not but anyway, I think timeout parameter may be necessary with default value in rhive.connect() function.

Thanks.

rhive.mrapply

rhive.mrapply
function (tablename, mapperFUN, reducerFUN, mapinput = NULL,
mapoutput = NULL, by = NULL, reduceinput = NULL, reduceoutput = NULL,
mapper_args = NULL, reducer_args = NULL, buffersize = -1L,
verbose = FALSE, hiveclient = rhive.defaults("hiveclient"))
.....
...
..

Plz modify function parameters especially all remains after reducerFUN to be neglectable parameters.

like "rhive.mrapply("weights", map, reduce)" to apply map for all columns.

rhive.connect makes long log messages

rhive.connect makes long message when I use.
I don't know what is the mean of the message.
can you guys make a way to save these to log file to hide the hard message and can you explain what is this?
rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hive-0.7.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-0.20.203.0/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

RHive doesn't seem to support HiveServer2

I tried to connect to a HiveServer2 instance, and it remained stuck at the rhive.connect() call.

I was able to connect to a HiveServer1 instance.

However, the latest Cloudera Manager (4.5) has support for managing only HiveServer2.

Are there any plans for HiveServer2 support?

Results not split correctly?

When I run any kind of query, the results returned by HiveClient are not tab-separated, but delimitted with the literal string "\001" resulting in improper results. Is this happening to anyone else?

rhive.big.query function cause a number overflow against what rhive.query do.

reported from Haven

implement aggregate function

develop RHive aggregate function similar to R aggregate function

this function syntax is

FUN(table-name, hiveFUN, col, ..., groups)

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {

coefficient * sal

}
rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!

use RUDF and export R Object with rhive error!

rhive.connect()
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hive/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
coefficient <- 1.1
scoring <- function(sal) {
coefficient * sal
}

rhive.assign('coefficient', coefficient)
[1] TRUE
rhive.assign('scoring', scoring)
[1] TRUE
rhive.exportAll('scoring')
[1] TRUE
rhive.query("select * from iris limit 10")
rhive_row sepallength sepalwidth petallength petalwidth species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
7 7 4.6 3.4 1.4 0.3 setosa
8 8 5.0 3.4 1.5 0.2 setosa
9 9 4.4 2.9 1.4 0.2 setosa
10 10 4.9 3.1 1.5 0.1 setosa
test2 <- rhive.query("select rhive_row, sepallength, sepalwidth from iris limit 10")
test2
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
test2 <- rhive.write.table(test2)
rhive.desc.table(test2)
col_name data_type comment
1 rhive_row string
2 sepallength double
3 sepalwidth double
rhive.query("select * from test2")
rhive_row sepallength sepalwidth
1 1 5.1 3.5
2 2 4.9 3.0
3 3 4.7 3.2
4 4 4.6 3.1
5 5 5.0 3.6
6 6 5.4 3.9
7 7 4.6 3.4
8 8 5.0 3.4
9 9 4.4 2.9
10 10 4.9 3.1
rhive.query("select R('scoring', sepallength, 0.0) from test2")
error .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:9, SQLState:08S01)

Anyone face the same problem? And who can give me some advise to solve it? Thanks very much!
My rhive package version is RHive_0.0-6.

Unnecessary attempt to modify HDFS root directory

Hello there. I recently started using your RHive package, and have been mostly happy so far. One concern I have is with code of "rhive.hdfs.connect". It tries to make changes in the root HDFS directory. In most real systems, including ours, it would be disallowed. Would it make sense to make changes in /tmp instead of /? I had to change your code to make it work with our system. Thanks!

= Yakov

ant build Faild

[root@hadoop RHive]# ant build
Buildfile: build.xml

compile:
[javac] Compiling 4 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Is there anyone who can help me??

rhive.hdfs.chmod

Need rhive.hdfs.chown, rhive.hdfs.chmod functions (not much strongly)

rhive.query function dose not handle "hive map table" properly

rhive.napply

rhive.write.table(iris)
[1] "iris"
rhive.desc.table("iris")
col_name data_type comment
1 rowname string
2 sepallength double
3 sepalwidth double
4 petallength double
5 petalwidth double
6 species string
rhive.napply('iris', function(column1) { column1 * 10}, 'sepallength')
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
HiveServerException(message:Query returned non-zero code: 11, cause: FAILED: Parse Error: line 1:40 cannot recognize input near 'CREATE' 'TABLE' 'iris_napply1328157031_table' in select clause
, errorCode:11, SQLState:42000)

plz check.

rhive.query function dose not handle "hive map table" properly

1.query : select logdata from ulog limit1

2.hive console result
{"{"body":"SEQ_ID":"20120709160001430307","HOST_NAME":"u2dlpweb01","LOG_TIME":"20120709160001","REQ_TIME":"20120709160001","LOG_KIND":"SVC","KT_USER_ID":"","KT_SVC_ID":"X","SESSION_KEY":"","FILE_ID":"X","RT_CODE":"1","DIVIDE1":"211.55.29.102","DIVIDE2":"http://gate2.ucloud.com/api/1/pcclient/pcauth","DIVIDE3":"POST","DIVIDE4":"200","DIVIDE5":"0000","DIVIDE6":"WIN","DIVIDE7":"7","DIVIDE8":"uCloud","DIVIDE9":"1.0.2","DIVIDE10":"personal","DIVIDE11":"GATEWAY","DIVIDE12":"4000","DIVIDE13":"X","DIVIDE14":"X","DIVIDE15":"X","DIVIDE16":"X","DIVIDE17":"X","DIVIDE18":"uCloud/1.0.2 WIN/7 PC personal","DIVIDE19":"X","DIVIDE20":"X","DIVIDE21":"X","DIVIDE22":"X","DIVIDE23":"X","DIVIDE24":"X","DIVIDE26":"X","DIVIDE26":"X","DIVIDE27":"X","DIVIDE28":"X","DIVIDE29":"X","DIVIDE30":"X","timestamp":1341820971766,"pri":"INFO","nanos":784667370878488,"host":"u2dlpweb01","fields":{"AckTag":"20120709-170250760+0900.784666364804488.00000018","AckType":"msg","AckChecksum":"\u0000\u0000\u0000\u0000:\u001F짰I","tailSrcFile":"ucloud-003.log","rolltag":"20120709-170444530+0900.524399892735665.00000020"}}"}
Time taken: 14.89 seconds

3.Rstudio rhive.query
rhive.query('select logdata from ulog limit 1')
logdata
1 NA
Warning message:
NAs introduced by coercion

4.ulog table script
CREATE EXTERNAL TABLE IF NOT EXISTS ulog (
logdata MAP<STRING,STRING>
)
PARTITIONED BY(logdt STRING)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '='
LOCATION '/ucloud/collected/ucloudpersonal'

handle big result from Hive

RHive should handle huge result from Hive.

rhive.query resulting in subscript out of bounds

I am getting the following error with Hive-0.9.0:11:
Error in rdata[[i]] : subscript out of bounds

traceback() reveals:

3: FUN(X[[25L]], ...)
2: lapply(list, function(item) {
item <- .jcast(item, new.class = "java/lang/String", check = FALSE,
convert.array = FALSE)
record <- strsplit(item$toString(), "\t")
for (i in seq.int(record[[1]])) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], as.numeric(record[[1]][i]))
}
else {
rdata[[i]] <<- c(rdata[[i]], record[[1]][i])
}
}
if (length(rdata) > length(record[[1]])) {
gap <- length(rdata) - length(record[[1]])
for (i in seq.int(length(record[[1]]) + 1, length(rdata))) {
if (is.numeric(rdata[[i]])) {
rdata[[i]] <<- c(rdata[[i]], NA)
}
else {
rdata[[i]] <<- c(rdata[[i]], "")
}
}
}
})

What does this error mean? Can this be fixed? Thanks in advance!

RHive with hiveserver2 + kerberos

Hi There,

Is the RHive will work with hiveserver2 which is enabled with kerberos security ?

When I try to connect to Hive I am getting following exception in my R Studio console

rhive.connect(host="hostname.domain.com/default;principal=hive/[email protected]",defaultFS="hdfs://namenode.domain.com:8020/user/me",hiveServer2=TRUE)
14/05/05 15:10:52 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "Thread-35" java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: hive/[email protected]:10000/default
        at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:64)
        at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:198)
        at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:138)
        at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:123)
        at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:185)
        at com.nexr.rhive.hive.DatabaseConnection.connect(DatabaseConnection.java:51)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.connect(HiveJdbcClient.java:330)
        at com.nexr.rhive.hive.HiveJdbcClient$HiveJdbcConnector.run(HiveJdbcClient.java:322)
Error: java.lang.IllegalStateException: Not connected to hiveserver

Thanks,
Prabhu.

convert java null to R NULL

In RUDF and RUDAF, null data is passed to R function. but R can not handle this data because null in R is not null but NULL.

rhive omit HIVE error log

rhive use hive-trift service so error log drop into hive-trift's stdout
this cause inconvenience to user

Need a way to get debug message from remote node

It's hard to debug and find error poins.
We need an appropriate way to catch the remote debug messages and It should works for dedicate account session.

design aggregate function

design distributed aggregate function similar to R aggregate function

implement apply function

develop RHive apply function using RUDF.
We designed two apply function depended on return type.

napply : numeric type
sapply : string type

these functions syntax is below :
[n|s]apply(hive-tablename, FUN, col1, ...)

quering SerDe tables

I have a table created using "ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';".
While rhive.query("SELECT * FROM serdetable") works,
selecting special column rhive.query("SELECT col1 FROM serdetable")
returns
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
I tried to run the same query directly from hive shell and it works - which means that the jar that contains org.openx.data.jsonserde.JsonSerDe class was loaded by the hive.
I have to mention that trying on other table created with default Regex SerDe returns the same error.
Any help would be appriciated!

RHive 2.0 Compile Failed!

I download the RHive 2.0 package,cd into the work directory,then run 'ant build' ,it faild:
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

compile:
[mkdir] Created dir: /opt/RHive/build/classes
[javac] Compiling 21 source files to /opt/RHive/build/classes
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:44: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!stat.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:115: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] if (!src.isDir()) {
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/FSUtils.java:147: warning: [deprecation] isDir() in org.apache.hadoop.fs.FileStatus has been deprecated
[javac] length[i] = items[i].isDir() ? fs.getContentSummary(items[i].getPath()).getLength() : items[i].getLen();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/hadoop/JobManager.java:33: warning: [deprecation] getUsedMemory() in org.apache.hadoop.mapred.ClusterStatus has been deprecated
[javac] clusterStatus.getUsedMemory();
[javac] ^
[javac] /opt/RHive/RHive/inst/javasrc/src/com/nexr/rhive/util/DFUtils.java:50: cannot find symbol
[javac] symbol : method getServerAddress(org.apache.hadoop.conf.Configuration,java.lang.String,java.lang.String,java.lang.String)
[javac] location: class org.apache.hadoop.net.NetUtils
[javac] return NetUtils.getServerAddress(getConf(), "dfs.info.bindAddress", "dfs.info.port",
[javac] ^
[javac] 1 error
[javac] 4 warnings

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.

Total time: 3 seconds
[root@192-168-11-59 RHive]# ant build
Buildfile: build.xml

BUILD FAILED
/opt/RHive/build.xml:37: Compile failed; see the compiler error output for details.
Is there anyone can help me ?

rhive.exportAll('scoring') Error

Hello All,
I installed RHive-0.7 and CDH4.4,rhive.connect() is worked.
when I run 'rhive.exportAll('scoring')',its print the Error as follow:

rhive.exportAll('scorint')
Error in RSeval(rcon, command) : remote evaluation failed
In addition: Warning messages:
1: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2
2: In readBin(c, "int", 4, signed = FALSE, endian = "little") :
'signed = FALSE' is only valid for integers of sizes 1 and 2

I want to know what the problem ......I'm looking forward to your reply.

RHive not working with CDH4

I am trying to use RHive with CDH4. And at rhive.connect() it gives me the following error-

WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Error in .jfindClass(as.character(class)) : class not found

Any ideas on this?