Coder Social home page Coder Social logo

cascading.hive's People

Contributors

4ndypanda avatar hellertime avatar zwang-duke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cascading.hive's Issues

cascading-hive does not write yo hcatalog in parquet format.

Trying to write data through hcatalog in parquet format I see this exception from hadoop logs [below]. Could you please point out what is going on wrong ? I use cascading-hive of latest master version (just built when you said about parquet support). Code which I use is extrimely simple - just read data from one table and put it to another with parquet aboard.

Thanks in advace and best regards,
Boris

Error: java.lang.RuntimeException: Should never be used at parquet.hive.DeprecatedParquetOutputFormat.getRecordWriter(DeprecatedParquetOutputFormat.java:74) at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.init(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:160) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:155) Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

Cannot use byte[] in RCFile scheme for Hive binary type

The Java type for the binary type in Hive is supposed to be byte[] according to the object inspector. However, when providing a byte[] object in the tuple, the RCFile scheme first casts the object to a string and then to an array of bytes when sinking the field. This is an issue since the toString() call for a byte[] is not the actual contents of byte[].

Is there another way around this issue or do we have to make an exception for a byte[] object in the RCFile code?

unclear where jars are deployed

I was wondering if you have already deployed jars of this project and if so, in which repo they are. I searched conjars, but I did not find it. Thx!

Problems reading from a partitioned Hive table

I am using the HCatTap to read from a partitioned Hive table. The table is partitioned into this pattern of paths:

hdfs://nameservice1/datasets/nowtv/mpp/mpp_order_report/p=<partition>/

where every directory contains a file with name MPP-CONSOLIDATED-OrderReport-.osv giving the following example path

hdfs://nameservice1/datasets/nowtv/mpp/mpp_order_report/p=20100501MPP-CONSOLIDATED-OrderReport-.osv

I am getting this error

Caused by: java.io.IOException: Not a file: hdfs://nameservice1/datasets/nowtv/mpp/mpp_order_report/p=20100501
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:212)
    at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:200)
    at cascading.tap.hadoop.io.MultiInputFormat.getSplits(MultiInputFormat.java:134)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1106)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1098)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:922)
    at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:105)
    at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:196)
    at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
    at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
    at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

Looking at the code I actually see that the HCatTap gets the location for every partition and passes to the MultiSourceTap but the actual location of the source is under the partition directory

Create ORC file with Partition

Hi Team,
We are looking for creating an ORC file with Partition. But it looks like the partitioning is not supported while creating an ORC file. Please advise.

Best Regards
Manoj K Nair

RCFileLocalScheme to run on local

Hi there,

First, thanks for the good work.

I wanted to add support for RCFile on local for scalding.
So basically, I wanted to create a class like this RCFileLocalScheme extends Scheme <...>
so that we can override the localScheme of the RCFile class in the ColumnarSerDeSource.

Would that interest you?

MAP datatype support in ORCFile

Hi Team,
We are looking for reading and writing ORC files which has a MAP field ( MAP). We found that the current version is not supporting MAP. Is there anyway to support MAP? If not, is there any plan to support it in near future? Please advise.

Thanks in advance.
Best Regards
Manoj K Nair

Support to accept TezConfiguration in ORCFile

Hi

We were testing PartitionTap for TEZ (our input/output are ORC files ) using cascading 3.0.0-wip-63 libs,Tez -0.5.3 and Cascading.hive 0.0.4 snapshot jar and encountered the following ClassCastException,

Caused by: java.lang.ClassCastException: org.apache.tez.dag.api.TezConfiguration cannot be cast to org.apache.hadoop.mapred.JobConf
at cascading.hive.ORCFile.sinkConfInit(ORCFile.java:72)
at cascading.tap.Tap.sinkConfInit(Tap.java:206)
at cascading.tap.hadoop.Hfs.sinkConfInit(Hfs.java:399)
at cascading.tap.hadoop.Hfs.sinkConfInit(Hfs.java:106)
at cascading.tap.hadoop.io.TapOutputCollector.initialize(TapOutputCollector.java:96)
at cascading.tap.hadoop.io.TapOutputCollector.(TapOutputCollector.java:91)
at cascading.tap.hadoop.PartitionTap.createTupleEntrySchemeCollector(PartitionTap.java:159)
at cascading.tap.partition.BasePartitionTap$PartitionCollector.getCollector(BasePartitionTap.java:130)
at cascading.tap.partition.BasePartitionTap$PartitionCollector.collect(BasePartitionTap.java:228)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:95)
at cascading.flow.stream.element.SinkStage.receive(SinkStage.java:98)

in the function,
public void sinkConfInit(FlowProcess flowProcess, Tap<JobConf, RecordReader, OutputCollector> tap, JobConf conf) of ORCFile of cascading.hive.

It seems that ORCFile doesnt have the support to receive TezConfiguration. Can you please check this?

Thanks.

Using cascading.hive with Scalding

I am making use of cascading.hive with scalding. To do so I created some new classes derived from the SchemedSource object in scalding.

They may be of use to others, so I wanted to share them.

https://gist.github.com/hellertime/10020639

I'm happy to make a pull request with the changes, but wasn't sure how best to integrate the code, so I would appreciate some feedback in that regard before I begin.

-Chris

Code License

Hi,

The work here looks very interesting, however, I'm a little unsure as to the license under which it's release. Could you please clarify?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.