hovosgithub / hive-json-serde Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/hive-json-serde
Automatically exported from code.google.com/p/hive-json-serde
What steps will reproduce the problem?
Put a null value in JSON input data
What is the expected output? What do you see instead?
Expect to see Hive NULL in output.
Instead you get:
Failed with exception java.io.IOException:java.lang.ClassCastException:
org.json.JSONObject$Null cannot be cast to java.lang.String
Please provide any additional information below.
Patch attached.
Original issue reported on code.google.com by [email protected]
on 14 Jan 2011 at 11:16
Attachments:
Steps
1. Created an external table that points to the gzip log files
2. Select query with limit 10 or 100 returned results.
3. Now created a secondary external table that's pointing to a different
location.
4. Used Insert Overwrite clause to pull out records from a certain day/month
into a partition.
5. the select statement succeeds but file creation fails
What is the expected output? What do you see instead?
a flat table in text file format. The job fails with following error .
=====================================
ERROR="java\.lang\.RuntimeException:
org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while
processing row
{\"atype\":\"type1\",\"operation\":\"orize\",\"status\":\"Allow\",\"tme\":156\.25900000000001,\"starttime\":\"/Date(1314981024895)/\",\"remoteip\":\"x\.y\.
z\.t,
x1\.y1\.z1\.t1\",\"requesturi\":\"uri\",\"userid\":\"x\",\"eidmid\":\"y\",\"userlanguage\":\"en\",\"
usercountry\":\"US\",\"mode\":\"normal_mode\",\"servicekey\":\"key1\",\"consumerkey\":\"ke2\",\"line\":null,\"number\":null,\"d
t\":\"2011\.09\.02\"} at org\.apache\.hadoop\.hive\.ql\.exec\.ExecMapper\.map(ExecMapper\.java:161) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50) at
org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:363) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:312) at
org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row
{\"atype\":\"type1\",\"operation\":\"orize\",\"status\":\"Allow\",\"tme\":156\.25900000000001,\"starttime\":\"/Date(1314981024895)/\",\"remoteip\":\"x\.y\.
z\.t,
x1\.y1\.z1\.t1\",\"requesturi\":\"uri\",\"userid\":\"x\",\"eidmid\":\"y\",\"userlanguage\":\"en\",\"
usercountry\":\"US\",\"mode\":\"normal_mode\",\"servicekey\":\"key1\",\"consumerkey\":\"ke2\",\"line\":null,\"number\":null,\"d
t\":\"2011\.09\.02\"} at org\.apache\.hadoop\.hive\.ql\.exec\.MapOperator\.process(MapOperator\.java:483) at
org\.apache\.hadoop\.hive\.ql\.exec\.ExecMapper\.map(ExecMapper\.java:143) \.\.\. 4 more Caused by: java\.lang\.NullPointerException at
org\.apache\.hadoop\.hive\.ql\.io\.HiveIgnoreKeyTextOutputFormat$1\.write(HiveIgnoreKeyTextOutputFormat\.java:97) at
org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.processOp(FileSinkOperator\.java:606) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.SelectOperator\.processOp(SelectOperator\.java:84) at
org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at
org\.apache\.hadoop\.hive\.ql\.exec\.FilterOperator\.processOp(FilterOperator\.java:87) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.FilterOperator\.processOp(FilterOperator\.java:87) at
org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at
org\.apache\.hadoop\.hive\.ql\.exec\.TableScanOperator\.processOp(TableScanOperator\.java:77) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.MapOperator\.process(MapOperator\.java:466) \.\.\. 5 more " .
=====================================
What version of the product are you using? On what operating system?
Mac OSX, Amazon EMR/Hive, --hadoop-version 0.20 --hive-interactive
--hive-versions 0.7, hive-json-serde-0.2.jar
Please provide any additional information below.
I am also having additional issues with null values in columns. But probably
open a new issue
Original issue reported on code.google.com by [email protected]
on 4 Oct 2011 at 1:23
We have a common case where lots of json entries will not contain all of the
possible "columns". This triggers the LOG.warn and spams the logs, causing a
noticeable slowdown of the job. Since the missing columns are expected could
the log level be reduced to debug?
Original issue reported on code.google.com by johan%[email protected]
on 10 Jan 2011 at 11:43
Attachments:
We have a data stream with columns which are usually integers, but sometimes
strings. To handle this, we use a string column in Hive and convert integers to
their string representation.
Right now the SerDe requires all objects in a column to be the same type. If
you feed an integer into a column that is expecting a string you get:
Failed with exception java.io.IOException:java.lang.ClassCastException:
java.lang.Integer cannot be cast to java.lang.String
Patch attached. If the column is a string, and the JSON data is a Number,
convert it automatically instead of failing.
Original issue reported on code.google.com by [email protected]
on 14 Jan 2011 at 11:31
Attachments:
Add a SerDe property to use a different name for the Hive column name and the
JSON key name.
This helps in case you have a data stream with columns named the same as Hive
reserved words (eg 'timestamp' and 'bucket').
Patch attached. Use like so:
CREATE TABLE foo (
ts double,
bckt string,
event string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES ('rename_columns'='timestamp>ts,bucket>bckt');
Original issue reported on code.google.com by [email protected]
on 14 Jan 2011 at 11:48
Attachments:
Deserializing basic JSON objects with simple key/values is fine, but it
cannot handle nested objects or arrays. Hive can support complex objects so
this SerDe should too.
Original issue reported on code.google.com by [email protected]
on 16 Feb 2010 at 9:58
I'm experiencing some incompatibility with Json SerDe and Partitioning, here's
an example query :
CREATE TABLE clicks (
condition_set STRING,
creative STRING,
date_created STRING,
from_app STRING,
from_campaign STRING,
meta_country STRING,
meta_model STRING,
meta_os STRING,
to_app STRING,
to_campaign STRING,
uuid STRING,
`time` STRING,
`hour` STRING
)
PARTITIONED BY (`date` STRING)
ROW FORMAT
SERDE 'com.amazon.elasticmapreduce.JsonSerde'
WITH SERDEPROPERTIES ('paths'='
condition_set,
creative,
date_created,
from_app,
from_campaign,
meta_country,
meta_model,
meta_os,
to_app,
to_campaign,
uuid,
time,
date,
hour')
LOCATION '/mnt/hdfsmall/'
;
Error is : Error in metadata: org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Expected a
one-one correspondance between paths 14 and columns 13)
If I add the column in the table column's definition, the partitioning will
give me an error too.
I think I tried every possibility in Hive to go around the problem. I have no
clue anymore on how to solve this So I think it's just not possible.
Original issue reported on code.google.com by [email protected]
on 1 Dec 2011 at 7:01
One of my fields are of type bigint in Hive. When running a query over that
table I get the following exception.
I assumed that the JSON library reads the field as an Integer and when Hive
expects a Long things blow up. Tried changing the field to int in Hive, but
then I get the reversed class cast exception. Also tried making it a string in
Hive, but then it also fails with a class cast exception.
java.io.IOException: java.lang.ClassCastException: java.lang.Integer cannot be
cast to java.lang.Long
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:39)
at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:190)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:480)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:426)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:129)
... 9 more
Original issue reported on code.google.com by johan%[email protected]
on 10 Sep 2010 at 1:09
For now, this SerDe only supports reading data (deserialization). In order
to write data in JSON format, the serialization process needs to be
implemented.
Original issue reported on code.google.com by [email protected]
on 16 Feb 2010 at 9:57
What steps will reproduce the problem?
1. Checkout the project
2. Build with "ant build"
3. Observe the created artifact is build/hive-json-serde.jar
What is the expected output? What do you see instead?
The expected output would have a version number, so it can be referenced with
ivy.
What version of the product are you using? On what operating system?
Unknown version :)
Please provide any additional information below.
One of my coworkers made a few changes to this project at:
https://github.com/johanoskarsson/hive-json-serde
I pinged him about adding versioning, and got pointed at this project since
we'd like to get all changes merged back here. We have this working
experimentally but would like to use in production, and need to push the jar
into ivy so we can build, which is when versioning came up. I see there are
some versioned jar's published in the downloads section here, but we'd like to
build from source. Any objection to building versioned jar's?
Original issue reported on code.google.com by [email protected]
on 6 Jan 2011 at 7:05
What steps will reproduce the problem?
1. Try to create a table with a JSON file that has an upper case field. For
example, {"TEST":1,"case":2}, create external table test1 (TEST int, case int).
2. Doing select * test1 will give you NULL for TEST and 2 for case
What is the expected output? What do you see instead?
I would expect it to return 1 for TEST rather than NULL
Original issue reported on code.google.com by [email protected]
on 8 Oct 2010 at 10:39
There is place holder code for JUnit tests, but none of the tests are
implemented.
Get this thing tested!
Original issue reported on code.google.com by [email protected]
on 17 Feb 2010 at 12:19
What steps will reproduce the problem?
1. create any table using SerDe (for my it was the one from "Getting Started"
section)
2. execute in hive: select count(1) from table;
What is the expected output? What do you see instead?
I expect to see number of records. Instead I got error:
hive> select count(1) from jsontest;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201205071112_0089, Tracking URL = [...]
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=[...] -kill
job_201205071112_0089
2012-06-06 09:28:49,506 Stage-1 map = 0%, reduce = 0%
2012-06-06 09:29:26,655 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201205071112_0089 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
What version of the product are you using? On what operating system?
Hadoop: Hadoop 0.20.2-cdh3u3
Hive: 0.7.0-cdh3u0
OS: Ubuntu Server
Please provide any additional information below.
Full stack trace from jobtracker can be found in attached file.
"select * from jsontest" works fine.
Original issue reported on code.google.com by [email protected]
on 6 Jun 2012 at 7:39
Attachments:
What steps will reproduce the problem?
1. Use this serde with Hive 0.10.0
What is the expected output? What do you see instead?
I'd like it to work, but naturally it doesn't since Hive APIs have changed
slightly for 0.10.0.
What version of the product are you using? On what operating system?
Hive 0.10.0
I've attached a simple patch that allows me to use this serde in 0.10.0.
Sorry, I'm a git person so the patch came from git. But the diff is simple
enough to be applied manually if necessary. The API differences are largely
cosmetic and the new method added is a no-op since implementation of that
method is optional (and ignored even for some of the native Hive serdes
themselves).
Note, this patch omits the upgrading of the lib/*.jar files. I upgraded the
following jar files (note I'm a Cloudera user, so I'm using CDH 4.2.0 jars):
* lib/hadoop-0.20.1-core.jar -> lib/hadoop-common-2.0.0-cdh4.2.0.jar
* lib/hive_serde.jar -> lib/hive-serde-0.10.0-cdh4.2.0.jar
Original issue reported on code.google.com by [email protected]
on 3 May 2013 at 9:05
Attachments:
This is mostly for my convenience. Would it be possible to add the directory
"build" to svn:ignore? Would make it easier to work with the project and svn.
Original issue reported on code.google.com by johan%[email protected]
on 10 Jan 2011 at 11:45
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.