evidens / json2csv Goto Github PK
View Code? Open in Web Editor NEWConverts JSON files to CSV (pulling data from nested structures). Useful for Mongo data
License: MIT License
Converts JSON files to CSV (pulling data from nested structures). Useful for Mongo data
License: MIT License
I am trying to convert JSON data into a CSV in Python3, but it no longer works with this script, giving me different errors. Anyone know how to fix for Python 3? Thanks.
Below is my JSON data:
{
"fruit": [
{
"name": "Apple",
"binomial name": "Malus domestica",
"major_producers": [
"China",
"United States",
"Turkey"
],
"nutrition": {
"carbohydrates": "13.81g",
"fat": "0.17g",
"protein": "0.26g"
}
},
{
"name": "Orange",
"binomial name": "Citrus x sinensis",
"major_producers": [
"Brazil",
"United States",
"India"
],
"nutrition": {
"carbohydrates": "11.75g",
"fat": "0.12g",
"protein": "0.94g"
}
},
{
"name": "Mango",
"binomial name": "Mangifera indica",
"major_producers": [
"India",
"China",
"Thailand"
],
"nutrition": {
"carbohydrates": "15g",
"fat": "0.38g",
"protein": "0.82g"
}
}
]
}
Although the outline is specified as an array, the row doesn't appear to preserve the intended order
Original message from @Manoj-Nagarajan:
Hello... i was able to convert json file to a csv file using the steps in the read me file. Thanks a lot for sharing the same. When i convert the json to csv, the output is stored in alphabetical order. Is there any way, which can convert to same order as the soure...
Example :
Source : id, name ,b1_test, url, a1_test . Result is stored in below format in csv file.
Result : a1_test, b1_test, id, name , urlBut i want the output csv in below format.
Result : id, name ,b1_test, url, a1_test.
Can you suggest if the same is possible . Thank you in advance...
Traceback (most recent call last):
File "json2csv.py", line 155, in
loader.write_csv(filename=outfile, make_strings=args.strings)
File "json2csv.py", line 105, in write_csv
writer.writerows(out)
File "/usr/lib/python2.7/csv.py", line 158, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)
I am using python with the workfront api to return JSON objects (theoretically) like the sample below and am trying to create an outline from either the python results object of from a file that I wrote the results to but I haven't been able to sucessfully generate a results file.
Using this command 👍 python gen_outline.py --collection nodes C:\Python27\resultsj.txt -o C:\Python27\testfile_outlinej.json
produces this
Traceback (most recent call last):
File "gen_outline.py", line 85, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 26, in coll_iter
for obj in data[coll_key]:
KeyError: 'nodes'
Using this command produces that attached outline file, that when run with json2csv produces a file of headers and 'no data'.
python gen_outline.py --each-line C:\Python27\resultsj.txt -o C:\Python27\testfile_outline.json
Your suggestions would be appreciated. Thanks, Victor
sample json line:
[{u'status': u'CPL', u'name': u'H557 Nano Valve Operator', u'plannedStartDate': u'2014-04-01T08:00:00:000-0400', u'objCode': u'PROJ', u'priority': 0, u'percentComplete': 100.0, u'projectedCompletionDate': u'2015-06-01T07:42:38:615-0400', u'ID': u'52ceb3a1001da497df97a63c7e99d6c8', u'plannedCompletionDate': u'2015-06-01T17:00:00:000-0400'} ]
I am trying to use json2csv to convert output from mailchimp api, but I am getting an error
python gen_outline.py --collection nodes text.json -o templatejsonoutput.json
ValueError: Expecting property name: line 1 column 2 (char 1)
The original file text.json has a structure like that:
{'members': [{'id': 'xxxxx', 'email_address': '[email protected]', 'unique_email_id': 'xxx', 'email_type': 'html', 'status': 'unsubscribed', 'unsubscribe_reason': 'N/A (Unsubscribed by admin)', 'merge_fields': {'FNAME': '', 'LNAME': '', 'MMERGE3': ''}, 'stats':
Do you know why I am getting this error? Thank you very much for any help.
Hi - I am running some large json files (some as large as 2.5GB) and am getting a windows memory error popup like this:
"
Close programs to prevent information loss
Your computer is low on memory. Save your files and close these programs:
python.exe
Windows will only close enough programs to restore needed memory.
"
json2csv stops at that point and no CSV is created. Is there a way to have python continue parsing, even against memory issues?
Note that I keep all other programs closed while running the program.
Thanks,
SJB
I wonder if there is some way or intention to handle properties that are array.
To convert this:
[
{
"autor": "Some autor",
"data_protocolo": "26/02/1953",
"data_publicacao_dpc": "26/02/1953",
"descricao": "Description",
"proposicao": "PL 2/1953",
"fases": [
{
"acao": "Prosseguir",
"data_envio": "11/09/1953 00:00:00",
"data_recebimento": "11/09/1953 00:00:00",
"despacho": "Lei",
"nome": "Protocolo",
"setor": "Plenário"
},
{
"acao": "Prosseguir",
"data_envio": "",
"data_recebimento": "26/02/1953 00:00:00",
"despacho": "",
"nome": "Protocolo",
"setor": "Plenário"
}
]
}
]
to this
autor | data_protocolo | data_publicacao_dpc | descricao | proposicao | fases.acao | fases.data_envio | fases.data_recebimento | fases.despacho | fases.nome | fases.setor |
---|---|---|---|---|---|---|---|---|---|---|
Algum Autor | 26/02/1953 | 26/02/1953 | Description | PL 2/1953 | Prosseguir | 11/09/1953 00:00:00 | 11/09/1953 00:00:00 | Lei | publicada no D.O. de 11/09/1953. | Protocolo |
Algum Autor | 26/02/1953 | 26/02/1953 | Description | PL 2/1953 | Prosseguir | 26/02/1953 00:00:00 | Protocolo | Plenário |
Outline is creating all instances within a node as a single line.
From the mapping:
[
"PM_REPOSITORY_FOLDER_SOURCE_1_SOURCEFIELD_16_@BUSINESSNAME",
"PM.REPOSITORY.FOLDER.SOURCE.1.SOURCEFIELD.16.@BusinessName"
],
[
"PM_REPOSITORY_FOLDER_SOURCE_1_SOURCEFIELD_16_@BUSINESSDATE",
"PM.REPOSITORY.FOLDER.SOURCE.1.SOURCEFIELD.16.@BUSINESSDATE"
],
As a result, the CSV is a single line (exc header).
I'm sorry for asking, but is this intended? If not, what am I doing wrong? Any help you can give would be appreciated!
How should nested arrays be handled in the outline files?
I have the following dataset, containing nested array named "categories":
{ "results": [
{
"categories": [
"foo", "bar"
],
"city": "Vienna"
}
] }
Thanks, Maks
Looks useful! Are you going to make it available on PyPi?
output I get from installer
Collecting unicodecsv==0.9.0 (from -r .\requirements.txt (line 1))
Using cached unicodecsv-0.9.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Bryan\AppData\Local\Temp\pip-build-4vog3hcr\unicodecsv\setup.py", line 5, in
version = import('unicodecsv').version
File "c:\users\bryan\appdata\local\temp\pip-build-4vog3hcr\unicodecsv\unicodecsv__init__.py", line 48
except TypeError, e:
^
SyntaxError: invalid syntax
----------------------------------------
Tried gen_outline.py and failed.
Got the following error, as I'm not a developer I have no clue what have I missed..
Traceback (most recent call last):
File "gen_outline.py", line 85, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 26, in coll_iter
for obj in data[coll_key]:
TypeError: list indices must be integers, not str
It's a standard mongodb json extract.
Hi, I am trying to get gen_outline to work. In the docs it says to use
python gen_outline.py --collection nodes /path/to/the.json
I am using exactly this:
python gen_outline.py --collection nodes F:\electoral_map\candidates_python\candidates0_to_250.json
And I get KeyError: 'nodes', so I took it out and just tried
python gen_outline.py -collection F:\electoral_map\candidates_python\candidates0_to_250.json
And then it says that it is missing the argument "json_file"
Am I just not entering it right? Please help I am new to github
Hi Gabriel,
I came across this library when research over nested json to csv conversion library, this covers all the use-cases that we want to address. I would like to use this in our application but there is no licence as well as releases for it. It would be really helpful if you could add those to these repository as it will make it easier for others to use this tool.
I apologize for contacting you over an issue as I couldn't find your email. Looking forward to your response.
Thanks,
Vaibhav
Running Python 2.7.8, in a PyDev environment.
I have verified that all of the input parameters (sys.argv) are the same in each scenario, yet when I call gen_outline.py's main function from another Python program (first modifying the sys.argv as necessary), I receive the following result:
gen_outline.main()
File "C:<my_path>\gen_outline.py", line 90, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "C:<my_path>\gen_outline.py", line 61, in make_outline
key_map = gather_key_map(iterator)
File "C:<my_path>\gen_outline.py", line 37, in gather_key_map
for d in iterator:
File "C:<my_path>\gen_outline.py", line 28, in coll_iter
data = json.load(f)
File "C:\Python278\lib\json__init__.py", line 290, in load
**kw)
File "C:\Python278\lib\json__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python278\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python278\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 200693 (char 200692)
At that specific character in the JSON is the starting quote for a value on the penultimate "line" of the JSON file (even though it's all technically a single line, but this is how my editor is displaying it with word wrapping).
The data entry is
"ad_hoc_command_events": "/api/v1/hos
ts/111/ad_hoc_command_events/"
Again, the character in question is the quotation mark just before "/api/v1...etc.". As I said, I don't get this error if I call gen_outline.py by itself. This JSON data name/value pair is hardly unique in the entire document, so I'm not sure why this exact place in the document is choking it, but the only possible theory I have is that, again, it starts the value of the last "line" in the JSON document (but as I said before, it is all on a single line, so it shouldn't matter). I've spent quite awhile trying to figure this out, using PyDev's debugger and everything, but I'm still not getting anywhere.
Please let me know if you can help, and if you need any more information on my end to do so.
Thanks in advance,
Michael
Traceback (most recent call last):
File "json2csv.py", line 155, in
loader.write_csv(filename=outfile, make_strings=args.strings)
File "json2csv.py", line 105, in write_csv
writer.writerows(out)
File "/usr/lib64/python2.7/csv.py", line 158, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 2: ordinal not in range(128)
When I try to outline the json file,
I get the above error.
Please help
Hi, I am trying to convert this json file to csv: https://www.bcbsal.org/cms/data/drugs.json
First I set out to create the outline file, but got this error:
TypeError: list indices must be integers, not str
The attached file shows the detail of the error. I tried using 'unquoting strings' but was unable to get it to work.
Any advice is much appreciated!
Thanks
What is the key_map_file.json?
Can you send an example?
Thanks
I'm getting below error with basic run. unicode has been installed. I do not have any workaround and blocked
python json2csv.py /root/dbseries.json /root/dbseries.csv
File "json2csv.py", line 82
for k, val in row.items()})
^
SyntaxError: invalid syntax
If i run the same json file against https://json-csv.com/api i'm getting a proper csv
I'm getting this error when i try to convert my json file to csv.
Traceback (most recent call last):
File "json2csv.py", line 120, in
loader.write_csv(filename=outfile)
File "json2csv.py", line 71, in write_csv
writer.writeheader()
AttributeError: DictWriter instance has no attribute 'writeheader'
need some help
Project description
The unicodecsv is a drop-in replacement for Python 2.7’s csv module which supports unicode strings without a hassle. Supported versions are python 2.7, 3.3, 3.4, 3.5, and pypy 2.4.0.
I am only getting this error once in a while, but it looks like this:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte
Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?
Thanks,
SJB
Could not find a version that satisfies the requirement requirements.txt (from versions: )
No matching distribution found for requirements.txt
Hi
I am trying to convert a json file to csv using your utility. However the output is not as expected.
Sample json file format is
{
"Datapoints": [
{
"Timestamp": "2015-09-21T02:55:00Z",
"Maximum": 2310711.0,
"Unit": "Bytes"
},
{
"Timestamp": "2015-09-18T02:55:00Z",
"Maximum": 1531247.0,
"Unit": "Bytes"
},
{
"Timestamp": "2015-09-09T02:55:00Z",
"Maximum": 1909968.0,
"Unit": "Bytes"
},
{
"Timestamp": "2015-09-19T02:55:00Z",
"Maximum": 1811538.0,
"Unit": "Bytes"
}
]
}
I use the following command to create the outline file.
$ python gen_outline.py filename.json --collection Datapoints
$ python json2csv.py filename.json filename.outline.json --strings
The output is as given below
INFO:root:{u'Timestamp': u'2015-09-21T02:55:00Z', u'Maximum': 2310711.0, u'Unit': u'Bytes'}
INFO:root:{u'Timestamp': u'2015-09-18T02:55:00Z', u'Maximum': 1531247.0, u'Unit': u'Bytes'}
INFO:root:{u'Timestamp': u'2015-09-09T02:55:00Z', u'Maximum': 1909968.0, u'Unit': u'Bytes'}
INFO:root:{u'Timestamp': u'2015-09-19T02:55:00Z', u'Maximum': 1811538.0, u'Unit': u'Bytes'}
Can you please suggest where I am going wrong
I am using this on ubuntu 14.04 , python version 2.7
Thanking you
Kanthan
Hi, I have a bunch of weblinks that are json files. Is there a way to have your tool parse directly from the weblink instead of me having to download the contents of each page, save to my computer, then parse?
Thanks!
My fault, but I have python 3 as the default interpreter, when launching the generator I just get this message.
$ python3 gen_outline.py --collection responses file.json
Traceback (most recent call last):
File "gen_outline.py", line 85, in <module>
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 32, in gather_key_map
for path in key_paths(d):
File "gen_outline.py", line 9, in helper
for k, v in x.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
Just by launching gen_outline.py with python2 it works as expected. Just leaving it here for reference
Hi,
So, I have a large Python program I'm writing that calls both gen_outline.py and json2csv.py as initial steps. Following this, I call Rufus Pollock's "csv2sqlite.py" (https://github.com/rgrp/csv2sqlite) to get the data in SQLITE format. In order to invoke these Python module from within my larger program I use the "subprocess" Python library. Up until now I was using Popen, but soon found I was running into race conditions (rufuspollock/csv2sqlite#20), After doing some research, I decided to either stick to subprocess.call(), or at least invoke Popen.wait(). While these solutions appear to solve the race condition, they unfortunately present another error, albeit a non-fatal one.
Basically, my Python program calls gen_outline.py and json2csv.py each four times, each one corresponding to a separate JSON file. The resulting stack trace is as follows, which again only manifests when I invoke Popen.wait() (or use the call() command instead), and did not occur before when I simply spawned a new process (but risked a race condition). I am not sure what the problem is, but I would very much appreciate help. Thanks in advance!
Traceback (most recent call last):
File "gen_outline.py", line 86, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 25, in coll_iter
data = json.load(f)
File "C:\Python278\lib\json__init__.py", line 290, in load
*kw)
File "C:\Python278\lib\json__init_.py", line 338, in loads
return default_decoder.decode(s)
File "C:\Python278\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=w(s, 0).end())
File "C:\Python278\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: end is out of bounds
Traceback (most recent call last):
File "gen_outline.py", line 86, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 25, in coll_iter
data = json.load(f)
File "C:\Python278\lib\json__init.py", line 290, in load
*kw)
File "C:\Python278\lib\json__init_.py", line 338, in loads
return default_decoder.decode(s)
File "C:\Python278\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=w(s, 0).end())
File "C:\Python278\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 200693 (char 200692)
Traceback (most recent call last):
File "gen_outline.py", line 86, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 25, in coll_iter
data = json.load(f)
File "C:\Python278\lib\json__init.py", line 290, in load
*kw)
File "C:\Python278\lib\json__init_.py", line 338, in loads
return default_decoder.decode(s)
File "C:\Python278\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=w(s, 0).end())
File "C:\Python278\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 57339 (char 57338)
Traceback (most recent call last):
File "gen_outline.py", line 86, in
main()
File "gen_outline.py", line 75, in main
outline = make_outline(args.json_file, args.each_line, args.collection)
File "gen_outline.py", line 50, in make_outline
key_map = gather_key_map(iterator)
File "gen_outline.py", line 31, in gather_key_map
for d in iterator:
File "gen_outline.py", line 25, in coll_iter
data = json.load(f)
File "C:\Python278\lib\json__init.py", line 290, in load
*kw)
File "C:\Python278\lib\json__init_.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python278\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python278\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: end is out of bounds
how would I add a single key locator for something that is outside of the specified array? (i.e. something outside of what I define as the "collection"?)
I want to pull something in a json object above the hierarchy of the collection "object" that I pointed at.
Ok, I am either doing something wrong, or there is something wrong with the code :/.
I am literally testing the code by using the following as example.
"testfile.json"
{
"nodes": [
{"source": {"author": "Someone"}, "message": {"original": "Hey!", "Revised": "Hey yo!"}},
{"source": {"author": "Another"}, "message": {"original": "Howdy!", "Revised": "Howdy partner!"}},
{"source": {"author": "Me too"}, "message": {"original": "Yo!", "Revised": "Yo, 'sup?"}}
]
}
Then I am creating the outline file with the following piece of code
python gen_outline.py --collection nodes /home/irving/workspace/json2csv_2/testfile.json -o /home/irving/workspace/json2csv_2/testfile_outline.json
Granted the results of "testfile_outline.json" are not the same as what it is on the "README.md", but I kept on going.
Now, everything is good and dandy, and I decide to create the CSV. So I type the following piece of code.
python json2csv.py /home/irving/workspace/json2csv_2/testfile.json /home/irving/workspace/json2csv_2/testfile_outline.json -o /home/irving/workspace/json2csv_2/testfile.csv
This give me the following error
File "json2csv.py", line 82
for k, val in row.items()})
^
SyntaxError: invalid syntax
What am I doing wrong? Thoughts?
(By the way I did test the outline JSON from "README.md" and got same results)
I have been facing this error:
Traceback (most recent call last):
File "json2csv.py", line 155, in
loader.write_csv(filename=outfile, make_strings=args.strings)
File "json2csv.py", line 105, in write_csv
writer.writerows(out)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 17-18: ordinal not in range(128)
how i solved it (return unicode(item).encode("utf-8")):
def make_string(self, item):
if isinstance(item, list) or isinstance(item, set) or isinstance(item, tuple):
return self.SEP_CHAR.join([self.make_string(subitem) for subitem in item])
elif isinstance(item, dict):
return self.DICT_OPEN + self.DICT_SEP_CHAR.join([self.KEY_VAL_CHAR.join([k, self.make_string(val)]) for k, val in item.items()]) + self.DICT_CLOSE
else:
return unicode(item).encode("utf-8")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.