hay / xml2json Goto Github PK

View Code? Open in Web Editor NEW

462.0 26.0 206.0 32 KB

Python script converts XML to JSON or the other way around

Home Page: http://github.com/hay/xml2json

License: MIT License

Python 100.00%

xml2json's Introduction

XML2JSON

This module is deprecated and will not be updated anymore (May 2019)

To convert between text-based data formats (including XML and JSON) use my library dataknead.
To work with XML in Python i recommend the excellent xmltodict.

Deprecated documentation

Python script converts XML to JSON or the other way around

Usage

Make this executable

$ chmod +x xml2json

Then invoke it from the command line like this

$ xml2json -t xml2json -o file.json file.xml

Or the other way around

$ xml2json -t json2xml -o file.xml file.json

Without the -o parameter xml2json writes to stdout

$ xml2json -t json2xml file.json

Additional the options: Strip text (#text and #tail) in the json

$ xml2json -t xml2json -o file.json file.xml --strip_text

Strip namespace in the json

$ xml2json -t xml2json -o file.json file.xml --strip_namespace

In code

from xml2json import json2xml
d = {'r': {'@p': 'p1', '#text': 't1', 'c': 't2'}}
print(json2xml(d))
> b'<r p="p1">t1<c>t2</c></r>'

Installation

Either clone this repo or use pip like this:

pip install https://github.com/hay/xml2json/zipball/master

License

xml2json is released under the terms of the MIT license.

Contributors

This script was originally written by R.White, Rewritten to a command line utility by Hay Kranen with contributions from George Hamilton (gmh04) and Dan Brown (jdanbrown)

How it works

xml2json relies on ElementTree for the XML parsing. This is based on pesterfish.py but uses a different XML->JSON mapping. The XML -> JSON mapping is described here

XML                              JSON
<e/>                             "e": null
<e>text</e>                      "e": "text"
<e name="value" />               "e": { "@name": "value" }
<e name="value">text</e>         "e": { "@name": "value", "#text": "text" }
<e> <a>text</a ><b>text</b> </e> "e": { "a": "text", "b": "text" }
<e> <a>text</a> <a>text</a> </e> "e": { "a": ["text", "text"] }
<e> text <a>text</a> </e>        "e": { "#text": "text", "a": "text" }

This is very similar to the mapping used for Yahoo Web Services

This is a mess in that it is so unpredictable -- it requires lots of testing (e.g. to see if values are lists or strings or dictionaries). For use in Python this could be vastly cleaner. Think about whether the internal form can be more self-consistent while maintaining good external characteristics for the JSON.

Look at the Yahoo version closely to see how it works. Maybe can adopt that completely if it makes more sense...

R. White, 2006 November 6

xml2json's People

Contributors

Stargazers

Watchers

Forkers

pbiernacki ian-llewellyn mutaku widgital gmh04 vdveer trey-jones akimboio deeshank jdanbrown skopp mobyle2 glukose antback gerpsh akesterson cemmanouilidis hesaul zhanglc aashish24 arcodergh tbkraf08 yellowcrescent gpmidi ralfarama milker90 aloaisa baojie vovkd scottchiefbaker sl45sms doctoruna shortcut75 larrycai zyggyrat tedelblu webmalex asifhj robato edyesed simba707 rub21 pombredanne vicgc maruthiprithivi xiaobb cohorte emiloslavsky abetusk jifferent jnbala elbow-jason sq6jnx belbis saurabh20n alexflanagan soniro bakytzhanakzhol chapayevdauren magzhan123 kulyash123 edigerad tamerlanimanov begadil bekzattt aibek-av sagynysh sdurecheck kaskabayev just-mura koniskair markosski gumpyoung wtrevino larrymartell dongohpark sebadiaz ninelives21 nmadhire harshalgalgale narenq7 kaiyuwang16 shejianmin faionweb withanage xrwang letiziaap cdht javacym g33klord borkhalenko thehatter skeledrew mraburn josejamilena fireflycoco var121 steveharrison82 pavelik mishasaggi

xml2json's Issues

Python 3.6 import failure

One line python3 script: from xml2json import xml2json

Executing that with Python 3.6 causes a print obj needs parens, line 58

Root cause - software not released to Pip3-land

Converting integer response to string

var xml = '<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">soapenv:Header<ERROR_CODE>0</ERROR_CODE><ERROR_DESC>success</ERROR_DESC>/soapenv:Headersoapenv:Body<MODIFY_BALANCE_RESPONSE>60.0000<OLD_BALANCE>0.0000</OLD_BALANCE><NETWORK_ID>7</NETWORK_ID></MODIFY_BALANCE_RESPONSE>/soapenv:Body/soapenv:Envelope'

resp = JSON.parse(require("xml2json").toJson(xml));

{ 'soapenv:Envelope':
{ 'xmlns:soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
'soapenv:Header': { ERROR_CODE: '0', ERROR_DESC: 'success' },
'soapenv:Body': { MODIFY_BALANCE_RESPONSE: [Object] } } }

Out of memory error on ~1.5 GB XML file

I'm unable to convert a ~1.5 GB file. ~600 MB files work fine. Any idea how to get past the out of memory error? The system I'm on has 32 GB of RAM.

$ wget https://dumps.wikimedia.org/enwiki/20180920/enwiki-20180920-pages-articles26.xml-p41067204p42567204.bz2
$ lbunzip2 enwiki-20180920-pages-articles26.xml-p41067204p42567204.bz2
$ xml2json --strip_namespace enwiki-20180920-pages-articles26.xml-p41067204p42567204 -o test.json

Traceback (most recent call last):
  File "/mnt/c/Users/mark/Desktop/.feed/bin/xml2json", line 9, in <module>
    load_entry_point('xml2json==0.1', 'console_scripts', 'xml2json')()
  File "/mnt/c/Users/mark/Desktop/.feed/local/lib/python2.7/site-packages/xml2json.py", line 237, in main
    out = xml2json(input, options, strip_ns, strip)
  File "/mnt/c/Users/mark/Desktop/.feed/local/lib/python2.7/site-packages/xml2json.py", line 176, in xml2json
    elem = ET.fromstring(xmlstring)
  File "<string>", line 124, in XML
cElementTree.ParseError: out of memory: line 1, column 0

Handle numbers and booleans

In xml to json conversion, all the values (text) are converted to strings.
It would be nice if numbers and booleans are returned in their own types, as supported by json.

For example:

_Input (xml)_

    <pi>3.14</pi> 
    <isCool>true</isCool>

_Actual behavior (json output)_

{
        "pi":"3.14",    // value is a string type
        "isCool":"true" // value is a string type
}

_Proposed behaviour (json output)_

{
        "pi": 3.14,     // value is a number type
        "isCool": true  // value is a boolean type
}

icon

Why is -t xml2json required?

xml2json -t xml2json -o file.json file.xml

Why do I need to specify t xml2json to make this work? It seems like that should be the default since that's the name of the program. I think we could leave the option to go from json -> XML, but the default shouldn't require it.

Would you accept a patch to make this the default behavior.

xml2json should have some way to strip namespaces but retain attributes that are namespaced

See subject.

usage description problem

Hi. Had a small problem calling your script.

Usage says:
xml2json -t xml2json -f file.json file.xml

Readme says:
xml2json -t xml2json -f file.xml file.json

Correct (as far as I understand :) call:
xml2json -t xml2json -o file.xml file.json

btw nice script, thx 👍

Issue with & in

I seem to have an issue with & in my xml.

result.content = "<result><contact_id id='561'><note date='2015-04-22T04:35:38-04:00' author='[email protected]'>28/04/2015-Replace DVR & LCD monitor and test.</note></contact_id></result>"
json_data = json.loads(xml2json(result.content, options, strip_ns))

File "C:\Users\dm\Documents\Projects\s_merge\venv34-64\lib\site-packages\xml2json.py", line 1
76, in xml2json
elem = ET.fromstring(xmlstring)
File "C:\Python34-64\Lib\xml\etree\ElementTree.py", line 1325, in XML
parser.feed(text)
File "", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 115

Do you think forking your solution and attempting
http://stackoverflow.com/questions/13046240/parseerror-not-well-formed-invalid-token-using-celementtree
would be the best solution?

latest commit getting error

Hello, When i use the latest commit, i get an error.(below)
If i use the commit dated june 7 2013 the xml2json function works fine.

170     elem = ET.fromstring(xmlstring)

--> 171 return elem2json(elem, strip_ns=strip_ns, strip=strip)
172
173

/pype/ce/petex_ws/extra/xml2json.py in elem2json(elem, strip_ns, strip)
149 if hasattr(elem, 'getroot'):
150 elem = elem.getroot()
--> 151 return json.dumps(elem_to_internal(elem, strip_ns=strip_ns, strip=strip))
152
153

/pype/ce/petex_ws/extra/xml2json.py in elem_to_internal(elem, strip_ns, strip)
56 elem_tag = elem.tag
57 if strip_ns:
---> 58 elem_tag = strip_tag(elem.tag)
59 else:
60 for key, value in list(elem.attrib.items()):

/pype/ce/petex_ws/extra/xml2json.py in strip_tag(tag)
45 strip_ns_tag = tag
46 split_array = tag.split('}')
---> 47 strip_ns_tag = split_array[1]
48 tag = strip_ns_tag
49 return tag

IndexError: list index out of range

option to use short namespace name?

This may be a duplicate of #20.

From https://www.sec.gov/Archives/edgar/data/1318605/000156459016013195/tsla-20151231.xml

<xbrl
...
xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31"
...
>
    ...
    <dei:TradingSymbol id="F_000005" contextRef="C_0001318605_20150101_20151231">TSLA</dei:TradingSymbol>
    ...
</xbrl>

I would like xml2json to output dei:TradingSymbol as the JSON key rather than {http://xbrl.sec.gov/dei/2014-01-31}TradingSymbol or TradingSymbol depending on options. Is there a currently a way to do this? If not, is it possible to add?

BTW I just wanted to say thanks for making this. xml2json has saved me countless hours of time parsing XML.

utf8 conversion

The script worked great for me from XML to Json, but it was unable to process correctly words like "Président" which became "Pr\u00e9sident". Is it possible to fix this?

I have this error in your script ('ascii' codec can not encode)?

Exception Type: UnicodeEncodeError
Exception Value: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)
Exception Location: /home/projeto/xml2json.py in xml2json, line 153
Python Executable: /home/projeto/env/bin/python
Python Version: 2.6.5

api.py

from django.http import HttpResponse
from suds.client import Client
import xml2json

def num_cep_json(request, cep):

url = 'http://www.toolsweb.com.br/webservice/serverWebService.php?wsdl'
client = Client(url)
cep == u'[]'
cep_xml = client.service.consultaCEP(cep)
cep_json = xml2json.xml2json(cep_xml)

return HttpResponse(cep_json, mimetype='application/json')

Example code needed for using xml2json() programatically (non cli use)

Specifically, options isn't a dict it is an object - showing how to set that up will be helpful to noobs.

Objects fail to parse xml2json

I have 2 files
input https://www.dropbox.com/s/ccaaj2aymomgqgo/20141004Flemington_5start.xml?dl=0

output https://www.dropbox.com/s/ur5q8404dfuukrj/20141004Flemington_5start.json?dl=0

for each item with multiple attributes such as these fail this is the opening section of the attached XML file and the output below, is there anything I can do to stop it failing so badly?

The module optparse has bean deprecated since version 3.2

xml2json/xml2json.py

Line 196 in 99de3ef

p = optparse.OptionParser(

The module optparse has bean deprecated since version 3.2, so suggest you use the module argparse.

Should this module be deprecated?

Here's a general question to all users of this library: should i deprecate it? I haven't updated this repo in years, because xmltodict does roughly the same thing, but much better, reliable and faster. I'm wondering if anyone has a specific usecase for this library that other libraries don't handle (better).

I would also be open for transferring ownership of this library/repo to somebody else.

@fambon, @danielgrijalva, @grunde73, @xiuyuanjun, @marklit, @suyashjain since you've all recently added an issue and/or added a pull request, maybe you can chime in? Thanks!

xml2json method documentation

Sorry if this is super elementary, but I'm trying to use the xml2json function, but you only provide an example for json2xml and I have no idea what the options parameter should be when passing it in. Any help would be appreciated.

support to skip xml attribute

I hope one extra option can be added to skip the attribute name/value
The xml file looks below

<project>
  <logRotator class="hudson.tasks.LogRotator">
    <daysToKeep>-1</daysToKeep>
    <numToKeep>20</numToKeep>
    <artifactDaysToKeep>-1</artifactDaysToKeep>
    <artifactNumToKeep>-1</artifactNumToKeep>
  </logRotator>
   ...

The json output is

{
    "project": {
        "logRotator": {
            "@class": "hudson.tasks.LogRotator",
            "artifactDaysToKeep": "-1",
            "artifactNumToKeep": "-1",
            "daysToKeep": "-1",
            "numToKeep": "20"
        }
}

The @Class is not needed as the data for my case.

It will be nice to have --strip_attribute as well

The case is http://stackoverflow.com/questions/21351996/how-to-convert-jenkins-job-configuration-config-xml-to-yaml-format-in-python-to

-strip_text not working

Hi,
When I try --strip_text, it does not simplify the json code. the #text and #tail are not removed.

I am executing the following command

./xml2json.py -t xml2json -o file.json confluence.settings.xml --strip_text

Thanks
Suyash

hay / xml2json Goto Github PK

xml2json's Introduction

XML2JSON

Deprecated documentation

Usage

Installation

License

Contributors

Links

How it works

xml2json's People

Contributors

Stargazers

Watchers

Forkers

xml2json's Issues

Recommend Projects

Recommend Topics

Recommend Org