Coder Social home page Coder Social logo

xml2json's Introduction

XML2JSON

This module is deprecated and will not be updated anymore (May 2019)

  • To convert between text-based data formats (including XML and JSON) use my library dataknead.
  • To work with XML in Python i recommend the excellent xmltodict.

Deprecated documentation

Python script converts XML to JSON or the other way around

Usage

Make this executable

$ chmod +x xml2json

Then invoke it from the command line like this

$ xml2json -t xml2json -o file.json file.xml

Or the other way around

$ xml2json -t json2xml -o file.xml file.json

Without the -o parameter xml2json writes to stdout

$ xml2json -t json2xml file.json

Additional the options: Strip text (#text and #tail) in the json

$ xml2json -t xml2json -o file.json file.xml --strip_text

Strip namespace in the json

$ xml2json -t xml2json -o file.json file.xml --strip_namespace

In code

from xml2json import json2xml
d = {'r': {'@p': 'p1', '#text': 't1', 'c': 't2'}}
print(json2xml(d))
> b'<r p="p1">t1<c>t2</c></r>'

Installation

Either clone this repo or use pip like this:

pip install https://github.com/hay/xml2json/zipball/master

License

xml2json is released under the terms of the MIT license.

Contributors

This script was originally written by R.White, Rewritten to a command line utility by Hay Kranen with contributions from George Hamilton (gmh04) and Dan Brown (jdanbrown)

Links

How it works

xml2json relies on ElementTree for the XML parsing. This is based on pesterfish.py but uses a different XML->JSON mapping. The XML -> JSON mapping is described here

XML                              JSON
<e/>                             "e": null
<e>text</e>                      "e": "text"
<e name="value" />               "e": { "@name": "value" }
<e name="value">text</e>         "e": { "@name": "value", "#text": "text" }
<e> <a>text</a ><b>text</b> </e> "e": { "a": "text", "b": "text" }
<e> <a>text</a> <a>text</a> </e> "e": { "a": ["text", "text"] }
<e> text <a>text</a> </e>        "e": { "#text": "text", "a": "text" }

This is very similar to the mapping used for Yahoo Web Services

This is a mess in that it is so unpredictable -- it requires lots of testing (e.g. to see if values are lists or strings or dictionaries). For use in Python this could be vastly cleaner. Think about whether the internal form can be more self-consistent while maintaining good external characteristics for the JSON.

Look at the Yahoo version closely to see how it works. Maybe can adopt that completely if it makes more sense...

R. White, 2006 November 6

xml2json's People

Contributors

cemmanouilidis avatar hay avatar jdanbrown avatar larrycai avatar parconoel avatar scottchiefbaker avatar shuxin avatar sl45sms avatar webmalex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xml2json's Issues

Python 3.6 import failure

One line python3 script: from xml2json import xml2json

Executing that with Python 3.6 causes a print obj needs parens, line 58

Root cause - software not released to Pip3-land

Converting integer response to string

var xml = '<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">soapenv:Header<ERROR_CODE>0</ERROR_CODE><ERROR_DESC>success</ERROR_DESC>/soapenv:Headersoapenv:Body<MODIFY_BALANCE_RESPONSE>60.0000<OLD_BALANCE>0.0000</OLD_BALANCE><NETWORK_ID>7</NETWORK_ID></MODIFY_BALANCE_RESPONSE>/soapenv:Body/soapenv:Envelope'

resp = JSON.parse(require("xml2json").toJson(xml));

{ 'soapenv:Envelope':
{ 'xmlns:soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
'soapenv:Header': { ERROR_CODE: '0', ERROR_DESC: 'success' },
'soapenv:Body': { MODIFY_BALANCE_RESPONSE: [Object] } } }

Out of memory error on ~1.5 GB XML file

I'm unable to convert a ~1.5 GB file. ~600 MB files work fine. Any idea how to get past the out of memory error? The system I'm on has 32 GB of RAM.

$ wget https://dumps.wikimedia.org/enwiki/20180920/enwiki-20180920-pages-articles26.xml-p41067204p42567204.bz2
$ lbunzip2 enwiki-20180920-pages-articles26.xml-p41067204p42567204.bz2
$ xml2json --strip_namespace enwiki-20180920-pages-articles26.xml-p41067204p42567204 -o test.json
Traceback (most recent call last):
  File "/mnt/c/Users/mark/Desktop/.feed/bin/xml2json", line 9, in <module>
    load_entry_point('xml2json==0.1', 'console_scripts', 'xml2json')()
  File "/mnt/c/Users/mark/Desktop/.feed/local/lib/python2.7/site-packages/xml2json.py", line 237, in main
    out = xml2json(input, options, strip_ns, strip)
  File "/mnt/c/Users/mark/Desktop/.feed/local/lib/python2.7/site-packages/xml2json.py", line 176, in xml2json
    elem = ET.fromstring(xmlstring)
  File "<string>", line 124, in XML
cElementTree.ParseError: out of memory: line 1, column 0

Handle numbers and booleans

In xml to json conversion, all the values (text) are converted to strings.
It would be nice if numbers and booleans are returned in their own types, as supported by json.

For example:

_Input (xml)_

    <pi>3.14</pi> 
    <isCool>true</isCool>

_Actual behavior (json output)_

{
        "pi":"3.14",    // value is a string type
        "isCool":"true" // value is a string type
}

_Proposed behaviour (json output)_

{
        "pi": 3.14,     // value is a number type
        "isCool": true  // value is a boolean type
}

Why is -t xml2json required?

xml2json -t xml2json -o file.json file.xml

Why do I need to specify t xml2json to make this work? It seems like that should be the default since that's the name of the program. I think we could leave the option to go from json -> XML, but the default shouldn't require it.

Would you accept a patch to make this the default behavior.

usage description problem

Hi. Had a small problem calling your script.

Usage says:
xml2json -t xml2json -f file.json file.xml

Readme says:
xml2json -t xml2json -f file.xml file.json

Correct (as far as I understand :) call:
xml2json -t xml2json -o file.xml file.json

btw nice script, thx 👍

Issue with & in

Hi

I seem to have an issue with & in my xml.

result.content = "<result><contact_id id='561'><note date='2015-04-22T04:35:38-04:00' author='[email protected]'>28/04/2015-Replace DVR & LCD monitor and test.</note></contact_id></result>"
json_data = json.loads(xml2json(result.content, options, strip_ns))

File "C:\Users\dm\Documents\Projects\s_merge\venv34-64\lib\site-packages\xml2json.py", line 1
76, in xml2json
elem = ET.fromstring(xmlstring)
File "C:\Python34-64\Lib\xml\etree\ElementTree.py", line 1325, in XML
parser.feed(text)
File "", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 115

Do you think forking your solution and attempting
http://stackoverflow.com/questions/13046240/parseerror-not-well-formed-invalid-token-using-celementtree
would be the best solution?

latest commit getting error

Hello, When i use the latest commit, i get an error.(below)
If i use the commit dated june 7 2013 the xml2json function works fine.

170     elem = ET.fromstring(xmlstring)

--> 171 return elem2json(elem, strip_ns=strip_ns, strip=strip)
172
173

/pype/ce/petex_ws/extra/xml2json.py in elem2json(elem, strip_ns, strip)
149 if hasattr(elem, 'getroot'):
150 elem = elem.getroot()
--> 151 return json.dumps(elem_to_internal(elem, strip_ns=strip_ns, strip=strip))
152
153

/pype/ce/petex_ws/extra/xml2json.py in elem_to_internal(elem, strip_ns, strip)
56 elem_tag = elem.tag
57 if strip_ns:
---> 58 elem_tag = strip_tag(elem.tag)
59 else:
60 for key, value in list(elem.attrib.items()):

/pype/ce/petex_ws/extra/xml2json.py in strip_tag(tag)
45 strip_ns_tag = tag
46 split_array = tag.split('}')
---> 47 strip_ns_tag = split_array[1]
48 tag = strip_ns_tag
49 return tag

IndexError: list index out of range

option to use short namespace name?

This may be a duplicate of #20.

From https://www.sec.gov/Archives/edgar/data/1318605/000156459016013195/tsla-20151231.xml

<xbrl
...
xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31"
...
>
    ...
    <dei:TradingSymbol id="F_000005" contextRef="C_0001318605_20150101_20151231">TSLA</dei:TradingSymbol>
    ...
</xbrl>

I would like xml2json to output dei:TradingSymbol as the JSON key rather than {http://xbrl.sec.gov/dei/2014-01-31}TradingSymbol or TradingSymbol depending on options. Is there a currently a way to do this? If not, is it possible to add?


BTW I just wanted to say thanks for making this. xml2json has saved me countless hours of time parsing XML.

utf8 conversion

The script worked great for me from XML to Json, but it was unable to process correctly words like "Président" which became "Pr\u00e9sident". Is it possible to fix this?

I have this error in your script ('ascii' codec can not encode)?

Exception Type: UnicodeEncodeError
Exception Value: 'ascii' codec can't encode character u'\xe9' in position 136: ordinal not in range(128)
Exception Location: /home/projeto/xml2json.py in xml2json, line 153
Python Executable: /home/projeto/env/bin/python
Python Version: 2.6.5

api.py

from django.http import HttpResponse
from suds.client import Client
import xml2json

def num_cep_json(request, cep):

url = 'http://www.toolsweb.com.br/webservice/serverWebService.php?wsdl'
client = Client(url)
cep == u'[]'
cep_xml = client.service.consultaCEP(cep)
cep_json = xml2json.xml2json(cep_xml)

return HttpResponse(cep_json, mimetype='application/json')

Should this module be deprecated?

Here's a general question to all users of this library: should i deprecate it? I haven't updated this repo in years, because xmltodict does roughly the same thing, but much better, reliable and faster. I'm wondering if anyone has a specific usecase for this library that other libraries don't handle (better).

I would also be open for transferring ownership of this library/repo to somebody else.

@fambon, @danielgrijalva, @grunde73, @xiuyuanjun, @marklit, @suyashjain since you've all recently added an issue and/or added a pull request, maybe you can chime in? Thanks!

xml2json method documentation

Sorry if this is super elementary, but I'm trying to use the xml2json function, but you only provide an example for json2xml and I have no idea what the options parameter should be when passing it in. Any help would be appreciated.

support to skip xml attribute

I hope one extra option can be added to skip the attribute name/value
The xml file looks below

<project>
  <logRotator class="hudson.tasks.LogRotator">
    <daysToKeep>-1</daysToKeep>
    <numToKeep>20</numToKeep>
    <artifactDaysToKeep>-1</artifactDaysToKeep>
    <artifactNumToKeep>-1</artifactNumToKeep>
  </logRotator>
   ...

The json output is

{
    "project": {
        "logRotator": {
            "@class": "hudson.tasks.LogRotator",
            "artifactDaysToKeep": "-1",
            "artifactNumToKeep": "-1",
            "daysToKeep": "-1",
            "numToKeep": "20"
        }
}

The @Class is not needed as the data for my case.

It will be nice to have --strip_attribute as well

The case is http://stackoverflow.com/questions/21351996/how-to-convert-jenkins-job-configuration-config-xml-to-yaml-format-in-python-to

-strip_text not working

Hi,
When I try --strip_text, it does not simplify the json code. the #text and #tail are not removed.

I am executing the following command

./xml2json.py -t xml2json -o file.json confluence.settings.xml --strip_text

Thanks
Suyash

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.