Coder Social home page Coder Social logo

elharo / xom Goto Github PK

View Code? Open in Web Editor NEW
63.0 5.0 32.0 27.6 MB

XOM™ is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order.

Home Page: https://xom.nu

License: Other

HTML 8.80% XSLT 0.49% Java 90.71%
xml xpath xslt xinclude java

xom's Introduction

XOM: a new XML object model

XOM™ is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order. It includes built-in support for a number of XML technologies including Namespaces in XML, XPath 1.0, XSLT 1.0, XInclude, xml:id, xml:base, Canonical XML, and Exclusive Canonical XML. XOM documents can be converted to and from SAX and DOM.

XOM is designed to be easy to learn and easy to use. It works very straight-forwardly, and has a very shallow learning curve. Assuming you're already familiar with XML, you should be able to get up and running with XOM very quickly.

XOM is the only XML API that makes no compromises on correctness. XOM only accepts namespace well-formed XML documents, and only allows you to create namespace well-formed XML documents. (In fact, it's a little stricter than that: it actually guarantees that all documents are round-trippable and have well-defined XML infosets.) XOM manages your XML so you don't have to. With XOM, you can focus on the unique value of your application, and trust XOM to get the XML right.

XOM is fairly unique in that it is a dual streaming/tree-based API. Individual nodes in the tree can be processed while the document is still being built. The enables XOM programs to operate almost as fast as the underlying parser can supply data. You don't need to wait for the document to be completely parsed before you can start working with it.

XOM is very memory efficient. If you read an entire document into memory, XOM uses as little memory as possible. More importantly, XOM allows you to filter documents as they're built so you don't have to build the parts of the tree you aren't interested in. For instance, you can skip building text nodes that only represent boundary white space, if such white space is not significant in your application. You can even process a document piece by piece and throw away each piece when you're done with it. XOM has been used to process documents that are gigabytes in size.

The current version of XOM is 1.3.9 and is backwards compatible with 1.2, 1.1 and 1.0. You should not need to recompile any code to upgrade to 1.3.9. XOM is believed to be quite stable and robust. Future releases should be backwards compatible with the 1.0 API for the foreseeable future.

Adding XOM to your build

XOM's Maven group ID is xom and its artifact ID is xom. To add a dependency on XOM using Maven, add this dependency element to your pom.xml:

<dependency>
  <groupId>xom</groupId>
  <artifactId>xom</artifactId>
  <version>1.3.9</version>
</dependency>

To add a dependency using Gradle:

dependencies {
  compile 'xom:xom:1.3.9'
}

Dependencies

XOM is not complete unto itself. It depends on an underlying SAX parser to read documents and feed the data into a tree structure. While theoretically any SAX2 compliant parser should work, Xerces 2.6.1 and later is the only one that I am fairly confident does work. Xerces 2.8.0 is included with the full distribution. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). Piccolo 1.0.3, Crimson, GNU JAXP 1.0b1, the Oracle XML Parser for Java 9.2.0.2.0D and 9.2.0.5.0, and Xerces versions prior to 2.6.1 all have bugs that prevent them from doing what XOM needs them to do. (Note to XML parser vendors: XOM's test suite gives parsers a very thorough workout, and delves into some of the more obscure parts of the XML spec that many parsers get wrong. You could do a lot worse for testing than making sure all the XOM unit tests pass when using your parser.)

Similarly XSLT support depends on a TrAX processor. XInclude and XML canonicalization, however, are native.

Learning More

If you'd like to know more about XOM, I suggest starting with the tutorial. XOM also includes a large collection of small sample programs that demonstrate various parts of the library. If you're curious about why XOM is the way it is, or if you would like to suggest future directions for XOM, you should read the design principles on which XOM is based. If you have a question about XOM that is not answered in the API documentation or the FAQ, you can ask it on Stack Overflow or the xom-interest mailing list. You do not need to be subscribed to post, but non-subscriber questions are moderated. (Due to increasing amounts of non-subscriber spam, it is possible non-subscriber questions are missed. If you don't get an answer, please subscribe and try again.)

Links

xom's People

Contributors

elharo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

xom's Issues

Prolog control

Consider adding the ability to control whether a prolog is added and what that prolog contains. I process XML using Xalan callouts from XSLT into the Java tier. I have functions which iterate and generate XML using BaseX. Each loop would generate an XML which is concatenated into a output string. I only need the prolog on the first and not each iteration. I created an overload to handle this. This product was great to get away from the extremely slow Java built into DOM processing. Excellent workl

What are the official dependency coordinates for XOM?

Let's say I'm writing a non-toy project, and want to consume XOM. I'm using Ant+Ivy, Maven, Gradle, SBT, etc. ... What coordinates do I use to import XOM?

Those build tools declare external dependencies using coordinates. Manually dropping an external JAR into source control is not easily supported by most build tools and would be frowned upon by most consuming projects.

If a goal is to make the project consumable, it needs to be in a Maven repo. I want to stress that this doesn't mean consumers must use "Maven the build tool" when consuming the JAR from a "Maven-layout repository".

Maven Central and Bintray JCenter are the 2 most popular repositories.

XOM is great and I'd like to use it but I can't because of the current lack of standard publishing practices.

Update xalan dep to 2.7.2

Having used the ossindex-maven-plugin on a maven project that uses xom 1.2.10 (via com.io7m.xom on Maven central), I see the following warning:

mvn compile net.ossindex:ossindex-maven-plugin:audit
...snip...

[ERROR] xalan:xalan:2.7.0  [VULNERABLE]
[ERROR]   required by com.io7m.xom:xom:1.2.10
[ERROR] 1 known vulnerabilities, 1 affecting installed version
[ERROR] 
[ERROR] [CVE-2014-0107]  Permissions, Privileges, and Access Controls
[ERROR] https://ossindex.net/resource/cve/359764
[ERROR] The TransformerFactory in Apache Xalan-Java before 2.7.2 does not properly restrict access to certain properties when FEATURE_SECURE_PROCESSING is enabled, which allows remote attackers to bypass expected restrictions and load arbitrary classes or access external resources via a crafted (1) xalan:content-header, (2) xalan:entities, (3) xslt:content-header, or (4) xslt:entities property, or a Java property that is bound to the XSLT 1.0 system-property function.
[ERROR] 
[ERROR] --------------------------------------------------------------
[ERROR] 

XSLTransform.toDocument should allow whitespace only text nodes

The result of a transformation is a XOM Nodes object.
The Nodes list returned by the transform method
may contain zero, one, or more than one node, depending on what the stylesheet
produced. After all, there’s no guarantee that an XSL transformation produces a well-formed XML document.
Sometimes it only produces a well-balanced document fragment, and sometimes it produces nothing at all.
However, many stylesheets do produce well-formed XML documents.
XSLTransform includes a static toDocument
utility method that converts a Nodes object into a Document object.
However, if the Nodes passed to this method contains no elements, more than one element, or any Text
objects, then toDocument throws an XMLException.
For example,

Test failure: testXIncludeTestSuite

In nu.xom.tests.XIncludeTest:

White spaces are required between publicId and systemId.

nu.xom.ParsingException: White spaces are required between publicId and systemId. at line 1, column 50 in http://dev.w3.org/cvsweb/~checkout~/2001/XInclude-Test-Suite/testdescr.xml?content-type=text/plain&only_with_tag=HEAD
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.tests.XIncludeTest.testXIncludeTestSuite(Unknown Source)
Caused by: org.xml.sax.SAXParseException; systemId: http://dev.w3.org/cvsweb/~checkout~/2001/XInclude-Test-Suite/testdescr.xml?content-type=text/plain&only_with_tag=HEAD; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
at org.apache.xerces.parsers.AbstrractSAXParser.parse(Unknown Source)

Suspect this one is down to bit rot as well: Perhaps something changed on w3.org?

javadoc: warning - Multiple sources of package comments found for package

[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.tests"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.canonical"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.converters"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.xinclude"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.xslt"

Merge excludes from build.xml and .gitignore

    <property name="excludes"       value=".clover, .DS_Store, **/.DS_Store, **/.AppleFileInfo, **/*.zip, **/.thumbnails/**, clover_html/**, clover/**, xom.gif, data/XInclude-Test-Suite/**, data/xmlconf/**, data/canonical/xmlconf/**, data/oasis*/**, **/testresults/**, **/pantry/**, **/workspace/**, **/junit*properties, **/.nautilus-metafile.xml, website/**, **/.project, **/.classpath, build/**, dist/**, .settings/**, lib2/**, xom.fb, jester*, trademark*"/>

Document release process

  1. Update release notes and docs.
  2. Remove SNAPSHOT from build.xml
  3. $ ant dist
  4. Commit and push.
  5. Squash and merge
  6. Tag on Github with release
  7. Upload binaries to IBiblio
  8. $ gcloud app deploy ./dist/website/WEB-INF/appengine-web.xml --project=xom-website --no-promote --version=TAG
  9. Set up snapshot for next release in build.xml. Commit and push. Squash and merge
  10. Migrate site in cloud console

Test failure: testISO2022JP

In nu.xom.tests.EncodingTest:

expected:<128> but was:<65311>

junit.framework.AssertionFailedError: expected:<128> but was:<<65311>
at nu.xom.tests.EncodiingTest.checkAll(Unknown Source)
at nu.xom.tests.EncodingTest..testISO2022JP(Unknown Source)

Test failure: testRelativeURIResolutionAgainstARedirectedBase

In nu.xom.tests.BaseURITest:

expected:<...cafeconleche.org...> but was:<...ibiblio.org/xml...>

junit.framework.ComparisonFailure: expected:<...cafeconleche.org...> but was:<...ibiblio.org/xml....>
at nu.xom.tests.BaseURITest.testRelativeURIResolutionAgainstARedirectedBase(Unknown Source)

Not the most helpful message there. I suspect this one is down to something having changed on the server since these tests were written.

Back out Billion Laughs protection

Before the release of 1.2.11 I'm thinking about backing out the experimental limits on document memory sizes; that is, billion laugh protection.

As best I can tell this doesn't truly work. It will catch some problems, but can be bypassed by a clever attacker. I'd rather not provide a false sense of security, and I think this can be better addressed at the parser level using techniques like XMLConstants.FEATURE_SECURE_PROCESSING

Clarify minimum Java version

README says

It requires Java 1.2 or later.

But the Ant file uses Java 1.6 for compiling. Documentation should be updated to correctly define the minumum Java version.

Add Maven documentation

It would be good to add some additional Maven documentation to the project files to tell which version should be gotten from where.

Currently you have xom : xom containing up to 1.2.5 and com.io7m.xom : xom containing just 1.2.10 in the central Maven repository. Combined with 1.2.10 missing here on GitHub (issue #1) it's doesn't give developers the assurance which version they should get from where. For me it feels like 1.2.10 has now been added by a 3rd party, which makes it less trustworthy.

Invalid artifact 1.3.0 in the Maven Central

Invalid artifact: 1.3.0 in the Maven Central.
Downloaded jar file has the wrong content. It has no packages with class files inside. Instead, it contains different jars (with classes, sources, javadocs).

Test failure: testIllegalIP6Addresses

In nu.xom.tests.VerifierTest:

Allowed illegal IPv6 address: ::FFFF:129.144.52.+22

junit.framework.AssertionFailedError: Allowed illegal IPv6 address: ::FFFF:129.1144.52.+22
at nu.xom.tests.VerifierTest.testIIllegalIP6Addresses(Unknown Source)

Add location information to Node class

It would be useful to have optional location information (line, column, character offset int fields, and ideally base URI or some way to acquire it) on the Node class which would allow better diagnostics to be presented to users in the event of an application error.

Such a solution should be resilient with respect to XInclude, so that a node which is examined would have the location information of the physical document that the Node in question originated from. Also it would be useful to be able to recursively list the include points in this case so the user can understand exactly how the document was included. Making the location information mutable might also be handy in cases where a XOM tree is built up programmatically.

This would also be useful in creating adapters between StAX and XOM, which is an idea I'm tinkering with at the moment.

The memory overhead should be very minimal (four fields totalling around 16-20 bytes per Node typically).

WDYT?

tag for 1.2.10

Seem to be missing a tag for 1.2.10/XOM_1210 maybe 1f1d628, git tag XOM_1210 1f1d628 thank you! :)

ICU4J license

ICU4J is no longer used in XOM. Pull the license off the website.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.