elharo / xom Goto Github PK

XOM™ is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order.

Home Page: https://xom.nu

License: Other

HTML 8.80% XSLT 0.49% Java 90.71%

xml xpath xslt xinclude java

xom's Introduction

XOM: a new XML object model

XOM™ is a new XML object model. It is an open source (LGPL), tree-based API for processing XML with Java that strives for correctness, simplicity, and performance, in that order. It includes built-in support for a number of XML technologies including Namespaces in XML, XPath 1.0, XSLT 1.0, XInclude, xml:id, xml:base, Canonical XML, and Exclusive Canonical XML. XOM documents can be converted to and from SAX and DOM.

XOM is designed to be easy to learn and easy to use. It works very straight-forwardly, and has a very shallow learning curve. Assuming you're already familiar with XML, you should be able to get up and running with XOM very quickly.

XOM is the only XML API that makes no compromises on correctness. XOM only accepts namespace well-formed XML documents, and only allows you to create namespace well-formed XML documents. (In fact, it's a little stricter than that: it actually guarantees that all documents are round-trippable and have well-defined XML infosets.) XOM manages your XML so you don't have to. With XOM, you can focus on the unique value of your application, and trust XOM to get the XML right.

XOM is fairly unique in that it is a dual streaming/tree-based API. Individual nodes in the tree can be processed while the document is still being built. The enables XOM programs to operate almost as fast as the underlying parser can supply data. You don't need to wait for the document to be completely parsed before you can start working with it.

XOM is very memory efficient. If you read an entire document into memory, XOM uses as little memory as possible. More importantly, XOM allows you to filter documents as they're built so you don't have to build the parts of the tree you aren't interested in. For instance, you can skip building text nodes that only represent boundary white space, if such white space is not significant in your application. You can even process a document piece by piece and throw away each piece when you're done with it. XOM has been used to process documents that are gigabytes in size.

The current version of XOM is 1.3.9 and is backwards compatible with 1.2, 1.1 and 1.0. You should not need to recompile any code to upgrade to 1.3.9. XOM is believed to be quite stable and robust. Future releases should be backwards compatible with the 1.0 API for the foreseeable future.

Adding XOM to your build

XOM's Maven group ID is xom and its artifact ID is xom. To add a dependency on XOM using Maven, add this dependency element to your pom.xml:

<dependency>
  <groupId>xom</groupId>
  <artifactId>xom</artifactId>
  <version>1.3.9</version>
</dependency>

To add a dependency using Gradle:

dependencies {
  compile 'xom:xom:1.3.9'
}

Dependencies

XOM is not complete unto itself. It depends on an underlying SAX parser to read documents and feed the data into a tree structure. While theoretically any SAX2 compliant parser should work, Xerces 2.6.1 and later is the only one that I am fairly confident does work. Xerces 2.8.0 is included with the full distribution. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). Piccolo 1.0.3, Crimson, GNU JAXP 1.0b1, the Oracle XML Parser for Java 9.2.0.2.0D and 9.2.0.5.0, and Xerces versions prior to 2.6.1 all have bugs that prevent them from doing what XOM needs them to do. (Note to XML parser vendors: XOM's test suite gives parsers a very thorough workout, and delves into some of the more obscure parts of the XML spec that many parsers get wrong. You could do a lot worse for testing than making sure all the XOM unit tests pass when using your parser.)

Similarly XSLT support depends on a TrAX processor. XInclude and XML canonicalization, however, are native.

Learning More

If you'd like to know more about XOM, I suggest starting with the tutorial. XOM also includes a large collection of small sample programs that demonstrate various parts of the library. If you're curious about why XOM is the way it is, or if you would like to suggest future directions for XOM, you should read the design principles on which XOM is based. If you have a question about XOM that is not answered in the API documentation or the FAQ, you can ask it on Stack Overflow or the xom-interest mailing list. You do not need to be subscribed to post, but non-subscriber questions are moderated. (Due to increasing amounts of non-subscriber spam, it is possible non-subscriber questions are missed. If you don't get an answer, please subscribe and try again.)

xom's People

Contributors

Stargazers

Watchers

xom's Issues

Inlcude line and column number in ParsingException message if available

That is, change "The content of elements must consist of well-formed character data or markup."
to "The content of elements must consist of well-formed character data or markup" at line 32, column 76" or some such.

Prolog control

Consider adding the ability to control whether a prolog is added and what that prolog contains. I process XML using Xalan callouts from XSLT into the Java tier. I have functions which iterate and generate XML using BaseX. Each loop would generate an XML which is concatenated into a output string. I only need the prolog on the first and not each iteration. I created an overload to handle this. This product was great to get away from the extremely slow Java built into DOM processing. Excellent workl

Update Bundle-RequiredExecutionEnvironment

      	<attribute name="Bundle-RequiredExecutionEnvironment" value="JavaSE-1.6"/>

Add FAQ and tutorial to top list on home page

sign the jar

Better build for docs

Maybe even based on maven

EncodingTest and GenericWriter can now use the Charset class

That is, get rid of this hack:

    private static boolean charsetAvailable(String name) {
        // hack to avoid using 1.4 classes
        try {
            "d".getBytes(name);
            return true;
        }
        catch (UnsupportedEncodingException ex) {
            return false;   
        }        
        
    }

What are the official dependency coordinates for XOM?

Let's say I'm writing a non-toy project, and want to consume XOM. I'm using Ant+Ivy, Maven, Gradle, SBT, etc. ... What coordinates do I use to import XOM?

Those build tools declare external dependencies using coordinates. Manually dropping an external JAR into source control is not easily supported by most build tools and would be frowned upon by most consuming projects.

If a goal is to make the project consumable, it needs to be in a Maven repo. I want to stress that this doesn't mean consumers must use "Maven the build tool" when consuming the JAR from a "Maven-layout repository".

Maven Central and Bintray JCenter are the 2 most popular repositories.

XOM is great and I'd like to use it but I can't because of the current lack of standard publishing practices.

Update xalan dep to 2.7.2

Having used the ossindex-maven-plugin on a maven project that uses xom 1.2.10 (via com.io7m.xom on Maven central), I see the following warning:

mvn compile net.ossindex:ossindex-maven-plugin:audit
...snip...

[ERROR] xalan:xalan:2.7.0  [VULNERABLE]
[ERROR]   required by com.io7m.xom:xom:1.2.10
[ERROR] 1 known vulnerabilities, 1 affecting installed version
[ERROR] 
[ERROR] [CVE-2014-0107]  Permissions, Privileges, and Access Controls
[ERROR] https://ossindex.net/resource/cve/359764
[ERROR] The TransformerFactory in Apache Xalan-Java before 2.7.2 does not properly restrict access to certain properties when FEATURE_SECURE_PROCESSING is enabled, which allows remote attackers to bypass expected restrictions and load arbitrary classes or access external resources via a crafted (1) xalan:content-header, (2) xalan:entities, (3) xslt:content-header, or (4) xslt:entities property, or a Java property that is bound to the XSLT 1.0 system-property function.
[ERROR] 
[ERROR] --------------------------------------------------------------
[ERROR]

Setup continuous integration

Once all tests are passing again.

Bundle API docs with website

ant dist does not automatically generate them and put them in the right place

XSLTransform.toDocument should allow whitespace only text nodes

The result of a transformation is a XOM Nodes object.
The Nodes list returned by the transform method
may contain zero, one, or more than one node, depending on what the stylesheet
produced. After all, there’s no guarantee that an XSL transformation produces a well-formed XML document.
Sometimes it only produces a well-balanced document fragment, and sometimes it produces nothing at all.
However, many stylesheets do produce well-formed XML documents.
XSLTransform includes a static toDocument
utility method that converts a Nodes object into a Document object.
However, if the Nodes passed to this method contains no elements, more than one element, or any Text
objects, then toDocument throws an XMLException.
For example,

Test failure: testXIncludeTestSuite

In nu.xom.tests.XIncludeTest:

White spaces are required between publicId and systemId.

nu.xom.ParsingException: White spaces are required between publicId and systemId. at line 1, column 50 in http://dev.w3.org/cvsweb/~checkout~/2001/XInclude-Test-Suite/testdescr.xml?content-type=text/plain&only_with_tag=HEAD
at nu.xom.Builder.build(Unknown Source)
at nu.xom.Builder.build(Unknown Source)
at nu.xom.tests.XIncludeTest.testXIncludeTestSuite(Unknown Source)
Caused by: org.xml.sax.SAXParseException; systemId: http://dev.w3.org/cvsweb/~checkout~/2001/XInclude-Test-Suite/testdescr.xml?content-type=text/plain&only_with_tag=HEAD; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
at org.apache.xerces.parsers.AbstrractSAXParser.parse(Unknown Source)

Suspect this one is down to bit rot as well: Perhaps something changed on w3.org?

Test that index.html and other html files in website are valid XHTML

Delete maven1 target in build.xml

javadoc: warning - Multiple sources of package comments found for package

[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.tests"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.canonical"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.converters"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.xinclude"
[javadoc] javadoc: warning - Multiple sources of package comments found for package "nu.xom.xslt"

Merge excludes from build.xml and .gitignore

    <property name="excludes"       value=".clover, .DS_Store, **/.DS_Store, **/.AppleFileInfo, **/*.zip, **/.thumbnails/**, clover_html/**, clover/**, xom.gif, data/XInclude-Test-Suite/**, data/xmlconf/**, data/canonical/xmlconf/**, data/oasis*/**, **/testresults/**, **/pantry/**, **/workspace/**, **/junit*properties, **/.nautilus-metafile.xml, website/**, **/.project, **/.classpath, build/**, dist/**, .settings/**, lib2/**, xom.fb, jester*, trademark*"/>

web site still points to java.net

revise to point to github

Remove docs dependence on Saxon 6.5.5

I.e. use XOM to do the transforms

Document release process

Update release notes and docs.
Remove SNAPSHOT from build.xml
$ ant dist
Commit and push.
Squash and merge
Tag on Github with release
Upload binaries to IBiblio
$ gcloud app deploy ./dist/website/WEB-INF/appengine-web.xml --project=xom-website --no-promote --version=TAG
Set up snapshot for next release in build.xml. Commit and push. Squash and merge
Migrate site in cloud console

Make document size feature optional

That is, no limit by default. Not sure this actually works.

Test failure: testISO2022JP

In nu.xom.tests.EncodingTest:

expected:<128> but was:<65311>

junit.framework.AssertionFailedError: expected:<128> but was:<<65311>
at nu.xom.tests.EncodiingTest.checkAll(Unknown Source)
at nu.xom.tests.EncodingTest..testISO2022JP(Unknown Source)

Test failure: testRelativeURIResolutionAgainstARedirectedBase

In nu.xom.tests.BaseURITest:

expected:<...cafeconleche.org...> but was:<...ibiblio.org/xml...>

junit.framework.ComparisonFailure: expected:<...cafeconleche.org...> but was:<...ibiblio.org/xml....>
at nu.xom.tests.BaseURITest.testRelativeURIResolutionAgainstARedirectedBase(Unknown Source)

Not the most helpful message there. I suspect this one is down to something having changed on the server since these tests were written.

Back out Billion Laughs protection

Before the release of 1.2.11 I'm thinking about backing out the experimental limits on document memory sizes; that is, billion laugh protection.

As best I can tell this doesn't truly work. It will catch some problems, but can be bypassed by a clever attacker. I'd rather not provide a false sense of security, and I think this can be better addressed at the parser level using techniques like XMLConstants.FEATURE_SECURE_PROCESSING

Clarify minimum Java version

README says

It requires Java 1.2 or later.

But the Ant file uses Java 1.6 for compiling. Documentation should be updated to correctly define the minumum Java version.

Update web page

Need to upload revised web page to xom.nu

Add Maven documentation

It would be good to add some additional Maven documentation to the project files to tell which version should be gotten from where.

Currently you have xom : xom containing up to 1.2.5 and com.io7m.xom : xom containing just 1.2.10 in the central Maven repository. Combined with 1.2.10 missing here on GitHub (issue #1) it's doesn't give developers the assurance which version they should get from where. For me it feels like 1.2.10 has now been added by a 3rd party, which makes it less trustworthy.

xom/classes15/nu/xom/ isn't needed

No longer need to support pre Java 1.5. This can be moved into the main code base.

http://www.xom.nu

Make sure www.xom.nu works or redirects to xom.nu and update internal links.

Update the xom entry in Maven Central

The Xom artifact in maven central is from 2010(1.2.5) while the current version is 1.2.11.
Please update the version in the maven central.

Document release process in wiki

Make integration tests work without manual setup and download

Maybe use ant get tasks to load relevant data

Update email address across codebase

metalab.unc.edu --> ibiblio.org

Remove ICU4J from docs

and licenses. It hasn't been included since 1.1.

Consider adding Path variants to methods that currently take Files

E.g. in Builder et al.

This requires java 1.7 as a minimum version.

Make API Doc part of website build

sample links are broken on website

They point to java.net. Need to point to github.

Update XSLTransform javadoc

to reflect more recent JDKs

Invalid artifact 1.3.0 in the Maven Central

Invalid artifact: 1.3.0 in the Maven Central.
Downloaded jar file has the wrong content. It has no packages with class files inside. Instead, it contains different jars (with classes, sources, javadocs).

Commit App Engine config files to repo

web.xml, appengine-web.xml etc.

Clover integrations

Now that clover is open source make it a standard part of the build.

Add instructions for importing project into Eclipse

Import Existing project (not ant project)
Configure Java versions
Configure warnings.

Third party license links are broken

on website

Test failure: testIllegalIP6Addresses

In nu.xom.tests.VerifierTest:

Allowed illegal IPv6 address: ::FFFF:129.144.52.+22

junit.framework.AssertionFailedError: Allowed illegal IPv6 address: ::FFFF:129.1144.52.+22
at nu.xom.tests.VerifierTest.testIIllegalIP6Addresses(Unknown Source)

Add location information to Node class

It would be useful to have optional location information (line, column, character offset int fields, and ideally base URI or some way to acquire it) on the Node class which would allow better diagnostics to be presented to users in the event of an application error.

Such a solution should be resilient with respect to XInclude, so that a node which is examined would have the location information of the physical document that the Node in question originated from. Also it would be useful to be able to recursively list the include points in this case so the user can understand exactly how the document was included. Making the location information mutable might also be handy in cases where a XOM tree is built up programmatically.

This would also be useful in creating adapters between StAX and XOM, which is an idea I'm tinkering with at the moment.

The memory overhead should be very minimal (four fields totalling around 16-20 bytes per Node typically).

WDYT?

elharo / xom Goto Github PK

xom's Introduction

XOM: a new XML object model

Adding XOM to your build

Dependencies

Learning More

Links

xom's People

Contributors

Stargazers

Watchers

Forkers

xom's Issues

Recommend Projects

Recommend Topics

Recommend Org