Coder Social home page Coder Social logo

amara3-xml's People

Contributors

distobj avatar uogbuji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

amara3-xml's Issues

Implement an XML builder

Came across the now archived XML Witch and thought it might be handy to have something like this in Amara. Might even be good to support async with for it, for cases of XML building interleaved with I/O.

Release 3.4.0

Release notes (update in place, for easy cut & paste):

  • MicroXpath: Fix selections from // and * axes
  • HTML5: Fix treatment of comment nodes
  • Code cleanup (e.g. formatting & avoiding HumpCase class names)
  • Parse URL sources directly from the microx command line

Migrate to Oori Data and reclaim the amara PyPI project

The Amara saga continues! I don't exactly remember why I decided to dead end the Amara PyPI project when it hit 2.0, but I moved to a series of Amara 3 generation projects (amara3.iri, amara3.xml & amara3-names). Those were far more lone wolf efforts, but at Oori Data we're seeing a lot of need for the sorts of capability that's inchoate in Amara 3.

Time to re-consolidate the Amara projects, call it a 4th generation, and move it to Oori Data GitHub.

Note: Pip & PyPI are case insensitive. PEP 426 says All comparisons of distribution names MUST be case insensitive, and MUST consider hyphens and underscores to be equivalent. An amara 4.0.0a1 package will seamlessly supersede Amara 2.0.0

Erroneous processing of MicroXPath // & ancestor axis (also * name tests)

from amara3.uxml import html5
from amara3.uxml.treeutil import descendants, select_elements
from amara3.uxml import xml
from amara3.uxml.treeutil import *
from amara3.uxml.tree import *
from amara3.uxml.uxpath import context as xpathcontext, parse as xpathparse

import requests
resp = requests.get('http://garybyker.library.link/resource/5bLglR2qVao/')
root = html5.parse(resp.text)
xpathctx = xpathcontext(root)
ALL_IMAGES = xpathparse('//img')
it = ALL_IMAGES.compute(xpathctx)
i = next(it) #StopIteration
# root => {uxml.element (-9223363273091541553) "html" with 3 children}
# root.xml_children[2] => {uxml.element (-9223363273092099405) "body" with 10 children}

X = xpathparse('/html//img')
it = X.compute(xpathctx)
n = next(it)

Last 3 lines result in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/uche/.local/pyenv/main/lib/python3.6/site-packages/amara3/uxml/uxpath/ast.py", line 313, in compute
    yield from self.relative.compute(new_ctx)
  File "/home/uche/.local/pyenv/main/lib/python3.6/site-packages/amara3/uxml/uxpath/ast.py", line 234, in compute
    yield from self.right.compute(new_ctx)
  File "/home/uche/.local/pyenv/main/lib/python3.6/site-packages/amara3/uxml/uxpath/ast.py", line 379, in compute
    to_process = list(child.xml_children) + to_process[1:]
AttributeError: 'comment' object has no attribute 'xml_children'

Note alternative works fine

imgs = [ e for e in descendants(root) if e.xml_name == 'img' ]
# imgs => [{uxml.element (8763762573951) "img" with 0 children}, {uxml.element (8763762576069) "img" with 0 children}, {uxml.element (8763762579471) "img" with 0 children}, {uxml.element (-9223363273092194223) "img" with 0 children}, {uxml.element (8763762583444) "img" with 0 children}, {uxml.element (-9223363273092209555) "img" with 0 children}]

Also not working:

PARENT_RESOURCE = xpathparse('ancestor::div[@class="thumbnail-holder"]/a/@href')
img = imgs[0]
imgctx = xpathcontext(img) #, force_root=False)
res = next(PARENT_RESOURCE.compute(imgctx), None)

Working alternative:

preparent, parent = img, img.xml_parent
while parent:
    if parent.xml_name == 'div' and 'thumbnail-holder' in parent.xml_attributes.get('class', ''):
        break
    preparent = parent
    parent = parent.xml_parent

uxml.writer not escaping attribute cdata?

I wrote some code using amara3.uxml to modify MARCXML records. I thought I'd be able to use a xmlter.sender coroutine for input and write it out with uxml.writer losslessly. That's not happening though, when it encounters character references on input, specifically the quot character reference in this case, it gets turned into a quote character, producing non-well-formed output.

Here's a script and some input data to reproduce

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.