Coder Social home page Coder Social logo

xml's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xml's Issues

Break XML::Document (and others) into separate files

Would you oppose breaking the classes/roles into separate files so they can be exported from the module separately? I'm asking because I'd like to use only XML::Document and the other ::Node objects in the NativeCall(Expat) version of this module that I'm currently working on

The return value from appending a node is unexpected

When I iteratively append new nodes to a node set, the return value of append() is the stringified concatenation of all of the nodes that have been appended up to that iteration. For example, given

for @array-of-nodes -> $node {
my $test = $nodeset.append($node);
say "$test";
}

The value output for $test on the first iteration is the first $node stringified, on the next iteration, it is the the string from the first iteration concatenated with the string from the second, etc.

great module

I am only using the from-file method, but this module allows me to get my dns data via the Namecheap.com API with no major problems. The only minor problem I had was that it was a bit of trial-and-error to see exactly how to extract the tidbits I needed.

Thanks for a very useful module!

Can not parse name with a peroids (.)

It failed to parse the name with a periods in it.

my $xml-part = "<name_contain.periods>some text</name_contain.periods>";
my $doc = XML::Grammar.parse($xml-part);
# return (Any) in $doc

I tried to change token in the grammar to:

token pident {
  <!before \d> [ \d+ <.ident>* || <.ident>+ ]+ % <[-.]>
}

It worked when I defined the grammar in the script.

Get xml text out unprocessed

When I do something like $xml-text ~= $xml-document.root; all text from XML::Text elements are processed in such a way that all whithespace is reduced to one space. For some elements like

 in html this is not desired. How can I change this? Or is another method needed in your package

Allow to insert before/after a node to proceed with the reference node is undefined

Currently the library won't allow you to insert before/after if the reference node not found in the parent element's node list, including undefined ones. I want to insert a node after a node found by some criteria, like tag name, and if none found, insert before/after first/last child, and if there are no nodes at all, insert it as first child. First two can be easily done with one-liner like $parent.insertAfter($node,$parent.elements(:TAG<item>).tail // $parent.lastChild()), but to fulfill the last you have to check for child nodes and call a whole different method, like insert to actually insert a node, complicating the code quite a bit.

Can't parse elements whose content is whitespace

XML can't parse documents where an element contains only whitespace. In t/parser.t, change $text to the following; note the <monkey> </monkey> element at the end:
my $text = '<test><title>The title</title><bullocks><item name="first"/><item name="second"/></bullocks><monkey> </monkey></test>';

Attempting to execute the tests will fail with could not parse XML. I've tried tweaking the textnode rule but I don't understand something about grammars well enough to make it work.

Tests Fail

$ perl6 -v
This is rakudo version 2015.11-316-ga4ca12a built on MoarVM version 2015.11-22-g6e4b90f implementing Perl v6.b.
$ panda install XML
==> Fetching XML
==> Building XML
==> Testing XML

# Failed test 'set using Boolean.'
# at t/emitter.t line 30
# expected: 'standalone'
#      got: 'True'

# Failed test 'element after set serialized properly'
# at t/emitter.t line 31
# expected: '<test><title alt="Alternate text">The title</title><bullocks standalone="standalone"><item name="first"/><item name="second"/></bullocks></test>'
#      got: '<test><title alt="Alternate text">The title</title><bullocks standalone="True"><item name="first"/><item name="second"/></bullocks></test>'
# Looks like you failed 2 tests of 5
t/emitter.t ........... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/5 subtests 
t/example.t ........... ok
t/make.t .............. ok
t/namespaces.t ........ ok
t/parser.t ............ ok
t/preamble.t .......... ok
t/proxies.t ........... ok
t/query-methods.t ..... ok
t/query-positional.t .. ok

Test Summary Report
-------------------
t/emitter.t         (Wstat: 512 Tests: 5 Failed: 2)
  Failed tests:  4-5
  Non-zero exit status: 2
Files=9, Tests=118,  7 wallclock secs ( 0.04 usr  0.01 sys +  6.58 cusr  0.34 csys =  6.97 CPU)
Result: FAIL
The spawned process exited unsuccessfully (exit code: 1)
  in sub run-and-gather-output at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/C27BE995DC18074CA8F64980F69FEB80BADF5619:86
  in block  at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/A40B6CBA2E85D9DAA45064316EBEB9E42B0036E1:24
  in sub indir at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/C27BE995DC18074CA8F64980F69FEB80BADF5619:20
  in method test at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/A40B6CBA2E85D9DAA45064316EBEB9E42B0036E1:5
  in method install at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/1BC9777EC40C29C8331437E926CD4C13B983C026:141
  in method resolve at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/sources/1BC9777EC40C29C8331437E926CD4C13B983C026:219
  in sub MAIN at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/resources/9FF75FC978A3556E531F982825B3EDBBBA834D9E:18
  in block <unit> at /home/zoffix/.rakudobrew/moar-nom/install/share/perl6/site/resources/9FF75FC978A3556E531F982825B3EDBBBA834D9E:145

HTML -> XML Doc

I want to use XPath in a Perl6 Web::Scraper - is there anything in the works for building the XML tree from an HTML file?

Pretty Printing when saving a xml

Hi, first thank you for the module.

I use it to dynamical update my Visual Studio project file and it works great. One think I think could be improved. After saving the project file the xml is all in one line. It does not be a problem, but in git and if you look at the file with a editor it is not very readable.

it is possible to implement a pretty printing parameter which adds new lines to the xml string when saving it?

Thanks

I don't think entities are being considered at all

I have an XML file with &amp; in it, and this is not decoded when I stringify the text node. The documentation doesn't mention XML entities at all, so I'm not sure whether to expect this or not.

<NAME>MARKS&amp;SPENCER</NAME>

say $doc.lookfor(:TAG('NAME') :SINGLE)[0]; # MARKS&amp;SPENCER

Although, one would at least expect a facility to decode it, which is also not provided.

Ideally, the user should never see an XML entity, and they would be encoded/decoded transparently along the way.

Accept IO::Path as filename

Raku uses IO::Path to represent filenames so it would make sense that functions accept an IO::Path wherever a filename is expected

[0] > my $svg = dir('.', test => *.ends-with('.svg'))[0];
"thumbnail.svg".IO
[1] > from-xml-file($svg)
Type check failed in binding to parameter '$file'; expected Str but got IO::Path (IO::Path.new("thumbn...)

Really bad performance

Hi!

I'm trying to parse a 52 MByte XML file and the performance is really bad.

I'm trying to follow the instructions and just doing:

my $XML = 'ec_inventory_en.xml';
sub MAIN(){
    my $xml = from-xml-file($XML);
}

This code will use more than 5Gbytes of memory [1], only one core is used [2] and it takes more than 3m30s (in comparison a perl version takes around 15 seconds to parse the file)

[1] - Reported by cat /proc/$PID/smaps | grep -i pss | awk '{Total+=$2} END {print Total/1024" MB"}'
[2] - htop image

Tests failing: Not enough positional parameters passed

Building/Installing XML.pm6 or other modules that run XML tests gives this kind of error:

Not enough positional parameters passed; got 0 but expected 1
in sub from-xml at lib/XML.pm6:1067
in block at t/example.t:10

lib/XML.pm6:1067 is this:
proto from-xml ($) is export {*}

no remove namespace method

Would be nice to have a remove namespace method. I can get by with

$el.attribs{"xmlns:xyz"}:delete;

Regards,
Marcel

XML prolog is not parsed correctly (patch not able to be attached)

If a XML input (file or string) has a XML prolog (<?xml.....) the version and encoding will have the quotes in them and on output will not parse correctly.

In addition the code looked like it would not handle a prolog with single quotes.

and
the standalone parameter is not supported (at least it does not look like it is.

I fixed the first two items BUT while I have 26 years of Perl coding I am just starting on Perl 6.
Attached is (At lease I am going to try and attach) a git patch file with two commits that:

  1. add a test (t/prolog.t) that tests out these issues and will fail on current master branch.
  2. fixes the Grammar file to handle the single quote.
  3. fixes XML.pm6 to remove the quotes from the version and encoding.

OK can't attach a patch file I will try and find a email address and send it to you.

Cannot parse valid file

I'm not sure at the moment where the parse failure is happening, but other validators confirm the file (from Unicode) is valid XML.

(rename file from ee.txt to ee.xml because GH doesn't like XML uploads for some reason).

I'll try to investigate further, but my guess is whatever is causing the problem is also causing problems with kn.xml, ks.xml, lo.xml, ml.xml, mr.xml, and yav.xml inside of the main/ directory for the CLDR repository.

ee.txt

Appending to XML file fails

It seems like the problem is in the module
The code:

#!/usr/bin/env perl6
use v6;
use XML;

my Str $log-file;

if %*ENV<MQ_LOG>:exists {
    $log-file = %*ENV<MQ_LOG>;
} else {
    $log-file = "$*HOME/.mq/log.xml";
}
unless $log-file.IO.e {
    spurt $log-file, make-xml('log', \('meta', :version<1>)).Str;
}
my XML::Document $log = from-xml-file($log-file);

sub MAIN {
    $log[3].append(make-xml('group',
                            :actual-length<1>,
                            :original-length<2>,
                            :max<3>,
                            :timestamp<4>,
                            :score<5>,
                            :level<6>
                           )
                  );
    say "Üks!";
    spurt $log-file, $log.Str;
    say "Kaks!";
}

The error:

Use of uninitialized value of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
  in block  at /home/ron/rakudo/install/share/perl6/site/sources/23A69E0485BA94AAA7B51C8E2892B44F68D5C5DF (XML::Element) line 774
Use of uninitialized value of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
  in block  at /home/ron/rakudo/install/share/perl6/site/sources/23A69E0485BA94AAA7B51C8E2892B44F68D5C5DF (XML::Element) line 774

Iteratively using replace() on nodes gives unexpected results

For example:

my @elements = $mydocument.root.nodes;
for @elements -> $element {
my $new-element = @new-elements.pop;
$mydocument.root.replace($element, $new-element);
}

seems to result in only the first and last $new-element ending up in the document. Sorry, I have not golfed this down further. I see a similar result with the replaceChild() syntax.

Node 'indexing' from root gives strange `mod 2` result (alternating populated/unpopulated lines)

While trying to reproduce some examples (from the test folder), I stumbled onto an indexing oddity.

Below, $xml.root[1], $xml.root[3], $xml.root[5], and $xml.root[7] return text.

Conversely, $xml.root[0], $xml.root[2], $xml.root[4], $xml.root[6], and $xml.root[8] return blank lines.

Is this canonical XML behavior? Thx.

~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.say;' ~/exemel_text.xml
<?xml version="1.0"?><root>
  <file>text1</file>
  <file>text2</file>
  <file>text3</file>
  <file>text4</file>
</root>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.say;' ~/exemel_text.xml
<root>
  <file>text1</file>
  <file>text2</file>
  <file>text3</file>
  <file>text4</file>
</root>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root[0].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[0].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[1].say;' ~/exemel_text.xml
<file>text1</file>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[2].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[3].say;' ~/exemel_text.xml
<file>text2</file>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[4].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[5].say;' ~/exemel_text.xml
<file>text3</file>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[6].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[7].say;' ~/exemel_text.xml
<file>text4</file>
~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[8].say;' ~/exemel_text.xml


~$ raku -MXML -e 'my $xml=open-xml($*ARGFILES.Str); $xml.root.[9].say;' ~/exemel_text.xml
(Any)
~$

Rakudo 2023.05 / MacOS;
XML:ver<0.3.3>:auth<zef:raku-community-modules>

@supernovus
@jonathanstowe

Using .perl loops forever

This might be considered a rakudo bug rather than an exemel bug, but the current built-in .perl method does not appear handle circular data well. Running .perl on an XML::Node never returns. In the meantime, it might be easy to create a version that outputs something like:

from-xml("...")

where the ... is replaced with the stringified representation of the XML::Node. It's not perfect, but at least it would work.

XML hangs in Web::Scraper

related to #63?

$ git clone https://github.com/tony-o/perl6-web-scraper
$ cd perl6-web-scraper
$ raku -I. -MXML -e "from-xml('t/data/s05.html'.IO.slurp)"

seems to hang, taking up a single CPU but never (in my limited patience) returning.

Mac M2:

$ sw_vers
ProductName:		macOS
ProductVersion:		14.1.2
BuildVersion:		23B92

Error if tagname ~~ / '-' \d /

This program:

use XML;
my $xml = from-xml-file('test.xml');

outputs an error when test.xml is like this (which passes the validation test here: http://www.w3schools.com/xml/xml_validator.asp):

<?xml version="1.0" encoding="UTF-8"?>
<test>
<greeting en="hello">world</greeting>
<for>
<item-1>Yes</item-1>
<item-1>No</item-1>
<item-1>Maybe</item-1>
<item-1>Who cares?</item-1>
</for>
</test>

but works fine if test.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<greeting en="hello">world</greeting>
<for>
<item-dash>Yes</item-dash>
<item-dash>No</item-dash>
<item-dash>Maybe</item-dash>
<item-dash>Who cares?</item-dash>
</for>
</test>

translation of single quotes in attribute values is undesired

Hi,
The translation of a single quote in an attribute value to &#39; is not helping in some situations. An example is: <img onclick="alert('clicked')" />. In these cases the JavaScript is rendered unusable if it is translated into <img onclick="alert(&#39;clicked&#39;)" />.

Regards,
Marcel

When attribute matcher is a code object no attribute case must be considered too.

Say, we have a table row where some columns has style attribute and some doesn't. We need these where the attribute doesn't contain a substring:

$tr.lookfor(:TAG<td>, :style{ ! (.defined && .contains("display:none")) });

The problem here is that the code sees :style and skips all nodes where it is missing.

I think, the right apporach for a code match must be passing in Nil for every missing style attribute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.