ikorin24 / u8xmlparser Goto Github PK
View Code? Open in Web Editor NEWExtremely fast UTF-8 xml parser library
License: MIT License
Extremely fast UTF-8 xml parser library
License: MIT License
I am trying to get the outer xml of a node, current it looks like you only have InnerText. is this something that could be added?
EDIT:
An example would be
<Messages>
<Message Id="1">
<Name>Foo</Name>
</Message>
<Message Id="2">
<Name>Foo</Name>
</Message>
</Messages>
And the result of outerxml selecting the first message in Messages.Children
should be:
<Message Id="1">
<Name>Foo</Name>
</Message>
Hi,
We write a C# version of MJML rendering engine for HTML email. MJML is an XML language.
I am trying out your library, but when I open this file, I get an FormatException without any further details:
https://github.com/SebastianStehle/mjml-test/blob/main/TestRunner/ManyHeroes.mjml
Describe the bug:
If there is a comment at the end of the xml, parsing will fail.
Environment:
library version: 1.6.0
.NET version: .NET6
OS: Windows10
Steps to Reproduce:
Execute the following code.
using var xml = XmlParser.Parse(
@"<foo></foo>
<!-- comment -->");
Expected behavior:
No errors.
Actual Behavior:
A FormatException is thrown.
Message:
"(line 2, char 1): Xml does not have multiple root nodes."
It would be great to have XmlValidation as part of this library using XmlSchemaSet
.
Currently (using XDocument):
var schemas = new XmlSchemaSet();
schemas.Add("<namespace>", XmlReader.Create("<XsdLocation>"));
var xml = "<test></test>";
var doc = XDocument.Parse(xml);
doc.Validate(_schemas, ValidationCallBack);
private void ValidationCallBack(object? sender, ValidationEventArgs e)
{
...
}
System.Xml.Schema.Extensions.Validate
I'm not sure how the api would look, but I would prefer a returned object from Validate
instead of a callback
Afaik the normal xml is totally valid.
<div>
<strong>Hello</strong> U8XMLParser
</div>
But in this case I have no idea how to get the inner xml.
I am currently trying to parse this xml file:
<?xml version="1.0"?>
<!DOCTYPE datafile PUBLIC "-//FB Alpha//DTD ROM Management Datafile//EN" "http://www.logiqx.com/Dats/datafile.dtd">
<datafile>
</datafile>
A FormatException
is being thrown when parsing the DOCTYPE element.
System.FormatException
HResult=0x80131537
Message=Exception of type 'System.FormatException' was thrown.
Source=U8XmlParser
StackTrace:
at U8Xml.XmlParser.<TryParseDocType>g__SkipUntil|17_0(Byte ascii, RawString data, Int32& i)
at U8Xml.XmlParser.TryParseDocType(RawString data, Int32& i, Boolean hasNode, OptionalNodeList optional, RawStringTable& entities)
at U8Xml.XmlParser.StartStateMachine(RawString data, CustomList`1 nodes, CustomList`1 attrs, OptionalNodeList optional, RawStringTable& entities)
at U8Xml.XmlParser.ParseCore(UnmanagedBuffer& utf8Buf, Int32 length)
at U8Xml.XmlParser.ParseFileCore(String filePath, Encoding encoding)
at U8Xml.XmlParser.ParseFile(String filePath, Encoding encoding)
at U8Xml.XmlParser.ParseFile(String filePath)
Tested on U8XmlParser v1.5.0
According to the XML specs entities need to be registered, e.g. this is not valid XML:
<?xml version="1.0" encoding="UTF-8"?>
<SomeData>
<Data>©</Data>
</SomeData>
You have to register these entities:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SomeData[
<!ENTITY copy "©;">
]>
<SomeData>
<Data>©</Data>
</SomeData>
RawString.StartsWith
and RawString.EndsWith
treats any unpaired surrogate in argument string as "�".
So...
[Fact]
public unsafe void UnpairedSurrogateComparison()
{
// "\ufffd" == "�" It is the default fallback character for UTF8Encoding
const string FallbackCharStr = "\ufffd";
// "\ud83d" is one of the surrogate
const string SurrogateCharStr = "\ud83d";
var fallbackCharUtf8Bytes = Encoding.UTF8.GetBytes(FallbackCharStr);
fixed(byte* ptr = fallbackCharUtf8Bytes) {
var fallbackCharRawStr = new RawString(ptr, fallbackCharUtf8Bytes.Length);
Assert.False(fallbackCharRawStr.StartsWith(SurrogateCharStr));
Assert.False(fallbackCharRawStr.EndsWith(SurrogateCharStr));
}
}
This kind of test fails.
This is just another example for an invalid XML that is accepted by this library:
<?xml version="1.0" encoding="UTF-8"?>
<SomeData>
<Foo url="http://google.com?quer1=1&query2=2"></Foo>
</SomeData>
Describe the bug:
Intermittent bug when trying to get an attribute from a node while in an IEnumerable<XmlNode>
from XmlNodeDescendantList
Our code for reference:
var type = _root.Descendants
.FirstOrDefault(node => node.Name != "xs:attribute" && GetAttribute<string>(node, "name") == inheritedTypeName)
is { IsNull: false } type
? Visit(type) with { Name = name ?? inheritedTypeName }
: null;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static T? GetAttribute<T>(XmlNode? node, string name)
{
if (node is null || !node.Value.TryFindAttribute(name, out var attribute)) // fails here
return default;
var value = attribute.Value.ToString();
if (string.IsNullOrWhiteSpace(value) || value is "unbounded")
return default;
return (T?)TypeDescriptor.GetConverter(typeof(T)).ConvertFromString(value);
}
Environment:
library version: 1.6.1
.NET version: .NET6 6.0.13
OS: Windows10
Steps to Reproduce:
call node.TryFindAttribute(<string value>, out var attribute)
Expected behavior:
Return the attribute
Actual Behavior:
System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at U8Xml.XmlAttributeEnumerableExtension.FindOrDefault[[U8Xml.XmlAttributeList, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](U8Xml.XmlAttributeList, System.ReadOnlySpan`1<Byte>)
at U8Xml.XmlAttributeEnumerableExtension.FindOrDefault[[U8Xml.XmlAttributeList, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](U8Xml.XmlAttributeList, System.ReadOnlySpan`1<Char>)
at U8Xml.XmlAttributeEnumerableExtension.FindOrDefault[[U8Xml.XmlAttributeList, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](U8Xml.XmlAttributeList, System.String)
at U8Xml.XmlAttributeEnumerableExtension.TryFind[[U8Xml.XmlAttributeList, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](U8Xml.XmlAttributeList, System.String, U8Xml.XmlAttribute ByRef)
at U8Xml.XmlNode.TryFindAttribute(System.String, U8Xml.XmlAttribute ByRef)
at CMA.Common.Xml.Validation.Xsd.XsdParser.GetAttribute[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.Nullable`1<U8Xml.XmlNode>, System.String)
at CMA.Common.Xml.Validation.Xsd.XsdParser+<>c__DisplayClass12_0.<VisitAttribute>b__0(U8Xml.XmlNode)
at System.Linq.Enumerable.TryGetFirst[[U8Xml.XmlNode, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](System.Collections.Generic.IEnumerable`1<U8Xml.XmlNode>, System.Func`2<U8Xml.XmlNode,Boolean>, Boolean ByRef)
at System.Linq.Enumerable.FirstOrDefault[[U8Xml.XmlNode, U8XmlParser, Version=1.6.1.0, Culture=neutral, PublicKeyToken=null]](System.Collections.Generic.IEnumerable`1<U8Xml.XmlNode>, System.Func`2<U8Xml.XmlNode,Boolean>)
Your example was just throwing FormatException
There is a question mark missing and you just get a FormatException without any explanation. It is hard to find issues like that.
<?xml version="1.0" encoding="UTF-8">
<SomeData>
<Data aa="20">bbb</Data>
<Data aa="30">ccc</Data>
</SomeData>
Would be great to have an (optional) position (line-number + column) for each element.
Hey @ikorin24 ,
What an amazing job, you're working with unmanaged code. =)
A question, what would be the most suitable way to append/concatenation the RawString's? I haven't found any function that could already do this.
Thank,
Cheers.
Hi the idea is to use this XmlReader to deserializer entire poco objects using Source Generators for .net 5+
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.