Coder Social home page Coder Social logo

Comments (75)

rmzelle avatar rmzelle commented on May 10, 2024

I can only reproduce this on Windows with Google Chrome, by the way. I don't get the error when submitting a file on OS X with Firefox or Chrome.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Another user can reproduce this on Windows as well, while things work correctly on Linux. See citation-style-language/csl-validator#6 (comment)

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

We always use these settings: https://validator.nu/?schema=https%3A%2F%2Fgithub.com%2Fcitation-style-language%2Fschema%2Fraw%2Fv1.0.1%2Fcsl.rnc&parser=xml&laxtype=yes&showsource=yes
but recently (somewhere since November) the File Upload option stopped working. It now gives the error: IO Error: Non-XML Content-Type: text/x-csl., so it looks like the "Be lax about HTTP Content-Type" check-box (laxtype=yes) is being ignored.
I can only reproduce this on Windows with Google Chrome, by the way. I don't get the error when submitting a file on OS X with Firefox or Chrome.
Another user can reproduce this on Windows as well, while things work correctly on Linux. See citation-style-language/csl-validator#6

I suspect this is caused by some JavaScript in the validator frontend code failing to execute as expected.

So, if possible, to help troubleshoot this, can you please open the console in Chrome's devtools feature (e.g., by right-clicking and choosing "Inspect Element") and check if it's reporting any JavaScript errors?

I'd expect it might be reporting an error from https://validator.nu/script.js somewhere either in lines 272-292 or in lines 311-319.

Looking at those lines you'll see functions named formSubmission & maybeMoveDocumentRowDown.

The maybeMoveDocumentRowDown function is a hack that—if it works as expected when you check a document using the "file upload" feature—ensures that during form submission, in the multipart/form-data HTTP POST request that gets sent with the uploaded file, the file field is the last field sent.

But, if the laxtype=yes field ends up included in the request after the file field instead of before it, then I think the result would be pretty much what you've reported here: It'd fail because the validator backend basically won't properly recognize the laxtype=yes field as having been sent in the request at all.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I'm not getting any output in the dev console (in Chrome or Firefox) when I get the validation error. Is the frontend for validator.nu under version control?

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

(also, we're using a custom frontend for validator.nu at http://validator.citationstyles.org/; if you're interested, I'd be happy to help you switch to a Bootstrap-based frontend)

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I'm not getting any output in the dev console (in Chrome or Firefox) when I get the validation error. Is the frontend for validator.nu under version control?

Yes. The JS code is at https://github.com/validator/validator/blob/master/site/script.js

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

(also, we're using a custom frontend for validator.nu at http://validator.citationstyles.org/;

If your custom frontend code isn't causing the file field to be sent as the last field in the multipart/form-data HTTP POST request that gets sent when checking file upload, then I think that's what's causing the problem you reported. Basically, you need to ensure that the part of the form that sets the file field is the last part of the form in document order—either by putting it last in document order in your form, or by using DOM manipulation to move it last before the form actually gets submitted.

if you're interested, I'd be happy to help you switch to a Bootstrap-based frontend

Thanks but no thanks 😄 I'm pretty sure @hsivonen would agree we don't want to switch.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

The problem occurs both at https://validator.nu/ and at our custom frontend. Our users will use one or the other website, so it would still be good to have it fixed on your site as well.

Any ideas why the OS seems to play a role here?

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Based on your suggestion, I tried editing our frontend, but I'm still getting an error.

The original code (see https://github.com/citation-style-language/csl-validator/blob/gh-pages/libraries/csl-validator.js#L144 ), which already defined the "laxtype" field first:

        var formData = new FormData();
        formData.append("schema", schemaURL);
        formData.append("parser", "xml");
        formData.append("laxtype", "yes");
        formData.append("level", "error");
        formData.append("out", "json");
        formData.append("showsource", "yes");

        if (sourceMethod == "textarea") {
            formData.append("content", documentContent);
        } else {
            formData.append("file", documentContent);
        }

gives "Non-XML Content-Type: “text/x-csl”."

The modified code

        var formData = new FormData();
        formData.append("schema", schemaURL);
        formData.append("parser", "xml");
        formData.append("level", "error");
        formData.append("out", "json");
        formData.append("showsource", "yes");

        if (sourceMethod == "textarea") {
            formData.append("content", documentContent);
        } else {
            formData.append("file", documentContent);
        }
        formData.append("laxtype", "yes");

gives "Non-XML Content-Type: “text/plain”."

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

The problem occurs both at https://validator.nu/ and at our custom frontend. Our users will use one or the other website, so it would still be good to have it fixed on your site as well.

From what I've seen so far, I doubt there's any way to fix this either in the code for the https://validator.nu/ frontend or in the code for your custom frontend. In both cases the request the frontend code is sending is formatted in the way the validator backend expects it. So it seems like the cause of the problem is that the request is getting mangled somewhere after that.

Any ideas why the OS seems to play a role here?

No ideas. When I saw that the report saying it was happening only in Chrome on Windows, I first thought it probably had to just be a browser bug in that version of Chrome. But then I read the comment at citation-style-language/csl-validator#6 where @gracile-fr said:

I have this issue on Win7 with both Firefox and chrome. On Ubuntu 12.04 LTS with Firefox, no problem.

…which would seem to indicate it's not a browser-specific bug.

@rmzelle Have you tried it with Firefox on Windows yourself?

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Based on your suggestion, I tried editing our frontend, but I'm still getting an error.

The original code (see https://github.com/citation-style-language/csl-validator/blob/gh-pages/libraries/csl-validator.js#L144 ), which already defined the "laxtype" field first:

        var formData = new FormData();
        formData.append("schema", schemaURL);
        formData.append("parser", "xml");
        formData.append("laxtype", "yes");
        formData.append("level", "error");
        formData.append("out", "json");
        formData.append("showsource", "yes");

        if (sourceMethod == "textarea") {
            formData.append("content", documentContent);
        } else {
            formData.append("file", documentContent);
        }

That all looks right—in that I think it will format the request in way the validator backend expects it, with the file field last.

gives "Non-XML Content-Type: “text/x-csl”."

The modified code

        var formData = new FormData();
        formData.append("schema", schemaURL);
        formData.append("parser", "xml");
        formData.append("level", "error");
        formData.append("out", "json");
        formData.append("showsource", "yes");

        if (sourceMethod == "textarea") {
            formData.append("content", documentContent);
        } else {
            formData.append("file", documentContent);
        }
        formData.append("laxtype", "yes");

Yeah that's wrong and I would expect it to fail. Sorry if I wasn't clear, but I wasn't suggesting that laxtype=yes should be last. It shouldn't be. The file field should be (as it is in your existing code).

gives "Non-XML Content-Type: “text/plain”."

Yeah, that's actually exactly what I'd expect to see in this case if laxtype=yes is sent after the file field. Because in that case, the validator never properly sees laxtype=yes at all, so the first thing that fails is the step on the validator backend where it goes to fetch the RelaxNG schema from https://github.com/citation-style-language/schema/raw/v1.0.1/csl.rnc and fails with an error because the csl.rnc file is being served from that github URL with a Content-Type: text/plain header instead of with the expected Content-Type: application/relax-ng-compact-syntax header.

(The "Non-XML" part of the error message is misleading here. It's in a part of the code that's (re)used both to check for correct XML content-type headers and for the correct application/relax-ng-compact-syntax for RNC files. I should probably change that code to just emit a Unexpected Content-Type: … instead of Non-XML Content-Type: ….)

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

@rmzelle @gracile-fr can you please examine the contents of the request payload using devtools in Windows Chrome—and if possible, Windows Firefox also, if it's failing for you there too.

In case you don't know how to do that already: For Chrome you need to open the Network tab and then reload the page (to re-post the file upload and request) then under the Headers tab, scroll down and there should be a "Request Payload" section there that you can examine. When things are working correctly the contents of that should look something like this:

------WebKitFormBoundary8eBByVrfDAq2eq5U
Content-Disposition: form-data; name="schema"

https://github.com/citation-style-language/schema/raw/v1.0.1/csl.rnc
------WebKitFormBoundary8eBByVrfDAq2eq5U
Content-Disposition: form-data; name="laxtype"

yes
------WebKitFormBoundary8eBByVrfDAq2eq5U
Content-Disposition: form-data; name="file"; filename="apa.csl"
Content-Type: application/octet-stream

------WebKitFormBoundary8eBByVrfDAq2eq5U--

Specifically, the Content-Disposition: form-data; name="laxtype" part should be in there somewhere and overall it should end with that Content-Disposition: form-data; name="file"; filename="apa.csl" part.

And for Firefox devtools, you also need to open the Network tab and reload, then the Params tab underneath that, which should look pretty similar:

Content-Type: multipart/form-data; boundary=---------------------------13292144091981675969704129660
Content-Length: 18415

-----------------------------13292144091981675969704129660
Content-Disposition: form-data; name="schema"

https://github.com/citation-style-language/schema/raw/v1.0.1/csl.rnc
-----------------------------13292144091981675969704129660
Content-Disposition: form-data; name="laxtype"

yes
-----------------------------13292144091981675969704129660
Content-Disposition: form-data; name="file"; filename="apa.csl"
Content-Type: application/octet-stream
…
-----------------------------13292144091981675969704129660--

…with the main difference being that Firefox dumps out the entire contents of the apa.csl file there too (which I've elided with above). But otherwise the thing to check for there too is that the Content-Disposition: form-data; name="laxtype" part should be in there somewhere and it should end with the part that contains Content-Disposition: form-data; name="file"; filename="apa.csl".

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

@rmzelle @gracile-fr can you please also try with http://qa-dev.w3.org:8888/ instead of https://validator.nu/

Right now on the host that's serving http://qa-dev.w3.org:8888/ I have tcpdump running to log all the raw request packets. So if you try from http://qa-dev.w3.org:8888/ I should then be able to see the requests in exactly the same form the validator backend is seeing them there.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

@rmzelle Have you tried it with Firefox on Windows yourself?

Yeah, it fails for me there as well.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I can't reproduce the error at http://qa-dev.w3.org:8888/ on Windows (both Firefox and Chrome validate files correctly). I noticed that this site isn't served over HTTPS. Could that be related to the issue? It's something that recently changed for validator.nu, and I'm sure file validation used to work until recently.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Please try again with http://qa-dev.w3.org:8888/ right now. Just a few minutes ago I made a change on the server side that I think should actually fix this.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I assume I already tested this with the fix in? http://qa-dev.w3.org:8888/ works (just tested again).

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I assume I already tested this with the fix in?

Yeah, I think so.

Anyway the cause isn't related to being served over https. Also, I realize now that despite what I wrote in #30 (comment) and #30 (comment) it has nothing to do with where the laxtype field appears in the request payload.

Instead it seems to be caused by the fact that on Windows, for some reason the request gets sent with a text/x-csl content-type for the uploaded file. Maybe there's some kind of OS-level mapping table of extensions to content types that the browsers end up consulting when the get the file from the filesystem.

Anyway, after examining the validator backend code further earlier today, I realize now that even when laxtype=yes is sent in the request as expected, the only non-XML content-types the validator will accept when set for checking XML documents are text/plain, text/html, and text/xsl.

So what I did to work around that is, I added .csl to the list of file extensions for uploaded files that the validator backend code will always treat as application/xml, regardless of the particular content-type that might be sent in a request along with the file contents.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Thanks for fixing this!

But while I appreciate the fix, wouldn't it be better to just completely ignore the content-type? Is there a specific reason laxtype doesn't extend to uploaded files? With the current solution of white-listing a specific extension, users that have different extensions might still have issues (like ".svg", which is also XML: http://fileformats.wikia.com/wiki/Scalable_Vector_Graphics ).

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

But while I appreciate the fix, wouldn't it be better to just completely ignore the content-type?

No, we don't think that'd be better. The current design is intentional.

Is there a specific reason laxtype doesn't extend to uploaded files?

laxtype does extend to uploaded files already. The laxtype-controlled logic applies in exactly the same way to both uploaded files and to remote files provided through URLs. For uploaded files, we're actually even less strict in that if a content-type is provided with the file upload, we have some additional logic (beyond the laxtype-controlled logic) for ignoring that content-type and instead setting a content-type based on the file extension. That's what I did for .csl to work around the cause of this bug and in general what we can do for other cases where a bogus content-type is getting sent with the upload.

And note that for the case remote-file-provided-by-URL case, if a bogus (unknown, unregistered) content-type like text/x-csl is sent in the headers of the response that the validator does for the file, it will fail even if you have laxtype set. Because, basically, it's not the intent of the service to facilitate serving up documents over the Web with bogus content types. The right solution in that case would be to fix the content-type brokeness, not to just give up on trying to help it get corrected. And in the case that you can't fix that directly yourself would be to copy and paste into the text-field area.

Also, note that the laxtype option is purposefully called "laxtype" and not, say, "ignoretype", because the intent isn't to provide a way to just completely ignore all bad content types. Instead, the intent is just to provide some rational level of laxity for some somewhat-reasonable known cases.

With the current solution of white-listing a specific extension, users that have different extensions might still have issues (like ".svg", which is also XML: http://fileformats.wikia.com/wiki/Scalable_Vector_Graphics ).

The validator already has specific logic for ensuring that .svg files are handled as expected, as well as for some other known cases like Atom and DocBook files.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

@sideshowbarker, okay, fair enough. Thanks for the explanation. Do you know if the content type "application/vnd.citationstyles.style+xml" would be considered valid by the validator? It looks like we can specify the content type in our POST request (http://stackoverflow.com/questions/2845459/jquery-how-to-make-post-use-contenttype-application-json/2845487#2845487), which should sidestep the need for you to treat ".csl" files as having the "application/xml" content type.

@dstillman, any thoughts? You can catch up quickly by reading the issue description at the top (#30 (comment)) and @sideshowbarker's solution above (#30 (comment)). In short, validator.nu doesn't like the "text/x-csl" content type that Windows provides for uploaded files. @sideshowbarker has introduced a fix for ".csl" files, but I'm wondering if there are any lessons to be learned for us here.

(see also http://sourceforge.net/p/xbiblio/mailman/message/29806809/ and zotero/styles-repo@1f9c126)

from validator.

dstillman avatar dstillman commented on May 10, 2024

Looks like Zotero Standalone on Windows is still registering text/x-csl for .csl files. We'll change that to application/vnd.citationstyles.style+xml in a future release.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Do you know if the content type "application/vnd.citationstyles.style+xml" would be considered valid by the validator?

Yes. Because the intent of the relevant validator logic for this is to treat anything served with an XML content type as being XML content. Specifically, the validator respects the RFC 7303 (RFC 3023) convention that anything served with a content type that has a +xml suffix should be considered XML content—that is, intended to be parseable as XML by any conforming XML parser.

The validator code that actually handles that is at https://github.com/validator/validator/blob/master/util/src/nu/validator/xml/ContentTypeParser.java#L115

see also http://sourceforge.net/p/xbiblio/mailman/message/29806809/

So in response to this statement:

It could also be application/vnd.citationstyles.style+xml, since RFC3023 recommends the use of "+xml" for XML-based media types. I don't think there's much to be gained from that, but there's also no particular reason not to do it.

My answer about what's to be gained from that is: Interoperability. Because along with the validator there are many other XML-aware tools which support the RFC 3023/7303 convention that the meaning of anything served with an +xml is “parseable as XML by an XML parser”, whereas text/x-csl is not formally documented anywhere I can find as meaning that (or meaning anything at all, really).

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Looks like Zotero Standalone on Windows is still registering text/x-csl for .csl files. We'll change that to application/vnd.citationstyles.style+xml in a future release.

Cool :-)

I recommend you also consider formally registering the application/vnd.citationstyles.style+xml media type with IANA so that IANA will add it to the the Media Types registry http://www.iana.org/assignments/media-types/media-types.xhtml

It's a pretty painless process and doesn't take a lot of time to complete.

from validator.

dstillman avatar dstillman commented on May 10, 2024

My answer about what's to be gained from that is: Interoperability.

Yes, we adopted application/vnd.citationstyles.style+xml long ago — that post @rmzelle linked to is from 2.5 years ago, and I quoted the justification for using +xml from the RFC. The only issue is that the Zotero installer has some old code that's still registering the old type.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Yes, we adopted application/vnd.citationstyles.style+xml long ago — that post @rmzelle linked to is from 2.5 years ago, and I quoted the justification for using +xml from the RFC. The only issue is that the Zotero installer has some old code that's still registering the old type.

Ah ok, cool—sorry, I had just taken a quick look at it without bothering to notice the date.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I just tried renaming the ".csl" file to ".dsl", which is an unrecognized extension on my Windows 7 system, and validation worked fine with that. After that I uncovered that it's indeed Zotero that registered "text/x-csl" in my registry for ".csl":

HKEY_CLASSES_ROOT.csl

Name Data
(Default) ZoteroCSL
Content Type text/x-csl

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

After that I uncovered that it's indeed Zotero that registered "text/x-csl" in my registry for ".csl":

HKEY_CLASSES_ROOT.csl

Name Data
(Default) ZoteroCSL
Content Type text/x-csl

Aha. I didn't realize there was a way for an application to register a content type in Windows that way (and then to have that affect the handling of file uploads in Web browsers running on the system).

from validator.

dstillman avatar dstillman commented on May 10, 2024

Yeah, we're really just registering the file extension for opening purposes. I guess the content type is a requirement of that. The web browser behavior is a totally unexpected side effect.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I submitted a registration request for "application/vnd.citationstyles.style+xml" by the way. We'll see how that goes. @dstillman, I couldn't find the code that does this (is it in the Zotero repo?). Will you open a ticket?

@sideshowbarker, you don't mind leaving in the fix, at least until we sort this out?

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I submitted a registration request for "application/vnd.citationstyles.style+xml" by the way.

Nice. If you have any process problems with getting it accepted, please let me know—I'd be glad to help (I've registered media types successfully in the past).

@sideshowbarker, you don't mind leaving in the fix, at least until we sort this out?

I have no plans to remove the fix regardless, so no worries there.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

@sideshowbarker, @dstillman, @fbennett, can I pick your brain about the registration (http://www.iana.org/form/media-types)? The reviewer has two main questions:

from validator.

dstillman avatar dstillman commented on May 10, 2024

I don't think we need to define it. These should be UTF-8, and if someone really needs them not to be, they can define the charset in the XML. Particularly since this isn't a text/* type, I don't see much point in allowing the charset parameter.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I don't think we need to define it. These should be UTF-8, and if someone really needs them not to be, they can define the charset in the XML.

Yep.

And actually maybe this registration should explicitly say that conforming instances of this media type must be encoded in UTF-8. There's no prohibition against a registration making that requirement. And if you're not aware of anybody actually producing non-UTF-8 Citation Style Language content and not aware of any use cases for doing it, then it makes a lot of sense to require UTF-8.

Particularly since this isn't a text/* type, I don't see much point in allowing the charset parameter.

RFC 7303/3023 provide a pretty detailed rationale for what the intended point is (or originally was), and they also say this :

   Media types following the naming convention '+xml' SHOULD define the
   charset parameter for consistency, since XML-generic processing by
   definition treats all XML MIME entities uniformly as regards
   character encoding information.

And so almost all XML media type registration seem to follow that "should" requirement in the RFC.

But that said, there are exceptions like http://www.iana.org/assignments/media-types/application/vnd.apple.installer+xml which omit the charset param. And the RFC even specifically mentions the possibility of exceptions to that "should" requirement:

   However, there are some cases that the charset parameter need not be defined.
   For example:

      When an XML-based media type is restricted to UTF-8, it is not
      necessary to define the charset parameter.  UTF-8 is the default
      for XML.

      When an XML-based media type is restricted to UTF-8 and UTF-16, it
      might not be unreasonable to omit the charset parameter.  Neither
      UTF-8 nor UTF-16 require XML encoding declarations.

So I guess what I'd suggest you guys consider doing is either what I asked about above—have the registration state that this media type is restricted to UTF-8—or else have the Optional Parameters section of the registration just say something like following:

Optional parameters : none.
Parsed entities of this media type which are stored in an encoding other than UTF-8 or UTF-16 must
begin with a text declaration containing an encoding declaration, following the requirements specified in the "Character Encoding in Entities" section of [XML] http://www.w3.org/TR/xml/

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

whether the encoding should be 7-bit text, 8-bit text, or binary

I don't think this registration needs to explicitly say anything about 7-bit vs 8-bit vs binary; you can instead just have the Encoding Considerations section of the registration say this:

Encoding Considerations
Same as encoding considerations of application/xml as specified in RFC 7303.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

The registration just went through: http://www.iana.org/assignments/media-types/application/vnd.citationstyles.style+xml

Thanks both!

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

The registration just went through: http://www.iana.org/assignments/media-types/application/vnd.citationstyles.style+xml

rock & roll

Thanks for taking time to submit the registration and follow through on it

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

@sideshowbarker, is 46de9bb already supposed to be live at https://validator.nu/? It doesn't seem to be.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

@sideshowbarker, is 46de9bb already supposed to be live at https://validator.nu/? It doesn't seem to be.

It's not live at https://validator.nu/ yet and you probably need to expect that it'll be a while before it is. It typically takes some time before changes get pushed to there. I'll ping @hsivonen about it.

But that said, I think you guys are probably going to be happier going forward if you set up your own instance of the validator. It's not all that hard to do https://validator.github.io/validator/#build-instructions

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I think you guys are probably going to be happier going forward if you set up your own instance of the validator

Our validator has low traffic (<500 visits/month), and by piggybacking on the validator API and GitHub Pages we don't have to host anything ourselves, so I like the current arrangement. I'll keep the option in mind, though. Thanks!

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Our validator has low traffic (<500 visits/month), and by piggybacking on the validator API and GitHub Pages we don't have to host anything ourselves, so I like the current arrangement.

Fair enough. FWIW http://validator.w3.org/nu/ exposes the same REST API and I push updates to it relatively often (~weekly) and I think it should work for you as expected. If not, let me know and I can spend some time figuring out how to make it work as you need.

It’s true it doesn’t itself provide a frontend Web UI with the same options that https://validator.nu/ does but as far as I understand, you guys don’t need that frontend.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Will http://validator.w3.org/nu/ move to using HTTPS as well?

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

At some point yes definitely but it's not high priority right now

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

When I use "http://validator.w3.org/nu/" instead of "https://validator.nu/" for my GET/POST request I get "Oops. That was not supposed to happen. A bug manifested itself in the application internals. Unable to continue. Sorry. The admin was notified."

Is there API documentation for http://validator.w3.org/nu/?

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

As far as the API documentation, if it's working as expected, it should just work exactly the same as https://validator.nu/ does. The fact that it's not working the same indicates I've not got something configured the way it should be.

So to help me troubleshoot it, can you please try using "http://qa-dev.w3.org:8888" instead and let me know if you get the same “Oops” message back?

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Same message. I also get the error "Could not compile stylesheet", by the way.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

OK that helps. I’ll take a look at the logs on the qa-dev host and get back to you later.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Well first thing I see in the logs is, java.lang.RuntimeException: Namespace for prefix 'cs' has not been declared

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

We have namespace cs = "http://purl.org/net/xbiblio/csl" in our RNC schema.

https://github.com/citation-style-language/schema/blob/v1.0.1/csl.rnc#L3

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

yeah I think it's complaining about https://raw.githubusercontent.com/citation-style-language/schema/master/csl.sch but I see xmlns:cs="http://purl.org/net/xbiblio/csl" in there so I don't understand what the problem is.

FWIW the relevant part of the stack trace looks like this:

nu.validator.xml.PrudentHttpEntityResolver - https://raw.githubusercontent.com/citation-style-language/schema/master/csl.sch
Warning:  org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized.
Compiler warnings:
  WARNING:  'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.'
Warning:  com.thaiopensource.validate.schematron.SchemaReaderImpl$TransformStage: http://javax.xml.XMLConstants/property/accessExternalDTD
Warning:  com.thaiopensource.validate.schematron.SchemaReaderImpl$TransformStage: http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit
nu.validator.servlet.VerifierServletTransaction - RuntimeException, doc: https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl schem
a: https://raw.githubusercontent.com/citation-style-language/schema/v1.0.1/csl.rnc https://raw.githubusercontent.com/citation-style-language/schema/master/csl
.sch lax: true
java.lang.RuntimeException: Namespace for prefix 'cs' has not been declared.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I just now made a change on the http://qa-dev.w3.org:8888 instance, so can you please retry your GET/POST there?

And if it doesn't work can you please open http://qa-dev.w3.org:8888 in a browser and try it manually from the frontend there?

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Yeah, it was definitely https://github.com/citation-style-language/schema/blob/master/csl.sch. Removing that schema fixed the issue (with http://qa-dev.w3.org:8888/).

I get the same error in the front-end.

The .sch schema was generated with the RNG2Schtrn.xsl XSLT style sheet based on the embedded Schematron rules in our .rnc schema, and Jing never complains about it.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Yeah, it was definitely https://github.com/citation-style-language/schema/blob/master/csl.sch. Removing that schema fixed the issue (with http://qa-dev.w3.org:8888/).

I get the same error in the front-end.

OK. But you're saying it does it work if you use the same files with https://validator.nu/ instead? If so, then it either means I broke something recently or it means the Java environment on validator.nu is somehow different.

The .sch schema was generated with the RNG2Schtrn.xsl XSLT style sheet based on the embedded Schematron rules in our .rnc schema, and Jing never complains about it.

Yeah from just eyeballing it at least I can't see any problems with it.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

But you're saying it does it work if you use the same files with https://validator.nu/ instead?

Yes.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Compare

https://validator.nu/?doc=https%3A%2F%2Fraw.githubusercontent.com%2Fcitation-style-language%2Fstyles%2Fmaster%2Fapa.csl&schema=https%3A%2F%2Fraw.githubusercontent.com%2Fcitation-style-language%2Fschema%2Fmaster%2Fcsl.sch&parser=xml&laxtype=yes (pass)

and

http://qa-dev.w3.org:8888/?doc=https%3A%2F%2Fraw.githubusercontent.com%2Fcitation-style-language%2Fstyles%2Fmaster%2Fapa.csl&schema=https%3A%2F%2Fraw.githubusercontent.com%2Fcitation-style-language%2Fschema%2Fmaster%2Fcsl.sch&parser=xml&laxtype=yes (fail)

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

So yeah not sure what's going on but it seems like it might be related to the fact the JVM on qa-dev is newer than the one on validator.nu and I'm things like this:

https://issues.apache.org/jira/browse/RAT-158?focusedCommentId=14102148&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14102148

newer JDKs are setting properties that Xerces don't know about. I had a look in the latest release of
Xerces and the problem is still there, so upgrading to that version will not work.

Will poke around some more to see what else I can find.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

OK I think I may have tracked down the cause and figured out how to fix it. It appears to be a bug in Xalan and we can avoid it by explicitly pointing java to Saxon instead, using:

-Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl

I'll make the change and push it to the w3c validator so you can re-try again there.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Pushed e027c69 to http://validator.w3.org/nu/

So please try again now at http://validator.w3.org/nu/ and I think you should find that it works as expected for you. If not, lemme know.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Now I'm getting

XMLHttpRequest cannot load http://validator.w3.org/nu/. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access.

(didn't see that before)

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Maybe relevant: http://stackoverflow.com/a/10143166/1712389

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Oh, and http://validator.w3.org/nu/ now shows "Excessive traffic pattern blocked". I didn't click the validate button that much :P.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Oh, and http://validator.w3.org/nu/ now shows "Excessive traffic pattern blocked". I didn't click the validate button that much :P.

oh. I think that means you're going to remain blocked for at least 2 hours. :(

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Yeah the No 'Access-Control-Allow-Origin' header is present on the requested resource message is from your browser but I don't understand why you'd be seeing it, since we do actually already send an Access-Control-Allow-Origin: * header on all responses.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

we do actually already send an Access-Control-Allow-Origin: * header on all responses.

Well, maybe not on your 503s :).

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I seemed to get a correct validation once or twice before I got blocked, though, so I'm pretty sure your fix worked.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Ah yeah :-)

If you let me know your IP address I can try to ask the W3C systems team to unblock you. Otherwise, you probably just need to try again in 2-3 hours.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

I seemed to get a correct validation once or twice before I got blocked, though, so I'm pretty sure your fix worked.

Ah OK, good, yeah as far as I can see it should be fine once the block disappears.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I'll wait.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

All right—I need to head off to sleep anyway, so will check up again here again some time tomorrow

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

I bypassed the block via VPN. Now I get an empty response though:

{"url":"https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl","messages":[],"source":{"code":""}}

Submitting the exact same request to validator.nu produces the correct output:

{"url":"https://raw.githubusercontent.com/citation-style-language/styles/master/apa.csl","messages":[{"type":"error","lastLine":5,"lastColumn":17,"firstColumn":5,"message":"Unknown element “title-short” from namespace “http://purl.org/net/xbiblio/csl” not allowed as child of element “info” from namespace “http://purl.org/net/xbiblio/csl”.","extract":"itle>\n    <title-short>APA</t","hiliteStart":10,"hiliteLength":13},{"type":"error","lastLine":31,"lastColumn":69,"firstColumn":5,"message":"Attribute “license” not allowed on element “rights” from namespace “http://purl.org/net/xbiblio/csl” at this point.","extract":"ated>\n    <rights license=\"http://creativecommons.org/licenses/by-sa/3.0/\">This w","hiliteStart":10,"hiliteLength":65},{"type":"error","lastLine":191,"lastColumn":61,"firstColumn":11,"message":"Bad value “version” for attribute “term” on element “text” from namespace “http://purl.org/net/xbiblio/csl”.","extract":"          <text term=\"version\" text-case=\"capitalize-first\"/>\n     ","hiliteStart":10,"hiliteLength":51},{"type":"error","lastLine":506,"lastColumn":129,"firstColumn":3,"message":"Attribute “et-al-use-last” not allowed on element “bibliography” from namespace “http://purl.org/net/xbiblio/csl” at this point.","extract":"tation>\n  <bibliography hanging-indent=\"true\" et-al-min=\"8\" et-al-use-first=\"6\" et-al-use-last=\"true\" entry-spacing=\"0\" line-spacing=\"2\">\n    <","hiliteStart":10,"hiliteLength":127}],"source":{"type":"text/plain","encoding":"utf-8","code":"..."}}

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

And I get a proper error report for csl.rnc if I change the schema URL from

https://raw.githubusercontent.com/citation-style-language/schema/v1.0.1/csl.rnc https://raw.githubusercontent.com/citation-style-language/schema/master/csl.sch

to

https://raw.githubusercontent.com/citation-style-language/schema/v1.0.1/csl.rnc

So it looks like csl.sch is still tripping up the validation at http://validator.w3.org/nu/.

Validating with

https://raw.githubusercontent.com/citation-style-language/schema/v1.0.1/csl.rnc https://raw.githubusercontent.com/citation-style-language/schema/v1.0/csl.rnc

gives the expected results, so it's not an issue with multiple schemas.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Yeah, in the logs I'm seeing this in this:

org.xml.sax.SAXParseException; systemId: https://raw.githubusercontent.com/citation-style-language/schema/master/csl.sch; lineNumber: 2; columnNumber: 107; No implementation available for schema language with namespace URI “http://www.ascc.net/xml/schematron”.

…which means it's not actually doing the schematron checking at all because it can't find Saxon from reason (despite the fact I've got the classpath set up correctly as far as I can tell).

Anyway, will poke around again more later.

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

OK the cause turned out to be a simple classpath flub. Fixed now—please try again with http://validator.w3.org/nu/ one more time.

from validator.

rmzelle avatar rmzelle commented on May 10, 2024

Yes, it works! Validation works with the .rnc and .sch schemas, and validating a .csl file with the text/x-csl media type does no longer give an error.

Thanks for all your time spent on this!

from validator.

sideshowbarker avatar sideshowbarker commented on May 10, 2024

Will http://validator.w3.org/nu/ move to using HTTPS as well?

It has now moved to HTTPS, with HSTS

from validator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.