Coder Social home page Coder Social logo

CollateX, bug, now with e-mail about collatex HOT 2 OPEN

kkynde avatar kkynde commented on August 25, 2024
CollateX, bug, now with e-mail

from collatex.

Comments (2)

rhdekker avatar rhdekker commented on August 25, 2024

Hi Karsten,

Gregor Middell forwarded your example data files to me. I ran the collation and looked at the internal alignment result, represented as a variant graph, meaning the result independent of the chosen output format and it looks as follows:

alignment_result_colx1_colx2

Which means that CollateX only finds two points of variation. One being a ":" (W1) replaced by a "," (W2). The other being "night" (W1) replaced by "knight" (W2). This seems to be correct to me.

If you agree with this then the question becomes how that internal result should be represented in the requested output format.

In the TEI output there is a <app><rdg wit="w1">Now, It was a dark and stormy</rdg><rdg wit="w2">Now, it was a dark and stormy</rdg></app><app> reading which is what I suspect your report Is about. CollateX doesn't find a meaningful semantic difference here, but it notices a differences in casing here: "it" versus "It". Changes in upper- and lowercasing are by default ignored during alignment, but we have to put them somewhere in the TEI to be able to reconstruct the original witnesses from the output. I think this is what causes the confusion or in other words the difference in expectations.

Before we discuss possible solutions: am I thinking in the right direction so far or is there something else that you wanted to bring to our attention with the example in your report?

Best,
Ronald

from collatex.

kkynde avatar kkynde commented on August 25, 2024

Dear Ronald Dekker

Thank you, very much, for your rapid reply. You are indeed thinking in the right direction.

It does confirm my suspicion that change of case somehow is a difference, somehow not.

Your graph is correct in the sense that the different cases does not constitute a 'semantical difference'. Never the less you have saved the 'not semantically different' version (It) somewhere. It is not represented in the graph (nor in the --format graphml output), but it is in the TEI output by two separate elements.

My problem is, that the not semantically difference (it vs. It) this way is mixed up with the truly invariant text surrounding it, which may be very comprehensive. I would have expected either (it and It are different)

Now, <app><rdg wit="w1">It</rdg><rdg wit="w2">it</rdg></app> was a dark and stormy

or (it an It are not different, consistent with the graph)

Now, it was a dark and stormy

I do catch your remark that the latter would prevent you to reconstruct the original witnesses. I also think the former was to prefer (the could be attributed type="notSemanticalDifference"), but you would not be able to construct it from your graph unless you make a recursive collation on the readings.

Have I missed something in the documentation that changes in upper and lower casing are by default ignored during alignment (BTW the same counts for change in spacing)? And does 'by default' mean that I can change it, like suggested in the documentation, by the --script option? If so, how do I do this (back to the first question)?

Yours,
Karsten Kynde

from collatex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.