Comments (10)
Hi Hayim, I am able to reproduce the error.I am not yet sure what is causing it. The algorithm detects a transposition but then it the processing of the transposition something unexpected happens.
from collatex.
During the alignment the algorithm traverses the graph. It turns out that not all the nodes are visited. The graph contains 29 nodes (excluding the start and end vertices) and only 20 (including the start vertex) are visited. The question now becomes why that is the case.
from collatex.
Thank you for looking at this. It is very vexing b/c it appears unpredictable.
from collatex.
I replaced the graph traversal algorithm with a well known true and tested algorithm and it did not change the result. With this specific dataset for some reason not the whole graph is traversed. So I will need to look into it further.
from collatex.
from collatex.
Still investigating. Do you have a dataset that triggers this bug in a roman language by any chance? I understand that this is a weird request maybe, but I have a hard time figuring out what tokens should be aligned or transposed because I can't read the Hebrew text. Right now the algorithm states that 004-P179204:16:'השנ' and 004-P179204:17:'הרת' are transposed compared to the previous witnesses. Does that sound plausible to you?
from collatex.
from collatex.
Thanks for pointing out the S01520 has what is likely missing text and that transpositions are not expected. That actually gives me a huge hint and a new direction to look into the issue.
from collatex.
A short update. I got a bit further in identifying the problem. The algorithm consists of several steps: 1. finding an optimal set of matches. -> 2. Identify transpositions -> 3. mark transpositions in the graph -> 4. graph traversal -> crash. At first I started looking at step 4. But that is not the cause of the crash. Then I turned my attention to step 2. If step 2 ignores a transposition it causes a cycle in the graph causing the traversal to fail. I thought that that might be the problem. But after your previous post indicating that there is a gap in one of the witnesses and no transpositions I realised that the problem is rather that too many transpositions are found. I checked that piece of code multiple times and could not find a mistake. Then I released that the problem might actually be in step 1. Each token of a witness should align with a unique vertex in the graph. It turns out that there is a bug somewhere in the code of step 1 that cause multiple tokens of the witness to be aligned with one and the same vertex. That should not happen. But somehow it does. Causing step 2, 3 and 4 to fail.
from collatex.
Thanks so much for the update!
For the time being I have the work around of using the option "algorithm":"needleman-wunsch"
. In fact, since I am using JSON tabular output rather than graph output I am not at present actually getting the benefit of detected transpositions.
from collatex.
Related Issues (20)
- <rdg> value in TEI output (Java) is "n", and should be "t"
- GraphML output in CollateX Python HOT 1
- Alignment error
- TEI output error (CollateX Python 2.2) HOT 1
- Regex to specify lower-priority collation tokens
- CollateX Python: Check input for duplicate witness ID and throw error if that is the case
- Choosing an algorithm HOT 6
- JSON to TEI HOT 4
- Collatex demo api service seems to be down HOT 1
- Collating more than 2 witnesses by CollateX2.2 HOT 3
- Words in labels of dot and GraphML output are being merged HOT 4
- dependencies update HOT 2
- CollateX, bug HOT 1
- CollateX, bug, now with e-mail HOT 2
- ECMA/JavaScript Callbacks HOT 1
- Collate bug with detect_transpositions
- Unexpected error, Invalid resource: A, zsh: unknown file attribute: i HOT 9
- Can the latest Python 3.10 fix be pushed to PyPi? HOT 5
- demo service seems down (as of 2024-02-9 15:40 EST) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from collatex.