replace HOMER PARALLEL READER with one using GRC's translations and which is driven by

Data: <a href="https://docs.google.com/spreadsheets/d/1-Zz1TCm1bVygngmWeXuskzlfQHN3ER6

Notes from our discussion: Pass a greek reference (1.1 - 1.18)

Adapt parallel reader endpoint about readhomer HOT 5 OPEN

scaife-viewer commented on July 17, 2024

Adapt parallel reader endpoint

from readhomer.

Comments (5)

jtauber commented on July 17, 2024

DDN: do we serve up the alignment together? A better approach would be to decouple the serving up of both languages separately and also the alignment itself separately.

from readhomer.

jacobwegner commented on July 17, 2024

Notes from our discussion:

Pass a greek reference (1.1 - 1.18)
Return alignment chunks, expanding if necessary (1.1-1.7, 1.8, 1.9, 1.9-1.12, 1.12-1.16, 1.17-1.19)
Each chunk has text from both translations
Will follow up with ISSUE GOES HERE to strip back the endpoint to not include the text, but rather references

First pass at a spec:

Show other chunking schemes, completion
1.1 to 1.19 is chapter 1
May be a "follow-on" like with GraphQL

{
    "metadata": {
        "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/<range>/",
        "refs_url": "/<text-identifier-1>:<range>/",
        "refs": {
            "start": "1.1",
            "end": "1.7",
        },
    },
    "chunks": [
        {
            "metadata": {
                "id": 123456,
                "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/by-id/<id>/",
            },
            "items": [
                {
                    "metadata": {
                        "self_url": "/<text-identifier-1>:<range>/",
                    },
                    "text_html": "",
                    "refs": {
                        "start": "1.1",
                        "end": "1.7",
                    },
                },
                {
                    "metadata": {},
                    "text_html": "",
                    "refs": {}
                }
            ]
        }
    ]
}

from readhomer.

jacobwegner commented on July 17, 2024

Endpoints

I've made the first pass with the following endpoints:

Alignment by reference

/<work_1_urn>/alignment/eng/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/3.411-3.412/

Retrieves the alignment based on the provided reference.

Returns 404 if the reference is not valid

Alignment by offset

/<work_1_urn>/alignment/eng/paginate/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/

Paginates through alignment milestones.

Supports limit (defaults to 10 ) and offset (defaults to 0) arguments. Includes previous/next URLs.

Alignment by offset from reference

/<work_1_urn>/alignment/eng/paginate/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/3.411-3.412/

Redirects to "Alignment by offset" at the first offset where the milestone contains the passage (e.g. https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/?offset=985&limit=10)

I though this could be a useful shortcut to start pagination without having to work out what the offset is for a particular reference.

Alignment milestone detail

/<work_1_urn>/alignment/eng/by-id/<milestone_id>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/by-id/2275092/

A detail view for an alignment milestone by its particular milestone id. Same as the milestones returned in chunks by the other endpoints, but just a quick way to load a particular milestone.

Gotchas hit along the way

My original spec had assumed a 1:1 relationship references and milestones, but there are several references that appear in multiple milestones. As part of resolving alignment milestones from a reference (and calculating the offset for that particular milestone), we will peek at the next and previous milestone from where the milestone is indexed. Here are two samples:
- https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/1.9/
- https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/1.47/
Because that relationship is not 1:1, I'm not making an attempt to resolve "subreferences" from leaves within the text server. Instead, I'm stripping out references from the "Greek" field on each row in the source CSV and returning that content.
I found that there are several errors in the "Citation" field in the source data. For example, look at 2185582 for the Iliad. citation="1.8" but greek="[1.80] τὸν δ᾽ ἠμείβετ᾽ ἔπειτα θεά, γλαυκῶπις Ἀθήνη:". Since I was already parsing the Greek content, I just re-created the citations from the greek content.
Greg's Iliad has a citation for 18.616-18.617, but the Iliad text we're using doesn't have 18.167. I wrote an edge case fix for this that can be expanded as desired (HEALED_CITATIONS)

TODOs

There's some circular dependencies between the various Python modules that I'd like to clean up. Might take the opportunity to move from Flask to Django too
I'd also like to port the alignment functionality over to the text server backends; everything is currently in memory but probably needs to be available in Redis as well. Doing the pagination bits has also made me think that a RDBMS (hello, Postgres!) backend might be something we should consider (and that might allow us to leverage something like Django Rest Framework on the backend and get pagination, cursors, etc "for free").
Add a top-level metadata endpoint for each work that lists available alignments (right now we're harcoding to a <work-urn> andeng pairing, but we know in things like Digital Sira that won't be hardcoded)
Get access to the homer-api app on Heroku (likely work with @jtauber on that). Currently, we're hosting on a new readhomer-dev-api app under the SV team.
Determine how to resolve other formatting errors in the Greek content
Determine if we want to standardize the output format (rather than returning text for the Greek and English items in the alignment, returning tokens instead, etc.)

from readhomer.

jacobwegner commented on July 17, 2024

Add-ons

Retain line breaks
text --> reference, content, continuation triplicates (English retains text)
continuation has a kind or None

Future

Ideally, "mute" the portion of the line that overlaps
Enumerate continuation kinds

from readhomer.

Recommend Projects

Adapt parallel reader endpoint about readhomer HOT 5 OPEN

Comments (5)

Endpoints

Gotchas hit along the way

TODOs

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent