Coder Social home page Coder Social logo

Comments (5)

jtauber avatar jtauber commented on July 17, 2024

DDN: do we serve up the alignment together? A better approach would be to decouple the serving up of both languages separately and also the alignment itself separately.

from readhomer.

jtauber avatar jtauber commented on July 17, 2024

Data: https://docs.google.com/spreadsheets/d/1-Zz1TCm1bVygngmWeXuskzlfQHN3ER6HERX0DC-VRI0/edit#gid=0

from readhomer.

jacobwegner avatar jacobwegner commented on July 17, 2024

Notes from our discussion:

  • Pass a greek reference (1.1 - 1.18)
  • Return alignment chunks, expanding if necessary (1.1-1.7, 1.8, 1.9, 1.9-1.12, 1.12-1.16, 1.17-1.19)
  • Each chunk has text from both translations
  • Will follow up with ISSUE GOES HERE to strip back the endpoint to not include the text, but rather references

First pass at a spec:

  • Show other chunking schemes, completion
  • 1.1 to 1.19 is chapter 1
  • May be a "follow-on" like with GraphQL
{
    "metadata": {
        "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/<range>/",
        "refs_url": "/<text-identifier-1>:<range>/",
        "refs": {
            "start": "1.1",
            "end": "1.7",
        },
    },
    "chunks": [
        {
            "metadata": {
                "id": 123456,
                "self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/by-id/<id>/",
            },
            "items": [
                {
                    "metadata": {
                        "self_url": "/<text-identifier-1>:<range>/",
                    },
                    "text_html": "",
                    "refs": {
                        "start": "1.1",
                        "end": "1.7",
                    },
                },
                {
                    "metadata": {},
                    "text_html": "",
                    "refs": {}
                }
            ]
        }
    ]
}

from readhomer.

jacobwegner avatar jacobwegner commented on July 17, 2024

Endpoints

I've made the first pass with the following endpoints:

Alignment by reference

/<work_1_urn>/alignment/eng/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/3.411-3.412/

Retrieves the alignment based on the provided reference.

Returns 404 if the reference is not valid

Alignment by offset

/<work_1_urn>/alignment/eng/paginate/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/

Paginates through alignment milestones.

Supports limit (defaults to 10 ) and offset (defaults to 0) arguments. Includes previous/next URLs.

Alignment by offset from reference

/<work_1_urn>/alignment/eng/paginate/<reference>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/3.411-3.412/

Redirects to "Alignment by offset" at the first offset where the milestone contains the passage (e.g. https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/?offset=985&limit=10)

I though this could be a useful shortcut to start pagination without having to work out what the offset is for a particular reference.

Alignment milestone detail

/<work_1_urn>/alignment/eng/by-id/<milestone_id>/

https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/by-id/2275092/

A detail view for an alignment milestone by its particular milestone id. Same as the milestones returned in chunks by the other endpoints, but just a quick way to load a particular milestone.

Gotchas hit along the way

  • My original spec had assumed a 1:1 relationship references and milestones, but there are several references that appear in multiple milestones. As part of resolving alignment milestones from a reference (and calculating the offset for that particular milestone), we will peek at the next and previous milestone from where the milestone is indexed. Here are two samples:

  • Because that relationship is not 1:1, I'm not making an attempt to resolve "subreferences" from leaves within the text server. Instead, I'm stripping out references from the "Greek" field on each row in the source CSV and returning that content.

  • I found that there are several errors in the "Citation" field in the source data. For example, look at 2185582 for the Iliad. citation="1.8" but greek="[1.80] τὸν δ᾽ ἠμείβετ᾽ ἔπειτα θεά, γλαυκῶπις Ἀθήνη:". Since I was already parsing the Greek content, I just re-created the citations from the greek content.

  • Greg's Iliad has a citation for 18.616-18.617, but the Iliad text we're using doesn't have 18.167. I wrote an edge case fix for this that can be expanded as desired (HEALED_CITATIONS)

TODOs

  • There's some circular dependencies between the various Python modules that I'd like to clean up. Might take the opportunity to move from Flask to Django too
  • I'd also like to port the alignment functionality over to the text server backends; everything is currently in memory but probably needs to be available in Redis as well. Doing the pagination bits has also made me think that a RDBMS (hello, Postgres!) backend might be something we should consider (and that might allow us to leverage something like Django Rest Framework on the backend and get pagination, cursors, etc "for free").
  • Add a top-level metadata endpoint for each work that lists available alignments (right now we're harcoding to a <work-urn> andeng pairing, but we know in things like Digital Sira that won't be hardcoded)
  • Get access to the homer-api app on Heroku (likely work with @jtauber on that). Currently, we're hosting on a new readhomer-dev-api app under the SV team.
  • Determine how to resolve other formatting errors in the Greek content
  • Determine if we want to standardize the output format (rather than returning text for the Greek and English items in the alignment, returning tokens instead, etc.)

from readhomer.

jacobwegner avatar jacobwegner commented on July 17, 2024

Add-ons

  • Retain line breaks
  • text --> reference, content, continuation triplicates (English retains text)
  • continuation has a kind or None

Future

  • Ideally, "mute" the portion of the line that overlaps
  • Enumerate continuation kinds

from readhomer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.