Comments (5)
DDN: do we serve up the alignment together? A better approach would be to decouple the serving up of both languages separately and also the alignment itself separately.
from readhomer.
Data: https://docs.google.com/spreadsheets/d/1-Zz1TCm1bVygngmWeXuskzlfQHN3ER6HERX0DC-VRI0/edit#gid=0
from readhomer.
Notes from our discussion:
- Pass a greek reference (1.1 - 1.18)
- Return alignment chunks, expanding if necessary (1.1-1.7, 1.8, 1.9, 1.9-1.12, 1.12-1.16, 1.17-1.19)
- Each chunk has text from both translations
- Will follow up with ISSUE GOES HERE to strip back the endpoint to not include the text, but rather references
First pass at a spec:
- Show other chunking schemes, completion
- 1.1 to 1.19 is chapter 1
- May be a "follow-on" like with GraphQL
{
"metadata": {
"self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/<range>/",
"refs_url": "/<text-identifier-1>:<range>/",
"refs": {
"start": "1.1",
"end": "1.7",
},
},
"chunks": [
{
"metadata": {
"id": 123456,
"self_url": "/<text-identifier-1>/alignment/<text-identifier-2>/by-id/<id>/",
},
"items": [
{
"metadata": {
"self_url": "/<text-identifier-1>:<range>/",
},
"text_html": "",
"refs": {
"start": "1.1",
"end": "1.7",
},
},
{
"metadata": {},
"text_html": "",
"refs": {}
}
]
}
]
}
from readhomer.
Endpoints
I've made the first pass with the following endpoints:
Alignment by reference
/<work_1_urn>/alignment/eng/<reference>/
Retrieves the alignment based on the provided reference.
Returns 404 if the reference is not valid
Alignment by offset
/<work_1_urn>/alignment/eng/paginate/
Paginates through alignment milestones.
Supports limit
(defaults to 10
) and offset
(defaults to 0
) arguments. Includes previous/next URLs.
Alignment by offset from reference
/<work_1_urn>/alignment/eng/paginate/<reference>/
Redirects to "Alignment by offset" at the first offset where the milestone contains the passage (e.g. https://readhomer-dev-api.herokuapp.com/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2/alignment/eng/paginate/?offset=985&limit=10)
I though this could be a useful shortcut to start pagination without having to work out what the offset is for a particular reference.
Alignment milestone detail
/<work_1_urn>/alignment/eng/by-id/<milestone_id>/
A detail view for an alignment milestone by its particular milestone id. Same as the milestones returned in chunks
by the other endpoints, but just a quick way to load a particular milestone.
Gotchas hit along the way
-
My original spec had assumed a 1:1 relationship references and milestones, but there are several references that appear in multiple milestones. As part of resolving alignment milestones from a reference (and calculating the offset for that particular milestone), we will peek at the next and previous milestone from where the milestone is indexed. Here are two samples:
-
Because that relationship is not 1:1, I'm not making an attempt to resolve "subreferences" from leaves within the text server. Instead, I'm stripping out references from the "Greek" field on each row in the source CSV and returning that content.
-
I found that there are several errors in the "Citation" field in the source data. For example, look at
2185582
for the Iliad.citation="1.8"
butgreek="[1.80] τὸν δ᾽ ἠμείβετ᾽ ἔπειτα θεά, γλαυκῶπις Ἀθήνη:"
. Since I was already parsing the Greek content, I just re-created the citations from the greek content. -
Greg's Iliad has a citation for
18.616-18.617
, but the Iliad text we're using doesn't have18.167
. I wrote an edge case fix for this that can be expanded as desired (HEALED_CITATIONS
)
TODOs
- There's some circular dependencies between the various Python modules that I'd like to clean up. Might take the opportunity to move from Flask to Django too
- I'd also like to port the alignment functionality over to the text server backends; everything is currently in memory but probably needs to be available in Redis as well. Doing the pagination bits has also made me think that a RDBMS (hello, Postgres!) backend might be something we should consider (and that might allow us to leverage something like Django Rest Framework on the backend and get pagination, cursors, etc "for free").
- Add a top-level metadata endpoint for each work that lists available alignments (right now we're harcoding to a
<work-urn>
andeng
pairing, but we know in things like Digital Sira that won't be hardcoded) - Get access to the
homer-api
app on Heroku (likely work with @jtauber on that). Currently, we're hosting on a newreadhomer-dev-api
app under the SV team. - Determine how to resolve other formatting errors in the Greek content
- Determine if we want to standardize the output format (rather than returning
text
for the Greek and English items in the alignment, returning tokens instead, etc.)
from readhomer.
Add-ons
- Retain line breaks
-
text
--> reference, content, continuation triplicates (English retains text) - continuation has a kind or None
Future
- Ideally, "mute" the portion of the line that overlaps
- Enumerate continuation kinds
from readhomer.
Related Issues (20)
- ability to view a "card" at a time
- pagination between cards HOT 1
- incorporation of metrical visualisation HOT 2
- sibling extension (e.g. click + above or below lines to load more lines HOT 7
- infinite scroll HOT 1
- add word list widget HOT 2
- show the CTS URN for passage being shown
- synchronised reading of translations HOT 3
- bring in tagging and lemmatisation HOT 1
- translation alignment HOT 1
- persist widget configuration so it survives refresh
- add some new TOC chunking HOT 1
- extend to Odyssey as well as Iliad
- tweaks to HOMER READER and HOMER REFERENCE INPUT
- refactor text server to return tokens where possible HOT 1
- Serve ngram data from the backend HOT 2
- loading globalComponents fails in production HOT 1
- Localization support
- full text search (not including lemmatisation yet) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readhomer.