Coder Social home page Coder Social logo

perseids-publications / treebank-template Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 9.0 28.33 MB

Template for publishing collections of treebanks.

Home Page: https://perseids-publications.github.io/treebank-template/

License: MIT License

JavaScript 85.47% CSS 1.49% HTML 10.75% Shell 2.29%
perseids-project alpheios-project arethusa ancient-greek latin classics treebank

treebank-template's People

Contributors

balmas avatar dependabot[bot] avatar zfletch avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

treebank-template's Issues

Alpheios integration: words annotated, but disambiguation data sometimes does not show up

Using the treebanks https://nevenjovanovic.github.io/treebank-template/fragment-SBJ-10-1-10/11 and https://nevenjovanovic.github.io/treebank-template/fragment-SBJ-10-1-10/12 (the same sentence, without the treebank diagram and with a partial one), at our site integrated with Alpheios and treebanks http://croala.ffzg.unizg.hr/eklogai/theca/10-sent-sbj/ the words 3, 5, 9-13, 20, 24, 26-28, 30 do not show disambiguation data, even those words which have multiple possible morphological identification.
Don't understand why is this happening, because the morphological data and lemmata are present in the source, for example:

<word id="24" form="ἀήθεις" lemma="ἀήθης" postag="a-p---ma-" relation="" head=""/>
<word id="25" form="," lemma="punc1" postag="u--------" relation="AuxX" head="0"/>
<word id="26" form="τροφῆς" lemma="τροφή" postag="n-s---fg-" relation="" head=""/>
<word id="27" form="δ’" lemma="δέ" postag="d-------p" relation="" head=""/>
<word id="28" form="ἡμέρου" lemma="ἥμερος" postag="a-s---fgp" relation="" head=""/>
<word id="29" form="παντελῶς" lemma="παντελῶς" postag="d-------p" relation="" head=""/>
<word id="30" form="ἀνεννοήτους" lemma="ἀνεννόητος" postag="a-p---ma-" relation="" head=""/>
<word id="31" form="." lemma="punc1" postag="u--------" relation="AuxK" head="0"/>

On the other hand, at another page of the same site: http://croala.ffzg.unizg.hr/eklogai/theca/ps-xen-athen-2-14-2/ all words are disambiguated according to source https://nevenjovanovic.github.io/treebank-template/ps-xen-2-14-2-16/1
I use mkdocs to produce the pages, but the sentences aligned with treebanks are HTML fragments.
Tested in Firefox and Chrome.

Additional instructions

I'm creating one issue to keep track of all the changes related to user instructions (public/getting-started, public/doi, etc.).

  • Add some lines about the .env file to public/getting-started/index.html. The values in the .env file are used for the page's metadata, including the title and the OpenGraph data. (Previously tracked in #49)

  • Add detailed instructions with pictures explaining how to update the template code through GitHub Actions (introduced in #48). (Previously tracked in #44)

  • Add detailed instructions with pictures explaining how to create releases through GitHub Actions (introduced in #53). These instructions should also mention how to integrated with Zenodo and explain that a user should add the DOI provided by Zenodo "for all versions." (The initial release won't have the DOI in it, but this way all future releases will.)

Automatic sentence detection with Treebank React

When Treebank React is used as the rendering engine, we should automatically be able to generate the sentence list without chunks in config.json. Adding this feature will make it much easier to work with the Treebank Template when XML files change.

Citability

Dear @zfletch ,
I think it would be great to hook the repositories here to zenodo.org :) I was gonna try to cite the different corpora in a paper, but there is no longterm solution available, nor real bibliography information. Using release versioning + zenodo hook + metadata, you'd be able to convey nice informations + DOI which would definitely be nice :)

alpheios instructions: fix examples

suggested by @nevenjovanovic

  • confusing: Step 2: data-alpheios_tb_app_version="3.1.0" (specify
    whichever version of the template you are using) -- I don't know how or
    where to find the version of the template, so I just left it as is and
    hoped for the best

  • inconsistent: Step 3: <div lang="lat" -- but in the rest of the document
    "grc" is used; it would be clearer if there were two examples, for Greek
    and for Latin (or an explanation as to what to change and where)

  • incorrect: Step 3: <div lang="lat" class="alpheios-enabled" data-alpheios_tb_doc="on-the-murder-of-eratosthenes-1-50/"> -- until I removed the forward slash at the end ("-1-50/" to the equivalent of"-1-50"), the connection to my treebank-template Github pages did not work; also, On the murder of Eratosthenes is probably not in Latin, so
    "lat" should be changed to "grc"

  • inconsistent: between Step 3 and Demo Scenarios: in Step 3 the element is
    <div lang="lat" class="alpheios-enabled" data-alpheios_tb_doc="on-the-murder-of-eratosthenes-1-50/">

but in "Text and treebank aligned by sentence." example the element is:
<div lang="grc" class="alpheios-enabled" data-alpheios_tb_doc="on-the-crown-1-50" data-alpheios_tb_sent="1">
-- I'm confused as to why there is the additional "data-alpheios_tb_sent"
in this element, as well as in its p child (there with the value "2"); I
can guess why it is so, but this is inconsistent with the example in Step
3 (also, pedagogically, perhaps it would be clearer if the same (Greek)
document is used throughout)

GitHub workflow for publishing a release

Right now creating a new release requires manually updating the package.json and then creating the release on GitHub.

Instead, we should create a GitHub workflow similar to update that uses the create-release action or something similar. Whenever a user has added enough new treebanks they're ready to create a release, the user could then use the GitHub Actions interface the same way this action is used.

Doing this might even be a way to get around the issue with the GitHub Zenodo integration not generating a DOI until after the archive is created. The "new release" action could use the Zenodo API to pre-reserve a DOI, update src/config.json, commit and push, then create a release on GitHub.

(From this conversation)

Use a CSV file to manage publications

(per discussion with @rgorman)

editing the config.json is tricky for people without good JSON skills.

Could we consider having a simple CSV file (or files) that people could put the file metadata in, and then have the code construct the config object automatically from that data?

What do you think @zfletch ?

could we make morph retrieval an optional parameter?

For purposes of publication of trees, there isn't really any value in doing the retrieval from the morphology service, and it just adds alot of unnecessary service calls.

It can be disabled by setting noRetrieval: "online" in the morph plugin configuration. Could we make it possible for the treebank-template to have this set it its config.json file (and thus carried over to ArethusaConfig.js) so that users of the template can decide whether they want it or not? (I don't really know if there is ever a reason it would be desirable but don't know if we want to rule that out).

Show morphology is broken in Safari

Reported by @monzug in #55

In Safari, the morphology sidebar does not show up. This is due to CSS of width: 35% !important;. For some reason it causes Arethusa to calculate a height of 0 for the sidebar, but does not cause this issue on other browsers.

safari

Alpheios integration doc: add instructions to use a fixed release number

Is there any chance that @latest will stop working when the latest version is updated? Or will it remain backwards compatible? If it has a chance of changing, it might be safer to specify the version explicitly.

it's a good question. Ideally I would like to be able to commit to not doing breaking changes and @latest is a synonym for @stable but I cannot guarantee we won't have to make breaking changes in the future. The advantage of using the @latest tag is that sites will automatically get bug fixes without having to make any changes on their side. The disadvantage is that their sites might at some point in the future break. Perhaps I should include instructions for both options and let users make their own decision about which way to go?

That's a difficult choice. I tend to lean towards not breaking projects over delivering bug fixes because it seems like a lot of DH projects are made by students or with a grant and end up unmaintained after creation. But I don't know the particular pros and cons with the Alpheios Components library so I'll leave it up to you. I have no hesitations merging this PR either way.

Originally posted by @zfletch in #38

changes to the look of treebank

  1. The numbering scheme in treebank does not reflect the sentences number from the book in Alpheios. see screenshot
  2. can we make the first/back and last/next highlighted (or not selectable) when at the beginning or the end of treebanking?
  3. In show morphology the checkbox, when mouseover a word, is out of focus in FF/PC.
  4. in Safari, Show morphology in both mobile and Mac does not do anything, though it does shrink the panel on Mac but I do not get the morphs and selector tabs like in Chrome.

numberIliad

Feedback system

Develop a feedback and feedback notification system for Treebank Template repositories. This could involve recommended notification settings or setting up a GitHub action to do something when an issue is created. We could also recommend disabling issues for published collections that are not being actively maintained.

Some common situations we would need to address:

  • Treebanks are copied from another repository or collection
  • Treebanks are not actively maintained
  • Treebanks are generated from some non-treebank source and backporting changed is difficult

Update step-by-step instructions to include env file update

Btw, I think perhaps the update of the .env flie is something that is missing from the instructions I wrote. Should it be there?

Everything will still work if the .env file isn't updated. The values are only used for the title of the page and for the preview that shows up when posting a link to social media like Twitter or Facebook (here).

Originally posted by @zfletch in #48 (comment)

I think it would be good to add this to the step-by-step instructions at some point.

issue with path to arethusa resources

I've been working through getting the deployment to work using aws3 and cloudfront.
I have run into a small issue with the path to the arethusa resources.

In ArethusaWrapper, the path is set to

${process.env.PUBLIC_URL}/arethusa/

When deployed at the root of the website, this then results in the path to the arethusa resources getting an extra '/'

e.g.

alpheios.trees.com/arethusa//dist/arethusa.packages.min.js

which is not working with the AWS webserver

If I remove the trailing slash it works great for my environment, but when I tested it on the origin repo, the yarn dev server is failing to find the the d3 code. I'm a little lost because I can't seem to reproduce that problem in my own yarn dev environment (using the alpheios-project/alpheios-trees repo), although I can't see any other differences .

Any thoughts? I know the arethusa code uses a relative path back to the d3 module so maybe that's a factor here, but I can't figure out why it would work in one environment and not another with the same code.

Replace Arethusa Widget with Treebank React

  • Add Treebank React option for user-facing trees (655caf8)
  • Use Treebank React for embedded trees (981f20d)
  • Update TreebankService (which uses Alpheios Messaging) to work with Treebank React
  • Update tests to account for the necessary fetch calls performed by Treebank React (4aead7f)

add update instructions to getting started

requested by @nevenjovanovic:

one thing bothers me about the current treebank templates and their Git
workflow. What happens if I have forked the repo, adapted the
configuration and added my documents -- and then you make a change to the
upstream repository, and I want to update my forked one?

In other words, are we, after the fork, committed to the forked version,
and we have to do everything again if the upstream repository is updated?
And what would be the best workflow for this "do everything again"?

Maybe, for advanced users who want to keep up with your work on the upstream repository, another workflow could
be developed and described?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.