Coder Social home page Coder Social logo

translators's People

Contributors

abejellinek avatar adam3smith avatar adomasven avatar aurimasv avatar avram avatar bclinthall avatar boan-anbo avatar brendan-oconnell avatar dependabot[bot] avatar dstillman avatar guyaglionby avatar jpwarren avatar mcburton avatar mrtcode avatar owcz avatar retorquere avatar rm2342 avatar rmzelle avatar simonster avatar smjwsk avatar sonali0901 avatar stakats avatar step21 avatar swifterslb avatar symac avatar tnajdek avatar vcarret avatar wragge avatar zoe-translates avatar zuphilip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translators's Issues

Embedded metadata missing authors

The page at http://respiratory-research.com/content/11/1/133 has copious, nice RDF. In the old translator, we got the whole author list intact, as you can see in the test case for BioMed Central, which was made using Scaffold when BMC was able to get the complete author list by calling Embedded RDF. Something in the revised translator is limiting us to just the first two authors.

Hoping @simonster can take a look; I can try to work this out in several days (and the motivating issue, http://forums.zotero.org/discussion/17365, needs to be resolved even more promptly).

HighWire (1.0) multiple test failing

The multiple test fails on the server, but works here (I have full text access). Not sure if we should care about fixing this one; there aren't many HighWire 1.0 sites left.

Escaping of note content in RDF export

People in the TEI world have noticed that our RDF export makes a mess of HTML tags in item data:

<rdf:value>&lt;h6>1256 to 1272&lt;/h6>
&lt;p>&amp;nbsp;&lt;/p>
&lt;p>page 32&amp;nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres&lt;/p>
&lt;p>page 40 ditto&lt;/p>
&lt;p>&amp;nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&amp;nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&amp;nbsp; William de hylton gives up dower amongst others.&amp;nbsp; Makes one wonder whether&amp;lt;per corresp='#williamofhultonclerk' role='m'&amp;gt;William de Hulton&amp;lt;/per&amp;gt; and William the clerk are the same person.&lt;/p>
&lt;p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&amp;nbsp; and lands in the custody of Edmund king's son&lt;/p>
&lt;p>page 9&amp;nbsp; and 10 1258 Information re Henry of Ashbourne.&amp;nbsp; Holds a court. Case of villeinage.&amp;nbsp; Confirms Henry heir of&amp;nbsp; Robert of Ashbourne.&amp;nbsp; Stephen of Ireton one of the pledges for Henry.&lt;/p>
</rdf:value>
</bib:Memo>

We are presumably doing the same with things like <i> in item titles. A proper solution to this, as suggested in the linked thread on eXist-TEIXML, is to namespace those tags. We would also need to replace non-XML entities like &nbsp;.

Unfortunately, this behavior has its roots in the underlying Tabulator RDF engine; I don't how we'd convince it to handle this with namespacing.

I would like help on this, if we have anyone still on the team who has experience with the RDF engine.

MODS - incorrectly puts last name of personal authors in single field mode

See here:
http://groups.google.com/group/zotero-dev/tree/browse_frm/thread/4c00e8ebacfcfcf1/7a5041c27d3389c5?rnum=1&_done=%2Fgroup%2Fzotero-dev%2Fbrowse_frm%2Fthread%2F4c00e8ebacfcfcf1%3F#doc_a1e870dfc15a8c8d

"MODS allows for either breaking up parts of the name (given and family, for example) in different elements or enclosing the entire name in one element."
http://www.loc.gov/standards/mods/userguide/name.html#namepart

currently the MODS translator only deals correctly with the first version.

Pleade translator detects false positives

See this example. It looks like the problem is the Twitter tweet button's iframe URL, which contains both "ead.html" and "id=". Since this is all detectWeb checks, the Pleade translator comes up, even though the site has nothing to do with Pleade. Ideally, there would be something more unique that we could use to detect Pleade, but even if there isn't, we should be able to make the regexp stricter.

Create Labels

We should have some labels here - major, minor, error, new translator needed

Sirsi eLibrary support

@usclibraries has prepared a site-agnostic version (8b921dca58871474568d55c825035126bc318c94) of Rice/Rutgers modifications of the Sirsi translator. The translator is intended to cover the most recent iteration of the SirsiDynix OPAC, called eLibrary.

It looks like it can be used as a drop-in replacement for Rice and Rutgers immediately, and it works for USC. The translator still needs some work, since we can probably tear out some of the out code from the original Sirsi translator and streamline this one. More importantly, it doesn't yet work with all eLibrary installations. Joyce at USC pointed out a list of Association of Research Libraries members using SirsiDynix, Unicorn on the list: http://www.librarytechnology.org/arl.pl

Of these installations, the present translator definitely doesn't work with the Indiana catalog

We need to review the rest of the major installations and see which ones we're missing, and see how we can shoehorn support for them into a unified translator.

Expand embedded metadata detection

The <link rel="alternate" /> syntax for providing alternate representations should be used when we look for embedded metadata. A recent discussion notes a site providing dissertations that we don't import correctly. In addition to Google/Highwire metadata which we're parsing, it includes such <link rel="alternate" /> references to structured descriptions:

<link href="http://umu.diva-portal.org/smash/getreferences?referenceFormat=librismarcxml&pids=diva2:459013"
  rel="alternate" title="MARC-XML Representation" type="text/xml" />
<link href="http://umu.diva-portal.org/smash/getreferences?referenceFormat=swepubmods&pids=diva2:459013"
  rel="alternate" title="MODS Representation" type="text/xml" />

I don't think we can expect to read these as-is, since the text/xml type is too vague, but we should look for known types for formats we do read, just like we do for intercepting RIS/BibTeX download. That means application/mods+xml for MODS, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.