zotero / translators Goto Github PK
View Code? Open in Web Editor NEWZotero Translators
Home Page: http://www.zotero.org/support/dev/translators
Zotero Translators
Home Page: http://www.zotero.org/support/dev/translators
(the search result page is a bit odd & takes a long time to load, so this is probably to be expected).
I had to disable a translator test in the Huffington Post translator because it was intermittently trying to fetch http://search.huffingtonpost.com/false an infinite number of times in both Gecko and Chrome. I don't actually know the framework well enough to understand why this is. @adam3smith, since it's your translator, can you take a look at this?
The page at http://respiratory-research.com/content/11/1/133 has copious, nice RDF. In the old translator, we got the whole author list intact, as you can see in the test case for BioMed Central, which was made using Scaffold when BMC was able to get the complete author list by calling Embedded RDF. Something in the revised translator is limiting us to just the first two authors.
Hoping @simonster can take a look; I can try to work this out in several days (and the motivating issue, http://forums.zotero.org/discussion/17365, needs to be resolved even more promptly).
The multiple test fails on the server, but works here (I have full text access). Not sure if we should care about fixing this one; there aren't many HighWire 1.0 sites left.
People in the TEI world have noticed that our RDF export makes a mess of HTML tags in item data:
<rdf:value><h6>1256 to 1272</h6>
<p>&nbsp;</p>
<p>page 32&nbsp; roll 1218a 1272 John the Clerk against William de Grendon regarding the warrant of 8 acres</p>
<p>page 40 ditto</p>
<p>&nbsp;p108 roll 144 1269 Claim by Margery who was the wife of Henry of Ashbourne&nbsp; re dower from various individuals including Stephen of Ireton the third part of an acre of meadow in Snelston, and ?( William de ) Hulton in Clifton .&nbsp; William de hylton gives up dower amongst others.&nbsp; Makes one wonder whether&lt;per corresp='#williamofhultonclerk' role='m'&gt;William de Hulton&lt;/per&gt; and William the clerk are the same person.</p>
<p>page 109 ditto Roger is the son of Henry of Ashbourne and is in the custody of Margaret countess Derby&nbsp; and lands in the custody of Edmund king's son</p>
<p>page 9&nbsp; and 10 1258 Information re Henry of Ashbourne.&nbsp; Holds a court. Case of villeinage.&nbsp; Confirms Henry heir of&nbsp; Robert of Ashbourne.&nbsp; Stephen of Ireton one of the pledges for Henry.</p>
</rdf:value>
</bib:Memo>
We are presumably doing the same with things like <i>
in item titles. A proper solution to this, as suggested in the linked thread on eXist-TEIXML, is to namespace those tags. We would also need to replace non-XML entities like
.
Unfortunately, this behavior has its roots in the underlying Tabulator RDF engine; I don't how we'd convince it to handle this with namespacing.
I would like help on this, if we have anyone still on the team who has experience with the RDF engine.
It's likely that new.com.au will be able to use a very similar translator as the two sites look alike.
Works pretty well with DOIs, though, which are displayed for all items.
"MODS allows for either breaking up parts of the name (given and family, for example) in different elements or enclosing the entire name in one element."
http://www.loc.gov/standards/mods/userguide/name.html#namepart
currently the MODS translator only deals correctly with the first version.
See this example. It looks like the problem is the Twitter tweet button's iframe URL, which contains both "ead.html" and "id=". Since this is all detectWeb checks, the Pleade translator comes up, even though the site has nothing to do with Pleade. Ideally, there would be something more unique that we could use to detect Pleade, but even if there isn't, we should be able to make the regexp stricter.
fails on
http://scholar.google.com/scholar?hl=en&q=smith&btnG=Search&as_sdt=0%2C22&as_ylo=&as_vis=0
which is one of the tests.
The reason is the link to the author biography between the 3rd and 4th item, which has the same Xpath as the article titles/links.
We should have some labels here - major, minor, error, new translator needed
We should probably write a translator for Fairfax Media, since there are several news sites that use this content delivery suite.
See http://www.fairfax.com.au/network-map.aspx for a list of sites
@usclibraries has prepared a site-agnostic version (8b921dca58871474568d55c825035126bc318c94) of Rice/Rutgers modifications of the Sirsi translator. The translator is intended to cover the most recent iteration of the SirsiDynix OPAC, called eLibrary.
It looks like it can be used as a drop-in replacement for Rice and Rutgers immediately, and it works for USC. The translator still needs some work, since we can probably tear out some of the out code from the original Sirsi translator and streamline this one. More importantly, it doesn't yet work with all eLibrary installations. Joyce at USC pointed out a list of Association of Research Libraries members using SirsiDynix, Unicorn on the list: http://www.librarytechnology.org/arl.pl
Of these installations, the present translator definitely doesn't work with the Indiana catalog
We need to review the rest of the major installations and see which ones we're missing, and see how we can shoehorn support for them into a unified translator.
The <link rel="alternate" />
syntax for providing alternate representations should be used when we look for embedded metadata. A recent discussion notes a site providing dissertations that we don't import correctly. In addition to Google/Highwire metadata which we're parsing, it includes such <link rel="alternate" />
references to structured descriptions:
<link href="http://umu.diva-portal.org/smash/getreferences?referenceFormat=librismarcxml&pids=diva2:459013"
rel="alternate" title="MARC-XML Representation" type="text/xml" />
<link href="http://umu.diva-portal.org/smash/getreferences?referenceFormat=swepubmods&pids=diva2:459013"
rel="alternate" title="MODS Representation" type="text/xml" />
I don't think we can expect to read these as-is, since the text/xml
type is too vague, but we should look for known types for formats we do read, just like we do for intercepting RIS/BibTeX download. That means application/mods+xml
for MODS, etc.
www.loc.gov/standards/mods/userguide/physicaldescription.html
Since this can contain other info for non-book item types we should check that this doesn't cause any problems.
I'm hoping @AJLyon can have a look - the translator does some funky stuff with creators that I don't want to mess with w/o understanding it.
\textit{ }
can be put into our italics, possibly including round-trip
Per http://forums.zotero.org/discussion/19316
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.