jhellingman / tei2html Goto Github PK
View Code? Open in Web Editor NEWXSLT stylesheets to convert TEI to HTML and ePub format.
License: GNU General Public License v3.0
XSLT stylesheets to convert TEI to HTML and ePub format.
License: GNU General Public License v3.0
Line 455 in a34ac7a
Line 77 in 73af97e
Open the ePub in various ePub readers. Depending on the device, the cover
image is not neatly rendered or not recognized at all.
Try to find out reasonable defaults:
http://blog.threepress.org/2009/11/20/best-practices-in-epub-cover-images/
Note: the Mobi format seems to use and extention:
<x-metadata><Cover>images\cover.jpg</Cover></x-metadata>
Original issue reported on code.google.com by jhellingman
on 25 Apr 2010 at 9:50
What steps will reproduce the problem?
1. Create a TEI document with a chapter with two head elements (two-line
heading)
2. Generate an ePub from it.
3. Look at the table of contents in the left-hand pane (using the Calibre
viewer)
What is the expected output? What do you see instead?
Multiline headers are separated; now they are joined together.
Original issue reported on code.google.com by jhellingman
on 7 Dec 2009 at 7:48
Currently, all IDs used for internal cross references in the output HTML
are generated, and thus are meaningless alphanumeric strings, whereas ids
in the source TEI are often manually added and have sensible meanings. It
would help to retain these IDs in the HTML output as well, and only
generate IDs when no ID is present in the TEI. This would also help to make
the HTML more stable, as now, every re-run of the stylesheet may generate
different IDs.
Original issue reported on code.google.com by jhellingman
on 1 Nov 2006 at 1:32
We need to support ePub, for easier deployment of ebooks to eReaders.
Since ePub is basically XHTML with a subset of CSS 2.0, we need to do the
following:
* Generate valid XHTML
* Replace unsupported CSS features with supported constructs (Where this
leads to loss in functionality in the HTML version, this should depend on
an ePub switch.)
* Generate the required metadata files for ePub.
* Package the whole into a ePub-compliant zip archive.
Original issue reported on code.google.com by jhellingman
on 12 Nov 2009 at 2:14
A proposed new feature to replace the non-standard Adobe page-map.
See
http://www.epubbooks.com/blog/20081209/marking-up-page-numbers-in-the-epub-ncx/
Original issue reported on code.google.com by jhellingman
on 15 Jan 2010 at 7:57
Add support for running headers and footers.
See http://wiki.mobileread.com/wiki/EBook_Publisher for some implementation
hints, but realize that <header> and <footer> are not valid HTML tags.
Do something like:
in CSS:
.pagehead {display:none; display:oeb-page-head}
.pagefoot {display:none; display:oeb-page-foot}
And in HTML at opportune locations:
<div class="pagehead>Text of running header</div>
Original issue reported on code.google.com by jhellingman
on 22 Apr 2010 at 7:30
Currently, the table-normalization step is done in a separate transformation
before the main transformation. This requires additional glue code in perl to
run this transformation. To simplify this, this should be done at the same time
the tables are formatted in the tables.xsl.
Currently, a simple integration of the table-normalization into the main
stylesheet leads to a number of unexpected results, either due to other
(unrelated) templates matching, or some other overseen complication, which
makes the trivial integration step (using a temporary node-tree in a variable)
incorrect so far.
Original issue reported on code.google.com by jhellingman
on 10 Sep 2014 at 5:29
Tei2html does not support PGTEI, as used by Project Gutenberg volunteers.
Minimal support will be needed for the following:
<div>
elements. (DONE)@rend
attributes. (should be a flag, consider pre-processing and use of @style
attribute; @style
support DONE.)<pgExtensions>
elements.<q>
elements by inserting quotation marks. (should be a flag; a big issue here is that <q>
elements are often used as wrappers for elements that do not directly fit the structure of a TEI document, as to make them valid. We somehow need to distinguish those uses from the intended usage of text between quotation marks.)<divGen>
types. (Partly DONE)Original issue reported on code.google.com by jhellingman
on 24 Mar 2011 at 10:50
Currently, the generated HTML contains a link back to the TOC. This points to
the top-level of the TOC. It should be neater if that link links back to the
relevant entry in the TOC when possible.
Two cases:
<divN id=toc>:
- For encoded pre-existing tocs, we need to look at the element with a target
to the current chapter in the toc, and link to that. (Assuming we have one, if
we have more, linking to the first will do.)
<divGen type=toc>:
- For generated tocs, we need to generate the links back to the toc, using
generated ids that we will know of.
Third case: no toc: no link back will be generated:
Original issue reported on code.google.com by jhellingman
on 24 Sep 2012 at 2:52
1. value of attribute "http-equiv" is invalid; must be a string matching the
regular expression
"([Dd][Ee][Ff][Aa][Uu][Ll][Tt]\-[Ss][Tt][Yy][Ll][Ee])|([Rr][Ee][Ff][Rr][Ee][Ss][
Hh])"
See: https://code.google.com/p/epubcheck/issues/detail?id=135
Use <meta charset="utf-8" /> instead.
2. Obsolete or irregular DOCTYPE statement. External DTD entities are not
allowed. Use '<!DOCTYPE html>' instead.
Tricky thing with HTML5 doctype (some post-processing)?
3. attribute "summary" not allowed here; expected attribute "accesskey", "...
or "xml:space" (with xmlns:ns1="http://www.idpf.org/2007/ops"
xmlns:ns2="http://www.w3.org/2001/10/synthesis")
Remove the summary attribute from tables.
Original issue reported on code.google.com by jhellingman
on 6 May 2014 at 2:19
Documents contain external references as hyperlinks.
In some formats those hyperlinks are life and can be followed, in other (print)
they are not. In those cases it should be possible to collect them in a
dedicated section generated by
<divGen type="ExternalReferences"/>
Which includes all external references
* Each reference only occurs once in the list.
* Each reference links back to the source(s) (as link or by page number,
depending on the output format.
Original issue reported on code.google.com by jhellingman
on 27 Aug 2010 at 9:26
What steps will reproduce the problem?
1. I don't know exactly how explain this, so please see the attachment file
2.
3.
What is the expected output? What do you see instead?
Expected: correct characters
but only half of it, I'm not family with perl
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 13 Oct 2008 at 7:01
Attachments:
Currently, deploying this software is rather complex, requiring
* Download of various tools.
* Adjusting lines of code in various tools to point to the actual tool
locations.
* Heavy use of command-line interface.
Not all of this is avoidable:
* Not all tools can be bundled for because of licensing
* Configuration needs to be done.
However, I can prepare compiled executables, and an archive of all required
files in the right locations to make life easier for those wishing to use
these XSLT files.
Original issue reported on code.google.com by jhellingman
on 12 Nov 2009 at 2:07
Add semantic annotations to elements.
Original issue reported on code.google.com by jhellingman
on 14 May 2014 at 9:50
Write a short XSLT stylesheet that accepts the messages.xml file, and dumps
a .po file, and a Perl script that can reverse that process.
Put those .po files in a special directory
This will enable the use of transifex.net for translations.
Original issue reported on code.google.com by jhellingman
on 19 Apr 2010 at 11:16
Mark corrections in footnotes as such in list of corrections.
That is, say "page 23 (footnote)" or similar if the correction appears in a
footnote on that page.
Original issue reported on code.google.com by jhellingman
on 14 Feb 2011 at 2:56
The element replaces ad-hoc methods of including references to external images in P3.
The current code is developed for producing web-based editions with
low-resolution images. When we also want to produce version suitable for
printing, we need ways to also specify alternative, high resolution images that
can be printed, preferably without changes to the master files themselves.
The idea is to pull the images from an alternative path containing the high
resolution versions.
The idea is to have a hires/ folder next to images/, which should contain the
high resolution versions of the low resolution images. In images.xml, we
collect all information about both, and verify this is the case.
Now, when generating a PDF for print, we use the hires instead of images folder
as source.
Original issue reported on code.google.com by jhellingman
on 27 Aug 2010 at 9:38
$ git clone https://github.com/jhellingman/tei2html
$ cd tei2html/tools
$ perl tei2html.pl
Can't locate SgmlSupport.pm in @INC (you may need to install the SgmlSupport module) (@INC contains: /usr/local/lib64/perl5/5.30 /usr/local/share/perl5/5.30 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at tei2html.pl line 14.
BEGIN failed--compilation aborted at tei2html.pl line 14.
Why isn't Perl loading the file from the current directory?
These files use language codes that are not supported.
Validation:
MythsOfTheCherokee-pl-05.xhtml contains: la-x-bio
The xml:lang and lang attributes must use language codes that conform to
[RFC5646]. The official list of language codes can be found here.
Best Practices:
Much of the Table of Contents is written with all capital letters. We recommend
using mixed case.
Text color should not be set to black using CSS, because the text will be
unreadable in Night Reading Mode on some devices. This EPUB has the following
CSS rules which set text to black:
MythsOfTheCherokee.css – .transcribernote
MythsOfTheCherokee.css – .advertisment
MythsOfTheCherokee.css – body, a.hidden
This EPUB contains content files over 300 KB in size. While there is no formal
standard regarding individual file sizes, larger file sizes can slow down the
loading and response time on older eBook devices. Also, some major devices in
the marketplace use the Adobe RMSDK, and that display engine has a hard limit
of 300 KB on all content files. Please consider breaking these files into
multiple parts to keep them under this limit.
See this file: MythsOfTheCherokee-ch2.xhtml
See this file: MythsOfTheCherokee-ch5.xhtml
See this file: MythsOfTheCherokee-ch6.xhtml
See this file: MythsOfTheCherokee-ix.xhtml
This EPUB does not define the beginning of its main content. Some reading
systems use this setting when a reader opens your book for the first time or
wants to navigate to the beginning of the book.
Please see our Handbook for an example showing how to define the start location.
http://ebookflightdeck.com/handbook/startlocation
We recommend adding a landmark navigation section to your nav document. Some
retailers recommend specifying the location of the cover image, table of
contents, and the start of the text in the landmarks.
Please see our Handbook entry for a template landmarks section, and the
Retailer Grid for more information about retailer requirements.
http://ebookflightdeck.com/handbook/landmarks
Original issue reported on code.google.com by jhellingman
on 13 Jun 2014 at 11:40
Currently the rend attributes are used in a somewhat hap-hazard way. The
idea is to make them map to CSS 2.0 features in a consistent way.
Basically, we can recognize three types of values in the rend attributes
1. Shortcut values, for example rend="sc" to make the content of an element
small caps. This typically works on selected elements only.
2. tei2html specific values, for example rend="image(example.gif)" to
achieve certain rendering effects during the transformation from TEI to HTML.
3. CSS pass-through values, for example rend="background-color(red)", which
is translated to a CSS style-sheet rule background-color: red.
Currently, the rend attributes are not handled for all elements. (Although
CSS style-sheet rules are produced for them.)
Original issue reported on code.google.com by jhellingman
on 21 Jan 2010 at 2:05
The tei2html stylesheets have been developed with the now very old TEI P4 version of TEI. The stylesheets should also work with TEI P5 and later documents. For this a few changes need to made:
For multivolume works, it sometimes is needed to nest the top-level element. Support this in a reasonable way.
Footnotes are typically placed at the end of the div1 element they appear
in. This works fine, except when footnotes appear outside a div1 element,
in which case they have nowhere to go.
For this a number of solutions are possible.
1. Place at end of div1 (this may place some footnotes wildly out-of-order)
2. Place at end of chunk of div0 before its first div1.
3. Insert using explicit <divGen type=footnotes> element.
Original issue reported on code.google.com by jhellingman
on 7 Dec 2009 at 7:46
What steps will reproduce the problem?
1. Create a TEI file in which an <lg> element appears.
2. Create a <pb n="123"> tag somewhere in the <lg> element.
3. Render with the tei2html stylesheet.
What is the expected output? What do you see instead?
Expected: The pagenumber is rendered in the right margin, aligned as all
other line numbers.
Actual: The pagenumber is not fully in the right margin.
Original issue reported on code.google.com by jhellingman
on 19 Oct 2006 at 10:07
The inclusion of images is currently complex:
* image file names for images are derived from various types of information
** id attribute
** url attribute
** rend attribute
* File metadata is collected in, and retrieved from an imageinfo.xml file
generated on the fly.
* 'Standard' mechanisms are not used.
* Different images for use in screen and print output are not supported.
* XSLT code concerned is complex.
Need to investigate proper way to include illustrations in generated HTML,
and refactor the related code.
Original issue reported on code.google.com by jhellingman
on 12 Nov 2009 at 2:11
ePub 3.1 is out since January 2017. For details see http://www.idpf.org/epub/31/spec/epub-spec.html, and in particular http://www.idpf.org/epub/31/spec/epub-changes.html.
For now, intent is to put this behind a setting.
The current CSS stylesheets used contain various constructs that are not
consistently supported in most ePub readers, to wit:
* margin: auto
* position: relative, absolute
* background-image: <url>
* float: <position>
These need to be replaced by constructs that do work even in simple ePub
devices.
Original issue reported on code.google.com by jhellingman
on 5 Jul 2010 at 12:11
Line 85 in 474e1b1
What steps will reproduce the problem?
1. Open any generated ePub in Adobe Digital Editions.
2. Look at the title page, it doesn't look nice, and is sometimes even
completely white.
What is the expected output? What do you see instead?
A neat looking titlepage, also recognizable as thumbnail.
The idea here is to do the following.
1. Some rules to recognize content that can be used for creating a neat
titlepage (for example, a specific illustration.)
2. Some standard title-page templates that can be used to create a
title-page from the metadata, more complex than the current simple title-page.
3. Place the title-page in a file of itself. (taking care of references
leading to it still work.)
Original issue reported on code.google.com by jhellingman
on 10 Jan 2010 at 12:27
Currently there are little options to change the generated output, other than
setting parameters on the input, or changing the actual code or document.
It would be nice to activate or deactivate various features using a
configuration file, e.g. tei2html.config, with the ability to set the following:
* Include images (Y/N/All/Important)
* Image path (<path>)
* Include external references (Y/N)
* Footnote location (Page/Chapter/Work)
* Generate colophon (Y/N)
* Generate table of contents (Front/Back/None)
* Additional CSS stylesheets (<name>)
* CSS stylesheet location (Internal/External)
* Generate marginal page-numbers (Y/N)
* Generate links to page-images (Y/N)
Things that can also be handled via CSS:
* Default table alignment (Left/Right/Center)
* Default verse alignment (Left/Right/Center)
Original issue reported on code.google.com by jhellingman
on 1 Feb 2011 at 1:50
What steps will reproduce the problem?
1. Create a TEI file with a footnote in a paragraph with embedded text
document with verse (that is <q><text><body><div1>....)
2. Render using tei2html
3. Validate in HTML validator
What is the expected output? What do you see instead?
Expected: Valid HTML.
Actual: Invalid HTML (additional </p> tags.)
Since the paragraph model of HTML and TEI do not match, the current code
closes HTML paragraphs at certain points where the stylesheet needs to
place an element (such as a table) in the HTML file. To do this, it checks
whether we are in a <p> in TEI. Sometimes, in TEI, however, we can be
inside a <p> twice (for example in <note>s). This is not handled correctly.
Original issue reported on code.google.com by jhellingman
on 20 Oct 2006 at 7:12
heads with @type='super' are included in the ncx file. They should be skipped.
Original issue reported on code.google.com by jhellingman
on 10 Dec 2009 at 1:17
Add relevant attributes to documents, and process these to produce texts with consistent metadata.
See: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.canonical.html
Then add things like:
<name ref="https://viaf.org/viaf/109557338/" type="person">Seamus Heaney</name>
and
<author> <name key="Hugo, Victor (1802-1885)" ref="https://www.idref.fr/026927608/">Victor Hugo</name> </author>
OPF file contains entries for divisions hidden by rend="display(none)"
Original issue reported on code.google.com by jhellingman
on 18 Jan 2011 at 11:23
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.