Comments (8)
The Schematron passed just fine, but the RelaxNG schema produced 159 problems initially. I fixed these using a combination of approaches:
- The schema flagged
hi/@rend="smallcaps"
andhi/@rend="roman"
. Added these values to our ODD file, frus.odd. - The schema flagged
<opener>
and<salute>
; these will be nice to have to translate into flush-left paragraphs as an alternative top/@rend="flushleft"
and a good complement tocloser/signed
. Added these elements to our ODD. - Added
<gap>
to our ODD, constraining the attributes to just@quantity
and@unit
; will need to notify DSCS to use@quantity
instead of@extent
. - Deleted instances of
orgName
,affiliation
not allowed in our ODD; while nice from a semantic perspective, they don't add particular analytical value in the mode applied by DSCS. - Changed
ref/@ana
in d290 toref/@target
Also spotted these problems in the course of the schema review:
- Line breaks
<lb/>
needed on pgII between lines, "DEPARTMENT OF STATE Office of the Historian...". Added the missing line break elements.
from frus-tei.
From previous SVN commits:
- Added missing
<lb>
line break elements in multi-line signatures. Found these instances with this XPath in oXygen://closer[.//affiliation and not(.//lb)]
. This leverages vendor's use of<affiliation>
elements for the 2nd and subsequent lines following the signature. - Also, worked on Published & Unpublished Sources headings in the source note.
- Added missing
@type
attributes to subject and participant lists (vendor seems to have been thrown by lists whose headings were variants of the usual entries, i.e., PRESENT, PRECIS, RE, CRYPTONYM, etc.) TODO add to our guidelines. - Fixed missing space in d83fn4: "Congo Crisis,Document 71"
- In scanning cross references to other volumes, found generally good tagging, but (TODO) we should standardize our style guide for linked cross references. There's a lot of room for interpretation about how much of, and which portions of a cross reference to tag, and when to take enumerated volume, document, or footnote numbers.
- Another issue is paragraphs that were tightly spaced (vertically) in the PDF but are tagged simply as paragraphs, indistinguishable from other paragraphs. We often use this tight spacing to set off lists, quotes, etc. from the normal flow of paragraphs. We should decide if tight spacing needs to be tagged or not, and whether to continue with the current practice (of
list/item
, sans@type
). (TODO) - Added space missing at start of numbered paragraphs (#19-27) in d579, e.g.:
<p>27.The US enjoys...
- Similarly, a space was missing between document number and heading of d580:
<head>580.Memorandum From...
- Based on these two examples, I searched with this regular expression:
\d\.[A-Z]
(i.e., one digit followed by a period and a capital letter) and found other instances of this in d342, d495, d496, d501, d504-6, d509-10, d512-5, d517-8, d520-2, d524-5, d528-9, d531, d533, d535, d539, d541, d544, d548, d550, d560, d565, d568-70. Besides conjoined document numbers/headings, this phenomenon was manifest in cable numbers, e.g.:<p>2402.Ref...
The document heading cases could be a candidate for a schematron error. The paragraph-level instances could be a warning, since they're not strictly forbidden?
from frus-tei.
Initial notes on the random sample:
- The PDF has 921 pages. 5% = 46 pages.
- Setting aside the front matter, which I already looked at closely, the body has 887 pages. 5% = 44 pages. Pages 1-44 would cover documents 1-32. Going by documents, 5% of 582 documents would be 29 documents.
- Best to take a random 30 documents. How about documents 1-5 of each 100 documents? (Other reviews could take other approaches - best that we vary our approaches.)
- d1: for page 2 broke in the middle of the word. Our guidelines have always been not to break a word, but to place the pb after the final word of a page.
- d3: dang, I should've replaced (in signatures) with .
- d3: "Conakat" not tagged as a term (CONAKAT is in the terms list)
from frus-tei.
#d100-#104
- no issues
#d200-#d204
- Noticed in this range of documents that here and throughout
Stan
andLeop
were not tagged with the<gloss>
element. Added it in this range, but should be added elsewhere. - Silently corrected typo in #d208:
assasinate
>assassinate
- #d203 for tight spacing text in 4A-E, changed
<p>
to<list>
-nested<item>
elements sans@type
. (TODO: clarify guidelines on this, esp. wrt.@type
.) - #d204 tagged
ChiCom
as<gloss>
. Also caught 5 instances elsewhere with regex search for\schicoms?\s
(whitespace + chicom + optional s + whitespace). Wondering why this (and Stan and Leop) were missed - perhaps because of case variation? If so, perhaps this was a prudent, intentional omission. And this could point to something we should be on the lookout for.
Also
- fixed all instances of
<pb>
breaking in the middle of words, moving the<pb>
to the end of the word: Find:([^\s]+)(<pb[^>]+?>)([^\s]+)
Replace with:$1$3 $2
from frus-tei.
#d300-#d304
- #d300: Noticed that
Leo
was tagged - this matched case of entry in terms list. But noticed that there is a "Leo G. Cyr" in the persons list. A possibility for mistagging, especially in cryptic telegrams? Similarly, many names are tagged, even if only the last name is present. I recall our guidance was to tag people only if the full name or title + last name was present. The concern about tagging instances where only the last name is present is that there could be ambiguity and thus mistagging. - #d301: Noticed smooshed spacing in item C, between
<hi>
and<gloss>
. TODO: add check for sibling elements like these, which results in a space being inserted if serialization parameterindent=no
. Similarly, sibling<gloss>
elements (e.g., #d402 "AmbLeo")
from frus-tei.
#d400-#d404
- #d402: odd double accent mark on the "e" in "Chargé" in the PDF was luckily not preserved in the XML!
- #d402: noticed extraneous
@corresp
on the<signed>
element - `. Deleted all 235 instances of this in the volume. Tell DSCS to omit this in the future.
#d500-#d504
- #d501: since the decision options follow the signature (and TEI doesn't allow paragraph content to appear following a
<closer>
element, DSCS followed our previous practice and tagged the signature with a<p rend="right">'. but we now encase the material following the closer like this decision option block in a
element, which is allowed following a
. (TODO: document use of
, as well as
frus:attachmentif we don't just use
in its stead - perhaps better to use a core TEI element rather than creating a new element, but only if we're not abusing the tag.) Applied this closer/postscript change to #d86, #d226, #d246 (I moved the interesting right-aligned phrase right above the signature from its own paragraph into the signed element... I'm thinking closers should make bold explicit instead of implicit; and should @rend="roman" reset both italic and bold or just italic?). there are still about 20 cases of this, which can be found with
//p[@rend='right']` - should be addressed when we flesh out the guidelines on this. many good cases of this here that can be used as illustrations for the guidelines. - #d501: the 3 options in
<p>
elements at the end@rend="flushleft"
to ensure they're rendered flushleft. - #d503: telegraph number (?) - the thing to the left of the dateline - needs
@rend="flushleft"
In summary:
- No significant issues in the volume to hold up release, but many areas where DSCS can improve for next time, illustrating where our guidelines could be tighter.
from frus-tei.
Spotted a few things during the ebook review:
- #d142 has a table - do we need borders? no, but there is a "total" line that is missing. TODO figure out how to encode/render these total/subtotal lines.
- #d569 fn2 is empty - indeed, the footnote is missing in the PDF too. resolved: delete the empty footnote.
from frus-tei.
This issue was moved to HistoryAtState/frus#10
from frus-tei.
Related Issues (20)
- frus1939v02
- frus1939v01
- frus1938v05
- frus1938v04
- Review frus1969-76ve14p2
- frus1938v03
- frus1938v01
- frus1938v02
- frus1937v05
- frus1937v04
- frus1937v03
- frus1937v02
- frus1937v01
- frus1936v05
- frus1936v04
- frus1936v03
- frus1936v02
- frus1936v01
- Note: Issues have been moved to HistoryAtState/frus
- Note: Wiki pages have been moved to HistoryAtState/hsg-project
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frus-tei.