Coder Social home page Coder Social logo

repertorium's Introduction

Repertorium of Old Bulgarian literature and letters

Overview

The Repertorium of Old Bulgarian Literature and Letters was conceived as an archival repository capable of encoding and preserving in SGML (and, subsequently, XML) format archeographic, paleographic, codicological, textological, and literary-historical data concerning original and translated medieval texts represented in Balkan and other Slavic manuscripts. The files are intended to serve both as documentation (fulfilling the goals of traditional manuscript catalogues) and as direct input for computer-assisted philological research. The project site, located at http://repertorium.obdurodon.org, was designed and implemented by David J. Birnbaum, Andrej Bojadžiev, Anisava Miltenova, and Diljana Radoslavova.

Reference materials

repertorium's People

Contributors

atoboy avatar djbpitt avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

lb42 claudius108

repertorium's Issues

note in msItemStruct

We have various notes as children of <msItemStruct> element. Some of them do not have attributes at all, some of them have various encoding, e.g. AA1322NBKM:

 <note place="foot" xml:id="note1">The text presents the type of Thunderbook,
                            predicting by the Zodiac circle. ...</note>

or from the same description:

<note place="foot" type="articleData" xml:id="note2">The text is wrongly
                            titled as Lunary. ...</note>

In most of the descriptions <note> in this context has no attributes at all.

Suggestion:
Use <note> without any attributes. When it is a child of <msItemStruct> display it as paragraph, not as footnote.

It will be good if the content of <note> is presented in HTML edition of the description.

Clean up indexes

Indexes contain elements and attributes no longer needed, which should be removed.

Transliterating Serbian names (Cyrillic to Latin)

Andrej writes:

Should'n we have Matić instead of Matich?

David replied:

If we're rendering a Serbian name in Latin letters I think we should use the BCS spelling, so yes, Matić. The decision becomes difficult, though, because people publish under different Romanizations of the same name. What if there's some American named Matich who publishes in English and spells his name "Matich"?

Andrej replied:

I would propose if it is a Serbian/Russian/Bulgarian name and we should transliterate it, then to follow http://repertorium.obdurodon.org/transliteration.xhtml (as you suggest), otherwise we should spell the name as the author prefers. So in this particular case will be Matić, because it is a transliteration.

New response (David):

I agree with Andrej's proposal above. At some point we may run into situations where authors have published with their names in Latin letters and in Cyrillic, and in inconsistent ways; cf. Čiževskij or Isačenko. We may want to revisit this question should we run into authors in our materials whose names pose that sort of problem. For now, though, I agree that if the name on the original work is spelled "Матић", we should write "Matić".

I will move this card to the To be implemented column. It does not require a change in ODD or schema, but it requires revising the XML documents and then my revising the addendum to our markup guidelines.

respStmt in scriptDesc/scriptNote

We have the following statement about respStmt in this place: The <gi>respStmt</gi> element and its children are to be used inside the <gi>scriptNote</gi> element only in situations where the responsibility belongs to the encoder (compiler) of the electronic description. They should be wrapped inside <gi>bibl</gi> element.
And further below:
In situations where the identification or description comes from a publication, though, instead of respStmt we must use a bibliographic pointer.
Why we need a special statement in the former case? Is it not clear that the information belongs to the encoder (the author of the file)?
I don't think we need <gi>respStmt</gi> and its content here at alll.

Element date as part of watermark element

@djbpitt
Right now the Schematron rule states:

In an album reference in a watermark, the num and date elements must alternate, starting with a num.

But when in the description I have no date but just the number in watermark album and I cannot check it the markup will be:

<watermark>
 <re:motif>Anchor in circle with anchor</re:motif>
   <ref type="bibl" target="bib:Moshin1973">
      <num>1942</num>
      <date/>
        </ref>
        </watermark>

In Zograf library I encoded the informaton like that:

<watermark>
<motif xmlns="http://www.ilit.bas.bg/repertorium/ns/3.0" facs="http://memoryofpaper.oeaw.ac.at/mosin/mosin.php?wmid=3050">Камбана в кръг</motif> <term>много подобен</term> <ref target="#Mošin1957" type="bibl">Mošin, Traljić 1957</ref><num>3050</num><date type="watermark" notBefore="1355" notAfter="1359">1355–1359</date>
</watermark>

And when I have no information about the date I just not use the element. Do we need an empty date element in this place.
How we should proceed?

Element <material>

In our Guidelines, ODD and Schema files we changed the TEI element <material> to include as attribute usage:

<elementSpec ident="material"
 module="msdescription" mode="change">
 <classes mode="change">
  <memberOf key="att.global"/>
  <memberOf key="att.global.linking"/>
  <memberOf key="att.global.analytic"/>
  <memberOf key="att.global.facs"/>
  <memberOf key="att.global.change"/>
  <memberOf key="att.canonical"/>
 </classes>
 <attList>
  <attDef ident="usage">
   <desc>describes the usage of the material</desc>
  </attDef>
 </attList>
</elementSpec>

In current TEI Guidelines they have an attribute function with the same meaning: (https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-material.html).

Proposal: 1) use the TEI declaration instead of ours. So <material> without our modifications; 2) change the encoding of our descriptions accordingly.

Multiple specific or general msName elements?

Some mss have multiple specific or general <msName >elements. We use a single <msName> as the name of the ms in listings (such as in the browsing interface), which means that although multiple names of the same type are allowed and can be seen in the full description, only one is rendered in short title listings. The order of names for a ms where there is more than one does not seem to be specified or constrained, though, so in situation where there is more than one, the one that winds up getting rendered in short title listings appears to be arbitrary.

Neither the Scripta documentation of <msName> (pp. 22–23) nor the HTML addendum says anything about how to order multiple specific or generic names. Shouldn't we document and enforce a consistent policy?

Orthographic description is inconsistent

Anisava writes:

I am a minimalist concerning the description.
In my opinion as we input the less data -- the better.
I understand the need of the differentiation between MS with regular juses and jers and others, but for the late manuscripts it is impossible to say in details what is the exact definition of because texts are from the different sources, and depends from their sources. I prefer summary and very short description.
May be it is not convenient for indices, but it is more useful for late MSS.

She is responding to Andrej's earlier message:

You are quite right that this situation is complicated. We have the following description in AMAdd39628BBL.xml:

<scribeLang>
    <orthography>
        <p>Old Church Slavonic.</p>
        <p>One-<emph>jus</emph> (<foreign xml:lang="cu">ѧ</foreign>) and
            one-<emph>jer</emph> (<foreign xml:lang="cu">ь</foreign>); sporadical usage of the so-called "middle jus".
            Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign>, regular initial and post-vocalic
            <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign> (see <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>. </p>
    </orthography>
    <lexis>In the synaxarion the Slavonic names of the months are used
        beside the Greek ones.</lexis>
</scribeLang>

First attempt:

<scribeLang>
    <summary>Old Church Slavonic</summary>
    <orthography>
        <p>One-<emph>jus</emph> (<foreign xml:lang="cu">ѧ</foreign>) and
            one-<emph>jer</emph> (<foreign xml:lang="cu">ь</foreign>); sporadical usage of the so-called "middle jus".
            Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign>, regular initial and post-vocalic
            <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign> (see <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>. </p>
    </orthography>
    <lexis>In the synaxarion the Slavonic names of the months are used
        beside the Greek ones.</lexis>
</scribeLang>

It is quite obvious that Old Church Slavonic in this case is just a summary. The other question is how to relate this summary with other descriptions (cf. below).
According to our Guidelines, the rest of the description should be divided into several paragraphs. If we will be using an element the description will be something like:

<scribeLang>
    <summary>Old Church Slavonic</summary>
    <langNote type="jer" subtype="front">One-jer</langNote>
    <langNote type="jus" subtype="nonEtymReg">One-jus (<foreign xml:lang="cu">ѧ</foreign>). Sporadical usage of the so-called "middle jus".</langNote>
    <langNote type="jotVowel">Regular initial and post-vocalic <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign></langNote>
    <langNote type="otherLetters">Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign></langNote>
    <langNote type="lexis">In the synaxarion the Slavonic names of the months are used beside the Greek ones.</langNote>
    <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>
</scribeLang>

The problem is that in most cases we have for the description of orthography/language something like:

Without juses, with two jers, irregular; West Bulgarian dialect features

So, where should this statement go? Now it is encoded as (AM82NIK.xml):

<scribeLang>
      <orthography>
          <p xmlns="http://www.tei-c.org/ns/1.0">Without juses, with two jers, irregular; West Bulgarian dialect
              features</p>
      </orthography>
  </scribeLang>

First variant:

<scribeLang>
    <summary>Without juses, with two jers, irregular; West Bulgarian dialect features</summary>
</scribeLang>

This variant doesn't go well with Old Church Slavonic above, or we can replace Old Church Slavonic in the description of AMAdd39628BBL.xml with One-jus, One-jer orthography in the <summary>.
Second variant. We will not use <summary> but something like

<langNote type="general">Without juses, with two jers, irregular; West Bulgarian dialect features</langNote>

Then summary in this context, if we need it, will be just something like a free prose.
Or we can make this:

<scribeLang>
    <summary>West Bulgarian dialect features</summary>
<langNote type="general">Without juses, with two jers, irregular</langNote>
</scribeLang>

Then <summary> will be in accordance with Old Church Slavonic and will refer only to language, not to orthography.
The most complicated approach will be something like:

<scribeLang>
    <summary>West Bulgarian dialect features</summary>
<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>
</scribeLang>

Then when you have With juses, with two jers, irregular it will be encoded just as:

<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>

without <summary>

Without juses, with two jers, irregular; Resavian orthography (school in most of the descriptions):

<scribeLang>
    <summary>Resavian orthography</summary> 
<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>
</scribeLang>

I like the last one.
Hm. How to proceed?

<supportDesc> redundancy: @material and <material>

  1. When you have
  <supportDesc material="paper">
     <support>
         <material>Paper</material>
     ...
     </support>
</supportDesc>

and there is no further information in <material>, than Paper or
Parchment I suggest to remove the element material entirely.

Or to remove material="paper" and to have

<supportDesc>
     <support>
         <material>Paper</material>
     ...
     </support>
</supportDesc>

This information should be unified somehow, but to repeat one and the
same data makes no sense.

If, however we have

  <supportDesc material="mixed">
     <support>
         <material>Paper</material>
     ...
     </support>
</supportDesc>

then we should write some more information, of course. The same is valid
if we would like to have some more data about the paper or parchment.
Then we will need:

  <supportDesc material="paper">
     <support>
         <material>Paper is of low quality</material>
     ...
     </support>
</supportDesc>

This situation is more complicated because there are several possible variants, and I agree that the most important thing is for us to be consistent. Your examples above are of three types:

  1. Attribute and element are identical and simple, e.g., both say just "paper".
  2. Attribute is "mixed" and there are multiple elements.
  3. Attributes is simple (e.g., "paper") and element is more detailed (e.g., "paper is of low quality")

We might want to approach this question by asking how we want to use the values. Here is a proposal (for discussion; I don't mean to suggest that it is necessarily what we should do):

  1. The @material attribute on the <supportDesc> element is for structured search and retrieval. For that reason, it's a token list drawn from a fixed inventory of strings: "paper", "parchment", and whatever else might actually occur (stone? wax? birchbark?). Because it is a token list, we would not use a value like "mixed"; if a manuscript includes both parchment and paper, we would write <supportDesc material="parchment paper">. The order of the values in a token list is not informational, so "parchment paper" and "paper parchment" are equivalent. The attribute is required and the value must include at least one token from the allowed list.

A: I don't quite understand what is wrong with "mixed", but anyway I tried to write <supportDesc material="parchment paper">, but it triggers immediately an error. According to TEI Schema:

attribute material { "paper" | "parch" | "mixed" | [teidata.enumerated](https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.enumerated.html) }?,

https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-supportDesc.html

Attributes using this datatype must contain a single ‘word’ which
contains only letters, digits, punctuation characters, or symbols: thus
it cannot include whitespace.

That means that only one word here is allowed. So, if we would like to use an attribute here which is different from paper or parch, we can use just mixed. The other way around is to change the schema for supportDesc? Should we do it?

  1. The <material> child of <support> is for human eyes, that is, it's what we render in the codicological description. It is optional, but whether we use it is governed by the following considerations:

a) Where there is a single material and no supplementary information (e.g., paper), we omit the <material> element. In the codicological description we'll upper-case the value of the attribute. This lets us avoid the duplication. We can validate this with Schematron: if the count of tokens in @material is not equal to 1, there must be at least one <material> element.

b) Whether there are multiple materials, the @material attribute contains more than one token and a separate <material> element is required for each type of support. If, for example, the manuscript is on a combination of parchment and paper, there will be two tokens in the @material attribute value and at least two <material> elements. There might be more than two if, for example, there are multiple types of paper.

c) Even when there is one type of material, the <material> element can be used if the description presented to humans should be more detailed than what the attribute allows. For example, the @material value might be just "paper", while the content of the <material> element would read something like "Paper of poor quality".

A: David, you describe very good the possible situations. So I will suggest:

a) You know that material is paper or parchment, but you have no further information, then encode this as:

<supportDesc material="Paper">
     <support>
     ...
     </support>
</supportDesc>

or

<supportDesc material="Parchment">
     <support>
     ...
     </support>
</supportDesc>

With upper case letter.

b) You have a mixture of paper and parchment. Here we should decide whether we will change the model of supportDesc allowing both words as value of material="Parchment Paper" (upper case), or we will stick with the value "mixed". (If we decide to change supportDesc I don't know what kind of attribute class should be this allowing us to have two words as attribute value).
What do you think? Then, as you suggested we will have two elements (it is repeatable).

c) You have some more information about paper, parchment, etc. Encode this as:

<supportDesc material="Paper">
     <support>
         <material>Paper is of low quality</material>
     ...
     </support>
</supportDesc>

So, in principle we should decide whether we would like to change the model for supportDesc or leave it as it is?

  1. Current possibility:
<supportDesc material="mixed">
     <support>
         <material>Paper is of low quality</material>
     <material>Thin parchment. There is almost no distinction between the flesh and hair side ...</material>
...
     </support>
</supportDesc>
  1. Changing @material:
<supportDesc material="Paper Parchment">
     <support>
         <material>Paper is of low quality</material>
     <material>Thin parchment. There is almost no distinction between the flesh and hair side ...</material>
...
     </support>
</supportDesc>

If we would like to retain both views: description of MS as database and description of MS as user perspective (reading as text), maybe the second one is better. What do you think?

p in binding

@djbpitt Right now the content of binding in TEI is:
element binding { (model.pLike | condition | decoNote)+ ...
Should we replace p with summary in this context. Cf. ours:

element binding { (model.pLike | bindingNote | condition | decoNote)+
So
element binding { (summary | bindingNote | condition | decoNote)+

Unknown place of origin for a manuscript

In some descriptions we have:

<origPlace>Unknown</origPlace>

In most of the descriptions <origPlace> is missing, which means it is Unknown.
Are we going to add

<origPlace>Unknown</origPlace>

in the descriptions, where this information is not explicitly encoded, or we are going to remove the element along with its content, and keep it only where we have some idea about the place of origin?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.