Coder Social home page Coder Social logo

cdli-gh / data Goto Github PK

View Code? Open in Web Editor NEW
49.0 49.0 12.0 5.9 GB

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)

Home Page: http://cdli.ucla.edu/bulk_data

atf catalogue cuneiform metadata

data's People

Contributors

chiarcos avatar epageperron avatar larsgw avatar withgaurav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

Instead of @column, some objects have @columnn (two n)

In cdliatf_unblocked.atf :

1 | BRM 3, 031 | @columnn 1
2 | BRM 3, 031 | @columnn 2
3 | BRM 3, 050 | @columnn 1
4 | BRM 3, 050 | @columnn 2
5 | CST 696 | @columnn 1
6 | CST 696 | @columnn 2
7 | PDT 1, 0377 | @columnn 1
8 | PDT 1, 0377 | @columnn 2
9 | PDT 1, 0388 | @columnn 1
10 | PDT 1, 0388 | @columnn 2
11 | PDT 1, 0396 | @columnn 1
12 | PDT 1, 0396 | @columnn 2
13 | PDT 1, 0398 | @columnn 1
14 | PDT 1, 0398 | @columnn 2
15 | PDT 1, 0482 | @columnn 1
16 | PDT 1, 0483 | @columnn 1
17 | PDT 1, 0483 | @columnn 2
18 | PDT 1, 0488 | @columnn 1
19 | PDT 1, 0488 | @columnn 2
20 | PDT 1, 0498 | @columnn 1
21 | PDT 1, 0498 | @columnn 2
22 | PDT 1, 0522 | @columnn 1
23 | PDT 1, 0522 | @columnn 2
24 | PDT 1, 0528 | @columnn 1
25 | PDT 1, 0528 | @columnn 2
26 | PDT 1, 0538 | @columnn 1
27 | PDT 1, 0538 | @columnn 2
28 | PDT 1, 0569 | @columnn 1
29 | PDT 1, 0569 | @columnn 2
30 | PDT 1, 0586 | @columnn 1
31 | PDT 1, 0586 | @columnn 2
32 | PDT 1, 0587 | @columnn 1
33 | PDT 1, 0587 | @columnn 2
34 | PDT 1, 0609 | @columnn 1
35 | PDT 1, 0609 | @columnn 2
36 | PDT 1, 0610 | @columnn 1
37 | PDT 1, 0610 | @columnn 2
38 | PDT 1, 0682 | @columnn 1
39 | PDT 1, 0682 | @columnn 2
40 | SAT 3, 1359 | @columnn 1
41 | SAT 3, 1359 | @columnn 2
42 | CBS 09275 | @columnn 1
43 | RIME 1.14.20.01, ex. 63 | @columnn 1
44 | RIME 2.13.01.01b | @columnn 1
45 | RIME 2.13.01.01b | @columnn 2
46 | RIME 4.01.05.04, ex. add120 | @columnn 1
47 | RIME 4.01.05.04, ex. add120 | @columnn 2
48 | RINAP 3/1 Sennacherib 24 composite | @columnn 1
49 | RIME 3/1.01.07.041, ex. add403 | @columnn 1
50 | ARTA 2015/003 | @columnn 1
51 | ARTA 2015/003 | @columnn 2
52 | CTMMA 1, 002 | @columnn 1
53 | CTMMA 1, 002 | @columnn 2
54 | TSÅ  0936 | @columnn 1
55 | TSÅ  0936 | @columnn 2
56 | MARI 05, p. 071, 104-105 no. 06 | @columnn 1
57 | MARI 05, p. 071, 104-105 no. 06 | @columnn 2


By the way, thanks for the amazing work!

P498859 column 5 Missing colon (':') on #tr directives.

In P498859 column 5 lines 29 and 33 some of the translation lines are missing a colon (':').

29. ki-iz-za-ta u3 ni-{szi}szir3-ta5
#tr.ts: kizzata u niširta 
#tr.en curtailment and deduction 
[...]
33. _a-sza3_ ad-di-na-asz2-szu a-na _nam_ ut-ter
#tr.ts eqel addinaššu ana pīhāti uttēr 
#tr.en the field that I gave he returns to the province, 

P203171 obverse line 5 Blank translation lines

In P203171 obverse line 5 and reverse line 1, the second empty translation lines should probably be removed.

5. ki-es3-sa2{ki#} 
#tr.en: Ki’eša, 
#tr.en: 
@reverse 
1. e2 dingir-re-ne#
#tr.en: houses of the gods, 
#tr.en: 

P497998 Missing colon (':') on #tr directives

In P497998 law 24 line 269, and law 42 line 553, the English translation line is missing a colon after the #tr.en directive.

269. tal-ta-du-du-u2-ni _dam_-su    
#tr.ts: taltaduduni aššassu
#tr.en had drawn away, his wife
>>QMAL 269
553. lu-u2 i-na sza-ku-ul-te
#tr.ts: lū ina šākulte
#tr.en whether at a banquet
>>QMAL 553

P497998 invalid #tr directives

In P497998 law 23, line 241, the translation marker is missing the language designation. #tr. should be #tr.en:.

241. u2-usz-szu-ru-szu-nu   
#tr.ts: uššurūšunu  
#tr. they shall release them;
>>QMAL 241    

law 24, line 269 is missing a colon (':') on the translation #tr directive.

269. tal-ta-du-du-u2-ni _dam_-su
#tr.ts: taltaduduni aššassu
#tr.en had drawn away, his wife
>>QMAL 269

law 42, line 553 is also missing a colon (':') on the translation #tr directive.

553. lu-u2 i-na sza-ku-ul-te
#tr.ts: lū ina šākulte
#tr.en whether at a banquet
>>QMAL 553

P402035 has invalid description

The description field of P402035 is just a copy of the #atf: lang directive on the next line. It should probably be AMT pl. 005 04 instead.

&P402035 = #atf: lang akk 
#atf: lang akk 
@tablet 
@obverse
$ beginning broken
1'. [...] _{szim#}buluh# {szim#}li_ x [...]

P272901 reverse line 7'+ Missing colon (':') on #tr directive.

In P272901, starting with reverse line 7' and going until the end of the tablet, the translation lines are missing a colon (':').

6'. i-na ki-sze2-er-szi2-im wa-asz2-ba-ku-ni _tug2-hi-a_ ta-ta#-[ad]-na-ni
#tr.en: in jail you sold textiles for me.
7'. i-na a-mu-tim u2 _tug2-hi-a_ ta-da-nim a-szur3-ma-lik
#tr.en When the iron and the textiles were sold in Aszszur-malik,
8'. kur-ub-isztar _szesz_-szu a-szur3-i-mi3-ti2 _dumu_ i-ku-pi2-a
#tr.en Kurub-Isztar, his brother Aszszur-imitti, son of Ikkupija,
[...]

Better Readme

Can we update the readme with an example showing how the dump looks like? Probably showing the first five entries of the data.

P333111 @tablet misspelled

In P333111 the @tblet label should be @tablet.

&P333111 = AbB 11, 134
#atf: lang akk
@tblet
@obverse
1. a-na _{d}suen_-i-ri-ba-am#
2. qi2-bi2-ma

Why are you storing zip files?

Every commit is a new copy of the zip files; git doesn't handle binary files very well and so they all get stored in the history separately. If they were stored as text, then presumably git could just store the deltas and the total size of the repo, currently 2.2gb, would grow much more slowly.

P504598, P504600, P504601 missing P-marker in atf

These objects have a bare id number on the &-line of their atf representation, without the normal P-prefix, which is required to look up the entries on the website.

&504600 = CDLI Seals 013473 (physical)
[...]
&504601 = CDLI Seals 013474 (physical)
[...]
&504598 = CDLI Seals 013481 (physical)

Those should be&P504600, &P504601, and &P504598, respectively.

P125779 duplicate indication

P125779 is marked as a duplicate/copy of P126262. I'm not sure what the correct syntax for this is, but using a &-line probably isn't it and confuses parsers.

&P125779 = PDT 1, 0363
& (obverse & obverse copy of P126262 = PDT 2, 0902)

Perhaps it should be a $-line, >>, or comment instead? Also it's not clear what obverse & obverse refers to. Should that be obverse & reverse?

Error importing cdlicat csv with Python

reported by @jnovotny-lmu
error in line 124209 (115 columns instead of 63 columns) of cdli_catalogue_1of2.csv

Line 124209 of the file has

,,,,21198/zz001w65mw,"no atf",,nn,,,,,"University of Pennsylvania Museum of Archaeology and Anthropology, Philadelphia, Pennsylvania, USA",,"obv damaged",10/24/2005,,,10/21/2018,,"20051024 fitzgerald_upenn","N 2004",,,,,,,,,Administrative,,,?,124245,0,277115,,Akkadian,,clay,"N 2004",,tablet,"Neo-Babylonian (ca. 626-539 BC)",,"600ppi 20160630","unpublished unassigned ?","Nippur (mod. Nuffar)",,nd,,,,,,,"Account; payments of shekel of ?; 10x16x2(u.e.)x2(le.e.,,,,21198/zz001w65nd,"no atf",,nn,,,,,"University of Pennsylvania Museum of Archaeology and Anthropology, Philadelphia, Pennsylvania, USA",,"rev destroyed",10/24/2005,,-/VIII/-,10/21/2018,,"20051024 fitzgerald_upenn","N 2005",,,,,,,,,Administrative,,,?,124246,0,277116,,Akkadian,,clay,"N 2005",,tablet,"Middle Babylonian (ca. 1400-1100 BC)",,"600ppi 20160630","unpublished unassigned ?","Nippur (mod. Nuffar)",,nd,,,,,,,"Ledger; accounts for certain months?; 8x3 lines",,,?,"no translation",?

P204453 obverse line 2 extra #tr directive.

P204453 has an duplicate translation directive on obverse line 2.

2. gurum2#-ak kiszib3-ba
#tr.en: inspections, sealed documents, 
#tr. inspections, sealed documents 

The second #tr. line should be removed.

P100643 invalid markup

In P100643 the & blank space state markup should be $ blank space (wrong sigil) and the previous line should probably be labelled 2 instead of the duplicate 3.

@reverse        
1. ur-ba-gara2      
3. szu ba-ti 
& blank space           
3. mu us2-sa an-sza-an{ki} ba-hul

P345966 missing $ after reverse line 2

In P345966 the blank space annotation is missing its initial $ sigil on the reverse after line 2. The line should be $ blank space.

2. kiszib3# ur-{d}nu-musz#-da 
#tr.en: the sealed tablet of Ur-Numushda. 
 blank space 
3. mu# us2-sa {d}szu-{d}suen lugal#-e bad3 mar-tu mu-du3
#tr.en: Year after: “The king Šu-Suen erected the Amorite wall.” 

P464358 malformed translation directives

In P464358 law 16 line 527, there's a space between the translation directive and the language code. #tr. en should be #tr.en. On lines 529 and 530, #tr.tr should be #tr.ts.

@law 16
[...]
527. la usz-te-s,i2-a-am
#tr.ts: lā uštēṣiam
#tr. en: has not let him go out,
[...]
529. id-da-ak          
#tr.tr: iddâk             
#tr.en: shall be killed.
@law 17
530. szum-ma a-wi-lum
#tr.tr: šumma awīlum
#tr.en: If a man

P338870 obverse line 9 incorrect #tr directive.

In P338870 on the obverse, line 9, the translation is marked as another line of normalization and is missing a colon. The second #tr.ts should be #tr.en:

9. u3 u2-sza-ri-a-kum 
#tr.ts: u ušari’akkum 
#tr.ts further I had (them) led to you. 

Catalogue data for P277115 and P277116 run together

There seems to be some corruption in the catalogue data export for P277115 and P277116. On line 124209 of cdli_catalogue_1of2.csv, the sub-genre comments column of the first tablet stops abruptly, without a closing quotation mark, and is followed by the entry for the second tablet on the same line.

[...],"Account; payments of shekel of ?; 10x16x2(u.e.)x2(le.e.,,,,21198/zz001w65nd,"no atf",[..]

P346149 column 1' line 6' ruling should use $-line

In P346149 column 1' line 6', the double ruling annotation is marked as part of the English translation. It should probably use $-line markup to match other tablets.

6'. sza3 gi4-[...] 
#tr.en: You can argue with me by means of your truthful(?) heart? 
#tr.en: (double ruling) 
7'. x [...]   

The line before 7' should instead be:

$ double ruling

P513444 has spurious .jpg

The ATF record for P513444 has a spurious .jpg after the CDLI id number on the first line.

&P513444.jpg = RIME 4.04.01.02, ex. add175

It should be

&P513444 = RIME 4.04.01.02, ex. add175

Some catalogue fields have incorrect text encodings

Some fields in the catalogue csv file have data in non-utf-8 encodings. This is confusing for readers, and also results in incorrect display on the object webpage.

For example in P222716 Frühdyn. Beterstatuetten displays as Fr√ºhdyn. Beterstatuetten in the secondary publications field.

It's common in the CDLI comments field as well. For example in P282483 Fs. Košak displays as Fs Ko√∂ak.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.