Coder Social home page Coder Social logo

lexibank / halenepal Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 0.0 3.2 MB

CLDF dataset derived from Hale's "Wordlists in Selected Languages of Nepal" from 1973

License: Creative Commons Attribution 4.0 International

Python 68.65% TeX 31.35%
dataset clics3 calc mosel clts lexibank1

halenepal's Introduction

CLDF dataset derived from Hale's "Wordlists in Selected Languages of Nepal" from 1973

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Hale, Austin (1973): Clause, sentences, and discourse patterns in selected languages of Nepal. Kathmandu: Institute of Nepal and Asiatic Studies.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://stedt.berkeley.edu/~stedt-cgi/rootcanal.pl/source/AH-CSDPN

Conceptlists in Concepticon:

Statistics

CLDF validation Glottolog: 100% Concepticon: 78% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 13
  • Concepts: 997
  • Lexemes: 11,041
  • Sources: 1
  • Synonymy: 1.14
  • Invalid lexemes: 0
  • Tokens: 77,613
  • Segments: 158 (0 BIPA errors, 0 CLTS sound class errors, 158 CLTS modified)
  • Inventory size (avg): 69.46

Contributors

Name GitHub user Description Role
Austin Hale Author
Christoph Rzymski @chrzyki maintainer Other
Johann-Mattis List @LinguList maintainer Other
Natalia Morozowa concept mapping Other
STEDT digitization Editor, Distributor

CLDF Datasets

The following CLDF datasets are available in cldf:

halenepal's People

Contributors

chrzyki avatar lingulist avatar natalia-morozova avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

halenepal's Issues

non-matched srcids in stedt digitization

The reason why we have only some 7000 instead of 10000 forms int eh data now is that there are sourcids that are in STEDT but not in hale:

1 , early , XIID
2 , bark , 01.027
3 , leaf (small) , 01.025
4 , can / be able to do something , XIIIA1
5 , eighty , 09.20
6 , old (of objects) , XIID
7 , we (pl) , 01.003
8 , up , XIIC
9 , nineteen (inanimate) , 09.10
10 , some (solids) , 09.54
11 , pair , 09.41
12 , thirty-nine , 09.15
13 , fly , 01.064
14 , bark (of tree) , 01.027
15 , frequently , XIID
16 , hundred (inanimate) , 09.24
17 , name , 01.100
18 , BENEFACTIVE , XIIIA3
19 , hot , 01.093
20 , round , 01.098
21 , bird , 01.020
22 , person , 01.018
23 , away from , XIIC
24 , all (for things) , 01.009;12d
25 , toward , XIIC
26 , know , 01.058
27 , until / as long as , XIID
28 , leaf (large) , 01.025
29 , twenty-two , 09.12
30 , know (fact / person) , XIIIA2
31 , you, honorific singular , 01.002
32 , man (masc) , 01.017
33 , during , XIID
34 , neither , 09.43
35 , round (sphere) , 01.098
36 , someone , 09.05
37 , long , 01.014
38 , hither and thither , XIIC
39 , last , 09.34
40 , hear , 01.058
41 , that , 01.005
42 , during / in the midle , XIID
43 , old , XIID
44 , frequent , XIID
45 , forty , 09.16
46 , every , 09.45
47 , never , XIID
48 , drink , 01.054
49 , cold , 01.094
50 , know (sthg) , XIIIA2
51 , this , 01.004
52 , not , 01.008
53 , dry , 01.099
54 , four , 09.02
55 , can / be able to do something or know something , XIIIA1
56 , nothing , 09.49
57 , more , XIID7
58 , we (excl) , 01.003
59 , eight , 09.06
60 , white , 01.090
61 , less , XIID
62 , big , 01.013
63 , above (directly) , XIIC
64 , armpit , 66
65 , where , XIIC
66 , more , XIID
67 , behind , XIIC
68 , through , XIIC
69 , all , XIID
70 , full , 01.095
71 , between , XIIC
72 , unmixed / without condiment , XIID
73 , late , XIID
74 , third, one- , 09.37
75 , red , 01.087
76 , all (for things) , 01.009,12d
77 , new , XIID
78 , five , 09.03
79 , stone , 01.077
80 , we (pl exclu) , 01.003
81 , place , 06a.0406a.
82 , moon , 01.073
83 , let (smn do sthg) , XIIIA4
84 , know , 01.059
85 , something , 09.48
86 , behind (not visible) , XIIC
87 , no one , 09.51
88 , earth , 01.079
89 , get up , 0 2b1.59
90 , cloud , 01.080
91 , daily , XIID
92 , fish , 01.019
93 , ninety-nine (inanimate) , 09.23
94 , eighth , 09.33
95 , where (w. to) , XIIC
96 , hundred, two , 09.28
97 , die , 01.061
98 , quarter, one- , 09.38
99 , we (incl) , 01.003
100 , false , XIID14
101 , one , 01.011
102 , after / at last , XIID
103 , unmixed / pure , XIID
104 , root , 01.026
105 , ninety , 09.21
106 , give , 01.070
107 , some (fluids) , 09.54
108 , eat , 01.055
109 , come , 01.066
110 , sand , 01.078
111 , seven , 09.05
112 , we (du) , 01.003
113 , good , 01.097
114 , man , 01.017
115 , hundred and ninety (inanimate) , 09.27
116 , both , 09.44
117 , seventy , 09.19
118 , jackal , Hale 73 CSD
119 , up (straight up) , XIIC
120 , together , 09.47
121 , total , 09.46
122 , beyond , XIIC
123 , mixed , XIID
124 , hundred , 09.24
125 , half way , 09.36
126 , say , 01.071
127 , where , XIIC64
128 , new , 01.096
129 , sun , 01.072
130 , quarters, three- , 09.40
131 , sixty , 09.18
132 , bite , 01.056
133 , another , 09.52
134 , gloss , srcid
135 , fifty , 09.17
136 , third , 09.32
137 , lie , 01.067
138 , around , XIIC
139 , I , 01.001
140 , real , XIID
141 , tree , 01.023
142 , whole , XIID
143 , burn , 01.084
144 , all , I.9
145 , both (inanimate) , 09.44
146 , hundred and two (inanimate) , 09.25
147 , see , 01.057
148 , under (below) , XIIC
149 , water , 01.075
150 , in front of , XIIC56
151 , none , 09.42
152 , ash , 01.083
153 , beneath , XIIC
154 , second , 09.31
155 , half , 09.35
156 , hand , 
157 , smoke , 01.081
158 , woman , 01.016
159 , first , 09.30
160 , yellow , 01.089
161 , under , XIIC
162 , three , 09.01
163 , we (pl incl) , 01.003
164 , seed , 01.024
165 , dog , 01.021
166 , twenty-nine , 09.13
167 , thou , 01.002
168 , hundred and thirty , 09.26
169 , beside , XIIC
170 , some (grain) , 09.54
171 , until , XIID
172 , ten , 09.08
173 , some , 09.54
174 , nineteen , 09.10
175 , stand , 01.069
176 , thirteen , 09.09
177 , all , 01.009,12.
178 , something (unknown thing) , 09.48
179 , both (animate) , 09.44
180 , none (inanimate) , 09.42
181 , louse (head) , 01.022
182 , where (w. at) , XIIC
183 , when , XIID
184 , all , 01.009,12d
185 , hundred and two , 09.25
186 , too much , 09.55
187 , sleep , 01.060
188 , over , XIIC
189 , root / tuber , 01.026
190 , who , 01.006
191 , bite (past) , 01.056
192 , frequently / sometimes , XIID
193 , after , XIID
194 , sit , 01.068
195 , out of , XIIC
196 , rain , 01.076
197 , what , 01.007
198 , green , 01.088
199 , wing , 93
200 , partial , XIID
201 , nine , 09.07
202 , hundred and ninety , 09.27
203 , many , 01.010
204 , arm , 66
205 , path , 01.085
206 , two , 01.012
207 , most , XIID
208 , tickle , 158
209 , mountain , 01.086
210 , palm of hand , 66
211 , thirds, two- , 09.39
212 , none (animate) , 09.42
213 , thirty-one , 09.14
214 , twenty , 09.11
215 , louse , 01.022
216 , hundred and thirty (inanimate) , 09.26
217 , in / inside of , XIIC
218 , all , 01.009
219 , someone , 09.50
220 , far , XIIC1
221 , forty (inanimate) , 09.16
222 , over (above) , XIIC
223 , small , 01.015
224 , we , 01.003
225 , we (du incl) , 01.003
226 , behind (visible) , XIIC
227 , across , XIIC
228 , swim , 01.063
229 , until , XIID31
230 , old (of clothing) , XIID
231 , this , 01.100
232 , seed (includes fruit) , 01.024
233 , walk , 01.065
234 , cause (smn to do sthg) , XIIIA5
235 , down , XIIC
236 , before , XIID
237 , ninety-eight (inanimate) , 09.22
238 , all (for people) , 01.009
239 , ninety-eight , 09.22
240 , up (up country) , XIIC
241 , ninety (inanimate) , 09.21
242 , four (inanimate) , 09.02
243 , least , XIID
244 , star , 01.074
245 , night , 01.092
246 , black , 01.091
247 , ninety-nine , 09.23
248 , thousand, one , 09.29
249 , six , 09.04
250 , under (beneath) , XIIC
251 , we (du excl) , 01.003
252 , unmixed , XIID
253 , new one; one which is new , XIID
254 , kill , 01.062
255 , leaf , 01.025
256 , fire , 01.082
257 , let (smn do sthg) / permit , XIIIA4
258 , cold (wet) , 01.094
259 , infrequent , XIID
260 , above , XIIC

If those are identified (and ideally corrected in some json or whatever), we should have the full account of the data.

ids in halenepal not matching

There's a generally problematic way of handling ids in this dataset, as I just inspected, so I'm cleaning it up now. @chrzyki, please review, once I submit, as this is the reason why we had the trouble with missing forms...

checklist for release

  • contributors list (add url for stedt or github)
  • check concept list mapping another time

add sources to dataset

This is easy, as the data was freshly collected by the, so the data source is the bibtex of hte book.

Refactor code

As per @LinguList's suggestions:

looks okay, we have to get rid of re, though, as this is only used in one place. Ideally, we would have the same approach as in Huber-1992-375, i.e., Hale-1973-1798 has a digital_in_source whatever column that links to STEDT. This is too late now, so an explicit mapping as a dictionary seems like a good idea, I think, you could easily extract it by just collecting the mappings from our current approach, printing them, and adding them as a dictionary in the code.

Furthermore, I'd propose, once we're already on it, and I did not have time for it, to get rid of the namedtuples, as it can as well be done with a dictionary, right? So we'd have less obscure definitions in the start of the file.

Concepticon change

PR 872 over on Concepticon changed a concept mapping, so the list needs to be re-run for the next release.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.