adobe-research / deft_corpus Goto Github PK
View Code? Open in Web Editor NEWThe Definition Extraction From Text corpus and relevant formatting scripts
License: Other
The Definition Extraction From Text corpus and relevant formatting scripts
License: Other
The evaluated labels for subtask 3 include "Qualifies", but the label is not present in the training data because it is mixed with the "Supplements" label (as confirmed in the forum). I just wanted to make sure this is being tracked.
Hello,
Checking at the released labeled data.
It looks like for task 2, unlabeled data does not match the size of labeled data.
For instance for the file task_2_t1_biology_0_0.deft, labeled data has 519 lines while the unlabeled one has 475 lines.
Is there a reason why we can observe this ?
Thanks
I might be missing something, but why do some sentences appear twice in a row in the corpus? E. g. the sentence "There are usually acknowledgment and reference sections as well as an abstract ( a concise summary ) at the beginning of the paper ." appears twice in a row in the file data/deft_files/train/t1_biology_0_0.deft.
I found some data mismatch between deft and source files
In file deft_files/train/t7_government_0_101.deft and source_txt/train/t7_government_0_101.txt
The text after L110 and this line does not exist in source text file.
This text in source file is not present in deft file
Also there is tokenization error in
This is one of the examples, there might be more of these kinds. Please try to resolve this ASAP.
@sashaspala @Franck-Dernoncourt
Thanks
These IDs seem to have no rules to follow.
https://github.com/adobe-research/deft_corpus/tree/master/evaluation and https://competitions.codalab.org/competitions/20900#learn_the_details-evaluation are missing some high-level description of the evaluation metrics.
On working on the first task, I have noticed that some sentences are composed of less than 5 number of words.
After checking the parsing script, I couldn't really understand your idea of splitting the conll file into sentences using these regexps.
deft_corpus/task1_converter.py
Line 40 in a4667bb
One of the strange sentences is line 45 in the parsed t7_government_1_404.deft
file:
" 1993 . 7073 ." "0"
From the task's point of view, these sentences aren't definitions. But I am not sure whether this was done on purpose or not.
Thanks a lot.
I found 266 examples (context-windows) which have tokens with root_ids marked as "0" and tag_id, say TXXX, but there are no tokens with root_id TXXX in example in train and dev set.
For example there is such T105 tokens:
data/source_txt/t3_physics_2_101.deft
TOKEN ROOT_ID TAG_ID RELATION
3161 -1 -1 0
. -1 -1 0
Another -1 -1 0
is -1 -1 0
what -1 -1 0
Democritus -1 -1 0
in -1 -1 0
particular -1 -1 0
believed -1 -1 0
— -1 -1 0
that -1 -1 0
there 0 T106 0
is 0 T106 0
a 0 T106 0
smallest 0 T106 0
unit 0 T106 0
that 0 T106 0
can 0 T106 0
not 0 T106 0
be 0 T106 0
further 0 T106 0
subdivided 0 T106 0
. -1 -1 0
Democritus -1 -1 0
called -1 -1 0
this T106 T194 Refers-To
the 0 T105 0
atom 0 T105 0
. -1 -1 0
We -1 -1 0
now -1 -1 0
know -1 -1 0
that -1 -1 0
atoms -1 -1 0
themselves -1 -1 0
can -1 -1 0
be -1 -1 0
subdivided -1 -1 0
, -1 -1 0
but -1 -1 0
their -1 -1 0
identity -1 -1 0
is -1 -1 0
destroyed -1 -1 0
in -1 -1 0
the -1 -1 0
process -1 -1 0
, -1 -1 0
so -1 -1 0
the -1 -1 0
Greeks -1 -1 0
were -1 -1 0
correct -1 -1 0
in -1 -1 0
a -1 -1 0
respect -1 -1 0
. -1 -1 0
Subtask 1: Sentence Classification
Given a sentence, classify whether or not it contains a definition. This is the traditional definition extraction task.
Does this mean that the sentence does not contain a definition only when the tag of each token in a sentence is “O”?
I am not able to access google groups of SemEval2020 task 6, I also tried to mail on [email protected], but wasn't able to connect.
Also try to provide alternate contact mail address on CodaLab page.
Please have a look.
Thanks
train/t4_psychology_2_303.deft
In data/source_txt/train/t4_psychology_2_303.txt 12007 12009 O -1 -1 0
this data/source_txt/train/t4_psychology_2_303.txt 12010 12014 O -1 -1 0
dimension data/source_txt/train/t4_psychology_2_303.txt 12015 12024 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12024 12025 O -1 -1 0
people data/source_txt/train/t4_psychology_2_303.txt 12026 12032 O -1 -1 0
who data/source_txt/train/t4_psychology_2_303.txt 12033 12036 O -1 -1 0
are data/source_txt/train/t4_psychology_2_303.txt 12037 12040 O -1 -1 0
high data/source_txt/train/t4_psychology_2_303.txt 12041 12045 O -1 -1 0
on data/source_txt/train/t4_psychology_2_303.txt 12046 12048 O -1 -1 0
psychoticism data/source_txt/train/t4_psychology_2_303.txt 12049 12061 O -1 -1 0
tend data/source_txt/train/t4_psychology_2_303.txt 12062 12066 O -1 -1 0
to data/source_txt/train/t4_psychology_2_303.txt 12067 12069 O -1 -1 0
be data/source_txt/train/t4_psychology_2_303.txt 12070 12072 O -1 -1 0
independent data/source_txt/train/t4_psychology_2_303.txt 12073 12084 O -1 -1 0
thinkers data/source_txt/train/t4_psychology_2_303.txt 12085 12093 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12093 12094 O -1 -1 0
cold data/source_txt/train/t4_psychology_2_303.txt 12095 12099 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12099 12100 O -1 -1 0
nonconformists data/source_txt/train/t4_psychology_2_303.txt 12101 12115 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12115 12116 O -1 -1 0
impulsive data/source_txt/train/t4_psychology_2_303.txt 12117 12126 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12126 12127 O -1 -1 0
antisocial data/source_txt/train/t4_psychology_2_303.txt 12128 12138 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12138 12139 O -1 -1 0
and data/source_txt/train/t4_psychology_2_303.txt 12140 12143 O -1 -1 0
hostile data/source_txt/train/t4_psychology_2_303.txt 12144 12151 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12151 12152 O -1 -1 0
whereas data/source_txt/train/t4_psychology_2_303.txt 12153 12160 O -1 -1 0
people data/source_txt/train/t4_psychology_2_303.txt 12161 12167 O -1 -1 0
who data/source_txt/train/t4_psychology_2_303.txt 12168 12171 O -1 -1 0
are data/source_txt/train/t4_psychology_2_303.txt 12172 12175 O -1 -1 0
high data/source_txt/train/t4_psychology_2_303.txt 12176 12180 O -1 -1 0
on data/source_txt/train/t4_psychology_2_303.txt 12181 12183 O -1 -1 0
superego data/source_txt/train/t4_psychology_2_303.txt 12184 12192 O -1 -1 0
control data/source_txt/train/t4_psychology_2_303.txt 12193 12200 O -1 -1 0
tend data/source_txt/train/t4_psychology_2_303.txt 12201 12205 O -1 -1 0
to data/source_txt/train/t4_psychology_2_303.txt 12206 12208 O -1 -1 0
have data/source_txt/train/t4_psychology_2_303.txt 12209 12213 O -1 -1 0
high data/source_txt/train/t4_psychology_2_303.txt 12214 12218 O -1 -1 0
impulse data/source_txt/train/t4_psychology_2_303.txt 12219 12226 O -1 -1 0
control data/source_txt/train/t4_psychology_2_303.txt 12227 12234 O -1 -1 0
— data/source_txt/train/t4_psychology_2_303.txt 12234 12235 O -1 -1 0
they data/source_txt/train/t4_psychology_2_303.txt 12235 12239 O -1 -1 0
are data/source_txt/train/t4_psychology_2_303.txt 12240 12243 O -1 -1 0
more data/source_txt/train/t4_psychology_2_303.txt 12244 12248 O -1 -1 0
altruistic data/source_txt/train/t4_psychology_2_303.txt 12249 12259 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12259 12260 O -1 -1 0
empathetic data/source_txt/train/t4_psychology_2_303.txt 12261 12271 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12271 12272 O -1 -1 0
cooperative data/source_txt/train/t4_psychology_2_303.txt 12273 12284 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12284 12285 O -1 -1 0
and data/source_txt/train/t4_psychology_2_303.txt 12286 12289 O -1 -1 0
conventional data/source_txt/train/t4_psychology_2_303.txt 12290 12302 O -1 -1 0
( data/source_txt/train/t4_psychology_2_303.txt 12303 12304 O -1 -1 0
Eysenck data/source_txt/train/t4_psychology_2_303.txt 12304 12311 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12311 12312 O -1 -1 0
Eysenck data/source_txt/train/t4_psychology_2_303.txt 12313 12320 O -1 -1 0
& data/source_txt/train/t4_psychology_2_303.txt 12321 12322 O -1 -1 0
Barrett data/source_txt/train/t4_psychology_2_303.txt 12323 12330 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12330 12331 O -1 -1 0
1985).While data/source_txt/train/t4_psychology_2_303.txt 12332 12343 O -1 -1 0
Cattell data/source_txt/train/t4_psychology_2_303.txt 12344 12351 O -1 -1 0
’s data/source_txt/train/t4_psychology_2_303.txt 12351 12353 O -1 -1 0
16 data/source_txt/train/t4_psychology_2_303.txt 12354 12356 O -1 -1 0
factors data/source_txt/train/t4_psychology_2_303.txt 12357 12364 O -1 -1 0
may data/source_txt/train/t4_psychology_2_303.txt 12365 12368 O -1 -1 0
be data/source_txt/train/t4_psychology_2_303.txt 12369 12371 O -1 -1 0
too data/source_txt/train/t4_psychology_2_303.txt 12372 12375 O -1 -1 0
broad data/source_txt/train/t4_psychology_2_303.txt 12376 12381 O -1 -1 0
, data/source_txt/train/t4_psychology_2_303.txt 12381 12382 O -1 -1 0
the data/source_txt/train/t4_psychology_2_303.txt 12383 12386 O -1 -1 0
Eysenck data/source_txt/train/t4_psychology_2_303.txt 12387 12394 O -1 -1 0
’s data/source_txt/train/t4_psychology_2_303.txt 12394 12396 O -1 -1 0
two data/source_txt/train/t4_psychology_2_303.txt 12397 12400 O -1 -1 0
- data/source_txt/train/t4_psychology_2_303.txt 12400 12401 O -1 -1 0
factor data/source_txt/train/t4_psychology_2_303.txt 12401 12407 O -1 -1 0
system data/source_txt/train/t4_psychology_2_303.txt 12408 12414 O -1 -1 0
has data/source_txt/train/t4_psychology_2_303.txt 12415 12418 O -1 -1 0
been data/source_txt/train/t4_psychology_2_303.txt 12419 12423 O -1 -1 0
criticized data/source_txt/train/t4_psychology_2_303.txt 12424 12434 O -1 -1 0
for data/source_txt/train/t4_psychology_2_303.txt 12435 12438 O -1 -1 0
being data/source_txt/train/t4_psychology_2_303.txt 12439 12444 O -1 -1 0
too data/source_txt/train/t4_psychology_2_303.txt 12445 12448 O -1 -1 0
narrow data/source_txt/train/t4_psychology_2_303.txt 12449 12455 O -1 -1 0
. data/source_txt/train/t4_psychology_2_303.txt 12455 12456 O -1 -1 0
Lines 1693-1777. Error in line 1753.
Update: examples were from old data. Nowadays it is from current repository data
train/t4_psychology_1_0.deft
In data/source_txt/t4_psychology_mkaplan_0.txt 31686 31688 O -1 -1 0
central data/source_txt/t4_psychology_mkaplan_0.txt 31689 31696 B-Term T189 0 Direct-Defines
sleep data/source_txt/t4_psychology_mkaplan_0.txt 31697 31702 I-Term T189 0 Direct-Defines
apnea data/source_txt/t4_psychology_mkaplan_0.txt 31703 31708 I-Term T189 0 Direct-Defines
, data/source_txt/t4_psychology_mkaplan_0.txt 31708 31709 O -1 -1 0
disruption data/source_txt/t4_psychology_mkaplan_0.txt 31710 31720 B-Definition T190 T189 Direct-Defines
in data/source_txt/t4_psychology_mkaplan_0.txt 31721 31723 I-Definition T190 T189 Direct-Defines
signals data/source_txt/t4_psychology_mkaplan_0.txt 31724 31731 I-Definition T190 T189 Direct-Defines
sent data/source_txt/t4_psychology_mkaplan_0.txt 31732 31736 I-Definition T190 T189 Direct-Defines
from data/source_txt/t4_psychology_mkaplan_0.txt 31737 31741 I-Definition T190 T189 Direct-Defines
the data/source_txt/t4_psychology_mkaplan_0.txt 31742 31745 I-Definition T190 T189 Direct-Defines
brain data/source_txt/t4_psychology_mkaplan_0.txt 31746 31751 I-Definition T190 T189 Direct-Defines
that data/source_txt/t4_psychology_mkaplan_0.txt 31752 31756 I-Definition T190 T189 Direct-Defines
regulate data/source_txt/t4_psychology_mkaplan_0.txt 31757 31765 I-Definition T190 T189 Direct-Defines
breathing data/source_txt/t4_psychology_mkaplan_0.txt 31766 31775 I-Definition T190 T189 Direct-Defines
cause data/source_txt/t4_psychology_mkaplan_0.txt 31776 31781 I-Definition T190 T189 Direct-Defines
periods data/source_txt/t4_psychology_mkaplan_0.txt 31782 31789 I-Definition T190 T189 Direct-Defines
of data/source_txt/t4_psychology_mkaplan_0.txt 31790 31792 I-Definition T190 T189 Direct-Defines
interrupted data/source_txt/t4_psychology_mkaplan_0.txt 31793 31804 I-Definition T190 T189 Direct-Defines
breathing data/source_txt/t4_psychology_mkaplan_0.txt 31805 31814 I-Definition T190 T189 Direct-Defines
( data/source_txt/t4_psychology_mkaplan_0.txt 31815 31816 O -1 -1 0
White data/source_txt/t4_psychology_mkaplan_0.txt 31816 31821 O -1 -1 0
, data/source_txt/t4_psychology_mkaplan_0.txt 31821 31822 O -1 -1 0
2005) data/source_txt/t4_psychology_mkaplan_0.txt 31823 31828 O -1 -1 0
. data/source_txt/t4_psychology_mkaplan_0.txt 31828 31829 O -1 -1 0
One data/source_txt/t4_psychology_mkaplan_0.txt 31829 31832 O -1 -1 0
of data/source_txt/t4_psychology_mkaplan_0.txt 31833 31835 O -1 -1 0
the data/source_txt/t4_psychology_mkaplan_0.txt 31836 31839 O -1 -1 0
most data/source_txt/t4_psychology_mkaplan_0.txt 31840 31844 O -1 -1 0
common data/source_txt/t4_psychology_mkaplan_0.txt 31845 31851 O -1 -1 0
treatments data/source_txt/t4_psychology_mkaplan_0.txt 31852 31862 O -1 -1 0
for data/source_txt/t4_psychology_mkaplan_0.txt 31863 31866 O -1 -1 0
sleep data/source_txt/t4_psychology_mkaplan_0.txt 31867 31872 O -1 -1 0
apnea data/source_txt/t4_psychology_mkaplan_0.txt 31873 31878 O -1 -1 0
involves data/source_txt/t4_psychology_mkaplan_0.txt 31879 31887 O -1 -1 0
the data/source_txt/t4_psychology_mkaplan_0.txt 31888 31891 O -1 -1 0
use data/source_txt/t4_psychology_mkaplan_0.txt 31892 31895 O -1 -1 0
of data/source_txt/t4_psychology_mkaplan_0.txt 31896 31898 O -1 -1 0
a data/source_txt/t4_psychology_mkaplan_0.txt 31899 31900 O -1 -1 0
special data/source_txt/t4_psychology_mkaplan_0.txt 31901 31908 O -1 -1 0
device data/source_txt/t4_psychology_mkaplan_0.txt 31909 31915 O -1 -1 0
during data/source_txt/t4_psychology_mkaplan_0.txt 31916 31922 O -1 -1 0
sleep data/source_txt/t4_psychology_mkaplan_0.txt 31923 31928 O -1 -1 0
. data/source_txt/t4_psychology_mkaplan_0.txt 31928 31929 O -1 -1 0
Lines 5794-5838. Error in line 5817.
I tested the evaluation scripts with the provided codes, config file, and test files.
However, I found the performance is quite different from the human calculation.
I doubt that the parameter name should be labels rather than target_names in semeval2020_0601_eval.py and semeval2020_0602_eval.py.
OR it would be nice that the authors could specify the version of scikit-learn they used.
The script output is shown below and my scikit-learn version is 0.20.3.
precision recall f1-score support HasDef 0.00 0.00 0.00 2 NoDef 0.60 1.00 0.75 3 micro avg 0.60 0.60 0.60 5 macro avg 0.30 0.50 0.37 5 weighted avg 0.36 0.60 0.45 5 precision recall f1-score support B-Term 0.50 0.50 0.50 2 I-Term 1.00 1.00 1.00 2 B-Definition 0.88 0.78 0.82 9 I-Definition 0.67 0.80 0.73 5 micro avg 0.78 0.78 0.78 18 macro avg 0.76 0.77 0.76 18 weighted avg 0.79 0.78 0.78 18 {'Direct-Defines': {'p': 1.0, 'r': 1.0, 'f': 1.0}, 'macro': {'p': 1.0, 'f': 1.0}}
Update: examples were from old data. Nowadays it is from current repository data
train/t1_biology_2_202.deft
As data/source_txt/t1_biology_rlacroix_202.txt 13106 13108 O -1 -1 0
shown data/source_txt/t1_biology_rlacroix_202.txt 13109 13114 O -1 -1 0
in data/source_txt/t1_biology_rlacroix_202.txt 13115 13117 O -1 -1 0
[ data/source_txt/t1_biology_rlacroix_202.txt 13118 13119 O -1 -1 0
link]a data/source_txt/t1_biology_rlacroix_202.txt 13119 13125 O -1 -1 0
, data/source_txt/t1_biology_rlacroix_202.txt 13125 13126 O -1 -1 0
some data/source_txt/t1_biology_rlacroix_202.txt 13127 13131 B-Definition T77 0 Refers-To
individual data/source_txt/t1_biology_rlacroix_202.txt 13132 13142 I-Definition T77 0 Refers-To
prokaryotes data/source_txt/t1_biology_rlacroix_202.txt 13143 13154 I-Definition T77 0 Refers-To
were data/source_txt/t1_biology_rlacroix_202.txt 13155 13159 I-Definition T77 0 Refers-To
responsible data/source_txt/t1_biology_rlacroix_202.txt 13160 13171 I-Definition T77 0 Refers-To
for data/source_txt/t1_biology_rlacroix_202.txt 13172 13175 I-Definition T77 0 Refers-To
transferring data/source_txt/t1_biology_rlacroix_202.txt 13176 13188 I-Definition T77 0 Refers-To
the data/source_txt/t1_biology_rlacroix_202.txt 13189 13192 I-Definition T77 0 Refers-To
bacteria data/source_txt/t1_biology_rlacroix_202.txt 13193 13201 I-Definition T77 0 Refers-To
that data/source_txt/t1_biology_rlacroix_202.txt 13202 13206 I-Definition T77 0 Refers-To
caused data/source_txt/t1_biology_rlacroix_202.txt 13207 13213 I-Definition T77 0 Refers-To
mitochondrial data/source_txt/t1_biology_rlacroix_202.txt 13214 13227 I-Definition T77 0 Refers-To
development data/source_txt/t1_biology_rlacroix_202.txt 13228 13239 I-Definition T77 0 Refers-To
to data/source_txt/t1_biology_rlacroix_202.txt 13240 13242 I-Definition T77 0 Refers-To
the data/source_txt/t1_biology_rlacroix_202.txt 13243 13246 I-Definition T77 0 Refers-To
new data/source_txt/t1_biology_rlacroix_202.txt 13247 13250 I-Definition T77 0 Refers-To
eukaryotes data/source_txt/t1_biology_rlacroix_202.txt 13251 13261 I-Definition T77 0 Refers-To
, data/source_txt/t1_biology_rlacroix_202.txt 13261 13262 I-Definition T77 0 Refers-To
whereas data/source_txt/t1_biology_rlacroix_202.txt 13263 13270 I-Definition T77 0 Refers-To
other data/source_txt/t1_biology_rlacroix_202.txt 13271 13276 I-Definition T77 0 Refers-To
species data/source_txt/t1_biology_rlacroix_202.txt 13277 13284 I-Definition T77 0 Refers-To
transferred data/source_txt/t1_biology_rlacroix_202.txt 13285 13296 I-Definition T77 0 Refers-To
the data/source_txt/t1_biology_rlacroix_202.txt 13297 13300 I-Definition T77 0 Refers-To
bacteria data/source_txt/t1_biology_rlacroix_202.txt 13301 13309 I-Definition T77 0 Refers-To
that data/source_txt/t1_biology_rlacroix_202.txt 13310 13314 I-Definition T77 0 Refers-To
gave data/source_txt/t1_biology_rlacroix_202.txt 13315 13319 I-Definition T77 0 Refers-To
rise data/source_txt/t1_biology_rlacroix_202.txt 13320 13324 I-Definition T77 0 Refers-To
to data/source_txt/t1_biology_rlacroix_202.txt 13325 13327 I-Definition T77 0 Refers-To
chloroplasts data/source_txt/t1_biology_rlacroix_202.txt 13328 13340 I-Definition T77 0 Refers-To
. data/source_txt/t1_biology_rlacroix_202.txt 13340 13341 O -1 -1 0
Lines 2402-2437. Error in line 2406.
The data/source_txt/t1_biology_rlacroix_202.txt 12685 12688 B-Term T72 0 Direct-Defines
nucleus data/source_txt/t1_biology_rlacroix_202.txt 12689 12696 I-Term T72 0 Direct-Defines
- data/source_txt/t1_biology_rlacroix_202.txt 12696 12697 I-Term T72 0 Direct-Defines
first data/source_txt/t1_biology_rlacroix_202.txt 12697 12702 I-Term T72 0 Direct-Defines
hypothesis data/source_txt/t1_biology_rlacroix_202.txt 12703 12713 I-Term T72 0 Direct-Defines
proposes data/source_txt/t1_biology_rlacroix_202.txt 12714 12722 B-Definition T73 T72 Direct-Defines
that data/source_txt/t1_biology_rlacroix_202.txt 12723 12727 I-Definition T73 T72 Direct-Defines
the data/source_txt/t1_biology_rlacroix_202.txt 12728 12731 I-Definition T73 T72 Direct-Defines
nucleus data/source_txt/t1_biology_rlacroix_202.txt 12732 12739 I-Definition T73 T72 Direct-Defines
evolved data/source_txt/t1_biology_rlacroix_202.txt 12740 12747 I-Definition T73 T72 Direct-Defines
in data/source_txt/t1_biology_rlacroix_202.txt 12748 12750 I-Definition T73 T72 Direct-Defines
prokaryotes data/source_txt/t1_biology_rlacroix_202.txt 12751 12762 I-Definition T73 T72 Direct-Defines
first data/source_txt/t1_biology_rlacroix_202.txt 12763 12768 I-Definition T73 T72 Direct-Defines
( data/source_txt/t1_biology_rlacroix_202.txt 12769 12770 I-Definition T73 T72 Direct-Defines
[ data/source_txt/t1_biology_rlacroix_202.txt 12770 12771 I-Definition T73 T72 Direct-Defines
link]a data/source_txt/t1_biology_rlacroix_202.txt 12771 12777 I-Definition T73 T72 Direct-Defines
) data/source_txt/t1_biology_rlacroix_202.txt 12777 12778 I-Definition T73 T72 Direct-Defines
, data/source_txt/t1_biology_rlacroix_202.txt 12778 12779 I-Definition T73 T72 Direct-Defines
followed data/source_txt/t1_biology_rlacroix_202.txt 12780 12788 I-Definition T73 T72 Direct-Defines
by data/source_txt/t1_biology_rlacroix_202.txt 12789 12791 I-Definition T73 T72 Direct-Defines
a data/source_txt/t1_biology_rlacroix_202.txt 12792 12793 I-Definition T73 T72 Direct-Defines
later data/source_txt/t1_biology_rlacroix_202.txt 12794 12799 I-Definition T73 T72 Direct-Defines
fusion data/source_txt/t1_biology_rlacroix_202.txt 12800 12806 I-Definition T73 T72 Direct-Defines
of data/source_txt/t1_biology_rlacroix_202.txt 12807 12809 I-Definition T73 T72 Direct-Defines
the data/source_txt/t1_biology_rlacroix_202.txt 12810 12813 I-Definition T73 T72 Direct-Defines
new data/source_txt/t1_biology_rlacroix_202.txt 12814 12817 I-Definition T73 T72 Direct-Defines
eukaryote data/source_txt/t1_biology_rlacroix_202.txt 12818 12827 I-Definition T73 T72 Direct-Defines
with data/source_txt/t1_biology_rlacroix_202.txt 12828 12832 I-Definition T73 T72 Direct-Defines
bacteria data/source_txt/t1_biology_rlacroix_202.txt 12833 12841 I-Definition T73 T72 Direct-Defines
that data/source_txt/t1_biology_rlacroix_202.txt 12842 12846 I-Definition T73 T72 Direct-Defines
became data/source_txt/t1_biology_rlacroix_202.txt 12847 12853 I-Definition T73 T72 Direct-Defines
mitochondria data/source_txt/t1_biology_rlacroix_202.txt 12854 12866 I-Definition T73 T72 Direct-Defines
. data/source_txt/t1_biology_rlacroix_202.txt 12866 12867 O -1 -1 0
Lines 2322-2345. Error in line 2337.
Subtask 2: Sequence labeling We will report P/R/F1 for each evaluated class, as well as macro- and micro-averaged F1 for the evaluated classes. The official score will be based on the macro-averaged F1 of the evaluated classes.
Subtask 3: Relation extraction We will report P/R/F1 for each evaluated relation, as well as macro- and micro-averaged F1 for the evaluated relations. The official score will be based on the macro-averaged F1 of the evaluated relations.
We should specify the list of the evaluated classes/relations. Is it all classes in tables 2 and 3 in https://sigann.github.io/LAW-XIII-2019/pdf/W19-4015.pdf?
Update: examples were from old data. Nowadays it is from current repository data
train/t5_economic_0_0.deft
In data/source_txt/t5_economic_jlee_0.txt 9719 9721 O -1 -1 0
this data/source_txt/t5_economic_jlee_0.txt 9722 9726 O -1 -1 0
case data/source_txt/t5_economic_jlee_0.txt 9727 9731 O -1 -1 0
, data/source_txt/t5_economic_jlee_0.txt 9731 9732 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 9733 9736 O -1 -1 0
addition data/source_txt/t5_economic_jlee_0.txt 9737 9745 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 9746 9748 O -1 -1 0
still data/source_txt/t5_economic_jlee_0.txt 9749 9754 O -1 -1 0
more data/source_txt/t5_economic_jlee_0.txt 9755 9759 O -1 -1 0
barbers data/source_txt/t5_economic_jlee_0.txt 9760 9767 O -1 -1 0
would data/source_txt/t5_economic_jlee_0.txt 9768 9773 O -1 -1 0
actually data/source_txt/t5_economic_jlee_0.txt 9774 9782 O -1 -1 0
cause data/source_txt/t5_economic_jlee_0.txt 9783 9788 O -1 -1 0
output data/source_txt/t5_economic_jlee_0.txt 9789 9795 O -1 -1 0
to data/source_txt/t5_economic_jlee_0.txt 9796 9798 O -1 -1 0
decrease data/source_txt/t5_economic_jlee_0.txt 9799 9807 O -1 -1 0
, data/source_txt/t5_economic_jlee_0.txt 9807 9808 O -1 -1 0
as data/source_txt/t5_economic_jlee_0.txt 9809 9811 O -1 -1 0
shown data/source_txt/t5_economic_jlee_0.txt 9812 9817 O -1 -1 0
in data/source_txt/t5_economic_jlee_0.txt 9818 9820 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 9821 9824 O -1 -1 0
last data/source_txt/t5_economic_jlee_0.txt 9825 9829 O -1 -1 0
row data/source_txt/t5_economic_jlee_0.txt 9830 9833 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 9834 9836 O -1 -1 0
[ data/source_txt/t5_economic_jlee_0.txt 9837 9838 O -1 -1 0
link].This data/source_txt/t5_economic_jlee_0.txt 9838 9848 O -1 -1 0
pattern data/source_txt/t5_economic_jlee_0.txt 9849 9856 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 9857 9859 O -1 -1 0
diminishing data/source_txt/t5_economic_jlee_0.txt 9860 9871 O -1 -1 0
marginal data/source_txt/t5_economic_jlee_0.txt 9872 9880 O -1 -1 0
returns data/source_txt/t5_economic_jlee_0.txt 9881 9888 O -1 -1 0
is data/source_txt/t5_economic_jlee_0.txt 9889 9891 O -1 -1 0
common data/source_txt/t5_economic_jlee_0.txt 9892 9898 O -1 -1 0
in data/source_txt/t5_economic_jlee_0.txt 9899 9901 O -1 -1 0
production data/source_txt/t5_economic_jlee_0.txt 9902 9912 O -1 -1 0
. data/source_txt/t5_economic_jlee_0.txt 9912 9913 O -1 -1 0
Lines 1456-1491. Error in line 1481.
This data/source_txt/t5_economic_jlee_0.txt 10467 10471 O -1 -1 0
pattern data/source_txt/t5_economic_jlee_0.txt 10472 10479 O -1 -1 0
was data/source_txt/t5_economic_jlee_0.txt 10480 10483 O -1 -1 0
illustrated data/source_txt/t5_economic_jlee_0.txt 10484 10495 O -1 -1 0
earlier data/source_txt/t5_economic_jlee_0.txt 10496 10503 O -1 -1 0
in data/source_txt/t5_economic_jlee_0.txt 10504 10506 O -1 -1 0
[ data/source_txt/t5_economic_jlee_0.txt 10507 10508 O -1 -1 0
link].In data/source_txt/t5_economic_jlee_0.txt 10508 10516 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 10517 10520 O -1 -1 0
middle data/source_txt/t5_economic_jlee_0.txt 10521 10527 O -1 -1 0
portion data/source_txt/t5_economic_jlee_0.txt 10528 10535 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 10536 10538 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 10539 10542 O -1 -1 0
long data/source_txt/t5_economic_jlee_0.txt 10543 10547 O -1 -1 0
- data/source_txt/t5_economic_jlee_0.txt 10547 10548 O -1 -1 0
run data/source_txt/t5_economic_jlee_0.txt 10548 10551 O -1 -1 0
average data/source_txt/t5_economic_jlee_0.txt 10552 10559 O -1 -1 0
cost data/source_txt/t5_economic_jlee_0.txt 10560 10564 O -1 -1 0
curve data/source_txt/t5_economic_jlee_0.txt 10565 10570 O -1 -1 0
, data/source_txt/t5_economic_jlee_0.txt 10570 10571 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 10572 10575 O -1 -1 0
flat data/source_txt/t5_economic_jlee_0.txt 10576 10580 O -1 -1 0
portion data/source_txt/t5_economic_jlee_0.txt 10581 10588 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 10589 10591 O -1 -1 0
the data/source_txt/t5_economic_jlee_0.txt 10592 10595 O -1 -1 0
curve data/source_txt/t5_economic_jlee_0.txt 10596 10601 O -1 -1 0
around data/source_txt/t5_economic_jlee_0.txt 10602 10608 O -1 -1 0
Q3 data/source_txt/t5_economic_jlee_0.txt 10609 10611 O -1 -1 0
, data/source_txt/t5_economic_jlee_0.txt 10611 10612 O -1 -1 0
economies data/source_txt/t5_economic_jlee_0.txt 10613 10622 O -1 -1 0
of data/source_txt/t5_economic_jlee_0.txt 10623 10625 O -1 -1 0
scale data/source_txt/t5_economic_jlee_0.txt 10626 10631 O -1 -1 0
have data/source_txt/t5_economic_jlee_0.txt 10632 10636 O -1 -1 0
been data/source_txt/t5_economic_jlee_0.txt 10637 10641 O -1 -1 0
exhausted data/source_txt/t5_economic_jlee_0.txt 10642 10651 O -1 -1 0
. data/source_txt/t5_economic_jlee_0.txt 10651 10652 O -1 -1 0
Lines 1567-1602. Error in line 1574.
Data mismatch between deft files and corresponding source files: source doesn't represent deft.
For example, in data/deft_files/dev/t1_biology_0_0.deft
2 data/source_txt/dev/t1_biology_0_0.txt 0 1 O -1 -1 0
. data/source_txt/dev/t1_biology_0_0.txt 1 2 O -1 -1 0
It data/source_txt/dev/t1_biology_0_0.txt 3 5 O -1 -1 0
becomes data/source_txt/dev/t1_biology_0_0.txt 6 13 O -1 -1 0
and in data/source_txt/dev/t1_biology_0_0.txt
5. Science includes such diverse fields as astronomy, biology, computer sciences, geology, logic, physics, chemistry
Also for dev dataset - in the deft folder there are files for which there are no corresponding files in the source folder and vice versa:
t4_psychology_2_202.deft, t5_economic_2_202.deft, t5_economic_2_303.deft, t7_government_2_101.deft, t7_government_2_202.deft
t4_psychology_1_202.txt, t5_economic_1_202.txt, t5_economic_1_303.txt, t7_government_1_101.txt
Update: examples were from old data. Nowadays it is from current repository data
train/t7_government_2_202.deft
Someone data/source_txt/t7_government_rlacroix_202.txt 23368 23375 O -1 -1 0
concerned data/source_txt/t7_government_rlacroix_202.txt 23376 23385 O -1 -1 0
about data/source_txt/t7_government_rlacroix_202.txt 23386 23391 O -1 -1 0
protecting data/source_txt/t7_government_rlacroix_202.txt 23392 23402 O -1 -1 0
individual data/source_txt/t7_government_rlacroix_202.txt 23403 23413 O -1 -1 0
rights data/source_txt/t7_government_rlacroix_202.txt 23414 23420 O -1 -1 0
might data/source_txt/t7_government_rlacroix_202.txt 23421 23426 O -1 -1 0
join data/source_txt/t7_government_rlacroix_202.txt 23427 23431 O -1 -1 0
a data/source_txt/t7_government_rlacroix_202.txt 23432 23433 O -1 -1 0
group data/source_txt/t7_government_rlacroix_202.txt 23434 23439 O -1 -1 0
like data/source_txt/t7_government_rlacroix_202.txt 23440 23444 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 23445 23448 B-Term T36 0 AKA
American data/source_txt/t7_government_rlacroix_202.txt 23449 23457 I-Term T36 0 AKA
Civil data/source_txt/t7_government_rlacroix_202.txt 23458 23463 I-Term T36 0 AKA
Liberties data/source_txt/t7_government_rlacroix_202.txt 23464 23473 I-Term T36 0 AKA
Union data/source_txt/t7_government_rlacroix_202.txt 23474 23479 I-Term T36 0 AKA
( data/source_txt/t7_government_rlacroix_202.txt 23480 23481 O -1 -1 0
ACLU data/source_txt/t7_government_rlacroix_202.txt 23481 23485 B-Alias-Term T37 T36 AKA
) data/source_txt/t7_government_rlacroix_202.txt 23485 23486 O -1 -1 0
because data/source_txt/t7_government_rlacroix_202.txt 23487 23494 O -1 -1 0
it data/source_txt/t7_government_rlacroix_202.txt 23495 23497 O -1 -1 0
supports data/source_txt/t7_government_rlacroix_202.txt 23498 23506 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 23507 23510 O -1 -1 0
liberties data/source_txt/t7_government_rlacroix_202.txt 23511 23520 O -1 -1 0
guaranteed data/source_txt/t7_government_rlacroix_202.txt 23521 23531 O -1 -1 0
in data/source_txt/t7_government_rlacroix_202.txt 23532 23534 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 23535 23538 O -1 -1 0
U.S. data/source_txt/t7_government_rlacroix_202.txt 23539 23543 O -1 -1 0
Constitution data/source_txt/t7_government_rlacroix_202.txt 23544 23556 O -1 -1 0
, data/source_txt/t7_government_rlacroix_202.txt 23556 23557 O -1 -1 0
even data/source_txt/t7_government_rlacroix_202.txt 23558 23562 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 23563 23566 O -1 -1 0
free data/source_txt/t7_government_rlacroix_202.txt 23567 23571 O -1 -1 0
expression data/source_txt/t7_government_rlacroix_202.txt 23572 23582 O -1 -1 0
of data/source_txt/t7_government_rlacroix_202.txt 23583 23585 O -1 -1 0
unpopular data/source_txt/t7_government_rlacroix_202.txt 23586 23595 O -1 -1 0
views.https://www.aclu.org/ data/source_txt/t7_government_rlacroix_202.txt 23596 23623 O -1 -1 0
( data/source_txt/t7_government_rlacroix_202.txt 23624 23625 O -1 -1 0
March data/source_txt/t7_government_rlacroix_202.txt 23625 23630 O -1 -1 0
1 data/source_txt/t7_government_rlacroix_202.txt 23631 23632 O -1 -1 0
, data/source_txt/t7_government_rlacroix_202.txt 23632 23633 O -1 -1 0
2016 data/source_txt/t7_government_rlacroix_202.txt 23634 23638 O -1 -1 0
) data/source_txt/t7_government_rlacroix_202.txt 23638 23639 O -1 -1 0
. data/source_txt/t7_government_rlacroix_202.txt 23639 23640 O -1 -1 0
Lines 3908-3951. Error in line 3944.
The data/source_txt/t7_government_rlacroix_202.txt 47189 47192 O -1 -1 0
Republican data/source_txt/t7_government_rlacroix_202.txt 47193 47203 O -1 -1 0
Senate data/source_txt/t7_government_rlacroix_202.txt 47204 47210 O -1 -1 0
and data/source_txt/t7_government_rlacroix_202.txt 47211 47214 O -1 -1 0
Judiciary data/source_txt/t7_government_rlacroix_202.txt 47215 47224 O -1 -1 0
Committee data/source_txt/t7_government_rlacroix_202.txt 47225 47234 O -1 -1 0
will data/source_txt/t7_government_rlacroix_202.txt 47235 47239 O -1 -1 0
welcome data/source_txt/t7_government_rlacroix_202.txt 47240 47247 O -1 -1 0
a data/source_txt/t7_government_rlacroix_202.txt 47248 47249 O -1 -1 0
Trump data/source_txt/t7_government_rlacroix_202.txt 47250 47255 O -1 -1 0
nominee data/source_txt/t7_government_rlacroix_202.txt 47256 47263 O -1 -1 0
in data/source_txt/t7_government_rlacroix_202.txt 47264 47266 O -1 -1 0
early data/source_txt/t7_government_rlacroix_202.txt 47267 47272 O -1 -1 0
2017.Other data/source_txt/t7_government_rlacroix_202.txt 47273 47283 O -1 -1 0
presidential data/source_txt/t7_government_rlacroix_202.txt 47284 47296 O -1 -1 0
selections data/source_txt/t7_government_rlacroix_202.txt 47297 47307 O -1 -1 0
are data/source_txt/t7_government_rlacroix_202.txt 47308 47311 O -1 -1 0
not data/source_txt/t7_government_rlacroix_202.txt 47312 47315 O -1 -1 0
subject data/source_txt/t7_government_rlacroix_202.txt 47316 47323 O -1 -1 0
to data/source_txt/t7_government_rlacroix_202.txt 47324 47326 O -1 -1 0
Senate data/source_txt/t7_government_rlacroix_202.txt 47327 47333 O -1 -1 0
approval data/source_txt/t7_government_rlacroix_202.txt 47334 47342 O -1 -1 0
, data/source_txt/t7_government_rlacroix_202.txt 47342 47343 O -1 -1 0
including data/source_txt/t7_government_rlacroix_202.txt 47344 47353 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 47354 47357 O -1 -1 0
president data/source_txt/t7_government_rlacroix_202.txt 47358 47367 O -1 -1 0
’s data/source_txt/t7_government_rlacroix_202.txt 47367 47369 O -1 -1 0
personal data/source_txt/t7_government_rlacroix_202.txt 47370 47378 O -1 -1 0
staff data/source_txt/t7_government_rlacroix_202.txt 47379 47384 O -1 -1 0
( data/source_txt/t7_government_rlacroix_202.txt 47385 47386 O -1 -1 0
whose data/source_txt/t7_government_rlacroix_202.txt 47386 47391 O -1 -1 0
most data/source_txt/t7_government_rlacroix_202.txt 47392 47396 O -1 -1 0
important data/source_txt/t7_government_rlacroix_202.txt 47397 47406 O -1 -1 0
member data/source_txt/t7_government_rlacroix_202.txt 47407 47413 O -1 -1 0
is data/source_txt/t7_government_rlacroix_202.txt 47414 47416 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 47417 47420 O -1 -1 0
White data/source_txt/t7_government_rlacroix_202.txt 47421 47426 O -1 -1 0
House data/source_txt/t7_government_rlacroix_202.txt 47427 47432 O -1 -1 0
chief data/source_txt/t7_government_rlacroix_202.txt 47433 47438 O -1 -1 0
of data/source_txt/t7_government_rlacroix_202.txt 47439 47441 O -1 -1 0
staff data/source_txt/t7_government_rlacroix_202.txt 47442 47447 O -1 -1 0
) data/source_txt/t7_government_rlacroix_202.txt 47447 47448 O -1 -1 0
and data/source_txt/t7_government_rlacroix_202.txt 47449 47452 O -1 -1 0
various data/source_txt/t7_government_rlacroix_202.txt 47453 47460 O -1 -1 0
advisers data/source_txt/t7_government_rlacroix_202.txt 47461 47469 O -1 -1 0
( data/source_txt/t7_government_rlacroix_202.txt 47470 47471 O -1 -1 0
most data/source_txt/t7_government_rlacroix_202.txt 47471 47475 O -1 -1 0
notably data/source_txt/t7_government_rlacroix_202.txt 47476 47483 O -1 -1 0
the data/source_txt/t7_government_rlacroix_202.txt 47484 47487 O -1 -1 0
national data/source_txt/t7_government_rlacroix_202.txt 47488 47496 O -1 -1 0
security data/source_txt/t7_government_rlacroix_202.txt 47497 47505 O -1 -1 0
adviser data/source_txt/t7_government_rlacroix_202.txt 47506 47513 O -1 -1 0
) data/source_txt/t7_government_rlacroix_202.txt 47513 47514 O -1 -1 0
. data/source_txt/t7_government_rlacroix_202.txt 47514 47515 O -1 -1 0
Lines 8260-8313. Error in line 8273.
The output of the task1_converter program doesn't seem to be very clean, I see a lot of sentences like " . 178" "0". Is this expected, are we supposed to clean such sentences up or am I using the program wrongly? To run the program I use python task1_converter.py ./data/deft_files/train ./output
It could be interesting to have some documentation mapping IOB tags with the DEFT paper's Tables 2 and 3. E.g.
has the IOB tagB-Definiti-frag
, which might not be obvious to link to DEFT paper's Tables 2While attempting to parse the corpus, I ran into a number of inconsistencies in terms of how the context windows are separated, etc. Here's the output of my code:
Extra sentence on line 4617 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_101.deft
Malformed context window separator on line 4877 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_101.deft
Malformed context window separator on line 5227 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_101.deft
Malformed context window separator on line 5322 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_101.deft
Extra sentence on line 3110 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_202.deft
Potential missing line-break on line 110 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_404.deft
Extra sentence on line 191 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_0_404.deft
Extra sentence on line 4818 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_101.deft
Potential missing line-break on line 2075 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_202.deft
Extra sentence on line 2174 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_202.deft
Suspiciously short sentence on line 4346 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_202.deft
Malformed context window separator on line 4352 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_202.deft
Potential missing line-break on line 134 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_303.deft
Extra sentence on line 213 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_303.deft
Suspiciously short sentence on line 629 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_1_606.deft
Malformed context window separator on line 1804 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_0.deft
Malformed context window separator on line 4688 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_101.deft
Malformed context window separator on line 5471 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_101.deft
Extra sentence on line 4113 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_303.deft
Suspiciously short sentence on line 4110 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_303.deft
Potential missing line-break on line 556 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_606.deft
Extra sentence on line 644 in file ..\..\deft_corpus\data\deft_files\train\t1_biology_2_606.deft
Potential missing line-break on line 251 in file ..\..\deft_corpus\data\deft_files\train\t2_history_1_0.deft
Malformed context window separator on line 6017 in file ..\..\deft_corpus\data\deft_files\train\t2_history_1_101.deft
Potential missing line-break on line 171 in file ..\..\deft_corpus\data\deft_files\train\t2_history_2_0.deft
Extra sentence on line 262 in file ..\..\deft_corpus\data\deft_files\train\t2_history_2_0.deft
Malformed context window separator on line 7322 in file ..\..\deft_corpus\data\deft_files\train\t2_history_2_0.deft
Extra sentence on line 2959 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_0.deft
Extra sentence on line 3769 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_0.deft
Extra sentence on line 5945 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_0.deft
Extra sentence on line 643 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_101.deft
Potential missing line-break on line 1033 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_101.deft
Extra sentence on line 1566 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_101.deft
Malformed context window separator on line 1935 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_101.deft
Malformed context window separator on line 4028 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_101.deft
Extra sentence on line 119 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_202.deft
Extra sentence on line 546 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_0_202.deft
Extra sentence on line 1336 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_0.deft
Malformed context window separator on line 3600 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_0.deft
Extra sentence on line 5856 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_0.deft
Potential missing line-break on line 370 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Extra sentence on line 1755 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Suspiciously short sentence on line 1818 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Malformed context window separator on line 2210 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Suspiciously short sentence on line 2250 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Malformed context window separator on line 3674 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Extra sentence on line 4650 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Suspiciously short sentence on line 4756 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Extra sentence on line 4852 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Suspiciously short sentence on line 5335 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_101.deft
Extra sentence on line 529 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_202.deft
Extra sentence on line 999 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_1_202.deft
Extra sentence on line 1456 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_0.deft
Extra sentence on line 1540 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_0.deft
Malformed context window separator on line 2374 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_0.deft
Malformed context window separator on line 3849 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_0.deft
Extra sentence on line 835 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_101.deft
Malformed context window separator on line 4355 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_101.deft
Extra sentence on line 4759 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_101.deft
Extra sentence on line 740 in file ..\..\deft_corpus\data\deft_files\train\t3_physics_2_202.deft
Extra sentence on line 101 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_0.deft
Extra sentence on line 1153 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_0.deft
Potential missing line-break on line 440 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_101.deft
Extra sentence on line 507 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_101.deft
Potential missing line-break on line 645 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_202.deft
Extra sentence on line 702 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_202.deft
Potential missing line-break on line 345 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_303.deft
Extra sentence on line 451 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_0_303.deft
Extra sentence on line 254 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_0.deft
Potential missing line-break on line 789 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_0.deft
Suspiciously short sentence on line 2442 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_0.deft
Extra sentence on line 3449 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_0.deft
Suspiciously short sentence on line 1136 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_101.deft
Suspiciously short sentence on line 5649 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_101.deft
Malformed context window separator on line 446 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_1_303.deft
Extra sentence on line 2644 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_2_101.deft
Extra sentence on line 2669 in file ..\..\deft_corpus\data\deft_files\train\t4_psychology_2_202.deft
Potential missing line-break on line 56 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_0_101.deft
Malformed context window separator on line 3823 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_0_101.deft
Malformed context window separator on line 3 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_0_202.deft
Extra sentence on line 5649 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_0_202.deft
Suspiciously short sentence on line 1604 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_1_0.deft
Potential missing line-break on line 4976 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_1_101.deft
Extra sentence on line 2050 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_1_202.deft
Malformed context window separator on line 2263 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_1_202.deft
Extra sentence on line 2798 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_2_0.deft
Extra sentence on line 175 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_2_101.deft
Extra sentence on line 3742 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_2_101.deft
Extra sentence on line 343 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_2_202.deft
Extra sentence on line 1682 in file ..\..\deft_corpus\data\deft_files\train\t5_economic_2_202.deft
Potential missing line-break on line 983 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_0_0.deft
Extra sentence on line 1091 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_0_0.deft
Malformed context window separator on line 1299 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_0_0.deft
Malformed context window separator on line 2425 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_0_0.deft
Malformed context window separator on line 391 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_1_0.deft
Extra sentence on line 811 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_1_0.deft
Malformed context window separator on line 2300 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_1_0.deft
Extra sentence on line 110 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_1_101.deft
Suspiciously short sentence on line 5941 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_2_0.deft
Malformed context window separator on line 4347 in file ..\..\deft_corpus\data\deft_files\train\t6_sociology_2_101.deft
Potential missing line-break on line 470 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 534 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 1124 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Potential missing line-break on line 3057 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 3566 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 4029 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Potential missing line-break on line 4181 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Potential missing line-break on line 4775 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 5387 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Potential missing line-break on line 5473 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 6515 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 7021 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_0.deft
Extra sentence on line 957 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Suspiciously short sentence on line 955 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 1312 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 1735 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 1772 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 2127 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 2259 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 2430 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 2484 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 2522 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 2550 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 2691 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 3022 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 3390 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 3677 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 3761 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 3834 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 4235 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 4337 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Malformed context window separator on line 4389 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 4589 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 4661 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 4716 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 4780 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 4828 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Potential missing line-break on line 5216 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 5384 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 5481 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 5505 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 5544 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 5737 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 6321 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 6667 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 7738 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 7796 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_101.deft
Extra sentence on line 108 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 314 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Suspiciously short sentence on line 836 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 895 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 1176 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 1434 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 1868 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 1905 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 2825 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 3085 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 3128 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 3452 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 3656 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 3754 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 3785 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 4389 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 4586 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 4672 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 5418 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Suspiciously short sentence on line 5498 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 6113 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Potential missing line-break on line 6593 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 6731 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 7265 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 7566 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 7592 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_202.deft
Extra sentence on line 674 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 2627 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 3097 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 3641 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 3867 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 3956 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Potential missing line-break on line 4290 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_303.deft
Extra sentence on line 223 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_404.deft
Extra sentence on line 1546 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_404.deft
Extra sentence on line 1786 in file ..\..\deft_corpus\data\deft_files\train\t7_government_0_404.deft
Extra sentence on line 669 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 863 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Potential missing line-break on line 914 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Suspiciously short sentence on line 2193 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Potential missing line-break on line 2754 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Potential missing line-break on line 3177 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 3773 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Suspiciously short sentence on line 4868 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 5227 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Potential missing line-break on line 5318 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 5551 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 6378 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 6889 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 7652 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 7801 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 8062 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 8106 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_0.deft
Extra sentence on line 263 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 417 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Malformed context window separator on line 512 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 1404 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 1469 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 1525 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 1966 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 2649 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 3780 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 3805 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 4050 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 6265 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Potential missing line-break on line 6384 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 6474 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Potential missing line-break on line 6548 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Potential missing line-break on line 6891 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 7107 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 7602 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 7752 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 7778 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 7913 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 8082 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 8110 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 8160 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 9342 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_101.deft
Extra sentence on line 506 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1025 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1068 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1372 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1561 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1598 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 2510 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 2959 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Potential missing line-break on line 3204 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 3479 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 3800 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 4025 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 4249 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 4713 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Potential missing line-break on line 4772 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 4999 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 5027 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 6853 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 7425 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Potential missing line-break on line 7934 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 8147 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Potential missing line-break on line 8327 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 8732 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_202.deft
Extra sentence on line 1325 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 1637 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 2225 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 2701 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 3027 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 3036 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 3049 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 4627 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Potential missing line-break on line 5160 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 5489 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 5632 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 6227 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 7134 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Suspiciously short sentence on line 7170 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Extra sentence on line 7581 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_303.deft
Malformed context window separator on line 60 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Extra sentence on line 276 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Extra sentence on line 304 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Potential missing line-break on line 1292 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Potential missing line-break on line 1349 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Extra sentence on line 1697 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Potential missing line-break on line 1827 in file ..\..\deft_corpus\data\deft_files\train\t7_government_1_404.deft
Extra sentence on line 809 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Potential missing line-break on line 2091 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 3422 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 3729 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 3862 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 3901 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 4276 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 4642 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 4841 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 4987 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Potential missing line-break on line 6109 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 6333 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 6366 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 6391 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Potential missing line-break on line 6807 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 7020 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 8427 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 8860 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 8894 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_0.deft
Extra sentence on line 240 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 362 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 403 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 1247 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 2004 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 3496 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Potential missing line-break on line 3580 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 3898 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Potential missing line-break on line 4079 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Potential missing line-break on line 5286 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 5362 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 5870 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 5891 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 6158 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 6906 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 6927 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 7156 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_101.deft
Extra sentence on line 223 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 252 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 441 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 792 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 828 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1036 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1294 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 1365 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1472 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1492 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1978 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1981 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Suspiciously short sentence on line 1978 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 2942 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 3363 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 3718 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 4227 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 4449 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 5209 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 5272 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 5463 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 5967 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 6147 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 6207 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 6689 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 7132 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 7537 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Potential missing line-break on line 7621 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_202.deft
Extra sentence on line 1066 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 1255 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 1852 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Potential missing line-break on line 2299 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Potential missing line-break on line 2334 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Malformed context window separator on line 3512 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 3882 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 4087 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 4684 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Potential missing line-break on line 5191 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 5322 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Potential missing line-break on line 5496 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Potential missing line-break on line 6558 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 6790 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 7237 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_303.deft
Extra sentence on line 266 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_404.deft
Potential missing line-break on line 1504 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_404.deft
Extra sentence on line 1567 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_404.deft
Extra sentence on line 1806 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_404.deft
Extra sentence on line 2095 in file ..\..\deft_corpus\data\deft_files\train\t7_government_2_404.deft
Extra sentence on line 260 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_0.deft
Potential missing line-break on line 420 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_0.deft
Extra sentence on line 540 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_0.deft
Potential missing line-break on line 688 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_0.deft
Extra sentence on line 768 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_0.deft
Extra sentence on line 369 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_101.deft
Extra sentence on line 425 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_101.deft
Extra sentence on line 540 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_101.deft
Extra sentence on line 882 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_0_101.deft
Extra sentence on line 128 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_0.deft
Extra sentence on line 146 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_0.deft
Extra sentence on line 164 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_0.deft
Potential missing line-break on line 256 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_0.deft
Extra sentence on line 408 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_0.deft
Suspiciously short sentence on line 415 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_303.deft
Extra sentence on line 458 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_1_303.deft
Extra sentence on line 212 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_101.deft
Potential missing line-break on line 401 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_101.deft
Potential missing line-break on line 281 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_202.deft
Potential missing line-break on line 337 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_202.deft
Potential missing line-break on line 25 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_303.deft
Extra sentence on line 470 in file ..\..\deft_corpus\data\deft_files\dev\t7_government_2_303.deft
Hi
The conversion script task1_converter.py does not handle the last line in all the deft_files where they dont end in a blank line (which is all those in the train subdirectory). The code isn't checking for a new sentences concatenation after going through all the lines.
This brings up another question which is the corpus size in terms of sentences. I've not been able to match up with the figures in the paper against any of the sets of files in this repo, so i wanted to check how many sentences should there in fact be in total.
Thanks
Tony
Should be validating labels against the set of all possible labels, not just ones in the dev set.
Update: examples were from old data. Nowadays it is from current repository data
train/t4_psychology_0_101.deft
Merkel data/source_txt/t4_psychology_jlee_101.txt 4569 4575 O -1 -1 0
’s data/source_txt/t4_psychology_jlee_101.txt 4575 4577 O -1 -1 0
disks data/source_txt/t4_psychology_jlee_101.txt 4578 4583 O -1 -1 0
respond data/source_txt/t4_psychology_jlee_101.txt 4584 4591 O -1 -1 0
to data/source_txt/t4_psychology_jlee_101.txt 4592 4594 O -1 -1 0
light data/source_txt/t4_psychology_jlee_101.txt 4595 4600 O -1 -1 0
pressure data/source_txt/t4_psychology_jlee_101.txt 4601 4609 O -1 -1 0
, data/source_txt/t4_psychology_jlee_101.txt 4609 4610 O -1 -1 0
while data/source_txt/t4_psychology_jlee_101.txt 4611 4616 O -1 -1 0
Ruffini data/source_txt/t4_psychology_jlee_101.txt 4617 4624 O -1 -1 0
corpuscles data/source_txt/t4_psychology_jlee_101.txt 4625 4635 O -1 -1 0
detect data/source_txt/t4_psychology_jlee_101.txt 4636 4642 O -1 -1 0
stretch data/source_txt/t4_psychology_jlee_101.txt 4643 4650 O -1 -1 0
( data/source_txt/t4_psychology_jlee_101.txt 4651 4652 O -1 -1 0
Abraira data/source_txt/t4_psychology_jlee_101.txt 4652 4659 O -1 -1 0
& data/source_txt/t4_psychology_jlee_101.txt 4660 4661 O -1 -1 0
Ginty data/source_txt/t4_psychology_jlee_101.txt 4662 4667 O -1 -1 0
, data/source_txt/t4_psychology_jlee_101.txt 4667 4668 O -1 -1 0
2013).There data/source_txt/t4_psychology_jlee_101.txt 4669 4680 O -1 -1 0
are data/source_txt/t4_psychology_jlee_101.txt 4681 4684 O -1 -1 0
many data/source_txt/t4_psychology_jlee_101.txt 4685 4689 O -1 -1 0
types data/source_txt/t4_psychology_jlee_101.txt 4690 4695 O -1 -1 0
of data/source_txt/t4_psychology_jlee_101.txt 4696 4698 O -1 -1 0
sensory data/source_txt/t4_psychology_jlee_101.txt 4699 4706 O -1 -1 0
receptors data/source_txt/t4_psychology_jlee_101.txt 4707 4716 O -1 -1 0
located data/source_txt/t4_psychology_jlee_101.txt 4717 4724 O -1 -1 0
in data/source_txt/t4_psychology_jlee_101.txt 4725 4727 O -1 -1 0
the data/source_txt/t4_psychology_jlee_101.txt 4728 4731 O -1 -1 0
skin data/source_txt/t4_psychology_jlee_101.txt 4732 4736 O -1 -1 0
, data/source_txt/t4_psychology_jlee_101.txt 4736 4737 O -1 -1 0
each data/source_txt/t4_psychology_jlee_101.txt 4738 4742 O -1 -1 0
attuned data/source_txt/t4_psychology_jlee_101.txt 4743 4750 O -1 -1 0
to data/source_txt/t4_psychology_jlee_101.txt 4751 4753 O -1 -1 0
specific data/source_txt/t4_psychology_jlee_101.txt 4754 4762 O -1 -1 0
touch data/source_txt/t4_psychology_jlee_101.txt 4763 4768 O -1 -1 0
- data/source_txt/t4_psychology_jlee_101.txt 4768 4769 O -1 -1 0
related data/source_txt/t4_psychology_jlee_101.txt 4769 4776 O -1 -1 0
stimuli data/source_txt/t4_psychology_jlee_101.txt 4777 4784 O -1 -1 0
. data/source_txt/t4_psychology_jlee_101.txt 4784 4785 O -1 -1 0
Lines 797-835. Error in line 815.
There appears to be some duplicate information at least in the data/deft_files included in three files which have "jlee" in their names vs the 0,1,2 in the rest. Are these files meant to be present or have they slipped in by mistake? Could you clarify this.
Also i noted the same sentences can appear multiple times (even in the same group of three) within an individual file, which i assume has arisen due to sampling the different "bold" terms and producing independent instances. Were these annotated in BRAT in separate documents or within the same one? Just looking at the potential reuse of annotation IDs (Txx etc) which may occur.
Thanks
Tony
Update: examples were from old data. Nowadays it is from current repository data
train/t6_sociology_1_101.deft
Madame data/source_txt/t6_sociology_mkaplan_101.txt 15507 15513 O -1 -1 0
Jeanne data/source_txt/t6_sociology_mkaplan_101.txt 15514 15520 O -1 -1 0
Calment data/source_txt/t6_sociology_mkaplan_101.txt 15521 15528 O -1 -1 0
of data/source_txt/t6_sociology_mkaplan_101.txt 15529 15531 O -1 -1 0
France data/source_txt/t6_sociology_mkaplan_101.txt 15532 15538 O -1 -1 0
was data/source_txt/t6_sociology_mkaplan_101.txt 15539 15542 O -1 -1 0
the data/source_txt/t6_sociology_mkaplan_101.txt 15543 15546 O -1 -1 0
world data/source_txt/t6_sociology_mkaplan_101.txt 15547 15552 O -1 -1 0
's data/source_txt/t6_sociology_mkaplan_101.txt 15552 15554 O -1 -1 0
oldest data/source_txt/t6_sociology_mkaplan_101.txt 15555 15561 O -1 -1 0
living data/source_txt/t6_sociology_mkaplan_101.txt 15563 15569 O -1 -1 0
person data/source_txt/t6_sociology_mkaplan_101.txt 15570 15576 O -1 -1 0
until data/source_txt/t6_sociology_mkaplan_101.txt 15577 15582 O -1 -1 0
she data/source_txt/t6_sociology_mkaplan_101.txt 15583 15586 O -1 -1 0
died data/source_txt/t6_sociology_mkaplan_101.txt 15587 15591 O -1 -1 0
at data/source_txt/t6_sociology_mkaplan_101.txt 15592 15594 O -1 -1 0
122 data/source_txt/t6_sociology_mkaplan_101.txt 15595 15598 O -1 -1 0
years data/source_txt/t6_sociology_mkaplan_101.txt 15599 15604 O -1 -1 0
old data/source_txt/t6_sociology_mkaplan_101.txt 15605 15608 O -1 -1 0
; data/source_txt/t6_sociology_mkaplan_101.txt 15608 15609 O -1 -1 0
there data/source_txt/t6_sociology_mkaplan_101.txt 15610 15615 O -1 -1 0
are data/source_txt/t6_sociology_mkaplan_101.txt 15616 15619 O -1 -1 0
currently data/source_txt/t6_sociology_mkaplan_101.txt 15620 15629 O -1 -1 0
six data/source_txt/t6_sociology_mkaplan_101.txt 15630 15633 O -1 -1 0
women data/source_txt/t6_sociology_mkaplan_101.txt 15634 15639 O -1 -1 0
in data/source_txt/t6_sociology_mkaplan_101.txt 15640 15642 O -1 -1 0
the data/source_txt/t6_sociology_mkaplan_101.txt 15643 15646 O -1 -1 0
world data/source_txt/t6_sociology_mkaplan_101.txt 15647 15652 O -1 -1 0
whose data/source_txt/t6_sociology_mkaplan_101.txt 15653 15658 O -1 -1 0
ages data/source_txt/t6_sociology_mkaplan_101.txt 15659 15663 O -1 -1 0
are data/source_txt/t6_sociology_mkaplan_101.txt 15664 15667 O -1 -1 0
well data/source_txt/t6_sociology_mkaplan_101.txt 15668 15672 O -1 -1 0
documented data/source_txt/t6_sociology_mkaplan_101.txt 15673 15683 O -1 -1 0
as data/source_txt/t6_sociology_mkaplan_101.txt 15684 15686 O -1 -1 0
115 data/source_txt/t6_sociology_mkaplan_101.txt 15687 15690 O -1 -1 0
years data/source_txt/t6_sociology_mkaplan_101.txt 15691 15696 O -1 -1 0
or data/source_txt/t6_sociology_mkaplan_101.txt 15697 15699 O -1 -1 0
older data/source_txt/t6_sociology_mkaplan_101.txt 15700 15705 O -1 -1 0
( data/source_txt/t6_sociology_mkaplan_101.txt 15706 15707 O -1 -1 0
Diebel data/source_txt/t6_sociology_mkaplan_101.txt 15707 15713 O -1 -1 0
2014) data/source_txt/t6_sociology_mkaplan_101.txt 15714 15719 O -1 -1 0
. data/source_txt/t6_sociology_mkaplan_101.txt 15719 15720 O -1 -1 0
Lines 2489-2530. Error in line 2529.
E.g.the first line of https://github.com/adobe-research/deft_corpus/blob/master/data/deft_files/dev/t1_biology_0_0.deft#L1:
2 /Users/sspala/dev/definition_extraction/textbook_sentences/adjudication_files_082219_FINAL/ksun/biology/t1_biology_jlee_0.txt 0 1 O -1 -1 0
-> it refers to the file t1_biology_jlee_0.txt
However on https://github.com/adobe-research/deft_corpus/tree/master/data/source_txt/dev the files don't contain the annotator names.
There are several relations annotated that link to a missing head entity id. I have fixed some in the dev set (#19), but there a lot more examples in the training set.
This is an example from "data/deft_files/train/t1_biology_2_0.deft" (missing T220):
1802 99 data/source_txt/train/t1_biology_2_0.txt 12106 12108 O -1 -1 0
1803 . data/source_txt/train/t1_biology_2_0.txt 12108 12109 O -1 -1 0
1804 litmus data/source_txt/train/t1_biology_2_0.txt 12119 12125 B-Alias-Term T219 T220 AKA
1805 or data/source_txt/train/t1_biology_2_0.txt 12126 12128 O -1 -1 0
1806 pH data/source_txt/train/t1_biology_2_0.txt 12129 12131 O -1 -1 0
1807 paper data/source_txt/train/t1_biology_2_0.txt 12132 12137 B-Alias-Term-frag T219-frag T219 fragment
1808 , data/source_txt/train/t1_biology_2_0.txt 12137 12138 O -1 -1 0
1809 filter data/source_txt/train/t1_biology_2_0.txt 12139 12145 B-Definition T221 T220 Direct-Defines
1810 paper data/source_txt/train/t1_biology_2_0.txt 12146 12151 I-Definition T221 T220 Direct-Defines
1811 that data/source_txt/train/t1_biology_2_0.txt 12152 12156 I-Definition T221 T220 Direct-Defines
1812 has data/source_txt/train/t1_biology_2_0.txt 12157 12160 I-Definition T221 T220 Direct-Defines
There are some tokenization errors in your data and tokenization errors cause label errors. for example:
requickened”—assigned data/source_txt/train/t2_history_2_0.txt 4149 4170 Term T19 0 Direct-Defines
(
money”—an data/source_txt/train/t2_history_1_101.txt 766 775 Term T3 0 Direct-Defines
(
law”—is data/source_txt/train/t6_sociology_2_0.txt 18785 18792 Qualifier T210 T211 Supplements
(
Update: examples were from old data. Nowadays it is from current repository data
train/t1_biology_2_505.deft
As data/source_txt/t1_biology_rlacroix_505.txt 23587 23589 O -1 -1 0
illustrated data/source_txt/t1_biology_rlacroix_505.txt 23590 23601 O -1 -1 0
in data/source_txt/t1_biology_rlacroix_505.txt 23602 23604 O -1 -1 0
[ data/source_txt/t1_biology_rlacroix_505.txt 23605 23606 O -1 -1 0
link]a data/source_txt/t1_biology_rlacroix_505.txt 23606 23612 O -1 -1 0
Fish data/source_txt/t1_biology_rlacroix_505.txt 23613 23617 O -1 -1 0
have data/source_txt/t1_biology_rlacroix_505.txt 23618 23622 O -1 -1 0
a data/source_txt/t1_biology_rlacroix_505.txt 23623 23624 O -1 -1 0
single data/source_txt/t1_biology_rlacroix_505.txt 23625 23631 O -1 -1 0
circuit data/source_txt/t1_biology_rlacroix_505.txt 23632 23639 O -1 -1 0
for data/source_txt/t1_biology_rlacroix_505.txt 23640 23643 O -1 -1 0
blood data/source_txt/t1_biology_rlacroix_505.txt 23644 23649 O -1 -1 0
flow data/source_txt/t1_biology_rlacroix_505.txt 23650 23654 O -1 -1 0
and data/source_txt/t1_biology_rlacroix_505.txt 23655 23658 O -1 -1 0
a data/source_txt/t1_biology_rlacroix_505.txt 23659 23660 O -1 -1 0
two data/source_txt/t1_biology_rlacroix_505.txt 23661 23664 O -1 -1 0
- data/source_txt/t1_biology_rlacroix_505.txt 23664 23665 O -1 -1 0
chambered data/source_txt/t1_biology_rlacroix_505.txt 23665 23674 O -1 -1 0
heart data/source_txt/t1_biology_rlacroix_505.txt 23675 23680 O -1 -1 0
that data/source_txt/t1_biology_rlacroix_505.txt 23681 23685 O -1 -1 0
has data/source_txt/t1_biology_rlacroix_505.txt 23686 23689 O -1 -1 0
only data/source_txt/t1_biology_rlacroix_505.txt 23690 23694 O -1 -1 0
a data/source_txt/t1_biology_rlacroix_505.txt 23695 23696 O -1 -1 0
single data/source_txt/t1_biology_rlacroix_505.txt 23697 23703 O -1 -1 0
atrium data/source_txt/t1_biology_rlacroix_505.txt 23704 23710 O -1 -1 0
and data/source_txt/t1_biology_rlacroix_505.txt 23711 23714 O -1 -1 0
a data/source_txt/t1_biology_rlacroix_505.txt 23715 23716 O -1 -1 0
single data/source_txt/t1_biology_rlacroix_505.txt 23717 23723 O -1 -1 0
ventricle data/source_txt/t1_biology_rlacroix_505.txt 23724 23733 O -1 -1 0
. data/source_txt/t1_biology_rlacroix_505.txt 23733 23734 O -1 -1 0
Lines 4248-4277. Error in line 4252
The csv parsers are missing a configuration for "quote_char" and "quoting". This results in incorrect parses of some examples.
One example is char 773 in the beginning of file "data/deft_files/dev/t3_physics_2_101.deft":
2951 data/source_txt/t3_physics_2_101.deft 759 763 O -1 -1 0
. data/source_txt/t3_physics_2_101.deft 763 764 O -1 -1 0
3 data/source_txt/t3_physics_2_101.deft 765 766 O -1 -1 0
times data/source_txt/t3_physics_2_101.deft 767 772 O -1 -1 0
" data/source_txt/t3_physics_2_101.deft 773 774 O -1 -1 0
10 data/source_txt/t3_physics_2_101.deft 774 776 O -1 -1 0
" data/source_txt/t3_physics_2_101.deft 776 777 O -1 -1 0
rSup data/source_txt/t3_physics_2_101.deft 778 782 O -1 -1 0
{ data/source_txt/t3_physics_2_101.deft 783 784 O -1 -1 0
size data/source_txt/t3_physics_2_101.deft 785 789 O -1 -1 0
...
This results in some of the failed assertions reported in the forums.
It would be nice to add a readme file in /evaluation/program
. There is a readme in evaluation/old/README.md
but since it is in an old
folder it is unclear whether it is still up-to-date.
This is the first report on 47 found troubles in tokenization(only in train data)
train/t1_biology_1_606.deft
When data/source_txt/train/t1_biology_1_606.txt 21268 21272 O -1 -1 0
the data/source_txt/train/t1_biology_1_606.txt 21273 21276 O -1 -1 0
population data/source_txt/train/t1_biology_1_606.txt 21277 21287 O -1 -1 0
size data/source_txt/train/t1_biology_1_606.txt 21288 21292 O -1 -1 0
, data/source_txt/train/t1_biology_1_606.txt 21292 21293 O -1 -1 0
N data/source_txt/train/t1_biology_1_606.txt 21294 21295 O -1 -1 0
, data/source_txt/train/t1_biology_1_606.txt 21295 21296 O -1 -1 0
is data/source_txt/train/t1_biology_1_606.txt 21297 21299 O -1 -1 0
plotted data/source_txt/train/t1_biology_1_606.txt 21300 21307 O -1 -1 0
over data/source_txt/train/t1_biology_1_606.txt 21308 21312 O -1 -1 0
time data/source_txt/train/t1_biology_1_606.txt 21313 21317 O -1 -1 0
, data/source_txt/train/t1_biology_1_606.txt 21317 21318 O -1 -1 0
a data/source_txt/train/t1_biology_1_606.txt 21319 21320 O -1 -1 0
J data/source_txt/train/t1_biology_1_606.txt 21321 21322 O -1 -1 0
- data/source_txt/train/t1_biology_1_606.txt 21322 21323 O -1 -1 0
shaped data/source_txt/train/t1_biology_1_606.txt 21323 21329 O -1 -1 0
growth data/source_txt/train/t1_biology_1_606.txt 21330 21336 O -1 -1 0
curve data/source_txt/train/t1_biology_1_606.txt 21337 21342 O -1 -1 0
is data/source_txt/train/t1_biology_1_606.txt 21343 21345 O -1 -1 0
produced data/source_txt/train/t1_biology_1_606.txt 21346 21354 O -1 -1 0
( data/source_txt/train/t1_biology_1_606.txt 21355 21356 O -1 -1 0
[ data/source_txt/train/t1_biology_1_606.txt 21356 21357 O -1 -1 0
link]).The data/source_txt/train/t1_biology_1_606.txt 21357 21367 O -1 -1 0
bacteria data/source_txt/train/t1_biology_1_606.txt 21368 21376 O -1 -1 0
example data/source_txt/train/t1_biology_1_606.txt 21377 21384 O -1 -1 0
is data/source_txt/train/t1_biology_1_606.txt 21385 21387 O -1 -1 0
not data/source_txt/train/t1_biology_1_606.txt 21388 21391 O -1 -1 0
representative data/source_txt/train/t1_biology_1_606.txt 21392 21406 O -1 -1 0
of data/source_txt/train/t1_biology_1_606.txt 21407 21409 O -1 -1 0
the data/source_txt/train/t1_biology_1_606.txt 21410 21413 O -1 -1 0
real data/source_txt/train/t1_biology_1_606.txt 21414 21418 O -1 -1 0
world data/source_txt/train/t1_biology_1_606.txt 21419 21424 O -1 -1 0
where data/source_txt/train/t1_biology_1_606.txt 21425 21430 O -1 -1 0
resources data/source_txt/train/t1_biology_1_606.txt 21431 21440 O -1 -1 0
are data/source_txt/train/t1_biology_1_606.txt 21441 21444 O -1 -1 0
limited data/source_txt/train/t1_biology_1_606.txt 21445 21452 O -1 -1 0
. data/source_txt/train/t1_biology_1_606.txt 21452 21453 O -1 -1 0
Lines 3522-3558, error in 3544
This mistake mixes two sentences
Update: examples were from old data. Nowadays it is from current repository data
train/t1_biology_1_101.deft
Sturtevant data/source_txt/t1_biology_mkaplan_101.txt 20151 20161 O -1 -1 0
divided data/source_txt/t1_biology_mkaplan_101.txt 20162 20169 O -1 -1 0
his data/source_txt/t1_biology_mkaplan_101.txt 20170 20173 O -1 -1 0
genetic data/source_txt/t1_biology_mkaplan_101.txt 20174 20181 O -1 -1 0
map data/source_txt/t1_biology_mkaplan_101.txt 20182 20185 O -1 -1 0
into data/source_txt/t1_biology_mkaplan_101.txt 20186 20190 O -1 -1 0
map data/source_txt/t1_biology_mkaplan_101.txt 20191 20194 B-Qualifier T151 T150 Supplements
units data/source_txt/t1_biology_mkaplan_101.txt 20195 20200 I-Qualifier T151 T150 Supplements
, data/source_txt/t1_biology_mkaplan_101.txt 20200 20201 O -1 -1 0
or data/source_txt/t1_biology_mkaplan_101.txt 20202 20204 O -1 -1 0
centimorgans data/source_txt/t1_biology_mkaplan_101.txt 20205 20217 B-Alias-Term T148 T149 AKA
( data/source_txt/t1_biology_mkaplan_101.txt 20218 20219 O -1 -1 0
cM data/source_txt/t1_biology_mkaplan_101.txt 20219 20221 B-Term T149 0 AKA
) data/source_txt/t1_biology_mkaplan_101.txt 20221 20222 O -1 -1 0
, data/source_txt/t1_biology_mkaplan_101.txt 20222 20223 O -1 -1 0
in data/source_txt/t1_biology_mkaplan_101.txt 20224 20226 O -1 -1 0
which data/source_txt/t1_biology_mkaplan_101.txt 20227 20232 O -1 -1 0
a data/source_txt/t1_biology_mkaplan_101.txt 20233 20234 B-Definition T150 T149 Direct-Defines
recombination data/source_txt/t1_biology_mkaplan_101.txt 20235 20248 I-Definition T150 T149 Direct-Defines
frequency data/source_txt/t1_biology_mkaplan_101.txt 20249 20258 I-Definition T150 T149 Direct-Defines
of data/source_txt/t1_biology_mkaplan_101.txt 20259 20261 I-Definition T150 T149 Direct-Defines
0.01 data/source_txt/t1_biology_mkaplan_101.txt 20262 20266 I-Definition T150 T149 Direct-Defines
corresponds data/source_txt/t1_biology_mkaplan_101.txt 20267 20278 I-Definition T150 T149 Direct-Defines
to data/source_txt/t1_biology_mkaplan_101.txt 20279 20281 I-Definition T150 T149 Direct-Defines
1 data/source_txt/t1_biology_mkaplan_101.txt 20282 20283 I-Definition T150 T149 Direct-Defines
cM.By data/source_txt/t1_biology_mkaplan_101.txt 20284 20289 Definition T150 T149 Direct-Defines
representing data/source_txt/t1_biology_mkaplan_101.txt 20290 20302 O -1 -1 0
alleles data/source_txt/t1_biology_mkaplan_101.txt 20303 20310 O -1 -1 0
in data/source_txt/t1_biology_mkaplan_101.txt 20311 20313 O -1 -1 0
a data/source_txt/t1_biology_mkaplan_101.txt 20314 20315 O -1 -1 0
linear data/source_txt/t1_biology_mkaplan_101.txt 20316 20322 O -1 -1 0
map data/source_txt/t1_biology_mkaplan_101.txt 20323 20326 O -1 -1 0
, data/source_txt/t1_biology_mkaplan_101.txt 20326 20327 O -1 -1 0
Sturtevant data/source_txt/t1_biology_mkaplan_101.txt 20328 20338 O -1 -1 0
suggested data/source_txt/t1_biology_mkaplan_101.txt 20339 20348 O -1 -1 0
that data/source_txt/t1_biology_mkaplan_101.txt 20349 20353 O -1 -1 0
genes data/source_txt/t1_biology_mkaplan_101.txt 20354 20359 O -1 -1 0
can data/source_txt/t1_biology_mkaplan_101.txt 20360 20363 O -1 -1 0
range data/source_txt/t1_biology_mkaplan_101.txt 20364 20369 O -1 -1 0
from data/source_txt/t1_biology_mkaplan_101.txt 20370 20374 O -1 -1 0
being data/source_txt/t1_biology_mkaplan_101.txt 20375 20380 O -1 -1 0
perfectly data/source_txt/t1_biology_mkaplan_101.txt 20381 20390 O -1 -1 0
linked data/source_txt/t1_biology_mkaplan_101.txt 20391 20397 O -1 -1 0
( data/source_txt/t1_biology_mkaplan_101.txt 20398 20399 O -1 -1 0
recombination data/source_txt/t1_biology_mkaplan_101.txt 20399 20412 O -1 -1 0
frequency data/source_txt/t1_biology_mkaplan_101.txt 20413 20422 O -1 -1 0
= data/source_txt/t1_biology_mkaplan_101.txt 20423 20424 O -1 -1 0
0 data/source_txt/t1_biology_mkaplan_101.txt 20425 20426 O -1 -1 0
) data/source_txt/t1_biology_mkaplan_101.txt 20426 20427 O -1 -1 0
to data/source_txt/t1_biology_mkaplan_101.txt 20428 20430 O -1 -1 0
being data/source_txt/t1_biology_mkaplan_101.txt 20431 20436 O -1 -1 0
perfectly data/source_txt/t1_biology_mkaplan_101.txt 20437 20446 O -1 -1 0
unlinked data/source_txt/t1_biology_mkaplan_101.txt 20447 20455 O -1 -1 0
( data/source_txt/t1_biology_mkaplan_101.txt 20456 20457 O -1 -1 0
recombination data/source_txt/t1_biology_mkaplan_101.txt 20457 20470 O -1 -1 0
frequency data/source_txt/t1_biology_mkaplan_101.txt 20471 20480 O -1 -1 0
= data/source_txt/t1_biology_mkaplan_101.txt 20481 20482 O -1 -1 0
0.5 data/source_txt/t1_biology_mkaplan_101.txt 20483 20486 O -1 -1 0
) data/source_txt/t1_biology_mkaplan_101.txt 20486 20487 O -1 -1 0
when data/source_txt/t1_biology_mkaplan_101.txt 20488 20492 O -1 -1 0
genes data/source_txt/t1_biology_mkaplan_101.txt 20493 20498 O -1 -1 0
are data/source_txt/t1_biology_mkaplan_101.txt 20499 20502 O -1 -1 0
on data/source_txt/t1_biology_mkaplan_101.txt 20503 20505 O -1 -1 0
different data/source_txt/t1_biology_mkaplan_101.txt 20506 20515 O -1 -1 0
chromosomes data/source_txt/t1_biology_mkaplan_101.txt 20516 20527 O -1 -1 0
or data/source_txt/t1_biology_mkaplan_101.txt 20528 20530 O -1 -1 0
genes data/source_txt/t1_biology_mkaplan_101.txt 20531 20536 O -1 -1 0
are data/source_txt/t1_biology_mkaplan_101.txt 20537 20540 O -1 -1 0
separated data/source_txt/t1_biology_mkaplan_101.txt 20541 20550 O -1 -1 0
very data/source_txt/t1_biology_mkaplan_101.txt 20551 20555 O -1 -1 0
far data/source_txt/t1_biology_mkaplan_101.txt 20556 20559 O -1 -1 0
apart data/source_txt/t1_biology_mkaplan_101.txt 20560 20565 O -1 -1 0
on data/source_txt/t1_biology_mkaplan_101.txt 20566 20568 O -1 -1 0
the data/source_txt/t1_biology_mkaplan_101.txt 20569 20572 O -1 -1 0
same data/source_txt/t1_biology_mkaplan_101.txt 20573 20577 O -1 -1 0
chromosome data/source_txt/t1_biology_mkaplan_101.txt 20578 20588 O -1 -1 0
. data/source_txt/t1_biology_mkaplan_101.txt 20588 20589 O -1 -1 0
Lines 3764-3840. Error in line 3789
Update: examples were from old data. Nowadays it is from current repository data
train/t4_psychology_2_202.deft
Behaviorists data/source_txt/t4_psychology_rlacroix_202.txt 32067 32079 O -1 -1 0
such data/source_txt/t4_psychology_rlacroix_202.txt 32080 32084 O -1 -1 0
as data/source_txt/t4_psychology_rlacroix_202.txt 32085 32087 O -1 -1 0
Joseph data/source_txt/t4_psychology_rlacroix_202.txt 32088 32094 O -1 -1 0
Wolpe data/source_txt/t4_psychology_rlacroix_202.txt 32095 32100 O -1 -1 0
also data/source_txt/t4_psychology_rlacroix_202.txt 32101 32105 O -1 -1 0
influenced data/source_txt/t4_psychology_rlacroix_202.txt 32106 32116 O -1 -1 0
Ellis data/source_txt/t4_psychology_rlacroix_202.txt 32117 32122 O -1 -1 0
’s data/source_txt/t4_psychology_rlacroix_202.txt 32122 32124 O -1 -1 0
therapeutic data/source_txt/t4_psychology_rlacroix_202.txt 32125 32136 O -1 -1 0
approach data/source_txt/t4_psychology_rlacroix_202.txt 32137 32145 O -1 -1 0
( data/source_txt/t4_psychology_rlacroix_202.txt 32146 32147 O -1 -1 0
National data/source_txt/t4_psychology_rlacroix_202.txt 32147 32155 O -1 -1 0
Association data/source_txt/t4_psychology_rlacroix_202.txt 32156 32167 O -1 -1 0
of data/source_txt/t4_psychology_rlacroix_202.txt 32168 32170 O -1 -1 0
Cognitive data/source_txt/t4_psychology_rlacroix_202.txt 32171 32180 O -1 -1 0
- data/source_txt/t4_psychology_rlacroix_202.txt 32180 32181 O -1 -1 0
Behavioral data/source_txt/t4_psychology_rlacroix_202.txt 32181 32191 O -1 -1 0
Therapists data/source_txt/t4_psychology_rlacroix_202.txt 32192 32202 O -1 -1 0
, data/source_txt/t4_psychology_rlacroix_202.txt 32202 32203 O -1 -1 0
2009).Cognitive data/source_txt/t4_psychology_rlacroix_202.txt 32204 32219 B-Term T161 0 AKA
- data/source_txt/t4_psychology_rlacroix_202.txt 32219 32220 I-Term T161 0 AKA
behavioral data/source_txt/t4_psychology_rlacroix_202.txt 32220 32230 I-Term T161 0 AKA
therapy data/source_txt/t4_psychology_rlacroix_202.txt 32231 32238 I-Term T161 0 AKA
( data/source_txt/t4_psychology_rlacroix_202.txt 32239 32240 O -1 -1 0
CBT data/source_txt/t4_psychology_rlacroix_202.txt 32240 32243 B-Alias-Term T160 T161 AKA
) data/source_txt/t4_psychology_rlacroix_202.txt 32243 32244 O -1 -1 0
helps data/source_txt/t4_psychology_rlacroix_202.txt 32245 32250 B-Definition T159 T161 Direct-Defines
clients data/source_txt/t4_psychology_rlacroix_202.txt 32251 32258 I-Definition T159 T161 Direct-Defines
examine data/source_txt/t4_psychology_rlacroix_202.txt 32259 32266 I-Definition T159 T161 Direct-Defines
how data/source_txt/t4_psychology_rlacroix_202.txt 32267 32270 I-Definition T159 T161 Direct-Defines
their data/source_txt/t4_psychology_rlacroix_202.txt 32271 32276 I-Definition T159 T161 Direct-Defines
thoughts data/source_txt/t4_psychology_rlacroix_202.txt 32277 32285 I-Definition T159 T161 Direct-Defines
affect data/source_txt/t4_psychology_rlacroix_202.txt 32286 32292 I-Definition T159 T161 Direct-Defines
their data/source_txt/t4_psychology_rlacroix_202.txt 32293 32298 I-Definition T159 T161 Direct-Defines
behavior data/source_txt/t4_psychology_rlacroix_202.txt 32299 32307 I-Definition T159 T161 Direct-Defines
. data/source_txt/t4_psychology_rlacroix_202.txt 32307 32308 O -1 -1 0
Lines 5568-5604. Error in line 5588.
Hi
It appears there are a tiny number of tags in the files missing the appropriate BIO prefix:
Definition | 15
Term | 10
Referential-Definition | 2
Alias-Term | 1
Secondary-Definition | 1
Could you confirm if these should have the missing prefix or signify something else, thanks.
Tony
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.