erroneous week calculation based on dctWeek/getxnextweek

there's something going wrong with the week values generated in 
HeidelTime.specifyAmbiguousValues() likely originating from 
system(/locale)-specific behavior within DateCalculator.getXNextWeek().

this results in week numbers that are decremented by one compared to their gold 
standard values. likely in relation to the relative value calculation based on 

Original issue reported on by [email protected] on 23 May 2012 at 3:57

Changing the license to Apache

Due to some requests, we are currently discussing to change HeidelTime's 
license from GPL to Apache. Please participate in the discussion and tell us 
your thoughts about that issue:
1. GNU:
2. Apache:

Original issue reported on by [email protected] on 24 May 2012 at 7:31

Sharing resources for Russian

I am a student of computational linguistics, and I am writing my master's 
thesis on temporal expression for Russian. In my work, I use Heideltime, and I 
would like to make my Russian resources publicly available. Do you think I 
could make a commit to the main Heideltime development trunk?

My changes mainly consist of creating new resources for Russian, and I have 
also added a function to calculate some Russian holidays.

Original issue reported on by [email protected] on 4 Apr 2014 at 9:24

German compounds consisting of weekday + time of day not extracted


running HeidelTime on news texts, I encountered a type of temporal expression 
that is currently not recognized: according to the German spelling reform, 
combinations of weekday (e.g., 'Montag') and time of day (e.g., 'abend') are 
connected to one word. This holds for substantives and adverbs, for instance:

HeidelTime (tested version: 1.8) currently doesn't extract these temporal 
expressions. In the following sentence, only 'Mittwoch' is extracted and 
correctly normalized, all other temporal expressions are neglected:

"Am Montagabend hat Peter telefoniert. Am Dienstagabend auch. Am Mittwoch auch. 
Montagmorgens wird er ebenfalls telefonieren."

Attached you can find the entered command and HeidelTime's output.

Maybe you can find time to add this feature at some point :)

Original issue reported on by boegel.thomas on 13 Jan 2015 at 9:17


HeidelTimeStandalone default constructor is missing

Good afternoon,

I notice, that the default constructor of HeidelTimeStandalone doesn't exist. 
So if you want to invoke the class dynamically it produces an error: 

You just need to add
public HeidelTimeStandalone() {     

in HeidelTimeStandalone to avoid that

ps: is a new release planned soon? last one was in may.

Original issue reported on by [email protected] on 11 Dec 2012 at 1:07

Inflected variants of "ein"(einer, einem) not recognized

What steps will reproduce the problem?
1. goto
2. put strings like "mit einer Frist von einem Monat zum Monatsende." "in einer 
Woche." into the input field
3. Press "Compute" 

What is the expected output? What do you see instead?
It should detect "einem Monat", "einer Woche"as time period.
But it didn't.

What version of the product are you using? On what operating system?
online demo

Original issue reported on by [email protected] on 4 Jul 2013 at 1:59

Sentence splitting bug in de.unihd.dbs.uima.annotator.stanfordtagger.StanfordPOSTaggerWrapper

Hi heideltime team,

I'm Master Student at the University of Mannheim and currently building an 
Temporal Information Extraction system using heideltime as a temporal tagger.
I encountered a bug in the StanfordPOSTaggerWrapper UIMA component

What steps will reproduce the problem?
1. Check the attached file "Breaking_Sample.txt"; it's a plain text version of 
Apple's Wikipedia article.
2. Apply de.unihd.dbs.uima.annotator.stanfordtagger.StanfordPOSTaggerWrapper on 
3. Check the JCas sentence annotations, respectively the sentences text you get 
when building substrings on the annotations "begin" and "end" indexes.

What is the expected output? What do you see instead?
Expected: Sentences as shown in "Output_MyStanfordPOSTaggerWrapper.txt"
Actual: Sentences as shown in "Output_StanfordPOSTaggerWrapper.txt"
Issue starts with Sentence 117

What version of the product are you using? On what operating system?

Please provide any additional information below.
Results of my analysis are as following:
The weakness of the current implementation is the own calculation  of an offset 
value in conjunction with
relying on searching the document text with ".indexOf(thisWord, offset)".

To fit my needs I copied and reimplemented your component the code can be found 
in "".
From my perspective this implementation is more robust as it reuses the offsets 
calculated by the Stanford Tokenizer.

If you have further questions please do not hesitate to contact me.


Original issue reported on by [email protected] on 23 Jul 2014 at 2:31


problems to run heideltime on ubuntu

Ive tried several hours to get Heideltime Standalone to run on my ubuntu 
system, but it still didnt work. 
i followed exactly the how to use instructions in the readme file and i also 
installed the treetagger from with the whole 
package, the tagging scripts, the installation script and the parameter files 
for the languages which i use and i also  indicate the path to the folder 
containing the tree-tagger in config.props, in "treeTaggerHome" (treeTaggerHome 
= /home/chuulio/Dokumente/TreeTagger/)
Ive tried heideltime on a text document about moskow in german, and this is 
what i got:

chuulio@chuulio-UX32VD:~/Dokumente/Temporal_Annotation/Standalone$ java -jar 
de.unihd.dbs.heideltime.standalone.jar /home/chuulio/Dokumente/Moskau.txt -l 
german -vv
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Verbosity: '-vv'; Logging level set to ALL.
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Encoding '-e': NOT FOUND OR RECOGNIZED; set to 'UTF-8'
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Language '-l': GERMAN
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Document Creation Time '-dct': NOT FOUND; skipping.
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Locale '-locale': NOT FOUND, set to environment locale: de_CH
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Configuration path '-c': config.props
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone readConfigFile
INFO: trying to read in file config.props
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Interval Tagger '-it': NOT FOUND OR RECOGNIZED; set to false
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone main
INFO: Reading document using charset: UTF-8
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone initialize
INFO: HeidelTimeStandalone initialized with language german
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone initialize
INFO: HeidelTime initialized
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone initialize
INFO: JCas factory initialized
Aug 26, 2014 10:47:17 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone process
INFO: Processing started
[de.unihd.dbs.uima.annotator.heideltime.HeidelTime] HeidelTime has not found 
any sentence tokens in this document. HeidelTime needs sentence tokens tagged 
by a preprocessing UIMA analysis engine to do its work. Please check your UIMA 
workflow and add an analysis engine that creates these sentence tokens.
Aug 26, 2014 10:47:19 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone process
INFO: Processing finished
Aug 26, 2014 10:47:19 PM 
de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone process
INFO: Result formatted
<?xml version="1.0"?>

Moskau (russisch Москва́ Zum Anhören bitte klicken! [mɐˈskva], 
Moskwa) ist die Hauptstadt der Russischen Föderation und mit rund 11,55 
Millionen Einwohnern (Stand 14. Oktober 2010)[1] die größte Stadt bzw. mit 
15,1 Millionen (2012)[2] die größte Agglomeration Europas. Am 1. Juli 2012 
wurde Moskau durch Eingemeindung der beiden Verwaltungsbezirke Nowomoskowski 
und Troizk im Südwesten der Stadt auf Kosten der Moskauer Oblast um 1480 km², 
d. h. um das 1,39-Fache, auf 2550 km² vergrößert. Durch die Eingliederung 
wuchs die Moskauer Bevölkerung um etwa 235.000 Menschen.

Moskau ist das politische, wirtschaftliche und kulturelle Zentrum des Landes 
mit Hochschulen und Fachschulen sowie zahlreichen Kirchen, Theatern, Museen, 
Galerien und dem 540 Meter hohen Ostankino-Turm. Moskau ist Sitz der 
Russisch-Orthodoxen Kirche: Der Patriarch residiert im Danilow-Kloster, das 
größte russisch-orthodoxe Kirchengebäude ist die Moskauer 
Christ-Erlöser-Kathedrale. Es gibt im Stadtgebiet von Moskau über 300 
Kirchen.[3] Seit dem 16. Jahrhundert wird Moskau auch als Drittes Rom 
bezeichnet. Nach Ende des Zweiten Weltkriegs erhielt Moskau die Auszeichnung 
einer „Heldenstadt“.

Der Kreml und der Rote Platz im Zentrum Moskaus stehen seit 1990 auf der 
UNESCO-Liste des Weltkulturerbes. Mit acht Fernbahnhöfen, drei internationalen 
Flughäfen und drei Binnenhäfen ist die Stadt wichtigster Verkehrsknoten und 
größte Industriestadt Russlands.

Denkmal für den Stadtgründer Juri Dolgoruki

Eine der Sagen kündet davon, dass der Fürst Juri Dolgoruki (1090–1157) im 
Land der Wjatitschen eine hölzerne Stadt zu errichten befahl, und dass diese 
Stadt nach dem Fluss benannt wurde, an dessen Ufern sie emporwuchs. Die erste 
schriftliche Erwähnung Moskaus stammt aus dem Jahre 1147, das darum als das 
Gründungsjahr Moskaus gilt. Doch schon lange davor gab es an der Stelle, wo 
heute Moskau steht, menschliche Niederlassungen. Archäologische Ausgrabungen 
bezeugen, dass die ältesten von ihnen vor etwa 5000 Jahren entstanden waren.

Um 1156 entstand eine erste, noch hölzerne Wehranlage des Kremls, in deren 
Schutz sich der Marktflecken allmählich zu einer beachtlichen Ansiedlung 
entwickelte. Im Jahre 1238 ist die Stadt von den Mongolen erobert und 
niedergebrannt worden. 1263 wurde das Umland zu einem Teilfürstentum im 
Großfürstentum Wladimir-Susdal, wenig später unter Fürst Daniel ein 
eigenständiges Fürstentum. In der ersten Hälfte des 14. Jahrhunderts – die 
Stadt zählte mittlerweile 30.000 Einwohner – erkannte der tatarische 
Großkhan den Moskauer Großfürsten als (ihm allerdings tributpflichtiges) 
Oberhaupt von Russland an.

Der Sieg über die Tataren in der Schlacht von Kulikowo am 8. September 1380, 
angeführt durch den Moskauer Großfürsten Dmitri Donskoi, befreite zwar nicht 
von der Hegemonie der Goldenen Horde (1382 wurde Moskau sogar abermals 
niedergebrannt und geplündert), doch die Stadt festigte dadurch ihr 
politisches und militärisches Ansehen erheblich und gewann mithin beständig 
an wirtschaftlicher Macht. 1480 konnte sie die Tatarenherrschaft endgültig 
abschütteln und wurde zur Hauptstadt des russischen Reiches.

Der seit 1462 regierende Großfürst von Moskau Iwan III., der Große 
(1440–1505), heiratete 1472 die byzantinische Prinzessin Sofia (Zoe) 
Palaiologos, eine Nichte des letzten oströmischen Kaisers Konstantin XI. 
Palaiologos, und übernahm von dort die autokratische Staatsidee und ihre 
Symbole: den Doppeladler und das Hofzeremoniell. Seither gilt Moskau als 
„Drittes Rom“ und Hort der Orthodoxie.
Moskau wird Großstadt
Moskau am Ende des 17. Jahrhunderts

In den beiden letzten Jahrzehnten des 15. Jahrhunderts begann der Ausbau des 
Kreml, in dessen Umkreis sich nun in großer Zahl Handwerker und Kaufleute 
niederließen. Die Einwohnerzahl stieg bald darauf auf mehr als 100.000, so 
dass um 1600 eine Ringmauer um Moskau und eine Erdverschanzung hinzukamen, die 
die blühende Stadt fortan nach außen abschirmten. 1571 war sie ein letztes 
Mal von den Tataren heimgesucht worden, als die überwiegend aus Holz gebaute 
Stadt abbrannte. Bereits ein Jahr später war die Tatarengefahr in der Schlacht 
von Molodi südlich von Moskau aber endgültig gebannt. In der Zeit der Wirren, 
die durch unklare Thronfolgeverhältnisse ausgelöst wurde, rückten polnische 
Truppen in die Stadt und versuchten, eigene Marionetten zu installieren. Eine 
Volksarmee aus Nischni Nowgorod belagerte die Polen jedoch im Moskauer Kreml 
und zwang sie zur Kapitulation. Diese Ereignisse ebneten den Weg für die 
Romanow-Dynastie auf den russischen Thron.

Während die ersten Tuch-, Papier- und Ziegelmanufakturen, Glasfabriken und 
Pulvermühlen entstanden, kulminierten die sozialen Gegensätze des 
Großreiches: 1667 erhoben sich die Bauern im Wolga- und Dongebiet gegen die 
wachsende Unterdrückung, ihr Führer, Stepan Rasin, wurde 1671 auf dem Roten 
Platz in Moskau hingerichtet. Im Jahre 1687 ist die erste Hochschule Russlands, 
die „Slawisch-Griechische Akademie“ eröffnet worden, 1703 erschien die 
erste gedruckte russische Zeitung „Wedomosti“. Im Jahre 1712 ging unter Zar 
Peter dem Großen (1672–1725) das Privileg der Hauptstadt auf das neu 
gegründete Sankt Petersburg über, aber Moskau blieb das wirtschaftliche und 
geistig-kulturelle Zentrum des Landes. 1755 wurde in Moskau mit der heutigen 
Lomonossow-Universität die erste russische Universität eröffnet.
Der Brand von Moskau vor der Einnahme der Stadt durch Napoleon 1812
Twerskaja-Straße im 19. Jahrhundert

Mit dem Moskau des 18. Jahrhunderts ist das Schaffen hervorragender russischer 
Schriftsteller und Dichter verknüpft wie Alexander Sumarokow, Denis Fonwisin, 
Nikolai Karamsin und vieler anderer. In Moskau trat der große russische 
Gelehrte Michail Lomonossow seinen Weg in die Wissenschaft an. Auch in 
späteren Zeiten lebten und wirkten in Moskau viele berühmte russische 
Schriftsteller und Dichter, Wissenschaftler und Künstler, die durch ihr 
Schaffen nicht nur zur russischen, sondern auch zur Weltkultur einen immensen 
Beitrag geleistet haben.

Im Vaterländischen Krieg von 1812, als Napoleon Bonaparte (1769–1821) mit 
seiner „Großen Armee“ auf Moskau zumarschierte, verlor die Stadt in einem 
Flächenbrand – die Bewohner zündeten ihre Häuser an und flohen aus der 
Stadt – zwei Drittel ihrer Bausubstanz. Aber in Moskau kam die französische 
Armee zum Stehen, hier wurde sie wegen Hunger und Kälte zur Umkehr gezwungen, 
die mit ihrem Untergang endete.

Der im Frühjahr 1813 einsetzende großstilige Wieder- und Neuaufbau sprengte 
rasch den alten städtischen Verteidigungsring und verschaffte der Stadt von 
der Mitte des 19. Jahrhunderts an durch zügigen Straßen- und Bahnstreckenbau 
Anschluss an die wichtigsten Städte des Landes. 1890 fuhren die ersten 
elektrischen Straßenbahnen; die erste Volkszählung des Landes fand am 28. 
Januar 1897 statt, die Bevölkerung der Stadt war auf etwa eine Milli


java and ubuntu version:

chuulio@chuulio-UX32VD:~/Dokumente/Temporal_Annotation/Standalone$ lsb_release 
-a && java -version
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.10
Release:    12.10
Codename:   quantal
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.10.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

any idea?

Original issue reported on by [email protected] on 26 Aug 2014 at 8:54

  • Merged into: #18

strange rule matching error

Can someone explain why this rule

RULENAME="date_r15k",EXTRACTION="il %reDayNumber %reMonthLong 

works fine and matches

  il 3 aprile prossimo

but if I add this other rule


I get this error:

 DEBUGGING: tonormalize:UNDEF-%normThisNextLast(group(1))-%normUnit(group(2))
 DEBUGGING: 3 aprile prossimo
 DEBUGGING: hmR...:null
 Maybe problem with normalization of the resource: normThisNextLast
 Maybe problem with part to replace? 3

The second rule by itself works fine and matches "prossima settimana".

Thank you.

Original issue reported on by [email protected] on 19 Oct 2014 at 10:15

POS matching

It should be possible to use a regex to specify a POS_CONSTRAINT.

The POS tagger I am using provides morphological information, hence to detect a 
plural (either Smp or Sfp), I need to use S.p

It is enough to change checkPosConstraint to use:

     if (pos_as_is.matches(pos))


Original issue reported on by [email protected] on 20 Oct 2014 at 11:26

Mod "debug"

Another improvement that can be nice I think: an optionnal "debug" mod. 

Like all the trace printed (except errors) are not necessary useful and if you 
use it on a big collection and want to see if there are problems it produces a 
lot of text.

So it could be a field in the to activate or not this option.

But need to change everywhere in the code where there is a print to show what 
is done and add someting like that:
if(Config.get(Config.DEBUG).equals("true")) {

Original issue reported on by [email protected] on 25 May 2012 at 8:02

Incorrect value for decades/centuries?

I decided to update to 1.5 and I notice something that has changed.
If I submit this sentence:
"Near the southern end, signs saying 'Hatfield and the North' inspired the 
eponymous 1970s rock band Hatfield and the North."
The date "1970s" is tagged: <TIMEX3 tid="t86" type="DATE" 
Or before it was <TIMEX3 tid="t85" type="DATE" value="197X">1970s</TIMEX3>. 
Which make more sense (no ambiguity with year 197). 

The same with centuries:
now: <TIMEX3 tid="t4" type="DATE" value="11">the 12th century</TIMEX3>
before: <TIMEX3 tid="t3" type="DATE" value="11XX">the 12th century</TIMEX3>

Original issue reported on by [email protected] on 4 Feb 2014 at 4:19

NullPointerException in TreeTaggerWrapper

What steps will reproduce the problem?
1. Simple test using the TreeTaggerWrapper (the environment is well configured 
with the resources from here:

import java.util.Calendar;
import java.util.Date;

import de.unihd.dbs.heideltime.standalone.DocumentType;
import de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone;
import de.unihd.dbs.heideltime.standalone.OutputType;
import de.unihd.dbs.heideltime.standalone.POSTagger;
import de.unihd.dbs.heideltime.standalone.components.impl.TimeMLResultFormatter;
import de.unihd.dbs.uima.annotator.heideltime.resources.Language;

public class HeidelTest {

    public static void main(String[] args)
        throws DocumentCreationTimeMissingException {

    final String configProps = "/config.props";

    String configPath = HeidelTest.class.getResource(configProps).getFile();

    HeidelTimeStandalone heidel = new HeidelTimeStandalone(Language.FRENCH,
        DocumentType.NEWS, OutputType.TIMEML, configPath,

    Date documentCreationTime = Calendar.getInstance().getTime();

    String result = heidel.process(
"samedi 13 décembre 2014 à 20h00.",
        documentCreationTime, new TimeMLResultFormatter());




What is the expected output? What do you see instead?

    at de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper.doTreeTag(
    at de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper.process(
    at de.unihd.dbs.heideltime.standalone.components.impl.TreeTaggerWrapper.process(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.establishPartOfSpeechInformation(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.establishHeidelTimePreconditions(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.process(
    at com.glue.feed.html.demo.dates.HeidelTest.main(

What version of the product are you using? On what operating system?
HeidelTime 1.8 on Ubuntu 64

Please provide any additional information below.

Looking at the code in de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper 
class, line 494:

 if ((!(token.getPos().equals(null))) &&

token.getPos() cannot be compared to null by calling the equals method on it if 
it is actually null...

So, the previous portion of code should be replaced by:

if (token.getPos() != null
            && token.getPos().equals("EMPTYLINE")) {

Original issue reported on by [email protected] on 14 Dec 2014 at 10:24

improper handling of newline when reading files

The main() in class HeidelTimeStnadalone reads input with this loop:

    while ((line = fileReader.readLine()) != null)
This has the effect of adding a newline at the beginning and leaving the last 
line unterminated.

This affects the tokenizer and POS tagger I am using, which gets an extra empty 
token at the beginning and causing a disalignement in tokens.

It should be changed to:

    while ((line = fileReader.readLine()) != null)
       sb.append(line + System.getProperty("line.separator"));

Original issue reported on by [email protected] on 18 Oct 2014 at 8:20

"Standalone" is not standalone

What steps will reproduce the problem?
1. Download Heideltime
2. Try to run example on the front page

What is the expected output? 

A date.

What do you see instead?

stuff about perl

ryan@3G08:~/Downloads/heideltime-standalone-1.3$ pwd
ryan@3G08:~/Downloads/heideltime-standalone-1.3$ cat cat.txt 
Jannik Strötgen, Julian Zell, and Michael Gertz: HeidelTime: Tuning English 
and Developing Spanish Resources for TempEval-3. In SemEval13, 15-19, 2013
ryan@3G08:~/Downloads/heideltime-standalone-1.3$ java -jar 
de.unihd.dbs.heideltime.standalone.jar -t news cat.txt 
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use 
TreeTagger tokenizer: english-abbreviations
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use 
TreeTagger tokenizer: english.par
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] File missing to use 
TreeTagger tokenizer: utf8-tokenize.perl
Cannot find tree tagger (SET ME IN CONFIG.PROPS!/cmd/utf8-tokenize.perl). Make 
sure that path to tree tagger is set correctly in config.props!
If path is set correctly:

[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] Maybe you need to 
download the TreeTagger tagger-scripts.tar.gz
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] from
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] Extract this file 
and copy the missing file into the corresponding TreeTagger directories.
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy 
english-abbreviations into SET ME IN CONFIG.PROPS!/lib
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy 
english.par into SET ME IN CONFIG.PROPS!/lib
[de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper] If missing, copy 
utf8-tokenize.perl into SET ME IN CONFIG.PROPS!/cmd

What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on by [email protected] on 21 Jun 2013 at 2:02

"Charset mismatch" when running the standalone version under Ubuntu

I have previously used HeidelTime's standalone version on Mac OSX and on a 
Lubuntu machine without any problem. 
Today I tried it on Ubuntu (running as VirtualBox) and can't get rid of this 

usr@usr-VirtualBox:~/Downloads/Temporal_Annotation_Initial/Standalone$ java 
-jar de.unihd.dbs.heideltime.standalone.initial.jar 
/home/usr/Temporal_Annotation/lill_sample.txt -l german
Error: Unable to access jarfile de.unihd.dbs.heideltime.standalone.initial.jar
usr@usr-VirtualBox:~/Downloads/Temporal_Annotation_Initial/Standalone$ java 
-jar de.unihd.dbs.heideltime.standalone.jar 
/home/usr/Temporal_Annotation/lill_sample.txt -l german
java.lang.RuntimeException: Opps! Could not find token f�rbringen in JCas 
after tokenizing with TreeTagger. Hmm, there may exist a charset missmatch! 
Default encoding is UTF-8 and should always be UTF-8 (use 
-Dfile.encoding=UTF-8). If input document is not UTF-8 use -e option to set it 
according to the input, additionally.
    at de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper.tokenize(
    at de.unihd.dbs.uima.annotator.treetagger.TreeTaggerWrapper.process(
    at de.unihd.dbs.heideltime.standalone.components.impl.TreeTaggerWrapper.process(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.establishPartOfSpeechInformation(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.establishHeidelTimePreconditions(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.process(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.process(
    at de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone.main(
<?xml version="1.0"?>

178. Frondienst der Wurmsbacher Lehenleute 
1605 <TIMEX3 tid="t4" type="DATE" value="XXXX-06-06">Juni 6</TIMEX3>. 
Uf fürbringen unnd clagen der frawen äbbtißin unnd convent des würdigen 
gozhuß Wurmbspach gegen und wider jre lehenlüthen zu Wagen, jm Buech und in 
der Auw: Das dieselbigen vermeinen, die wyl die fraw andere jres gozhuß 
güetter verlichen, also dz sy keiner acherlüthen nit mer mangelbar und aber 
so sy die behallten, jnnen die ächer zebuwen 12 tag schuldig und sonnsten nit.

It annotates the first occurance of a temporal expression, and then stops... 
My input files are in UTF8:

usr@usr-VirtualBox:~/Temporal_Annotation$ file lill_sample.txt 
lill_sample.txt: UTF-8 Unicode text, with very long lines

I also tried to indicate -Dfile.encoding=UTF-8, as the error message says, but 
it doesn't help a bit...

Original issue reported on by [email protected] on 28 Jul 2014 at 2:27

Erroneous Date Recognized

What steps will reproduce the problem?
1. HeidelTimeStandalone hts_sci = new HeidelTimeStandalone(Language.ENGLISH, 
DocumentType.NARRATIVES, OutputType.TIMEML);
    String f = hts_sci.process("19-Nov-12", new Date(2012,01,05), new TimeMLResultFormatter());

It should pick out 19, November, 2012. Instead this is the result produced:

<?xml version="1.0"?>
19-<TIMEX3 tid="t0" type="DATE" value="3912-11">Nov</TIMEX3>-12

What is the work-around for this?

Original issue reported on by [email protected] on 4 Mar 2013 at 8:06

Regular expression


I would like to process a series of french dates such as:

"lundi 20, mardi 21 mercredi 22 jeudi 23 vendredi 24 samedi 25 et dimanche 26 

An equivalent in english would be:

"Monday 20, Tuesday 21 Wednesday 22 Thursday 23 Friday 24 Saturday 25 and 
Sunday, April 26"

where all the dates should apply ("be relative") to April.

I ended up writing the following rule:

RULENAME="date_r4d2",EXTRACTION="(%reWeekday %reDayNumber%reAndOrTo)+%reWeekday 

Note: %reAndOrTo is ( et | ou | au |,\s|\s) in my case

And I get this result:

Monday, March 9, 2015 0:00
Tuesday, March 10, 2015 0:00
Wednesday, March 11, 2015 0:00
Thursday, March 12, 2015 0:00
Friday, March 13, 2015 0:00
Saturday, April 25, 2015 0:00
Sunday, April 26, 2015 0:00

The XML version:
<TIMEX3 tid="t5" type="DATE" value="2015-03-09">lundi</TIMEX3> 20, <TIMEX3 
tid="t6" type="DATE" value="2015-03-10">mardi</TIMEX3> 21 <TIMEX3 tid="t7" 
type="DATE" value="2015-03-11">mercredi</TIMEX3> 22 <TIMEX3 tid="t8" 
type="DATE" value="2015-03-12">jeudi</TIMEX3> 23 <TIMEX3 tid="t9" type="DATE" 
value="2015-03-13">vendredi</TIMEX3> 24 <TIMEX3 tid="t4" type="DATE" 
value="2015-04-25">samedi 25</TIMEX3> et <TIMEX3 tid="t3" type="DATE" 
value="2015-04-26">dimanche 26 avril</TIMEX3>

As you can see, only the two last dates are correct.

The key here is that I have a repeatable group (%reWeekday 

I tried a lot of alternatives in my regular expression, like using a 
non-capturing group as in \(hello\), etc.

Actually, I do not know if it is an OFFSET issue or if Heideltime is not able 
to handle such regular expressions.
Do you have any clue that could help me ?

Thank you very much,


Original issue reported on by [email protected] on 14 Mar 2015 at 9:39

Installation issue

I'm trying to run heideltime on the commandline as follows: 

one-1.4>java -jar de.unihd.dbs.heideltime.standalone.jar test.txt -vv 

jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Verbosity: '-vv'; Logging level set to ALL.
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Encoding '-e': NOT FOUND OR RECOGNIZED; set to 'UTF-8'
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Language '-l': NOT FOUND; set to ENGLISH
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Document Creation Time '-dct': NOT FOUND; skipping.
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Locale '-locale': NOT FOUND, set to environment locale: en_US
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Configuration path '-c': config.props
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: trying to read in file config.props
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Reading document using charset: UTF-8
jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: HeidelTimeStandalone initialized with language english
Jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: HeidelTime initialized
Jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: JCas factory initialized
Jul 26, 2013 11:31:23 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Processing started
Jul 26, 2013 11:31:24 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Processing finished
Jul 26, 2013 11:31:24 AM de.unihd.dbs.heideltime.standalone.HeidelTimeStandalone
INFO: Result formatted

INFO: Result formatted
<?xml version="1.0"?>

Upcoming UIMA Seminars

April 7, 2004 Distillery Lunch Seminar
UIMA and its Metadata
12:00PM-1:00PM in HAW GN-K35.

Dave Ferrucci will give a UIMA overview and discuss the types of component metad
ata that UIMA components provide.  Jon Lenchner will give a demo of the Text Ana
lysis Engine configurator tool.

April 16, 2004 KM & I Department Tea
Title: An Eclipse-based TAE Configurator Tool
3:00PM-4:30PM in HAW GN-K35 .

Jon Lenchner will demo an Eclipse plugin for configuring TAE descriptors, which
will be available soon for you to use.  No more editing XML descriptors by hand!

May 11, 2004 UIMA Tutorial
9:00AM-5:00PM in HAW GN-K35.

This is a full-day, hands-on tutorial on UIMA, covering the development of Text
Analysis Engines and Collection Processing Engines, as well as how to include th
ese components in your own applications.

What version of the product are you using? On what operating system?
I am using the heideltime-standalone-1.4 on Windows. 

Please provide any additional information below.
No dates are tagged in the TimeML file, and I don't get any error message. It's 
the first time I'm trying to use heideltime, so probably something went wrong 
in the installation, but without error message it's hard to fix it. Any ideas 
what it could be?

Original issue reported on by [email protected] on 26 Jul 2013 at 9:37

availability of resources for Portuguese

I'm a software engineer working on a project that uses HeidelTime for English, 
Spanish, and Arabic.
Now I started developing resources for Portuguese and I wonder if there's any 
sharable preliminary work done in resources of Portuguese even if it's not 
officially released.
Thanks much.

Original issue reported on by [email protected] on 7 Jan 2015 at 2:20

Pass as parameter

Force to have file in the same folder can be a problem. My 
software has already a file. Should be better to have the 
possibility to submit config file we want.

So I added a "run" function in HeidelTimeStandalone so I can provide Config 
file, file of Input and file of Ouput. I attach it if it can be useful for 
others ;)

Original issue reported on by [email protected] on 25 May 2012 at 8:55


overlapping timexes produce broken XML in the standalone

When the standalone version of heideltime is used to tag a document that 
contains multiple overlapping temporal expressions, the TimeML writer module 
will produce invalid XML code.

Due to the nature of inline tags in TimeML documents, this condition cannot be 
resolved entirely satisfactorily; two overlapping timexes would produce 
overlapping XML tags which would be semantically invalid.

The condition of two overlapping timexes should *ideally* never occur, since if 
a temporal expression produces two overlapping timexes, this temporal 
expression should also be representable by a single timex that spans both of 
the smaller timexes. The recognition of temporal expressions however is subject 
to the utilized resources/rules and whether they include such a "larger" rule.

Different domains such as poetry however can produce unexpected sentence syntax 
which may elude any of the existing rulesets otherwise thought of as 

To resolve the bug that produces broken XML/TimeML tags, we will, for 
overlapping timexes, only create an XML tag for the first recognized timex, 
omitting all of the subsequent timexes that overlap with the first one.

Our thanks go to Armin Hoenen for bringing this bug to our attention.

Original issue reported on by [email protected] on 31 Jul 2012 at 12:19

Strange online result

What steps will reproduce the problem?
1. Browse to
2. Enter the String "02/08/2012 4:49 PM" (w/o ") into the form field "Input"
3. Press "Compute"

What is the expected output?
complete sting underlined w/ normalized date 2012-02-08 16:49:00

What do you see instead?
underlined "02/08/2012 4:49pan> PM" w/ normalized date 2012-02-08 4:49:00

What version of the product are you using? On what operating system?
online version

Original issue reported on by [email protected] on 25 Jun 2013 at 1:11

StanfordPOSTaggerWrapper model path


The initialize method of StanfordPOSTaggerWrapper class tests whether the file 
denoted by model_path exists, and then attempts to instantiate a MaxentTagger 
object with it. The Javadoc for the MaxentTagger constructor says that the 
modelFile parameter can be interpreted as a URL if it starts with "https?://" 
or can be loaded directly from the classpath as in 

I put my model file in my project's classpath (and I configured my config.props 
according to this resource path). Heidel Time fails because of the check of the 
pathname's existence. If I remove this check, it works like a charm.

It would be nice to reflect the MaxentTagger specification in 
StanfordPOSTaggerWrapper. I think StanfordPOSTaggerWrapper should only check 
that model_path is not null, and should leave the responsibility of the other 
checks to MaxentTagger. What do you think?

Thank you for this great library and the work you have done so far!

Original issue reported on by [email protected] on 17 Jan 2015 at 1:56

Descriptors of Chinese text


I would like to process Chinese documents, they are .txt files without any  
tokenization and segmentation. which reader and annotation descriptors should I 

I have tested the FileSystemCollectionReader in the example project of UIMA, 
and ACETernReader, Eventi2014Reader, Tempeval2Reader and Tempeval3Reader in 
heideltime, with the StanfordPosTagger and Heideltime annotation descriptors, 
but I cannot get the right result under all these choices! Actually, the output 
has no any annotation.

while I copy the input text from the .txt file into the heideltime online demo 
input dialogue, it works, how does the demo generate the right result? Do I 
need to write a new reader for my input by myself?

my heideltime version is: heideltime-kit 1.8.
and my system is: ubuntu 14.04.1

By the way, my input is a piece of Chinese news, the creation time is January 
29, 2014. the input is as follows:
        中新网1月29日电 综合马来西亚、新加坡等媒体消息,马来西亚民航局总监阿兹哈鲁丁于29日下午6时,通过国营电视台TV1针对MH370事件最新进展作出汇报。

The expected output is as the output of heideltime online demo, which annotates 
the "1月29日", "今天", "目前", "下午3时30分".

But the actual output with heideltime-kit 1.8 is the same with the input 
without any annotation.

Thank you very much!



Original issue reported on by [email protected] on 24 Apr 2015 at 4:02

