tuetschek / en-deep Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 2.19 MB

MLProcess – a framework for batch parallel processing of various NLP tasks

TeX 13.37% Perl 0.94% Shell 0.27% Java 61.55% PostScript 23.87%

en-deep's People

Contributors

Stargazers

Watchers

en-deep's Issues

Line counting in ScenarioParser is faulty

The line numbers doesn't even reflect the actual position (i.e. the
currently parsed task).

Original issue reported on code.google.com by [email protected] on 23 Apr 2010 at 3:56

Usage of java.util.Vector and java.util.Hashtable

Reduce the usage of obsolete synchronized collections.

Original issue reported on code.google.com by [email protected] on 21 Jun 2010 at 3:16

Find a legal way to stop the running workers

There should be a "legal" way to stop the running workers, such as special file 
that the task should check for existence upon each task retrieval.

Original issue reported on code.google.com by [email protected] on 9 Jul 2010 at 5:03

StToArff should clear output files before filling new data

StToArff should clear all the output files before each run.

Original issue reported on code.google.com by [email protected] on 16 Apr 2010 at 11:42

Bugs in Task reset

Task reset feature should be reviewed, its behavior is sometimes strange (for a 
prefix, it resets the whole plan).

Original issue reported on code.google.com by [email protected] on 9 Jul 2010 at 5:04

Partial rebuild of plan on reset tasks

If there is a change in the scenario file, it should be recognized upon
task reset (just for the tasks that are to be reset).

Original issue reported on code.google.com by [email protected] on 16 Apr 2010 at 10:09

Issue warnings on pattern collision

Issue warnings if there may be a pattern collision (one pattern is a
subpattern of another) in order to prevent problems in scenario reruns.

E.g.: file*.txt and file1*.txt exist in two different tasks, but file1*.txt
is produced later. If file1*.txt is already produced and the task that
produces file*.txt gets reset, the pattern expansion includes also
file1*.txt and it may get messy.

Original issue reported on code.google.com by [email protected] on 30 Apr 2010 at 12:22

Add support for local temp-files

There should be some switch with path prefix that will be considered to be 
local, e.g. /tmp. All tasks which have this prefix in their I/O specs and 
depend on each other should then be lined up for computation on the same 
machine.

Original issue reported on code.google.com by [email protected] on 17 Jul 2010 at 8:33

Improve expansion performance

In the current setting, the expanded task is copied along with all the 
dependencies of the original task, which are then removed. This increases 
complexity and reduces performance with more than ca. 10000's of tasks.

Original issue reported on code.google.com by [email protected] on 24 Jul 2010 at 5:48

UTF-8 characters broken

The program is yet unable to handle UTF-8 characters in the input ST files.

Original issue reported on code.google.com by [email protected] on 21 Aug 2010 at 11:38

Parsing does not report error if two algorithm parameters are not separated by comma

If an input like this is provided:

params: lang_conf="st-en.conf", omit_semclass="1", predicted="1",
pred_only="1" generate="Children";

then the parser doesn't report an error, but the last parameter is not
recognized at all.

The parser should report an error.

Original issue reported on code.google.com by [email protected] on 16 Apr 2010 at 10:07

GreedyAttributeSearch + Incomplete rankings

GreedyAttributeSearch does not work well together with attribute rankings that 
do not contain all attributes.

Original issue reported on code.google.com by [email protected] on 31 Jul 2010 at 9:49

Add Children patterns without function words

Add new generated feature: children patterns without function words.

Original issue reported on code.google.com by [email protected] on 16 Apr 2010 at 10:49

Create a parse-only option for Process

There should be an option, which, when selected, just parses the scenario file 
and end the whole program. It would be useful just to check for errors in the 
plan file before launching the process.

Original issue reported on code.google.com by [email protected] on 9 Jul 2010 at 5:49

Partial task patterns should be implemented

For a task that outputs a-**.txt, some of which are a-*-x.txt, there is no way 
to capture just the a-*-x.txt in the input of another task. 

There should be something like a-*|-x|.txt, which would depend on a task 
producing a-**.txt, but take only a-*-x.txt as input

Original issue reported on code.google.com by [email protected] on 7 Jul 2010 at 3:52

Expansion -- prefixes and suffixes not working

In the current version, sub-specifications of prefixes and suffixes for the 
expansions do not work (only if they're at the beginning of the expansion 
transitive line).

Original issue reported on code.google.com by [email protected] on 26 Jul 2010 at 4:45

tuetschek / en-deep Goto Github PK

en-deep's People

Contributors

Stargazers

Watchers

en-deep's Issues

Recommend Projects

Recommend Topics

Recommend Org