rug-compling / dact Goto Github PK

View Code? Open in Web Editor NEW

13.0 5.0 2.0 4.96 MB

Decaffeinated Alpino Corpus Tool

Home Page: https://rug-compling.github.io/dact/

License: GNU Lesser General Public License v2.1

C++ 89.20% Python 0.23% XSLT 6.48% Objective-C++ 0.24% NSIS 1.00% Nix 1.34% Meson 1.46% Shell 0.05%

alpino treebank search xpath

dact's People

Contributors

Stargazers

Watchers

Forkers

matswillemsen evdmade01

dact's Issues

Zooming behavior for small trees

Dact uses zoom to fit when no highlight expression is used. This makes tree nodes enormous if the tree is small. For smaller trees, we should zoom a bit more cautiously.

Ability to cancel query

Currently, queries can be cancelled by pressing the escape key while the textfield is focussed. The problem is that the iterator blocks while searching for results, and there is no way to interrupt this block. As long as there are results found frequently, this is not a problem, but when running a query that doesn't yield any results, there is no way to interrupt it.

I already tried running the query in a separate thread, and kill the thread when I want to cancel a query. This works, sort of, but I can't run a second query because when the previous thread was terminated, it still owned the lock on some internal dbxml resources, which of course it didn't unlock.

Apparently dbxml has an API to cancel the query (I found this by searching thru the source code of their shell utility) XmlQueryContext::interruptQuery(). This would need to be implemented in AlpinoCorpus.

Some queries crash Dact

As I was working on finding a solution for #32, I found that //node[@rel='su']/@root/string() crashes Dact.

The error, when run in the BracketedWindow: An attempt was made to perform an axis step when the Context Item was not a node [err:XPTY0020], :1:56 After that, when switching to the dependency tree and file list, an exception is thrown somewhere in a QtConcurrent thread. Apparently an exception is thrown by AlpinoCorpus that is not caught by catching alpinocorpus::Error.

Dact segfaults after opening two corpora consecutively

Opening a corpus with Dact works ok, but opening another (or the same corpus) causes a segfault in QListMode::data() when redrawing the QListWidget for files. Since the Qt GUI library is not thread-safe, I think this is caused by calling addFiles() in readAndShowFiles(). If I move this call to a method that is called in the UI thread (e.g. corpusRead()), Dact does not crash.

Switching away from the Statistics tab is slow

When I switch between the three tabs, switching between Tree and Sentences is always quick, and switchting to Statistics is also quite speedy. But when I switch from Statistics to one of the other two, it takes one or two seconds.

I don't yet know what causes it, and whether it only happens on my setup or on other configurations as well.

start dact with multiple .dact files

cf e.g. /net/aistaff/vannoord/z/Alpino/Treebank/Machine/NLWIKI20110804/COMPACT/
I want to be able to run dact with 5 or 10 or 20 dact files.

statistics tab: display number of hits for which relevant attribute is not defined

if you are counting the attribute "word", there may be hits of your query for which that attribute is not defined. It would be very useful to know the number of such hits.

e.g. the query //node[@rel="hd"] will also contain multi-word-units, which will go unnoticed if you only count the word attribute.

Other options could be to count combinations of attributes, or the attributes of all of the descendents of a node !?!

Add functionality to show only matches

Comparable to the bracketed sentence window, but only show matches. Maybe we should integrate this with the bracketed sentences window and #3.

Suggested by Gertjan van Noord.

Copying multiple entries

Make it possible to copy multiple entries from the entry list.

Progress bars vs. lazy querying

It's impossible to get the size of a corpus subset when using DB XML's lazy query mechanism, since results are generated on-the-fly rather than as a complete set. I suggest replacing the progress bars with a simple counter window that counts up as results are retrieved.

(Using eager querying may take huge amounts of memory on the larger treebanks and beats the purpose of the database compression that I just got working.)

Cache queries

Cache previous queries.

Incorrect matrix check

From DactTreeView.cpp:

if (matrix().m11() > 1.0 || matrix().m12() > 1.0)

Note that m12 is the vertical shearing factor, don't we want to look at m22 instead (vertical scaling factor)?

BracketedWindow shows empty entries

The KeywordInContext and Visibility delegates both use the hits counter (the second column) of the QueryModel to guess their size for sizeHint. Only not all hits are drawn.

For example: //node[@rel="su"] also matches moved nodes, but these nodes don't have @root or @word so they are not drawn.

Possible solution: alter the query entered by the user and add 'and @root' as a second condition. This way only hits that will be drawn are counted. Downside of this is that @root is an assumption which does not need to hold if the user altered bracketed-sentence.xsl.

(edit: I don't think this will work. e.g. the NP node will be su, but it is a nonterminal. The terminals inside it won't match the query, but do need to be displayed.)

QFSFileEngine::open: No file name specified messages

Dact on Linux gives the following message when opening a directory-based corpus:

QFSFileEngine::open: No file name specified

Match highlighting in the main window

It would be nice if matches are also highlighted in the sentence edit in the main window.

Build breaks on Linux

I can't build on Linux after Jelmer's merge of master into dbxml. A file DactMainWindow is reported missing. I can't push modifications into GitHub either, since I have to "fast-forward" to the broken build (which I don't want to do).

Dact sometimes hangs on Linux

On Linux, I sometimes (not consistently) get the error:

QObject::setParent: Cannot set parent, new parent is in a different thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.

on stderr, with the program hanging. I've no idea what's causing this, since it doesn't always happen. It might just be the X11 config on RUG LWP Kubuntu boxes.

Add 'Copy' to the 'Edit' menu

Currently, entries can be copied by using Cmd/Control - c. Add an item to the 'Edit' menu as well. On Linux, make sure we copy to the X11 clipboard as well.

Use of single quotes in queries

$SUBJECT doesn't work.

Enable the inspector by default

See title.

Qt is not 64-bit clean

Qt uses int instead of size_t for container sizes, so once we hit the 2^31 (2.1bln) barrier for corpus sizes, things are going to break badly on Intel platforms.

Corpus export causes segfaults

Corpus exports cause segfaults on Linux in the UI thread. Probably caused by operations on d_exportProgressDialog from a non-UI thread.

Crash when changing tabs with no corpus loaded

Apparently it tries to run a query when you switch to the statistics- or graph-tab, even when there is no corpusreader. Note that switching to the bracketedwindow tab does not cause a crash.

0   nl.rug.Dact                     0x0000000100019581 QueryModel::runQuery(QString const&) + 21
1   nl.rug.Dact                     0x0000000100035813 StatisticsWindow::startQuery() + 267
2   nl.rug.Dact                     0x000000010002217f MainWindow::tabChanged(int) + 345

Dead links on Dact homepage

http://www.inl.nl/en/corpora/lassy-corpus and http://www.inl.nl/en/corpora/lassy-large-corpus, listed at http://rug-compling.github.com/dact/, both end in a 404. Maybe http://www.let.rug.nl/vannoord/Lassy/ is a better link?

Zoom in/out on tree using scrollwheel

The zoom buttons are on the top left of the window by default, but the tree view is on the right. Being able to zoom with Ctrl+scrollwheel would be a great feature.

progress indicator for query processing

some queries take a long time and/or have few hits. It would be extremely useful to know the progress of the query. One
poor man's way would be, to use the list of all file-names, and to use the file-name of the last match to guess how far the
query is (this does not work if there are no hits at all, though)

Be more explicit on what is counted in the statistics widget

Currently, it is not clear to the user whether the statistics windows counts the number of entries or the number of nodes for which the query matches.

Crash in BracketedWindow when no corpus is loaded

When you start Dact clean without loading a corpus and enter a query in the BracketedWindow, Dact will crash on a segfault. I think it lies deep inside FilterModel which can't handle null pointers at this moment.

updateTreenodeButtons crashes on quit

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000001bb928498
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0                                   0x0000000100005f9c DactMainWindow::updateTreeNodeButtons() + 156
1                                   0x000000010003b97e DactMainWindow::qt_metacall(QMetaObject::Call, int, void**) + 1150
2   QtCore                          0x0000000100de1c1b QMetaObject::activate(QObject*, QMetaObject const*, int, void**) + 603
3   QtGui                           0x000000010073da88 QGraphicsScenePrivate::removeItemHelper(QGraphicsItem*) + 1128
4   QtGui                           0x000000010070dd4e QGraphicsItem::~QGraphicsItem() + 286
5                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
6   QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
7                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
8   QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
9                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
10  QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
11                                  0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
12  QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
13                                  0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
14  QtGui                           0x0000000100730ab5 QGraphicsScene::clear() + 85
15  QtGui                           0x0000000100730c18 QGraphicsScene::~QGraphicsScene() + 72
16                                  0x000000010003d498 DactTreeScene::~DactTreeScene() + 56
17                                  0x0000000100009b78 DactMainWindow::~DactMainWindow() + 168
18                                  0x000000010003b0cc DactApplication::~DactApplication() + 44
19                                  0x000000010003a9a2 main + 978
20                                  0x0000000100003c88 start + 52

Dact crashes on addFiles when trying to open an nonexisting file

Just noticed this when I tried to load a renamed (therefore no longer existing) dbxml file from the commandline Dact would no longer startup and crash on a EXC_BAD_ACCESS singal.

Dact does not display words in nodes

Here at U. Amsterdam, Dact seems not to display words in nodes, while in Groningen it did. This makes interpreting the tree quite hard. I didn't change the code in any way. Screenshot:

only show "inspector" if tree-tab is displayed

macro's in separate file that I can edit with my favorite editor

to use macro's effectively, I'd much rather be able to edit them with Emacs. Also, there could then be different sets of macro's (in different files) depending on my application (or corpus). So macro's would not be part of the settings, but there would be a way to load a file which contains macro's (from command line as well as from graphical interface)

Export subset of corpus

Add some sort of export functionality that can be used to create a subset of the loaded corpus, e.g. the results after a filter query.

Make previous/next node context-sensitive

The next/previous nodes buttons could be disabled if there is no highlighted node.

Change file extension for DB XML?

We're currently using .dbxml for Berkeley DB-based treebanks, but shouldn't we switch to (say) .alpino to prevent confusion, esp. if we're going to associate Dact with the extension?

Pressing enter in the highlight query field does nothing

For some reason d_ui->fileListWidget->selectionModel()->selectedRows().size(); in DactMainWindow::highlightChanged is always zero.

Node inspector

It would be useful to have an inspector for tree nodes. When selecting a node from a tree, this inspector would simply show a table with attributes and values.

Improve visibility of matches in the bracketed sentences overview

Feature request: improve the visibility of matches in the bracketed sentences overview.

Reported by Gosse Bouma.

dact filetype icon

I don't know who made the original Dact coffee cup icon, but it would be nice if we also had an icon for the .dact databases and maybe even for the .data.dz files*

It could be something generic like the coffee cup painted on the generic document icon, or something more creative like a pack of coffee with the coffee cup as a logo in it. Its purpose is to indicate that double-clicking the file will open it with Dact.

*.data.dz files are a bit problematic since .dz is the extension seen by most operating systems. Maybe it is too generic to be associated with Dact.

Sentence field should show the complete sentence

The sentence is now only partially shown, and there is no scrollbars. We need to find a more creative solution. One proposal is to draw the sentence in the treeview, under the relevant nodes.

save results for each of the tabs

From the tree-tab, I can save the results using cut-and-paste. Same for the statistics-tab. However, I cannot select
multiple lines for the Sentences-tab, so there currently is no way to save/export these results into some text file.

For each of the tree tabs, it might be useful to add an explicit "export" function of the selected lines?

Matching nodes without the given attribute in the statistics widget

The statistics widget now only counts cases where 1.) the query matches, and 2.) the selected attribute exists. Should we provide counts of cases where only (1) is true?

Consider performance and presentation before implementing.

Crash when changing filter query

When you enter a filter query, select a result (play with the tree and show the inspector etc) and then enter a new filter query which does not match your previously selected result, Dact crashes.

0   libSystem.B.dylib               0x00007fff885085d6 __kill + 10
1   libSystem.B.dylib               0x00007fff885a8cd6 abort + 83
2   QtCore                          0x0000000100f69455 qt_message_output(QtMsgType, char const*) + 117
3   QtCore                          0x0000000100f69637 qt_message(QtMsgType, char const*, __va_list_tag*) + 183
4   QtCore                          0x0000000100f697fa qFatal(char const*, ...) + 170
5                                   0x0000000100017ac2 QList::at(int) const + 66 (qlist.h:439)
6                                   0x000000010002ec2b MainWindow::entrySelected(QItemSelection const&, QItemSelection const&) + 101 (MainWindow.cpp:314)
7                                   0x000000010002ed54 MainWindow::setHighlight(QString const&) + 128 (MainWindow.cpp:826)
8                                   0x000000010002efb5 MainWindow::filterChanged() + 395 (MainWindow.cpp:446)
9                                   0x00000001000551f5 MainWindow::qt_metacall(QMetaObject::Call, int, void**) + 843 
...

ASSERT failure in QList::at: "index out of range", file /Developer/SDKs/MacOSX10.6.sdk/Library/Frameworks/QtCore.framework/Headers/qlist.h, line 439
Abort trap

(version 1bb9e1b)

Statistics window blocks when searching

Quoting Jelmer: "It looks like my last commit, which adds the macros, somehow locks up the UI thread while the mapper is working.

Removing the insertRow() and updateResultsPercentages() calls from the DactQueryWindow::attributeFound() method removes the effect. Maybe we could insert results in batches so these calls are made less often."

Document use of number() for @begin and @end

When using the begin/end attributes, it is often necessary to explicitly convert these attributes to numbers. Document this in the manual. Inspiration is available from:

http://www.let.rug.nl/vannoord/Lassy/sa-man_lassy.pdf
http://www.let.rug.nl/Presentations/Gent11/Update/post.pdf

Keywords in context

Feature request: besides a bracketed sentences, a keyword/match in context (KWIC) overview would be nice.

Suggestion courtesy of Gosse Bouma.

Add an application font configuration option

Add a preferences window that allows the user to change the application font and size.

Suggested by Gertjan van Noord.

Note: this is done, but does not work correctly on OS X.

Statistics window results are unsorted

Using QSortFilterProxyModel is not a real option since it's painfully slow. I think we need to implement sorting in QueryModel itself. It could be pretty performant if we just keep our internal results sorted. I do not yet know how to combine this with the dataChanged signal when we need to insert something at the top of the list.

Opening a directory corpus fails

Backtrace:

#1  0x00000001011598ca in alpinocorpus::DirectoryCorpusReader::DirIter::equals (this=0x11aa173e0, other=0x0) at /Users/daniel/git/alpinocorpus/src/DirectoryCorpusReader.cpp:62
62          DirIter const &that = dynamic_cast<DirIter const &>(*other);

I used master branches of both alpinocorpus and Dact.