Coder Social home page Coder Social logo

rug-compling / dact Goto Github PK

View Code? Open in Web Editor NEW
13.0 5.0 2.0 4.96 MB

Decaffeinated Alpino Corpus Tool

Home Page: https://rug-compling.github.io/dact/

License: GNU Lesser General Public License v2.1

C++ 89.20% Python 0.23% XSLT 6.48% Objective-C++ 0.24% NSIS 1.00% Nix 1.34% Meson 1.46% Shell 0.05%
alpino treebank search xpath

dact's People

Contributors

danieldk avatar jelmervdl avatar larsmans avatar pebbe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dact's Issues

Zooming behavior for small trees

Dact uses zoom to fit when no highlight expression is used. This makes tree nodes enormous if the tree is small. For smaller trees, we should zoom a bit more cautiously.

Ability to cancel query

Currently, queries can be cancelled by pressing the escape key while the textfield is focussed. The problem is that the iterator blocks while searching for results, and there is no way to interrupt this block. As long as there are results found frequently, this is not a problem, but when running a query that doesn't yield any results, there is no way to interrupt it.

I already tried running the query in a separate thread, and kill the thread when I want to cancel a query. This works, sort of, but I can't run a second query because when the previous thread was terminated, it still owned the lock on some internal dbxml resources, which of course it didn't unlock.

Apparently dbxml has an API to cancel the query (I found this by searching thru the source code of their shell utility) XmlQueryContext::interruptQuery(). This would need to be implemented in AlpinoCorpus.

Some queries crash Dact

As I was working on finding a solution for #32, I found that //node[@rel='su']/@root/string() crashes Dact.

The error, when run in the BracketedWindow: An attempt was made to perform an axis step when the Context Item was not a node [err:XPTY0020], :1:56 After that, when switching to the dependency tree and file list, an exception is thrown somewhere in a QtConcurrent thread. Apparently an exception is thrown by AlpinoCorpus that is not caught by catching alpinocorpus::Error.

Dact segfaults after opening two corpora consecutively

Opening a corpus with Dact works ok, but opening another (or the same corpus) causes a segfault in QListMode::data() when redrawing the QListWidget for files. Since the Qt GUI library is not thread-safe, I think this is caused by calling addFiles() in readAndShowFiles(). If I move this call to a method that is called in the UI thread (e.g. corpusRead()), Dact does not crash.

Switching away from the Statistics tab is slow

When I switch between the three tabs, switching between Tree and Sentences is always quick, and switchting to Statistics is also quite speedy. But when I switch from Statistics to one of the other two, it takes one or two seconds.

I don't yet know what causes it, and whether it only happens on my setup or on other configurations as well.

statistics tab: display number of hits for which relevant attribute is not defined

if you are counting the attribute "word", there may be hits of your query for which that attribute is not defined. It would be very useful to know the number of such hits.

e.g. the query //node[@rel="hd"] will also contain multi-word-units, which will go unnoticed if you only count the word attribute.

Other options could be to count combinations of attributes, or the attributes of all of the descendents of a node !?!

Add functionality to show only matches

Comparable to the bracketed sentence window, but only show matches. Maybe we should integrate this with the bracketed sentences window and #3.

Suggested by Gertjan van Noord.

Progress bars vs. lazy querying

It's impossible to get the size of a corpus subset when using DB XML's lazy query mechanism, since results are generated on-the-fly rather than as a complete set. I suggest replacing the progress bars with a simple counter window that counts up as results are retrieved.

(Using eager querying may take huge amounts of memory on the larger treebanks and beats the purpose of the database compression that I just got working.)

Incorrect matrix check

From DactTreeView.cpp:

if (matrix().m11() > 1.0 || matrix().m12() > 1.0)

Note that m12 is the vertical shearing factor, don't we want to look at m22 instead (vertical scaling factor)?

BracketedWindow shows empty entries

The KeywordInContext and Visibility delegates both use the hits counter (the second column) of the QueryModel to guess their size for sizeHint. Only not all hits are drawn.

For example: //node[@rel="su"] also matches moved nodes, but these nodes don't have @root or @word so they are not drawn.

Possible solution: alter the query entered by the user and add 'and @root' as a second condition. This way only hits that will be drawn are counted. Downside of this is that @root is an assumption which does not need to hold if the user altered bracketed-sentence.xsl.

(edit: I don't think this will work. e.g. the NP node will be su, but it is a nonterminal. The terminals inside it won't match the query, but do need to be displayed.)

Build breaks on Linux

I can't build on Linux after Jelmer's merge of master into dbxml. A file DactMainWindow is reported missing. I can't push modifications into GitHub either, since I have to "fast-forward" to the broken build (which I don't want to do).

Dact sometimes hangs on Linux

On Linux, I sometimes (not consistently) get the error:

QObject::setParent: Cannot set parent, new parent is in a different thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.

on stderr, with the program hanging. I've no idea what's causing this, since it doesn't always happen. It might just be the X11 config on RUG LWP Kubuntu boxes.

Add 'Copy' to the 'Edit' menu

Currently, entries can be copied by using Cmd/Control - c. Add an item to the 'Edit' menu as well. On Linux, make sure we copy to the X11 clipboard as well.

Qt is not 64-bit clean

Qt uses int instead of size_t for container sizes, so once we hit the 2^31 (2.1bln) barrier for corpus sizes, things are going to break badly on Intel platforms.

Corpus export causes segfaults

Corpus exports cause segfaults on Linux in the UI thread. Probably caused by operations on d_exportProgressDialog from a non-UI thread.

Crash when changing tabs with no corpus loaded

Apparently it tries to run a query when you switch to the statistics- or graph-tab, even when there is no corpusreader. Note that switching to the bracketedwindow tab does not cause a crash.

0   nl.rug.Dact                     0x0000000100019581 QueryModel::runQuery(QString const&) + 21
1   nl.rug.Dact                     0x0000000100035813 StatisticsWindow::startQuery() + 267
2   nl.rug.Dact                     0x000000010002217f MainWindow::tabChanged(int) + 345

Zoom in/out on tree using scrollwheel

The zoom buttons are on the top left of the window by default, but the tree view is on the right. Being able to zoom with Ctrl+scrollwheel would be a great feature.

progress indicator for query processing

some queries take a long time and/or have few hits. It would be extremely useful to know the progress of the query. One
poor man's way would be, to use the list of all file-names, and to use the file-name of the last match to guess how far the
query is (this does not work if there are no hits at all, though)

Crash in BracketedWindow when no corpus is loaded

When you start Dact clean without loading a corpus and enter a query in the BracketedWindow, Dact will crash on a segfault. I think it lies deep inside FilterModel which can't handle null pointers at this moment.

updateTreenodeButtons crashes on quit

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000001bb928498
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0                                   0x0000000100005f9c DactMainWindow::updateTreeNodeButtons() + 156
1                                   0x000000010003b97e DactMainWindow::qt_metacall(QMetaObject::Call, int, void**) + 1150
2   QtCore                          0x0000000100de1c1b QMetaObject::activate(QObject*, QMetaObject const*, int, void**) + 603
3   QtGui                           0x000000010073da88 QGraphicsScenePrivate::removeItemHelper(QGraphicsItem*) + 1128
4   QtGui                           0x000000010070dd4e QGraphicsItem::~QGraphicsItem() + 286
5                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
6   QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
7                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
8   QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
9                                   0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
10  QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
11                                  0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
12  QtGui                           0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
13                                  0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
14  QtGui                           0x0000000100730ab5 QGraphicsScene::clear() + 85
15  QtGui                           0x0000000100730c18 QGraphicsScene::~QGraphicsScene() + 72
16                                  0x000000010003d498 DactTreeScene::~DactTreeScene() + 56
17                                  0x0000000100009b78 DactMainWindow::~DactMainWindow() + 168
18                                  0x000000010003b0cc DactApplication::~DactApplication() + 44
19                                  0x000000010003a9a2 main + 978
20                                  0x0000000100003c88 start + 52

Dact does not display words in nodes

Here at U. Amsterdam, Dact seems not to display words in nodes, while in Groningen it did. This makes interpreting the tree quite hard. I didn't change the code in any way. Screenshot:

screenshot

macro's in separate file that I can edit with my favorite editor

to use macro's effectively, I'd much rather be able to edit them with Emacs. Also, there could then be different sets of macro's (in different files) depending on my application (or corpus). So macro's would not be part of the settings, but there would be a way to load a file which contains macro's (from command line as well as from graphical interface)

Export subset of corpus

Add some sort of export functionality that can be used to create a subset of the loaded corpus, e.g. the results after a filter query.

Change file extension for DB XML?

We're currently using .dbxml for Berkeley DB-based treebanks, but shouldn't we switch to (say) .alpino to prevent confusion, esp. if we're going to associate Dact with the extension?

Node inspector

It would be useful to have an inspector for tree nodes. When selecting a node from a tree, this inspector would simply show a table with attributes and values.

dact filetype icon

I don't know who made the original Dact coffee cup icon, but it would be nice if we also had an icon for the .dact databases and maybe even for the .data.dz files*

It could be something generic like the coffee cup painted on the generic document icon, or something more creative like a pack of coffee with the coffee cup as a logo in it. Its purpose is to indicate that double-clicking the file will open it with Dact.

*.data.dz files are a bit problematic since .dz is the extension seen by most operating systems. Maybe it is too generic to be associated with Dact.

Sentence field should show the complete sentence

The sentence is now only partially shown, and there is no scrollbars. We need to find a more creative solution. One proposal is to draw the sentence in the treeview, under the relevant nodes.

save results for each of the tabs

From the tree-tab, I can save the results using cut-and-paste. Same for the statistics-tab. However, I cannot select
multiple lines for the Sentences-tab, so there currently is no way to save/export these results into some text file.

For each of the tree tabs, it might be useful to add an explicit "export" function of the selected lines?

Crash when changing filter query

When you enter a filter query, select a result (play with the tree and show the inspector etc) and then enter a new filter query which does not match your previously selected result, Dact crashes.

0   libSystem.B.dylib               0x00007fff885085d6 __kill + 10
1   libSystem.B.dylib               0x00007fff885a8cd6 abort + 83
2   QtCore                          0x0000000100f69455 qt_message_output(QtMsgType, char const*) + 117
3   QtCore                          0x0000000100f69637 qt_message(QtMsgType, char const*, __va_list_tag*) + 183
4   QtCore                          0x0000000100f697fa qFatal(char const*, ...) + 170
5                                   0x0000000100017ac2 QList::at(int) const + 66 (qlist.h:439)
6                                   0x000000010002ec2b MainWindow::entrySelected(QItemSelection const&, QItemSelection const&) + 101 (MainWindow.cpp:314)
7                                   0x000000010002ed54 MainWindow::setHighlight(QString const&) + 128 (MainWindow.cpp:826)
8                                   0x000000010002efb5 MainWindow::filterChanged() + 395 (MainWindow.cpp:446)
9                                   0x00000001000551f5 MainWindow::qt_metacall(QMetaObject::Call, int, void**) + 843 
...
ASSERT failure in QList::at: "index out of range", file /Developer/SDKs/MacOSX10.6.sdk/Library/Frameworks/QtCore.framework/Headers/qlist.h, line 439
Abort trap

(version 1bb9e1b)

Statistics window blocks when searching

Quoting Jelmer: "It looks like my last commit, which adds the macros, somehow locks up the UI thread while the mapper is working.

Removing the insertRow() and updateResultsPercentages() calls from the DactQueryWindow::attributeFound() method removes the effect. Maybe we could insert results in batches so these calls are made less often."

Keywords in context

Feature request: besides a bracketed sentences, a keyword/match in context (KWIC) overview would be nice.

Suggestion courtesy of Gosse Bouma.

Add an application font configuration option

Add a preferences window that allows the user to change the application font and size.

Suggested by Gertjan van Noord.

Note: this is done, but does not work correctly on OS X.

Statistics window results are unsorted

Using QSortFilterProxyModel is not a real option since it's painfully slow. I think we need to implement sorting in QueryModel itself. It could be pretty performant if we just keep our internal results sorted. I do not yet know how to combine this with the dataChanged signal when we need to insert something at the top of the list.

Opening a directory corpus fails

Backtrace:

#1  0x00000001011598ca in alpinocorpus::DirectoryCorpusReader::DirIter::equals (this=0x11aa173e0, other=0x0) at /Users/daniel/git/alpinocorpus/src/DirectoryCorpusReader.cpp:62
62          DirIter const &that = dynamic_cast<DirIter const &>(*other);

I used master branches of both alpinocorpus and Dact.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.