rug-compling / dact Goto Github PK
View Code? Open in Web Editor NEWDecaffeinated Alpino Corpus Tool
Home Page: https://rug-compling.github.io/dact/
License: GNU Lesser General Public License v2.1
Decaffeinated Alpino Corpus Tool
Home Page: https://rug-compling.github.io/dact/
License: GNU Lesser General Public License v2.1
Dact uses zoom to fit when no highlight expression is used. This makes tree nodes enormous if the tree is small. For smaller trees, we should zoom a bit more cautiously.
Currently, queries can be cancelled by pressing the escape key while the textfield is focussed. The problem is that the iterator blocks while searching for results, and there is no way to interrupt this block. As long as there are results found frequently, this is not a problem, but when running a query that doesn't yield any results, there is no way to interrupt it.
I already tried running the query in a separate thread, and kill the thread when I want to cancel a query. This works, sort of, but I can't run a second query because when the previous thread was terminated, it still owned the lock on some internal dbxml resources, which of course it didn't unlock.
Apparently dbxml has an API to cancel the query (I found this by searching thru the source code of their shell utility) XmlQueryContext::interruptQuery(). This would need to be implemented in AlpinoCorpus.
As I was working on finding a solution for #32, I found that //node[@rel='su']/@root/string()
crashes Dact.
The error, when run in the BracketedWindow: An attempt was made to perform an axis step when the Context Item was not a node [err:XPTY0020], :1:56 After that, when switching to the dependency tree and file list, an exception is thrown somewhere in a QtConcurrent thread. Apparently an exception is thrown by AlpinoCorpus that is not caught by catching alpinocorpus::Error.
Opening a corpus with Dact works ok, but opening another (or the same corpus) causes a segfault in QListMode::data() when redrawing the QListWidget for files. Since the Qt GUI library is not thread-safe, I think this is caused by calling addFiles() in readAndShowFiles(). If I move this call to a method that is called in the UI thread (e.g. corpusRead()), Dact does not crash.
When I switch between the three tabs, switching between Tree and Sentences is always quick, and switchting to Statistics is also quite speedy. But when I switch from Statistics to one of the other two, it takes one or two seconds.
I don't yet know what causes it, and whether it only happens on my setup or on other configurations as well.
cf e.g. /net/aistaff/vannoord/z/Alpino/Treebank/Machine/NLWIKI20110804/COMPACT/
I want to be able to run dact with 5 or 10 or 20 dact files.
if you are counting the attribute "word", there may be hits of your query for which that attribute is not defined. It would be very useful to know the number of such hits.
e.g. the query //node[@rel="hd"] will also contain multi-word-units, which will go unnoticed if you only count the word attribute.
Other options could be to count combinations of attributes, or the attributes of all of the descendents of a node !?!
Comparable to the bracketed sentence window, but only show matches. Maybe we should integrate this with the bracketed sentences window and #3.
Suggested by Gertjan van Noord.
Make it possible to copy multiple entries from the entry list.
It's impossible to get the size of a corpus subset when using DB XML's lazy query mechanism, since results are generated on-the-fly rather than as a complete set. I suggest replacing the progress bars with a simple counter window that counts up as results are retrieved.
(Using eager querying may take huge amounts of memory on the larger treebanks and beats the purpose of the database compression that I just got working.)
Cache previous queries.
From DactTreeView.cpp:
if (matrix().m11() > 1.0 || matrix().m12() > 1.0)
Note that m12 is the vertical shearing factor, don't we want to look at m22 instead (vertical scaling factor)?
The KeywordInContext and Visibility delegates both use the hits counter (the second column) of the QueryModel to guess their size for sizeHint. Only not all hits are drawn.
For example: //node[@rel="su"] also matches moved nodes, but these nodes don't have @root or @word so they are not drawn.
Possible solution: alter the query entered by the user and add 'and @root' as a second condition. This way only hits that will be drawn are counted. Downside of this is that @root is an assumption which does not need to hold if the user altered bracketed-sentence.xsl.
(edit: I don't think this will work. e.g. the NP node will be su, but it is a nonterminal. The terminals inside it won't match the query, but do need to be displayed.)
Dact on Linux gives the following message when opening a directory-based corpus:
QFSFileEngine::open: No file name specified
It would be nice if matches are also highlighted in the sentence edit in the main window.
I can't build on Linux after Jelmer's merge of master into dbxml. A file DactMainWindow is reported missing. I can't push modifications into GitHub either, since I have to "fast-forward" to the broken build (which I don't want to do).
On Linux, I sometimes (not consistently) get the error:
QObject::setParent: Cannot set parent, new parent is in a different thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
QPixmap: It is not safe to use pixmaps outside the GUI thread
: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
on stderr, with the program hanging. I've no idea what's causing this, since it doesn't always happen. It might just be the X11 config on RUG LWP Kubuntu boxes.
Currently, entries can be copied by using Cmd/Control - c. Add an item to the 'Edit' menu as well. On Linux, make sure we copy to the X11 clipboard as well.
$SUBJECT doesn't work.
See title.
Qt uses int instead of size_t for container sizes, so once we hit the 2^31 (2.1bln) barrier for corpus sizes, things are going to break badly on Intel platforms.
Corpus exports cause segfaults on Linux in the UI thread. Probably caused by operations on d_exportProgressDialog from a non-UI thread.
Apparently it tries to run a query when you switch to the statistics- or graph-tab, even when there is no corpusreader. Note that switching to the bracketedwindow tab does not cause a crash.
0 nl.rug.Dact 0x0000000100019581 QueryModel::runQuery(QString const&) + 21
1 nl.rug.Dact 0x0000000100035813 StatisticsWindow::startQuery() + 267
2 nl.rug.Dact 0x000000010002217f MainWindow::tabChanged(int) + 345
http://www.inl.nl/en/corpora/lassy-corpus and http://www.inl.nl/en/corpora/lassy-large-corpus, listed at http://rug-compling.github.com/dact/, both end in a 404. Maybe http://www.let.rug.nl/vannoord/Lassy/ is a better link?
The zoom buttons are on the top left of the window by default, but the tree view is on the right. Being able to zoom with Ctrl+scrollwheel would be a great feature.
some queries take a long time and/or have few hits. It would be extremely useful to know the progress of the query. One
poor man's way would be, to use the list of all file-names, and to use the file-name of the last match to guess how far the
query is (this does not work if there are no hits at all, though)
Currently, it is not clear to the user whether the statistics windows counts the number of entries or the number of nodes for which the query matches.
When you start Dact clean without loading a corpus and enter a query in the BracketedWindow, Dact will crash on a segfault. I think it lies deep inside FilterModel which can't handle null pointers at this moment.
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000001bb928498
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Thread 0 Crashed: Dispatch queue: com.apple.main-thread
0 0x0000000100005f9c DactMainWindow::updateTreeNodeButtons() + 156
1 0x000000010003b97e DactMainWindow::qt_metacall(QMetaObject::Call, int, void**) + 1150
2 QtCore 0x0000000100de1c1b QMetaObject::activate(QObject*, QMetaObject const*, int, void**) + 603
3 QtGui 0x000000010073da88 QGraphicsScenePrivate::removeItemHelper(QGraphicsItem*) + 1128
4 QtGui 0x000000010070dd4e QGraphicsItem::~QGraphicsItem() + 286
5 0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
6 QtGui 0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
7 0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
8 QtGui 0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
9 0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
10 QtGui 0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
11 0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
12 QtGui 0x000000010070dd13 QGraphicsItem::~QGraphicsItem() + 227
13 0x00000001000341ad DactTreeNode::~DactTreeNode() + 333
14 QtGui 0x0000000100730ab5 QGraphicsScene::clear() + 85
15 QtGui 0x0000000100730c18 QGraphicsScene::~QGraphicsScene() + 72
16 0x000000010003d498 DactTreeScene::~DactTreeScene() + 56
17 0x0000000100009b78 DactMainWindow::~DactMainWindow() + 168
18 0x000000010003b0cc DactApplication::~DactApplication() + 44
19 0x000000010003a9a2 main + 978
20 0x0000000100003c88 start + 52
Just noticed this when I tried to load a renamed (therefore no longer existing) dbxml file from the commandline Dact would no longer startup and crash on a EXC_BAD_ACCESS singal.
to use macro's effectively, I'd much rather be able to edit them with Emacs. Also, there could then be different sets of macro's (in different files) depending on my application (or corpus). So macro's would not be part of the settings, but there would be a way to load a file which contains macro's (from command line as well as from graphical interface)
Add some sort of export functionality that can be used to create a subset of the loaded corpus, e.g. the results after a filter query.
The next/previous nodes buttons could be disabled if there is no highlighted node.
We're currently using .dbxml for Berkeley DB-based treebanks, but shouldn't we switch to (say) .alpino to prevent confusion, esp. if we're going to associate Dact with the extension?
For some reason d_ui->fileListWidget->selectionModel()->selectedRows().size();
in DactMainWindow::highlightChanged
is always zero.
It would be useful to have an inspector for tree nodes. When selecting a node from a tree, this inspector would simply show a table with attributes and values.
Feature request: improve the visibility of matches in the bracketed sentences overview.
Reported by Gosse Bouma.
I don't know who made the original Dact coffee cup icon, but it would be nice if we also had an icon for the .dact databases and maybe even for the .data.dz files*
It could be something generic like the coffee cup painted on the generic document icon, or something more creative like a pack of coffee with the coffee cup as a logo in it. Its purpose is to indicate that double-clicking the file will open it with Dact.
*.data.dz files are a bit problematic since .dz is the extension seen by most operating systems. Maybe it is too generic to be associated with Dact.
The sentence is now only partially shown, and there is no scrollbars. We need to find a more creative solution. One proposal is to draw the sentence in the treeview, under the relevant nodes.
From the tree-tab, I can save the results using cut-and-paste. Same for the statistics-tab. However, I cannot select
multiple lines for the Sentences-tab, so there currently is no way to save/export these results into some text file.
For each of the tree tabs, it might be useful to add an explicit "export" function of the selected lines?
The statistics widget now only counts cases where 1.) the query matches, and 2.) the selected attribute exists. Should we provide counts of cases where only (1) is true?
Consider performance and presentation before implementing.
When you enter a filter query, select a result (play with the tree and show the inspector etc) and then enter a new filter query which does not match your previously selected result, Dact crashes.
0 libSystem.B.dylib 0x00007fff885085d6 __kill + 10 1 libSystem.B.dylib 0x00007fff885a8cd6 abort + 83 2 QtCore 0x0000000100f69455 qt_message_output(QtMsgType, char const*) + 117 3 QtCore 0x0000000100f69637 qt_message(QtMsgType, char const*, __va_list_tag*) + 183 4 QtCore 0x0000000100f697fa qFatal(char const*, ...) + 170 5 0x0000000100017ac2 QList::at(int) const + 66 (qlist.h:439) 6 0x000000010002ec2b MainWindow::entrySelected(QItemSelection const&, QItemSelection const&) + 101 (MainWindow.cpp:314) 7 0x000000010002ed54 MainWindow::setHighlight(QString const&) + 128 (MainWindow.cpp:826) 8 0x000000010002efb5 MainWindow::filterChanged() + 395 (MainWindow.cpp:446) 9 0x00000001000551f5 MainWindow::qt_metacall(QMetaObject::Call, int, void**) + 843 ...
ASSERT failure in QList::at: "index out of range", file /Developer/SDKs/MacOSX10.6.sdk/Library/Frameworks/QtCore.framework/Headers/qlist.h, line 439 Abort trap
(version 1bb9e1b)
Quoting Jelmer: "It looks like my last commit, which adds the macros, somehow locks up the UI thread while the mapper is working.
Removing the insertRow() and updateResultsPercentages() calls from the DactQueryWindow::attributeFound() method removes the effect. Maybe we could insert results in batches so these calls are made less often."
When using the begin/end attributes, it is often necessary to explicitly convert these attributes to numbers. Document this in the manual. Inspiration is available from:
http://www.let.rug.nl/vannoord/Lassy/sa-man_lassy.pdf
http://www.let.rug.nl/Presentations/Gent11/Update/post.pdf
Feature request: besides a bracketed sentences, a keyword/match in context (KWIC) overview would be nice.
Suggestion courtesy of Gosse Bouma.
Add a preferences window that allows the user to change the application font and size.
Suggested by Gertjan van Noord.
Note: this is done, but does not work correctly on OS X.
Using QSortFilterProxyModel is not a real option since it's painfully slow. I think we need to implement sorting in QueryModel itself. It could be pretty performant if we just keep our internal results sorted. I do not yet know how to combine this with the dataChanged signal when we need to insert something at the top of the list.
Backtrace:
#1 0x00000001011598ca in alpinocorpus::DirectoryCorpusReader::DirIter::equals (this=0x11aa173e0, other=0x0) at /Users/daniel/git/alpinocorpus/src/DirectoryCorpusReader.cpp:62
62 DirIter const &that = dynamic_cast<DirIter const &>(*other);
I used master branches of both alpinocorpus and Dact.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.