datalad / datalad-gooey Goto Github PK

View Code? Open in Web Editor NEW

4.0 6.0 6.0 2.09 MB

A graphical user interface for DataLad (datalad.org)

Home Page: https://docs.datalad.org/projects/gooey

License: Other

Makefile 0.12% Python 98.42% Batchfile 0.02% PowerShell 1.05% NSIS 0.38%

datalad gui rdm closember

datalad-gooey's Introduction

DataLad Gooey (pronounce "GUI")

This package provides a graphical user interface (GUI) for DataLad. It is specifically aiming at making key data management tasks more accessible and more convenient, without requiring to become familiar with the command line.

This simplified interface to DataLad is built on a foundation that is capable of providing graphical user interfaces for any DataLad command, including those provided by extension packages. Moreover, extension packages can even provide their own GUI suites, by mixing and tuning a custom set of commands and parameters.

To try it out, install this package, and run datalad gooey.

Acknowledgements

DataLad development is supported by a US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).

This DataLad extension was developed with additional funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Michael Hanke} 💻 🤔 📆 🧑‍🏫	_{Yaroslav Halchenko} 💻	_{Chris Markiewicz} 💻	_{Adina Wagner} 💻	_{John T. Wodder II} 💻	_{Benjamin Poldrack} 💻	_{Stephan Heunis} 💻
_{Michał Szczepanik} 💻	_{Alex Waite} 📓 🤔	_{Leonardo Muller-Rodriguez} 📓 💻	_{Laura Waite} 💻	_{Christian Mönch} 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

datalad-gooey's People

Contributors

Stargazers

Watchers

Forkers

christian-monch jsheunis adswa manukapp bpoldrack loj

datalad-gooey's Issues

Connect `ui.message()` to console browser

This would make any result rendering accessible (of sorts), even when no dedicated widget is available yet

Tree nodes do not inherit their sorting properties to newly discovered directory-type children

Let `cmd_exec` group results before submitting a signal

The number of signals emitted seems to be a major source of slowdown. Grouping results into a single signal may help with this.

However, it may also be that the amount of data passing through the signals is the bottleneck. yet to be investigated. The tree execution via cmd_exec that is in main now provides a suitable test bed for this.

Ability to kill

Relates to #31 probably since would need to kill first those commands. ATM seems can't kill the gooey process as well because some underlying datalad call is still holding it or smth like that

$> datalad gooey --path ~/datalad
qt.pysideplugin: Environment variable PYSIDE_DESIGNER_PLUGINS is not set, bailing out.
qt.pysideplugin: No instance of QPyDesignerCustomWidgetCollection was found.
Qt WebEngine seems to be initialized from a plugin. Please set Qt::AA_ShareOpenGLContexts using QCoreApplication::setAttribute and QSGRendererInterface::OpenGLRhi using QQuickWindow::setGraphicsApi before constructing QGuiApplication.
QUEUEDIR /home/yoh/datalad
ANNOTATE! 1
EXECINTHREAD status {'dataset': '/home/yoh/datalad', 'path': PosixPath('/home/yoh/datalad')}
qt.pysideplugin: No instance of QPyDesignerCustomWidgetCollection was found.
^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/app.py", line 95, in <lambda>
    lambda i, cmd, args, ce: self.get_widget('statusbar').showMessage(
KeyboardInterrupt
^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
^C^C^C^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
^C^C^C^C^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
...

DOC: create basic documentation

Once we're ready for wider consumption, the docs will need to be created, linked to readthedocs.

Possibly useful sections:

Installation
Overview (what is and why have a datalad GUI, disclaimers)
Getting started (how to install and use the app)
Design (mainly for developers, so that they understand hoe to contribute)

Initial user feedback

I am creating this issue to list all the points of feedback that I thought of when testing the GUI locally (MacOS).

Disclaimers: Some of my expectations might not be in line with the design decisions made/planned for the GUI. Some issues might already be known. Some might not be issues at all.

From this list, we can create new and separate issues per topic if so desired. I'm dumping it all in a list for now just to make it easy for myself.

It would be cool to have an immediate visual representation (with icons) of what is a dataset and what is not. Distinctions could extend to content in git vs annex, etc. I am aware of #2 and #5.
I would expect the content in the Actions/Properties/History tab to be responsive to user selections in the file browser tab. A simple current example is that the content of the last executed action in the top right tab will remain visible (in a greyed-out state) even when selecting a different dataset in the file browser or when selecting a different toolbar option and then selecting "Action" again.
I'm assuming the available dataset actions are derived from what is available in the environment from which the GUI is launched, because I can e.g. see actions from datalad extensions. This is great. (more of a comment than an issue 😄 )
I tried running some commands (meta_extract, search, create), but none seem to execute, and no output is seen in the Log tab. Perhaps this should work yet. In either case, my expectation would be for the full command line output to show in the Log/Console tab, including results records/summaries as well as exceptions/warnings, and for the file browser to update in case changes were made there (e.g. a newly created dataset)
Properties of a dataset also seem like something that users would expect as a menu item in the right-click dropdown.
I think we need to find an intuitive and consistent way to deal with how users find and select information/options. The right click as well as tab selection options could be duplicating functionality in some ways, and this could confuse users.

To be continued...

Make and keep `datalad tree` fast

The snappiness of filesystem tree navigation in the UI is directly connected to the performance of the tree command. This makes issues like datalad/datalad#6940 critical.

It would make sense to include a benchmark of some kind in here.

FS model broken, may segfault Python

Creating and removing file nodes via extermal manipulation of the file system can trigger it.

It seems I can reliably trigger it with:

browse to a dataset with a file (it seems to be irrelevant what state that file is in)
left-click on the file (select it)
remove the file outside the datalad-gooey
with no action in datalad-gooey, recreate a file with the same name

-> segfault

When the removal comes after the status report for reannotation of the changed directory, traces of them coming segfault due data corruption of the tree model are already visible in the terminal after (3).

When the FS removal happens before the status report comes in, the app immediately segfaults:

Traceback (most recent call last):
  File "/home/mih/hacking/datalad/gooey/datalad_gooey/fsview_model.py", line 348, in parent
    pnode = self._tree[child_node.path.parent]
AttributeError: 'cell' object has no attribute 'path'

Consider switching the internal pointers in the FSModel to DataLadTreeNode objects

Right now it points to their .path properties. This makes little sense from my current POV. Maybe sorting has made me make this choice, but I somehow doubt it.

It feels like using node objects would also remove the need for node lookup by path via DataladTree.__getitem__()

Connect TreeView item expansion to FSwatcher

And unwatch when collapsed. Also consider dropping tree nodes when collapsing to reduce maintenance demands.

SVG icons

QSvgRenderer can render directly onto QImage instances. Which can also save the image, useful for caching, in case the SVG rendering is slow.

Decide how to handle symklinks

Just they be followed, or just represented as (possibly dead) symlinks.

Following them in the filesystem browser could lead to confusing situations, where some content looks as if it would be included in a dataset, although only the symlink is, and is pointing outside that particular dataset.

Follwoign symlinks also requires deduplication of tree model nodes, because symlink could point anywhere inside a directory/dataset hierarchy, such that the child nodes underneath a symlink have at least one alternative address on the filesystem.

Decorate treeview items

i.e. add icons (see #2)

It should communicate what is a

dataset
directory (already done by the expansion tick)
symlink (valid vs broken)

everything else would be a file.

It would be meaningful to decorate files in git vs annex, but this would be expensive to do correctly across platforms (not every symlink to a file is an annexed file, not every non-symlinked file is not in an annex).

Disable launching more gooey instances

Or at least launch them as independent subprocesses

Implement tree sorting

We need sorting by

name
time
type

A natural UI would be to have columns in the tree view showing each of the three properties, and have the column headers be clickable for sorting using that property. The latter is already enabled in the UI, but not connected to any actual sorting.

This is somewhat in conflict with #5, because a dedicated column for type would make some decorations redundant.

Sorting by time raises the question what kind of time to consider. Modification time makes sense. But should it be the filesystem reported one (fast), or the timestamp of the last commit in a dataset (slower, and calls for corner cases to be handled ("what if the dataset is not clean?")).

Clear command tab

When executing a command, the command specification sticks "forever". While not instantly clearing that tab as soon as the command finished is fine (so one can see what was specified when assessing its outcome), at least when subsequently selecting something else in the tree view the previous command specification should disappear, I think.

`DataladQtWorkerBridge`

Analog to DataladQtUIBridge we need something to wrap the execution of a command (optionally or always) in a worker thread. It needs to represent command execution in a way that Qt can understand:

emit a signal when a process starts (with info on that process) -- could be used to update a status widget
emit a signal when a process ends (with info on the status/exit code) -- could be used to connect callbacks that perform an action conditional on the process outcome

This need not be a runner wrapper, but should be able to take any callable (incl. something like GitRepo.call_git().

We likely need dedicated support for generators, and maybe even more dedicated support for datalad result records (for datalad command execution). For generators, we should have an additional result_recieved signal, to be emitted whenever the generator yielded an item.

Design question dialog

It needs to represent

text
title
possibly choices
with an optional default
the possibility for hidden text entry
and the ability to gather an item twice and compare for equality

in order to be able to handle ui.question() calls. But see datalad/datalad#6991 for possible developments.

`datalad status` call is not restricted to directory

With reference to this code:

# trigger datalad-status execution
# giving the target directory as a `path` argument should
# avoid undesired recursion into subDIRECTORIES
self._app.execute_dataladcmd.emit(
    'status', dict(dataset=dsroot, path=d))

This call does in fact recurse into a subdirectories. We could do the status call per child (that is not a directory) in the expanded directory. E.g.:

# trigger datalad-status execution
# giving the target directory as a `path` argument should
# avoid undesired recursion into subDIRECTORIES
for dc in d.iterdir():
    if not dc.is_dir():
        self._app.execute_dataladcmd.emit(
            'status', dict(dataset=dsroot, path=dc, annex='basic'))

I've tested this locally and it works, but the annotations are then done one by one, which makes for a weird user experience. Also, with this approach we don't annotate any immediate subdirectories/subdatasets.

Any thoughts?

Implement `GooeyUI.question(repeat=True)`

Right now, this parameter is ignored.

Make `datalad gooey` command suite entrypoint launch the app

Right now, this is still the helloworld code.

Come up with a facility to indicate "busy"

Qt has many ways (status bar, cursor shape/type, ...), but we need to pick one, and wire things appropriately to have busy go on and off as needed.

Example use cases:

FS browser items are still being annotated
A command is running

Test failures due to no graphical interface in CI setup

Looking at the test failure: We are now trying to start the GUI, but no CI setup has a graphical interface, hence all fail, including the doc-builds. I think there is little point in doing this startup test unconditionally.

Originally posted by @mih in #12 (comment)

Report on crashes in the console log

#54 makes most details inaccessible to GUI users.

Group dataset action menu items

Now it is a big monster list. They could be group into source extensions or by other means to make them better navigable

Each time the dataset menu is opened, more items are added to it

Implement fast but definitive DataLadTreeModel.hasChildren()

ATM it makes a guess based on the node type. But not all directories have something in them

Needs more surgical treeview update

See #22 (comment) for evidence that going into the tree model data and replacing the node object is not sufficient. It likely needs at least a call to removeRow().

quit from drop-down menu

Currently I see no equivalent of File->Quit.

I assume that most people quit by hitting the "X" for the window (especially as keyboard interrupt doesn't yet work #38), I don't have such chrome when using my WM (sway), so I use keyboard shortcuts to close the application.

The number of people who will lack window chrome is vanishingly small. But, the File->Quit idiom is a common one. Given the target audience of the gooey-GUI, I think such a dialog would increase familiarity for a certain set of users.

Channel `ui.question()` to the UI

`functools.cached_property` not available in PY3.7

Need to reimplement use in datalad_gooey/app.py

GUI can freeze when populating large directories in the tree view

This is all behind FSBrowserItem.from_path.

Idea: Underneath this is running datalad tree. So instead of running it synchronously, it could run in a thread, with a call-back signal that the respective parent item can receive to accept the result, and turn it into an item.

File dialog only allows for the selection of file paths that are present

For example, when using the get command to select a specific file path, files which are not present locally are not visible.

Have display name conventions

Right now, command names and command parameter names are taken literally from the Python API (all lower case, underscores, etc). Decide how the should be displayed (e.g. auto-caps, or whatever).

TESTS REQUIRED!!

The code has reached a complexity where changes can easily cause breakage that is no noticable, unless ALL functionality is expensively exercised in the GUI, manually.

This is not a workable setup.

datalad_gooey/tests/test_param_widget.py has a sketch of how some GUI functionality can be tested in a headless setup (even without xfb).

Qt provides means to test GUI components (e.g. https://doc.qt.io/qt-6/qttestlib-tutorial3-example.html). These need to be explored.

Who's working on what?

test_dataladcmd_exec.py - @bpoldrack
test_dataladcmd_ui.py - @bpoldrack
test_fsbrowser_item.py - @jsheunis
test_fsbrowser.py
test_lsdir.py - @jsheunis
test_param_widget.py - @bpoldrack
test_register.py - @bpoldrack thinks there's nothing to do here
test_resource_provider.py - @jsheunis
test_status_light.py - @jsheunis

Command config UI should show name of to be configured command

Right now it is implicit, because one would just have clicked on a menu item with the name

Add app icon (tray / window / taskbar)

Add icons for the application to be recognisable in the system tray, taskbar, and desktop and wherever else seems useful.

Auto-generate parameter entry widgets for any datalad command

Starting from any API component, we can get the underlying class and its parameters

>>> import datalad.api as dl
>>> dl.wtf
<function datalad.local.wtf.WTF.__call__(*, dataset=None, sensitive=None, sections=None, flavor='full', decor=None, clipboard=None)>

>>> from datalad.utils import get_wrapped_class
>>> cls=get_wrapped_class(dl.wtf)
>>> cls._params_
{'dataset': <datalad.support.param.Parameter at 0x7fdff72162f0>,
 'sensitive': <datalad.support.param.Parameter at 0x7fdff72163e0>,
 'sections': <datalad.support.param.Parameter at 0x7fdff72164a0>,
 'flavor': <datalad.support.param.Parameter at 0x7fdff7216560>,
 'decor': <datalad.support.param.Parameter at 0x7fdff7216620>,
 'clipboard': <datalad.support.param.Parameter at 0x7fdff7216650>,
 'return_type': <datalad.support.param.Parameter at 0x7fdff7fa3880>,
 'result_filter': <datalad.support.param.Parameter at 0x7fdff7fa39a0>,
 'result_xfm': <datalad.support.param.Parameter at 0x7fdff7fa3b20>,
 'result_renderer': <datalad.support.param.Parameter at 0x7fdff7fa3b80>,
 'on_failure': <datalad.support.param.Parameter at 0x7fdff7fa3c40>}

as seen above, this include all common command parameters too, so no special handling is needed for them.

ATM we cannot tell which parameters are deprecated and could be ignored datalad/datalad#6998

Each Parameter instance can provide additional info. Parameter.constraints() could be called to validate parameter values prior execution. type(Parameter.contraints) could be used to select specialized input widgets (in some cases). Parameter .get_autodoc() is not very useful, because it spells our constraints, and a GUI would reflect them in the interface directly.

>>> p = cls._params_['sections']
>>> p.constraints('some')
ValueError: value is not one of (None, 'configuration', ...
>>> type(p.constraints)
datalad.support.constraints.EnsureChoice
>>> p.get_autodoc('some')
"some : list of {None, 'configuration', 'credentials', 'datalad', 'dataset', 'dependencies', 'environment', 'extensions', 'git-annex', 'location', 'metadata_extractors', 'metadata_indexers', 'python', 'system', '*'}\n  section to include.  If not set - depends on flavor. '*' could be\n  used to force all sections. [CMD: This option can be given multiple\n  times. CMD]."

It is not possible to extract from a constraint whether one or more values can be passed to a parameter. For that, it seems the argparse "action" configuration needs to be inspected and acted upon:

>>> p.cmd_kwargs
{'action': 'append', 'dest': 'sections', 'metavar': 'SECTION'}

likely other settings need to be taken into account too (nargs, const, choices, required). This is a complex problem. It seems to be best solved by a specialized solution like https://github.com/chriskiehl/Gooey but we are only using a subset of the capabilities of argparse, so we might get away with something cheaper.

Parameter exposes all argument diversity supported by argparse though...

The parameter entry widget should likely maintain the signature order:

>>> from datalad.utils import get_sig_param_names
>>> get_sig_param_names(wtf, ('any',))
(['dataset', 'sensitive', 'sections', 'flavor', 'decor', 'clipboard'],)

Any common parameters (result rendering, etc) could be added via a standard widget set (unconditionally).

Generate parameter widget from alternative contraints

Currently, _get_parameter_widget_factory only accounts for single constraints like a plain EnsureChoice. However, in reality we have several EnsureSome() | EnsureOther() | EnsureNone() resulting in a AltConstraints instance.

This could be addressed by a to-be-implemented multi-widget. The idea would be for this to (potentially) have a checkbox to enable/disable the entire thing if EnsureNone() is part of it, and within a radio button selection of multiple input widgets - one per each constraint except the EnsureNone().

This is not accounting for Constraints (AND'ed constraints). However, not clear ATM whether this would be the place to fully account for it, plus: I don't think it's actually used anywhere in datalad. It's pretty much always OR.

Support glob expressions in `MultiValueInput(PathParamWidget)`

It could be implemented in PathParamWidget with a dedicated flag turning it on (to make sure that it can still be used to retrieve just a single parameter value. And then it can be enabled, when used with MultiValueInputWidget, e.g., for path arguments that can take any number of values.

Consider `QTreeWidget` instead of `QTreeView + DataladTreeModel`

With #47 in mind, we know that implementing a working model is not simple. This raises the question what would be lost when switching to use
QTreeWidget with its ready-to-use internal model.

The experience with the internal model use of QListWidget for MultiValueInputWidget was pretty straightforward. I did not have the impression that it would do something more expensive that what would have been possible with a custom (possibly flawed) implementation.

It needs a closer look at the current access pattern to DataladTreeModel that are not pure UI, and an assessment whether they could be implemented using QTreeWidget directly and cheap-enough.

The one essential information of a tree view item could be included via a custom item data role

https://doc.qt.io/qtforpython/PySide6/QtCore/Qt.html?highlight=qt%20itemdatarole#PySide6.QtCore.PySide6.QtCore.Qt.ItemDataRole

Alternatively, the path could also be stored in an item as Qt.EditRole, while path.name is stored as (or returned as) Qt.DisplayRole.

FSBrowser update based on FS changes can come to quickly

Command execution is happening in a thread, hence commands run while the FS is being modified. Running a create can trigger the FS watcher update report, before a full dataset is built. Hence the too early tree command running (as triggered by FS watcher) will see a directory node, and not a dataset.

Reproduce by right-clicking on a dataset item -> create -> create a subdataset. The parent dataset item is expanded (hence watched), and a child item is added, which is not labeled as a dataset for this reason.

Run datalad commands in a thread (but provide UI access)

For many (all?) datalad command invocations, we likely want to execute in a thread (if not even a subprocess) in order to not let the GUI freeze up.

A somewhat critical requirement for that would be to have the datalad UI be able to communicate with the GUI. For example, a ui.question() triggered inside a thread, needs to result in a GUI response (e.g. a dialog pop up), and deliver the outcome back into the thread. It is OK for the thread to be blocking until the question is answered, but not OK for the GUI to freeze until the thread has finished.

For the main results (result records) produced by any command, we can have a wrapper that emits them as Qt signals. Any connected slot would be able to receive them in a thread-safe manner automatically.

The same would be possible for ui.message() being called inside a thread. It could mere emit them as a ui_message signal, and the GUI needs to connect a receiver.

I presently have no solid idea how approach ui.message() in a thread-safe manner.

The best I can come up with right now is some kind of two-way queue. A ui.question() called from within a thread:

obtains a lock answer_question (to make sure only one question is processed at a time, and the thread blocks until this is possible)
it emits a special question signal that the GUI connected to a dedicated handler -- which can fire up a dialog or whatever is appropriate to collect an answer IN THE GUI THREAD
once the question signal was emitted from the worker thread, it runs queue.get(block=True) to sleep until the GUI thread has deposited the answer on the queue for retrieval
the answer_question lock is released

Discontinue QModelIndex.internalPointer() access outside Model class

From docs:

It is not advisable to access this internal pointer outside of the model. Use the data() function instead.

For this to work, we likely need to store the full path in the tree items.

NF: add ability to clear the log view

Just a nice to have from a user's perspective. Clearing the log output and "starting anew" would be cool to do.

Use "file-open-dialog" for `path` command parameter configuration

Right now this is still done with a line edit that cannot handle more than one path.

NF: Ability to kill a command

I want to be able to "hit CTRL-C" if a command takes too long to run or for whatever other reason.

`path` argument interpretation with CLI assumptions -- suboptimal

A command configured as a "datasetmethod" (right-click, fixed dataset path argument) fails when a path argument is entered as a relative path.

This is because the dataset argument is supplied as a path string, and the underlying command treats it with the assumption of coming for the CLI, hence interprets the relative path as relative to CWD -- which makes little sense for a GUI

Command arguments to create as gathered from the GUI and passed on to the command:

EXECINTHREAD create {'path': 'subds1', 'dataset': '/tmp/ebrains/dummy7', 'annex': False}

Dataset menu entries not sorted in any intelligible fashion

Approach to unwatch on tree item collapse is no optimal

If a subtree of items exists, but potential modifications are not watched, they will go out of sync.

I believe it would be best to keep watching them, but make the actual update conditional on the item being expanded. If it is not expanded, the modification is recorded (set a flag), and on the next expansion, the modifications are inspected.