Coder Social home page Coder Social logo

datalad / datalad-gooey Goto Github PK

View Code? Open in Web Editor NEW
4.0 6.0 6.0 2.09 MB

A graphical user interface for DataLad (datalad.org)

Home Page: https://docs.datalad.org/projects/gooey

License: Other

Makefile 0.12% Python 98.42% Batchfile 0.02% PowerShell 1.05% NSIS 0.38%
datalad gui rdm closember

datalad-gooey's Introduction

DataLad Gooey (pronounce "GUI")

All Contributors

Build status codecov.io crippled-filesystems docs Documentation Status GitHub release PyPI version fury.io

This package provides a graphical user interface (GUI) for DataLad. It is specifically aiming at making key data management tasks more accessible and more convenient, without requiring to become familiar with the command line.

This simplified interface to DataLad is built on a foundation that is capable of providing graphical user interfaces for any DataLad command, including those provided by extension packages. Moreover, extension packages can even provide their own GUI suites, by mixing and tuning a custom set of commands and parameters.

To try it out, install this package, and run datalad gooey.

Acknowledgements

DataLad development is supported by a US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411).

This DataLad extension was developed with additional funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Michael Hanke
Michael Hanke

πŸ’» πŸ€” πŸ“† πŸ§‘β€πŸ«
Yaroslav Halchenko
Yaroslav Halchenko

πŸ’»
Chris Markiewicz
Chris Markiewicz

πŸ’»
Adina Wagner
Adina Wagner

πŸ’»
John T. Wodder II
John T. Wodder II

πŸ’»
Benjamin Poldrack
Benjamin Poldrack

πŸ’»
Stephan Heunis
Stephan Heunis

πŸ’»
MichaΕ‚ Szczepanik
MichaΕ‚ Szczepanik

πŸ’»
Alex Waite
Alex Waite

πŸ““ πŸ€”
Leonardo Muller-Rodriguez
Leonardo Muller-Rodriguez

πŸ““ πŸ’»
Laura Waite
Laura Waite

πŸ’»
Christian MΓΆnch
Christian MΓΆnch

πŸ’»

This project follows the all-contributors specification. Contributions of any kind welcome!

datalad-gooey's People

Contributors

adswa avatar allcontributors[bot] avatar bpoldrack avatar christian-monch avatar effigies avatar jsheunis avatar jwodder avatar loj avatar manukapp avatar mih avatar mslw avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

datalad-gooey's Issues

Let `cmd_exec` group results before submitting a signal

The number of signals emitted seems to be a major source of slowdown. Grouping results into a single signal may help with this.

However, it may also be that the amount of data passing through the signals is the bottleneck. yet to be investigated. The tree execution via cmd_exec that is in main now provides a suitable test bed for this.

Ability to kill

Relates to #31 probably since would need to kill first those commands. ATM seems can't kill the gooey process as well because some underlying datalad call is still holding it or smth like that

$> datalad gooey --path ~/datalad
qt.pysideplugin: Environment variable PYSIDE_DESIGNER_PLUGINS is not set, bailing out.
qt.pysideplugin: No instance of QPyDesignerCustomWidgetCollection was found.
Qt WebEngine seems to be initialized from a plugin. Please set Qt::AA_ShareOpenGLContexts using QCoreApplication::setAttribute and QSGRendererInterface::OpenGLRhi using QQuickWindow::setGraphicsApi before constructing QGuiApplication.
QUEUEDIR /home/yoh/datalad
ANNOTATE! 1
EXECINTHREAD status {'dataset': '/home/yoh/datalad', 'path': PosixPath('/home/yoh/datalad')}
qt.pysideplugin: No instance of QPyDesignerCustomWidgetCollection was found.
^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/app.py", line 95, in <lambda>
    lambda i, cmd, args, ce: self.get_widget('statusbar').showMessage(
KeyboardInterrupt
^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
^C^C^C^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
^C^C^C^C^CTraceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-gooey/datalad_gooey/fsbrowser.py", line 79, in _directory_annotation
    def _directory_annotation(self):
KeyboardInterrupt
...

DOC: create basic documentation

Once we're ready for wider consumption, the docs will need to be created, linked to readthedocs.

Possibly useful sections:

  • Installation
  • Overview (what is and why have a datalad GUI, disclaimers)
  • Getting started (how to install and use the app)
  • Design (mainly for developers, so that they understand hoe to contribute)

Initial user feedback

I am creating this issue to list all the points of feedback that I thought of when testing the GUI locally (MacOS).

Disclaimers: Some of my expectations might not be in line with the design decisions made/planned for the GUI. Some issues might already be known. Some might not be issues at all.

From this list, we can create new and separate issues per topic if so desired. I'm dumping it all in a list for now just to make it easy for myself.

  1. It would be cool to have an immediate visual representation (with icons) of what is a dataset and what is not. Distinctions could extend to content in git vs annex, etc. I am aware of #2 and #5.
  2. I would expect the content in the Actions/Properties/History tab to be responsive to user selections in the file browser tab. A simple current example is that the content of the last executed action in the top right tab will remain visible (in a greyed-out state) even when selecting a different dataset in the file browser or when selecting a different toolbar option and then selecting "Action" again.
  3. I'm assuming the available dataset actions are derived from what is available in the environment from which the GUI is launched, because I can e.g. see actions from datalad extensions. This is great. (more of a comment than an issue πŸ˜„ )
  4. I tried running some commands (meta_extract, search, create), but none seem to execute, and no output is seen in the Log tab. Perhaps this should work yet. In either case, my expectation would be for the full command line output to show in the Log/Console tab, including results records/summaries as well as exceptions/warnings, and for the file browser to update in case changes were made there (e.g. a newly created dataset)
  5. Properties of a dataset also seem like something that users would expect as a menu item in the right-click dropdown.
  6. I think we need to find an intuitive and consistent way to deal with how users find and select information/options. The right click as well as tab selection options could be duplicating functionality in some ways, and this could confuse users.

To be continued...

Make and keep `datalad tree` fast

The snappiness of filesystem tree navigation in the UI is directly connected to the performance of the tree command. This makes issues like datalad/datalad#6940 critical.

It would make sense to include a benchmark of some kind in here.

FS model broken, may segfault Python

Creating and removing file nodes via extermal manipulation of the file system can trigger it.

It seems I can reliably trigger it with:

  1. browse to a dataset with a file (it seems to be irrelevant what state that file is in)
  2. left-click on the file (select it)
  3. remove the file outside the datalad-gooey
  4. with no action in datalad-gooey, recreate a file with the same name

-> segfault

When the removal comes after the status report for reannotation of the changed directory, traces of them coming segfault due data corruption of the tree model are already visible in the terminal after (3).

When the FS removal happens before the status report comes in, the app immediately segfaults:

Traceback (most recent call last):
  File "/home/mih/hacking/datalad/gooey/datalad_gooey/fsview_model.py", line 348, in parent
    pnode = self._tree[child_node.path.parent]
AttributeError: 'cell' object has no attribute 'path'

SVG icons

QSvgRenderer can render directly onto QImage instances. Which can also save the image, useful for caching, in case the SVG rendering is slow.

Decide how to handle symklinks

Just they be followed, or just represented as (possibly dead) symlinks.

Following them in the filesystem browser could lead to confusing situations, where some content looks as if it would be included in a dataset, although only the symlink is, and is pointing outside that particular dataset.

Follwoign symlinks also requires deduplication of tree model nodes, because symlink could point anywhere inside a directory/dataset hierarchy, such that the child nodes underneath a symlink have at least one alternative address on the filesystem.

Decorate treeview items

i.e. add icons (see #2)

It should communicate what is a

  • dataset
  • directory (already done by the expansion tick)
  • symlink (valid vs broken)

everything else would be a file.

It would be meaningful to decorate files in git vs annex, but this would be expensive to do correctly across platforms (not every symlink to a file is an annexed file, not every non-symlinked file is not in an annex).

Implement tree sorting

We need sorting by

  • name
  • time
  • type

A natural UI would be to have columns in the tree view showing each of the three properties, and have the column headers be clickable for sorting using that property. The latter is already enabled in the UI, but not connected to any actual sorting.

This is somewhat in conflict with #5, because a dedicated column for type would make some decorations redundant.

Sorting by time raises the question what kind of time to consider. Modification time makes sense. But should it be the filesystem reported one (fast), or the timestamp of the last commit in a dataset (slower, and calls for corner cases to be handled ("what if the dataset is not clean?")).

Clear command tab

When executing a command, the command specification sticks "forever". While not instantly clearing that tab as soon as the command finished is fine (so one can see what was specified when assessing its outcome), at least when subsequently selecting something else in the tree view the previous command specification should disappear, I think.

`DataladQtWorkerBridge`

Analog to DataladQtUIBridge we need something to wrap the execution of a command (optionally or always) in a worker thread. It needs to represent command execution in a way that Qt can understand:

  • emit a signal when a process starts (with info on that process) -- could be used to update a status widget
  • emit a signal when a process ends (with info on the status/exit code) -- could be used to connect callbacks that perform an action conditional on the process outcome

This need not be a runner wrapper, but should be able to take any callable (incl. something like GitRepo.call_git().

We likely need dedicated support for generators, and maybe even more dedicated support for datalad result records (for datalad command execution). For generators, we should have an additional result_recieved signal, to be emitted whenever the generator yielded an item.

Design question dialog

It needs to represent

  • text
  • title
  • possibly choices
  • with an optional default
  • the possibility for hidden text entry
  • and the ability to gather an item twice and compare for equality

in order to be able to handle ui.question() calls. But see datalad/datalad#6991 for possible developments.

`datalad status` call is not restricted to directory

With reference to this code:

# trigger datalad-status execution
# giving the target directory as a `path` argument should
# avoid undesired recursion into subDIRECTORIES
self._app.execute_dataladcmd.emit(
    'status', dict(dataset=dsroot, path=d))

This call does in fact recurse into a subdirectories. We could do the status call per child (that is not a directory) in the expanded directory. E.g.:

# trigger datalad-status execution
# giving the target directory as a `path` argument should
# avoid undesired recursion into subDIRECTORIES
for dc in d.iterdir():
    if not dc.is_dir():
        self._app.execute_dataladcmd.emit(
            'status', dict(dataset=dsroot, path=dc, annex='basic'))

I've tested this locally and it works, but the annotations are then done one by one, which makes for a weird user experience. Also, with this approach we don't annotate any immediate subdirectories/subdatasets.

Any thoughts?

Come up with a facility to indicate "busy"

Qt has many ways (status bar, cursor shape/type, ...), but we need to pick one, and wire things appropriately to have busy go on and off as needed.

Example use cases:

  • FS browser items are still being annotated
  • A command is running

quit from drop-down menu

Currently I see no equivalent of File->Quit.

I assume that most people quit by hitting the "X" for the window (especially as keyboard interrupt doesn't yet work #38), I don't have such chrome when using my WM (sway), so I use keyboard shortcuts to close the application.

The number of people who will lack window chrome is vanishingly small. But, the File->Quit idiom is a common one. Given the target audience of the gooey-GUI, I think such a dialog would increase familiarity for a certain set of users.

GUI can freeze when populating large directories in the tree view

This is all behind FSBrowserItem.from_path.

Idea: Underneath this is running datalad tree. So instead of running it synchronously, it could run in a thread, with a call-back signal that the respective parent item can receive to accept the result, and turn it into an item.

Have display name conventions

Right now, command names and command parameter names are taken literally from the Python API (all lower case, underscores, etc). Decide how the should be displayed (e.g. auto-caps, or whatever).

TESTS REQUIRED!!

The code has reached a complexity where changes can easily cause breakage that is no noticable, unless ALL functionality is expensively exercised in the GUI, manually.

This is not a workable setup.

datalad_gooey/tests/test_param_widget.py has a sketch of how some GUI functionality can be tested in a headless setup (even without xfb).

Qt provides means to test GUI components (e.g. https://doc.qt.io/qt-6/qttestlib-tutorial3-example.html). These need to be explored.

Who's working on what?

Auto-generate parameter entry widgets for any datalad command

Starting from any API component, we can get the underlying class and its parameters

>>> import datalad.api as dl
>>> dl.wtf
<function datalad.local.wtf.WTF.__call__(*, dataset=None, sensitive=None, sections=None, flavor='full', decor=None, clipboard=None)>

>>> from datalad.utils import get_wrapped_class
>>> cls=get_wrapped_class(dl.wtf)
>>> cls._params_
{'dataset': <datalad.support.param.Parameter at 0x7fdff72162f0>,
 'sensitive': <datalad.support.param.Parameter at 0x7fdff72163e0>,
 'sections': <datalad.support.param.Parameter at 0x7fdff72164a0>,
 'flavor': <datalad.support.param.Parameter at 0x7fdff7216560>,
 'decor': <datalad.support.param.Parameter at 0x7fdff7216620>,
 'clipboard': <datalad.support.param.Parameter at 0x7fdff7216650>,
 'return_type': <datalad.support.param.Parameter at 0x7fdff7fa3880>,
 'result_filter': <datalad.support.param.Parameter at 0x7fdff7fa39a0>,
 'result_xfm': <datalad.support.param.Parameter at 0x7fdff7fa3b20>,
 'result_renderer': <datalad.support.param.Parameter at 0x7fdff7fa3b80>,
 'on_failure': <datalad.support.param.Parameter at 0x7fdff7fa3c40>}

as seen above, this include all common command parameters too, so no special handling is needed for them.

ATM we cannot tell which parameters are deprecated and could be ignored datalad/datalad#6998

Each Parameter instance can provide additional info. Parameter.constraints() could be called to validate parameter values prior execution. type(Parameter.contraints) could be used to select specialized input widgets (in some cases). Parameter .get_autodoc() is not very useful, because it spells our constraints, and a GUI would reflect them in the interface directly.

>>> p = cls._params_['sections']
>>> p.constraints('some')
ValueError: value is not one of (None, 'configuration', ...
>>> type(p.constraints)
datalad.support.constraints.EnsureChoice
>>> p.get_autodoc('some')
"some : list of {None, 'configuration', 'credentials', 'datalad', 'dataset', 'dependencies', 'environment', 'extensions', 'git-annex', 'location', 'metadata_extractors', 'metadata_indexers', 'python', 'system', '*'}\n  section to include.  If not set - depends on flavor. '*' could be\n  used to force all sections. [CMD: This option can be given multiple\n  times. CMD]."

It is not possible to extract from a constraint whether one or more values can be passed to a parameter. For that, it seems the argparse "action" configuration needs to be inspected and acted upon:

>>> p.cmd_kwargs
{'action': 'append', 'dest': 'sections', 'metavar': 'SECTION'}

likely other settings need to be taken into account too (nargs, const, choices, required). This is a complex problem. It seems to be best solved by a specialized solution like https://github.com/chriskiehl/Gooey but we are only using a subset of the capabilities of argparse, so we might get away with something cheaper.

Parameter exposes all argument diversity supported by argparse though...

The parameter entry widget should likely maintain the signature order:

>>> from datalad.utils import get_sig_param_names
>>> get_sig_param_names(wtf, ('any',))
(['dataset', 'sensitive', 'sections', 'flavor', 'decor', 'clipboard'],)

Any common parameters (result rendering, etc) could be added via a standard widget set (unconditionally).

Generate parameter widget from alternative contraints

Currently, _get_parameter_widget_factory only accounts for single constraints like a plain EnsureChoice. However, in reality we have several EnsureSome() | EnsureOther() | EnsureNone() resulting in a AltConstraints instance.

This could be addressed by a to-be-implemented multi-widget. The idea would be for this to (potentially) have a checkbox to enable/disable the entire thing if EnsureNone() is part of it, and within a radio button selection of multiple input widgets - one per each constraint except the EnsureNone().

This is not accounting for Constraints (AND'ed constraints). However, not clear ATM whether this would be the place to fully account for it, plus: I don't think it's actually used anywhere in datalad. It's pretty much always OR.

Support glob expressions in `MultiValueInput(PathParamWidget)`

It could be implemented in PathParamWidget with a dedicated flag turning it on (to make sure that it can still be used to retrieve just a single parameter value. And then it can be enabled, when used with MultiValueInputWidget, e.g., for path arguments that can take any number of values.

Consider `QTreeWidget` instead of `QTreeView + DataladTreeModel`

With #47 in mind, we know that implementing a working model is not simple. This raises the question what would be lost when switching to use
QTreeWidget with its ready-to-use internal model.

The experience with the internal model use of QListWidget for MultiValueInputWidget was pretty straightforward. I did not have the impression that it would do something more expensive that what would have been possible with a custom (possibly flawed) implementation.

It needs a closer look at the current access pattern to DataladTreeModel that are not pure UI, and an assessment whether they could be implemented using QTreeWidget directly and cheap-enough.

The one essential information of a tree view item could be included via a custom item data role

https://doc.qt.io/qtforpython/PySide6/QtCore/Qt.html?highlight=qt%20itemdatarole#PySide6.QtCore.PySide6.QtCore.Qt.ItemDataRole

Alternatively, the path could also be stored in an item as Qt.EditRole, while path.name is stored as (or returned as) Qt.DisplayRole.

FSBrowser update based on FS changes can come to quickly

Command execution is happening in a thread, hence commands run while the FS is being modified. Running a create can trigger the FS watcher update report, before a full dataset is built. Hence the too early tree command running (as triggered by FS watcher) will see a directory node, and not a dataset.

Reproduce by right-clicking on a dataset item -> create -> create a subdataset. The parent dataset item is expanded (hence watched), and a child item is added, which is not labeled as a dataset for this reason.

Run datalad commands in a thread (but provide UI access)

For many (all?) datalad command invocations, we likely want to execute in a thread (if not even a subprocess) in order to not let the GUI freeze up.

A somewhat critical requirement for that would be to have the datalad UI be able to communicate with the GUI. For example, a ui.question() triggered inside a thread, needs to result in a GUI response (e.g. a dialog pop up), and deliver the outcome back into the thread. It is OK for the thread to be blocking until the question is answered, but not OK for the GUI to freeze until the thread has finished.

For the main results (result records) produced by any command, we can have a wrapper that emits them as Qt signals. Any connected slot would be able to receive them in a thread-safe manner automatically.

The same would be possible for ui.message() being called inside a thread. It could mere emit them as a ui_message signal, and the GUI needs to connect a receiver.

I presently have no solid idea how approach ui.message() in a thread-safe manner.

The best I can come up with right now is some kind of two-way queue. A ui.question() called from within a thread:

  1. obtains a lock answer_question (to make sure only one question is processed at a time, and the thread blocks until this is possible)
  2. it emits a special question signal that the GUI connected to a dedicated handler -- which can fire up a dialog or whatever is appropriate to collect an answer IN THE GUI THREAD
  3. once the question signal was emitted from the worker thread, it runs queue.get(block=True) to sleep until the GUI thread has deposited the answer on the queue for retrieval
  4. the answer_question lock is released

`path` argument interpretation with CLI assumptions -- suboptimal

image

A command configured as a "datasetmethod" (right-click, fixed dataset path argument) fails when a path argument is entered as a relative path.

This is because the dataset argument is supplied as a path string, and the underlying command treats it with the assumption of coming for the CLI, hence interprets the relative path as relative to CWD -- which makes little sense for a GUI

Command arguments to create as gathered from the GUI and passed on to the command:

EXECINTHREAD create {'path': 'subds1', 'dataset': '/tmp/ebrains/dummy7', 'annex': False}

Approach to unwatch on tree item collapse is no optimal

If a subtree of items exists, but potential modifications are not watched, they will go out of sync.

I believe it would be best to keep watching them, but make the actual update conditional on the item being expanded. If it is not expanded, the modification is recorded (set a flag), and on the next expansion, the modifications are inspected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.