wettenhj / mytardis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mytardis/mytardis

0.0 0.0 0.0 33.93 MB

MyTardis - a data management system for private lab/facility data

Home Page: http://mytardis.github.com

License: GNU General Public License v3.0

Shell 0.28% Python 66.77% HTML 14.39% JavaScript 9.16% CSS 9.40% Gherkin 0.01%

mytardis's People

Contributors

Watchers

mytardis's Issues

Default experiments (to be used by MyData) - please discuss MyTardis model changes required

Background

The initially targeted users of MyData (https://github.com/wettenhj/mydata) have requested that their users shouldn't have to interact with MyData at all if they don't want to, i.e. MyData will be primarily used by facility managers for adding new instrument PCs to MyTardis and for diagnosing failed uploads. So general microscope users should be able to simply save a folder (e.g. "Dataset 1") in their user folder (e.g. "jsmith") and leave it up to MyData to put the dataset in a sensible default experiment in MyTardis (which the user can later modify if they wish). The proposed method for defining a default experiment is to group datasets by:
(i) instrument,
(ii) user who collected the data (the researcher)
(iii) the date on which the data was collected.

So for example, if MyData found a "Dataset 1" folder with a creation date of "2014-10-11" within a "jsmith" folder on instrument "Test Microscope 1", then it would query MyTardis to see if a default experiment record already exists which is suitable for this dataset, i.e. an experiment record tagged with "Test Micrsocope 1", "jsmith" and "2014-10-11". If it didn't already exist, MyData would create this default experiment record. It would initially create the record using a facility role account, e.g. MyTardis username="myfacility", and then user "jsmith" would be given full ownership access to the experiment record by creating an appropriate ObjectACL record.

The question is how to implement these experiment "tags" (instrument, data-collector and date-of-collection) nicely in MyTardis.

Option 1. (already implemented in MyData's current MyTardis test instance)

Create a schema and parameters (as shown here: https://github.com/wettenhj/mydata/raw/master/UserGuideImages/Experiment%20Schema%20and%20Parameter%20Names.PNG)
Add some functionality to the MyTardis API to allow easy filtering of experiments, based on the values of these parameters: https://github.com/wettenhj/mytardis/blob/mydata/tardis/tardis_portal/api.py#L693

Option 2.

Make use of MyTardis's new Instrument model (accessible as an optional field in the dataset model), but try to avoid introducing any new schemas, parameters or changes to the experiment model.
This doesn't look feasible, because for "default experiments", we really want the instrument to be a property of the experiment, not the dataset. And we still need to find a way to record the date of data collection (NOT the same as the date of creation of a database record), There is already functionality in MyTardis's ObjectACLs which could be used to tag an experiment with the researcher who collected the data, but it may not be easy to filter experiments in the TastyPie API using ObjectACLs when determining whether a default experiment already exists for a given instrument, data owner, and date of collection.

Option 3.

Add new fields to MyTardis's Experiment model to allow "default experiments" of this form to be defined and queried easily.
- Having an instrument field in both the Experiment and Dataset models might go against database normalization principles, but it could certainly be useful here, and there would be no problem with just setting it to NULL for Experiments containing Datasets from multiple instruments.
Adding a data-collection-date field to the Experiment model would be easy, but it would be good to bounce the idea of other MyTardis users and see if it would cause confusion with the creation date of the database record, and whether some users would argue that date of collection should go in the Dataset model instead of the Experiment model (which certainly wouldn't help with the objective here of defining "default experiments").
Adding a field to the Experiment model for the user who collected the data would be easy, but there could be confusion with the ObjectACL records which indicate who currently has access to the data. For now, I would prefer having a new field in the Experiment model for this (and documenting the new fields together as a way of grouping datasets collected by the same user on the same instrument on the same date). But we could use ObjectACLs if we can work out an appropriate to filter by ObjectACL when querying experiment records in the TastyPie API.

MyData uses a new Uploader model in MyTardis. Discuss pros/cons of including in core MyTardis

Use cases for the new Uploader model added to (this fork of) MyTardis are discussed below.

Fields of the Uploader model are highlighted in bold below, because there has been some discussion of whether we need so many fields.

When a MyTardis administrator receives a request for staging access (automatically generated by MyData), they can look up the uploader record associated with the request, and check the instrument record (which gives them the Facility record via a foreign key), the contact name and contact email for the instrument PC in the uploader record, so they know who to contact when the upload-to-staging access has been set up.
User-facing instrument PCs are difficult to identify uniquely in a reliable way (often it's easy for users to change IP addresses, hostnames etc.) The best we can do is use the MAC address of the network interface (e.g. Ethernet) as a unique identifier for the "uploader" record. It's important that we don't accidentally grant staging access to the wrong instrument PC (or other upload PC).
It is envisaged that Facility records will be created by the MyTardis administrator, and will not be modifiable by MyData, however Instrument records need to be modifiable / createable by MyData, because the purpose of MyData is to make it easy to add a new instrument PC to MyTardis. If a MyData user tries to assign an instrument name which is not specific enough and is already used elsewhere in their Facility, e.g. "Nikon Microscope", then MyData should be able to give the user some indication of which instrument PC has already used the duplicate instrument name. In this case, MyData could ask MyTardis to report the hostname (e.g. nikontraining.mmi.monash.edu.au), the OS name (e.g. "Windows"), and maybe the OS username, e.g. "nikontraining", which can help the facility manager using MyData to determine which instrument PC has the duplicate instrument name. Custom authorization is required in TastyPie, because generally a user should only be able to access their own Uploader record (whose MAC address matches theirs), but in this case, a MyData users (facility managers) need to be able to access a few fields (hostname, os_name, os_username) from another uploader record with the same instrument_name.
Uploader records contain various fields which can be used by MyTardis/MyData/Store.Star support staff to diagnose problems with MyData installations without having to visit the instrument PC. These fields include the User Agent Name being used to upload to MyTardis, e.g. "MyData", and the User Agent Version e.g. "0.0.3" (maybe add git commit hash), the User Agent Install Location, e.g. "C:\Program FIles (x86)\MyData", the architecture (os_platform) which MyData was built with (e.g. MyData.exe could be built with a 32-bit Python), the architecture of the instrument PC (machine), the memory capacity of the PC, the OS version, the number of CPUs and the disk usage and capacity. The value of these fields is known from previous experience with supporting other wxPython GUIs like MASSIVE Launcher/Strudel. Often users don't give clear answers to these questions, and it can be difficult to visit instrument PCs physically to diagnose problems when they are spread across multiple sites (Clayton, AMREP etc.).
I'm expecting fierce debate on this issue, but I believe that the TastyPie custom authorization for the Uploader model in this git branch does the right thing in allowing an anonymous user to create an Uploader record without authenticating to MyTardis first. Authentication problems with GUIs are extremely common - I.T. help desks are often asking users "Are you sure you typed your password correctly?" So if MyData waits until it has successfully authenticated to MyTardis before uploading diagnostic information for MyTardis administrators/support staff, then they could end up in a situation where the MyData users thinks that they have installed MyData and submitted a request for RSYNC upload access, but the MyTardis user might not see any request due to a bug in MyData or due to the user's inability to enter their password or API key. The MyData installation wizard should ask the installing user (facility manager) to agree to the terms and conditions of using MyData which are that once a valid MyTardis URL (which supports the new Uploader model) has been entered in MyData, diagnostic information about the PC will be sent to that MyTardis URL.
The created time and updated time fields of the uploader record show the date when MyData was first run successfully on an instrument PC, and the last time it was run (on the same network interface). This could be useful for help desk staff to diagnose problems, e.g. if a user says "Our MyTardis uploads from this PC are broken", it is useful to the help desk staff / MyTardis administrator to determine how long it has been broken for, and how big the data backlog is which needs to be uploaded.
The ipv4 address, ipv6 address, subnet mask and wan_ip_address fields are used for granting access to staging areas. Sometimes the administrator of the staging host will need to add the instrument PC's IP address to a hosts.allow file, or to an iptables firewall, or (in the case of NeCTAR/OpenStack) to a security group. Some instrument PCs have a public-facing IP address, so the IP address you get from "ipconfig" or "ifconfig" will be the same as the one you see when you navigate to http://www.whatismyip.com/, whereas other instrument PCs will be stuck behind firewalls / routers / gateways, so their internal and public IP addresses could be completely different. The wan_ip_address is determined on the server side (in TastyPie), using the django-ipware module, which uses the HTTP_X_FORWARDED_FOR header. The wan_ip_address is probably by far the most useful IP address to store in the uploader record, (for granting access to the staging host through a firewall), although this IP address might not be unique to an instrument PC, as multiple instrument PCs can connect to the Internet through a common gateway. There may be uses cases (I.T. help desk diagnosing MyData problems), where having an internal IP address is useful too, e.g. the I.T. administrator might be able to connect to the instrument PC via Remote Desktop Protocol or using a UNC path to access its filesystem via CIFS. Note that the hostname field in the uploader record is basically just the result of running "hostname" on the client machine - MyData makes no effort to contact a DNS server to determine a fully-qualified hostname from the PC's IP address. So there's no guarantee that the hostname can be used by an I.T. help desk to remotely connect to the PC, but the hostmae might help a facility manager to identify another PC in their facility. For example, if a MyData user (facility manager) tries to enter a duplicate instrument name, MyData might say "The instrument name 'Test Microscope 1' for facility 'Test Facility' is already being used on hostname 'JamesLaptop'."

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.