allenneuraldynamics / aind-data-transfer-service Goto Github PK
View Code? Open in Web Editor NEWFastAPI service to run data compression and transfer jobs on the hpc
License: MIT License
FastAPI service to run data compression and transfer jobs on the hpc
License: MIT License
As a user, I want to use the latest version of aind-data-schema, so I can submit jobs with the latest modality values.
Add any helpful notes here.
As a user, I want to be able to modify an existing job from a list.
Add any helpful notes here.
As a user, I want a button that will add a job to a list if it's valid.
Add any helpful notes here.
As a user, I want to see the status of submitted jobs longer, so I can check the job status a few days I've submitted one.
We can get this info from a different endpoint: "api/slurmdb/v0.0.37/jobs"
When a user submits a job, they need to be able to return to see the status of the job.
Acceptance criteria:
http://<service-url>/jobs
JobStatus.name
<-- include asset name in job name pleaseJobStatus.job_state
JobStatus.submit_time
, formatted in human readable form, local time zone<N>
"Is your feature request related to a problem? Please describe.
The job template lets people put acquisition datetime in human readable form. This should be a supported format.
e.g. jobs | job status | upload template
also, update to match data-transfer (acq-datetime, Platform, etc) and update UI
As a devops engineer, I want to build the app in a docker file, so I can deploy it to a k8s cluster.
Add any helpful notes here.
As a devops engineer, I want to pull the docker image from a registry, so I can use it in a k8s environment easily.
Add any helpful notes here.
As a developer, I want the repo to be renamed to aind-data-transfer-service, so I can more accurately convey the purpose of the repo.
Add any helpful notes here.
Is your feature request related to a problem? Please describe.
As a developer, I want to be able to run the unit tests and have the same experience on Windows compared to if I develop on Linux.
Describe the solution you'd like
Some issues with unit tests have already been resolved in previous PR #62 for path comparisons.
One remaining issue is that the error messages displayed from /api/validate_csv are slightly different depending on OS, which breaks the unit tests if running on Windows. We can pull out the error message text returned by /api/validate_csv using e.Exception
or other format rather than directly returning repre(e)
, and update all unit tests affected.
Example:
Currently expected on Linux: "AttributeError('WRONG_MODALITY_HERE')"
Current expected on Windows: "AttributeError(\"type object 'Modality' has no attribute 'WRONG_MODALITY_HERE'\")"
We should check if there are any other issues related for running on different OS.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
There is an occasional error uploading a csv file generated from excel
To Reproduce
Steps to reproduce the behavior:
\ufeff
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
We can probably fix this by adding an encoding such as f = open('file', mode='r', encoding='utf-8-sig')
As a user, I want to input data using a browser-based GUI, so I can more easily define data-transfer jobs
Add any helpful notes here.
As a user, I want a button that can remove a specific job from the list of jobs.
As a user, I want my jobs to execute with a higher priority, so I don't have to wait as long in the queue.
Add any helpful notes here.
As a user, I want to submit a csv file for validation and submit jobs via json, so I can use a REST API to run these operations.
Add any helpful notes here.
Describe the bug
modality0.source is acting like the modality0 column with dropdowns and validation
As a user, I want to return serialized response when I hit submit.
Add any helpful notes here.
As a user, I want the server to have aws credentials to the open-data- bucket, so I can write to aind-open-data
Add any helpful notes here.
As a developer, I want the excel template automatically generated, so I can avoid maintaining the template in a sharepoint folder.
job_upload_template
, then the upload template will be returnedAdd any helpful notes here.
David wrote a capsule: https://codeocean.allenneuraldynamics.org/capsule/8742749/tree
Add an API endpoint that runs this and returns the template for download as an xlsx file
Describe the bug
Even though aind-data-schema is pinned to 0.26.5, "behavior-videos isn't being recognized
To Reproduce
Steps to reproduce the behavior:
Expected behavior
behavior-videos should be recognized as a valid modality
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context
Add any other context about the problem here.
As a user, I want to submit a rest request to the HPC, so I can run a data compression and upload job.
Add any helpful notes here.
As a user, I want to have the option to filter by service and upload jobs from the status page, so I can better select/view specific jobs
Add any helpful notes here.
After a user uploads a CSV and before they submit a job, they should be able to review the jobs and see any validation error messages.
Acceptance criteria:
As an admin, I want users to authenticate when access the app, so I can ensure only trusted actors can submit jobs to the hpc.
Add any helpful notes here.
As a user, I want to use the latest aind-data-schema version
Add any helpful notes here.
UI text input. Include in job description somewhere.
Display it in the job status table.
As a user, I want to be able to upload an xlsx file, so I can avoid having to convert an xlsx file to csv before using the service.
validate_csv
endpoint, then it will check the file extension and convert an xlsx sheet to a csv file before validation.Add any helpful notes here.
As a user, I want the GUI to have a panel to input a single job definition
Add any helpful notes here.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Entering many jobs will be hard. Let users upload a CSV file and parse it into Json
Acceptance Criteria:
Is your feature request related to a problem? Please describe.
As a developer, I want all config classes to be consistent and use pydantic models when possible. The JobUpdateTemplate
class currently configures and creates the job upload template using a static method. This can be refactored to be a pydantic model and can also ease future conversion to form-based templating.
Describe the solution you'd like
Refactor /configs/job_upload_template.py
as a pydantic model.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
A service account user registers the data asset on Code Ocean and makes the asset viewable to everyone. However, it will be helpful to add an option to make the person who initiated the upload an owner in case the data asset needs to be archived.
Describe the solution you'd like
Add an option to make the person who initiated the upload an owner of the data asset on Code Ocean
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
It's not entirely clear how users point to a directory where metadata files associated with a data asset is located.
Describe the solution you'd like
Add a column with metadata_dir field
Describe alternatives you've considered
At some point, we should have a fillable form
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
As a developer, I want to the /api/job_upload_template
endpoint to be more efficient so that a new template file is not generated every time.
Describe the solution you'd like
Implement caching for /api/job_upload_template
endpoint (and any others as appropriate).
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Note: this feature may not be needed/ not priority since the current implementation is quite fast already.
As a user, I want to see the validation results as soon as a file is selected, so I don't have to click too many buttons.
On uploading the sheet, automatically show the preview.
Get rid of the preview button.
Is your feature request related to a problem? Please describe.
I usually do multiple recordings from the same animal within a given day, so the only difference in the asset name between them right now is the timestamp. There's a lot of redundancy in one of the columns.
Describe the solution you'd like
Would it be possible down the road to have some other column in the upload .csv to designate some unique aspect of an experiment, like an index value or something?
Describe alternatives you've considered
Generating the csv file via a python script
Additional context
Add any other context or screenshots about the feature request here.
As a user, I want this published to a docker registry, so I can use deploy it easily to a k8s cluster.
Add any helpful notes here.
Is your feature request related to a problem? Please describe.
By accident, we managed to submit duplicate upload jobs twice this week: once through the website (not sure how this happened) and once via HTTP request (ran the same request twice).
We then weren't able to cancel the duplicate jobs - for one of them we didn't even realize until Jon alerted us.
In the end, running two uploads of the same data simultaneously didn't cause an issue: the second operation on each file must have seen that the data already existed and skipped it. A duplicate run of the sorting capsule was started after upload, and would have been wasteful had Jon not canceled it.
I can't think of a case where someone would want to upload the same session multiple times simultaneously, so I propose that the server could prevent this from happening.
Describe the solution you'd like
Before allowing a new upload job to be submitted, check there isn't already an upload job for that session in progress.
If there is a reason to allow multiple uploads with the same session ID, then compare csv/job upload parameters instead.
Describe alternatives you've considered
Is your feature request related to a problem? Please describe.
There is very little explanation for how to use the site. This makes it hard for users to know how to submit jobs effectively.
Describe the solution you'd like
On the main page, a clear indication of:
Notes
Our private buckets are confusing to users.
Once s3://aind-private-data
exists:
s3_bucket
column from the transfer service spreadsheets3://aind-private-data
Describe the bug
Trying to upload a session with 3 modalities specified in job csv: ecephys
, behavior
, behavior-videos
To Reproduce
At http://aind-data-transfer-service/, manually attach the following files:
modality0
, modality1
:
\allen\programs\mindscope\workgroups\np-exp\codeocean\DRpilot_676909_20231214\upload.csv
modality0
, modality1
, modality2
:
\allen\programs\mindscope\workgroups\np-exp\codeocean\DRpilot_702131_20240226\upload.csv
As a developer, I want to containerize the service, so I can run it on a k8s cluster.
Add any helpful notes here.
Is your feature request related to a problem? Please describe.
Sometimes a user wants to cancel a job that has been submitted.
Describe the solution you'd like
Add a button on the jobs status page to cancel a running job.
Describe alternatives you've considered
Currently, an admin needs to be contacted to cancel a running job.
Additional context
Add any other context or screenshots about the feature request here.
As a developer, I want the docker image registered, so I can pull it down and run it in a k8s cluster.
Add any helpful notes here.
As a user, I want to define hpc configs, so I can modify the hpc resources better.
Add any helpful notes here.
Describe the bug
Submitting jobs to the service was working fine at the end of last year, now I'm encountering a problem with /api/validate_csv
.
To Reproduce
As per one of the tests:
aind-data-transfer-service/tests/test_server.py
Lines 61 to 77 in 6859fb6
with the following csv:
modality0.source,modality0,s3-bucket,subject-id,platform,modality1.source,modality1,acq-datetime
//allen/programs/mindscope/workgroups/np-exp/codeocean/DRpilot_676909_20231214/ephys,ecephys,aind-ephys-data,676909,ecephys,//allen/programs/mindscope/workgroups/np-exp/codeocean/DRpilot_676909_20231214/behavior-videos,behavior-videos,2023-12-14 12:43:11
import pathlib
import requests
def _raise_for_status(response: requests.Response) -> None:
"""pydantic validation errors are returned as strings that can be eval'd
to get the real error class + message."""
if response.status_code != 200:
try:
raise eval(response.json()['data']['errors'][0])
except (KeyError, IndexError, requests.exceptions.JSONDecodeError, SyntaxError) as exc1:
try:
response.raise_for_status()
except requests.exceptions.HTTPError as exc2:
raise exc2 from exc1
csv_path = pathlib.Path("//allen/programs/mindscope/workgroups/np-exp/codeocean/DRpilot_676909_20231214/upload.csv")
url = "http://aind-data-transfer-service/api/validate_csv"
validate_csv_response = requests.post(url=url, files=dict(file=csv_path.read_bytes()))
_raise_for_status(validate_csv_response)
output:
Traceback (most recent call last):
File "<stdin>", line 6, in _raise_for_status
File "<string>", line 1
Invalid input file type
^^^^^
SyntaxError: invalid syntax
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in _raise_for_status
File "<stdin>", line 9, in _raise_for_status
File "C:\Users\ben.hardcastle\github\np_codeocean\.venv\Lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 406 Client Error: Not Acceptable for url: http://aind-data-transfer-service/api/validate_csv
which seems to originate from here, suggesting a file path should now be provided instead of file contents:
aind-data-transfer-service/src/aind_data_transfer_service/server.py
Lines 59 to 60 in 6859fb6
However, I'm not sure what the format of the request should be. The following gave a 500 error:
validate_csv_response = requests.post(url=url, json=dict(file=csv_path.as_posix()))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.