coedl / elpiscloud Goto Github PK
View Code? Open in Web Editor NEWRefactor of the Elpis project, with a web service in mind.
Refactor of the Elpis project, with a web service in mind.
For example, within lib/api/files.ts
, getSignedUploadURLs
and getUserFiles
are async methods that make network calls such as fetch
and getDocs
. These operations could fail and these errors (example HTTP errors) would need to be handled properly.
Similarly, where methods from lib/api/files.ts
are used, such as within components/datasets/DatasetViewer.tsx
(loading the user's datasets) would depend on this error handling being done correctly.
libsndfile
(which cloud functions don't have as far as I can see).Librosa
also uses soundfile
to do the audio processing so probably neither of these will work.Traceback (most recent call last):
File "/layers/google.python.pip/pip/bin/functions-framework", line 8, in <module>
sys.exit(_cli())
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/functions_framework/_cli.py", line 37, in _cli
app = create_app(target, source, signature_type)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/functions_framework/__init__.py", line 288, in create_app
spec.loader.exec_module(source_module)
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/workspace/main.py", line 4, in <module>
from functions.datasets.process_file import process_dataset_file
File "/workspace/functions/datasets/process_file.py", line 9, in <module>
import utils.audio as audio
File "/workspace/utils/audio.py", line 3, in <module>
import soundfile as sf
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/soundfile.py", line 142, in <module>
raise OSError('sndfile library not found')
OSError: sndfile library not found
The process_model
cloud function fails when trying to serialize the model status enum, as seen in the logs below.
Traceback (most recent call last):
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/layers/google.python.pip/pip/lib/python3.10/site-packages/functions_framework/__init__.py", line 171, in view_func
function(data, context)
File "/workspace/functions/process_model.py", line 31, in process_model
publish_to_topic(PUBSUB_TOPIC, [model.to_dict()])
File "/workspace/utils/pubsub.py", line 33, in publish_to_topic
serialized = json.dumps(obj)
File "/opt/python3.10/lib/python3.10/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/opt/python3.10/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/python3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/python3.10/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TrainingStatus is not JSON serializable
Primarily creation of models from dataset and file uploads.
The trainer subscription can fire the same training event multiple times, which would start different training run throughs on cloud run (wasting resources, overwriting models etc.).
This is not a priority to fix at this point, but something to think about.
Copying over a reply below from Nick.
Maybe I just put it as a flag on the firestore Model that training has begun?
If firebase has atomic transactions this might work, otherwise you're just going to have the same run condition in a different place. The ideal scenario is to have idempotent training jobs, where the end state isn't affected by multiple runs (either because it realised that a training job with the given inputs has already occurred and short-circuits, or it does the training again in a non-destructive way - both of these typically involve hash comparisons between inputs/outputs). This isn't easy to pull off, and probably isn't a priority at the moment, but something to think about.
Originally posted by @nicklambourne in #89 (comment)
Currently this is not implemented in the UI and it would be good to have this feature for completeness.
Currently the upon signing in to elpis.cloud, the router pops the last page off the stack, which can have the effect of navigating off the site.
We want the router to redirect to the home page instead after users log in.
The terraform formatter (terraform fmt
) removes new lines from the ends of files as seen here #59 , despite saying that it doesn't. Generally this is gross but low priority fix.
Essentially we'll have 4 representation of models like Dataset
, Model
etc that could get out of sync in the future.
The cloud function dependencies aren't currently pinned at any version, which has the potential to create problems down the track if there are breaking changes (as outlined by Nick in a previous PR).
So essentially we just need to go through the requirements.txt file and find a set of compatible versions to freeze at.
As above.
We need to test valid and invalid dataset processing requests.
globals.css contains CSS for tagging functionality, this needs to be changed to use tailwind CSS. Also the tagging needs to be changed to use a cross icon rather than the rotated plus sign it is using at the moment.
To reproduce
Arose from #86
This wasn't seen in the terraform planning stage, but apparently exactly once delivery
and push config
are incompatible options to have on the pubsub subscription, and it cried during the build pipeline:
│ Error: Error updating Subscription "projects/elpiscloud/subscriptions/trainer_subscription": googleapi: Error 400: A subscription cannot have push config or bigquery config set with exactly once delivery configured.
│
│ with module.trainer.google_pubsub_subscription.subscription,
│ on ../../modules/service/main.tf line 40, in resource "google_pubsub_subscription" "subscription":
│ 40: resource "google_pubsub_subscription" "subscription" {
After issues with the soundfile
library (#71), I tried to rewrite the audio utilities using the python built in wave
module.
When I did so, I thought that resampling was just modifying the sample_rate metadata on the audio file, which has caused some funny bugs with the processed dataset files.
The processed dataset files are not anything like expected, and instead of writing a custom resampling algo, we should find a better external audio utility library.
<li>
elements wrapped in a next/link
componentCurrently in DatasetViewer.tsx, when the user is shown a list of all their datasets, a column should contain a button to 'view' the dataset in that row. This button should show a simple view containing a list of all the user's files that belong in that dataset.
Not high priority, but the scripts taken from the desktop versions could, and probably should be improved in the following ways (stolen from @nicklambourne):
There's an error that occurs while running the trainer service, where some file isn't found where it's expected.
Instead of using hyphens in resource names in Terraform, underscores should be used. For example, in architecture/modules/frontend_bucket/main.tf, google_storage_bucket is named static-site. This will be changed to static_site.
Currently the preprocessing workflow blindly attempts to create processing jobs from the files supplied in a dataset.
This would fail if the user selected improper files, not enough files etc.
Instead, after forming a dataset object from the incoming event, the process_dataset
function should check dataset validity before batching and pushing jobs to the pubsub queue.
If the dataset is found to be invalid, we should indicate so on the dataset model in firestore (setting status to error or something).
Tags should be used to organise the user uploaded files in firestore.
Each user file in firestore currently has an empty list of tags that a user should be able to update from the frontend. These tags should appear when viewing the files that a user has uploaded, e.g. from the files page or from the create new dataset page. They allow users to filter files not only by the name but also by some other information if they should so choose.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.