ocean-data-factory-sweden / kso Goto Github PK

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.

License: GNU General Public License v3.0

Dockerfile 0.65% Python 99.35%

object-detection deep-learning marine-protected-areas citizen-science

kso's Introduction

KSO System

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.

KSO overview

The KSO system has been developed to:

move and process underwater footage and its associated data (e.g. location, date, sampling device).
make this data available to citizen scientists in Zooniverse to annotate the data.
train and evaluate machine learning models (customise Yolov5 or Yolov8 models).

The system is built around a series of easy-to-use Jupyter Notebooks. Each notebook allows users to perform a specific task of the system (e.g. upload footage to the citizen science platform or analyse the classified data).

Users can run these notebooks via Google Colab (by clicking on the Colab links in the table below), locally or on a high-performance computing (HPC) environment.

Notebooks

Our notebooks are modular and grouped into four main task categories; Set up, Classify, Analyse and Publish.

Task	Notebook	Description
Set up	Check_metadata	Check format and contents of footage and sites, media and species csv files
Classify	Upload_subjects_to_Zooniverse	Prepare original footage and upload short clips to Zooniverse, extract frames of interest from the original footage and upload them to Zooniverse
Classify	Process_classifications	Pull and process up-to-date classifications from Zooniverse
Analyse	Train_models	Prepare the training and test data, set model parameters and train models
Analyse	Evaluate_models	Use ecologically relevant metrics to test the models
Publish	Publish_models	Publish the model to a public repository
Publish	Publish_observations	Automatically classify new footage and export observations to GBIF

Local Installation

Docker Installation

Requirements

Docker

Pull KSO Docker image

Bash
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev

Conda Installation

Requirements

Download this repository

Clone this repository using

git clone https://github.com/ocean-data-factory-sweden/kso.git

Prepare your system

Depending on your system (Windows/Linux/MacOS), you might need to install some extra tools. If this is the case, you will get a message about what you need to install in the next steps. For example, Microsoft Build Tools C++ with a version higher than 14.0 is required for Windows systems.

Set up the environment with Conda

Open the Anaconda Prompt
Navigate to the folder where you have cloned the repository or unzipped the manually downloaded repository. Then go into the kso folder.

cd kso

Create an Anaconda environment with Python 3.8. Remember to change the name env.

conda create -n <name env> python=3.8

Enter the environment:

conda activate <name env>

Specify your GPU details.

5a. Find out the pytorch installation you need. Navigate to the system options (example below) and select your device/platform details.

5b. Add the recommended command to the KSO's gpu_requirements_user.txt file.

Install all the requirements:

pip install -r requirements.txt -r gpu_requirements_user.txt

Cloudina

Cloudina is a hosted version of KSO (powered by JupyterHub) on NAISS Science Cloud. It allows users to scale and automate larger workflows using a powerful processing backend. This is currently an invitation-only service. To access the platform, please contact jurie.germishuys[at]combine.se.

The current portals are accessible as:

Console (object storage) - storage
Album (JupyterHub) - notebooks
Vendor (MLFlow) - mlflow

Starting a new project

To start a new project you will need to:

Create initial information for the database: Input the information about the underwater footage files, sites and species of interest. You can use a template of the csv files and move the directory to the "db_starter" folder.
Link your footage to the database: You will need files of underwater footage to run this system. You can download some samples and move them to db_starter. You can also store your own files and specify their directory in the notebooks.

Please remember the format of the underwater media is standardised (typically .mp4 or .jpg) and the associated metadata captured in three CSV files (“movies”, “sites” and “species”) should follow the Darwin Core standards (DwC).

Developer instructions

If you would like to expand and improve the KSO capabilities, please follow the instructions above to set the project up on your local computer.

When you add any changes, please create your branch on top of the current 'dev' branch. Before submitting a Merge Request, please:

Run Black on the code you have edited

black filename

Clean up your commit history on your branch, so that every commit represents a logical change. (so squash and edit commits so that it is understandable for others)
For the commit messages, we ask that you please follow the conventional commits guidelines (table below) to facilitate code sharing. Also, please describe the logic behind the commit in the body of the message.
Commit types

Commit Type	Title	Description	Emoji
`feat`	Features	A new feature	✨
`fix`	Bug Fixes	A bug Fix	🐛
`docs`	Documentation	Documentation only changes	📚
`style`	Styles	Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)	💎
`refactor`	Code Refactoring	A code change that neither fixes a bug nor adds a feature	📦
`perf`	Performance Improvements	A code change that improves performance	🚀
`test`	Tests	Adding missing tests or correcting existing tests	🚨
`build`	Builds	Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)	🛠
`ci`	Continuous Integrations	Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)	⚙️
`chore`	Chores	Other changes that don't modify src or test files	♻️
`revert`	Reverts	Reverts a previous commit	🗑

Rebase on top of dev. (never merge, only use rebase)
Submit a Pull Request and link at least 2 reviewers

Citation

If you use this code or its models in your research, please cite:

Anton V, Germishuys J, Bergström P, Lindegarth M, Obst M (2021) An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548

Collaborations/Questions

You can find out more about the project at https://subsim.se.

We are always excited to collaborate and help other marine scientists. Please feel free to contact us (matthias.obst(at)marine.gu.se) with your questions.

Troubleshooting

If you experience issues importing panoptes_client in Windows, it is a known issue with the libmagic package. Pmason's suggestions in the Talk board of Zooniverse can be useful for troubleshooting it.

kso's People

Contributors

Stargazers

Watchers

Forkers

tdeneke ohadtay adida jose-begeospatial pilarnavarro pabloyoyoista diewertje11 yingfangu lewynli elvbom callumrollo donkyjohn

kso's Issues

Improve site descriptions in maps

In tutorial # 1 the site information displayed in the maps is not very "informative". We need to update it to be more readable.

Dockerfile includes extra python packages but unknown what they do

While re-creating the ci-pipeline to automatically test the notebooks, it was found that the master branch of kso points to commit a306499 of kso_utils, which is branch origin/feat/pyav-backand.
The dev branch of kso points to the dev of kso_utils, commit f2ac787. (I believe these commits were made to try to fix the problem of extracting frames from movies that Emil had just before the summer holidays)
The requirement file in kso_utils is different in both these commits. This creates the error as can be seen in the image below, when the notebook tests are run in a container based on the requirements in dev (commit f2ac787).

Since the tests did work when the container was build based on commit a306499 and the requirements there, these python packages are added to the dockerfile. (temporarily! we do not want these here, they should be or removed or put in the requirements.)
They are not added to the requirements in dev yet, since I do not know why these packages are added and what their function is, and if we want to use them in the end.

So this issue needs to be resolved by finding out what these packages do and if we actually use them. If we do, they should be added to the kso_utils requirements in dev and master/main. If we do not want them, we need to find out why this error from the image occurs and how we can solve it.

Notebook 4+8: Generalize workflow and species selection

To select the species and aggregation factors is done in a different order in Notebook 4 and 8, and they always display all options that do not give any annotations. The idea is to make it the same for both notebooks, and filter the options first on if there are annotations available or not. This can be done on zoo_info_dict. I am working on this.

Replace the unswedify and reswedify functions

Issue moved from kso-data-management https://github.com/ocean-data-factory-sweden/kso-data-management/issues/71

Google Colab package dependency error

While working on issue #191, colab gives the following error during the instalation of all the packages.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.11.0 which is incompatible. google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 1.4.0 which is incompatible.

This comes from the code where all requirements are stated in 1 line. The code to reproduce this error can be found in commit:
778aaadc18076834617aa53e2636db432723ce58

Since we currently only use google-colab to clear the output and this still works. And since the pip list shows our versions of the packages (8.11.0 and 1.4.0), we will ignore this error for now. But it is something to keep in mind if something does not work in the future on google colab.

Force id column to display in "select_meta_range" in tut#1

In tutorial ocean-data-factory-sweden/kso-utils#1 when manually changing the csv files users can select to display any column they want. This can create errors when updating the ,csv files if the "movie_id", "site_id" or "species_id" are not selected. A solution can be to force any of these columns to be displayed at all times so that way the column can be used to pd.concat the changes

Adjust the Dockerfile to match the new package dependencies Issue #191

@jannesgg, we should take a look together at the Dockerfile to make it match the new package dependencies. Maybe we can take some time for that next week.

Rename main repo and update all links

We should come up with a new name for the repo, now that we only have 1 repo left. Then we should update the name everywhere

update the name in the ReadMe (also in the links to all images)
update the name in the link of the repo / docker image (not koster-yolo4 anymore)
update the name in the links in the Dockerfile
update the name in the Jupyter notebooks
use ghcr instead of dockerhub for image builds

Dockerfile nvidia starting image

The dockerfile (the one that is for both repositories combined (data-management) and (object-detection)) currently first loads the nvidia cuda devel docker image to start with, in order to build the ffmpeg from scratch.

Then it starts over from a new image, copies the final installation of the ffmpeg and builds up the rest of the environment. In theory this should be possible to do with the runtime image, which has the advantage that it is smaller. However, when trying that, the dockerfile could not get through the builder test on github since it ran out of disk. This is the error that occurred:

This error is resolved by using the devel image for a second time, instead of loading in this new image. However, now we end up with a larger image at the end (which is not a problem), but it is not the neatest solution. So this is something we can take a look at again in the future.

Improve table next to frame display

In the launch_viewer function when displaying the frames there is a table on the side but it's not very useful. It should have the name of the actual labels instead of the colors

Ensure capability of different GPU types on Alvis

Create requirements.txt file for 1 env (works on local, SNIC, Colab)

Currently there are multiple requirement files in the 3 different repos and also 2 extra in the yolov5 and yolov5_tracker that we all pip install on separate lines. This causes that pip cannot manage that everything is compatible with each other. Therefor all the requirements should be installed on 1 line instead.

On top of that, our 3 requirement files contradict each other. The goal is now to remove these contradictions and final minimum combination of packages that makes everything work.

This should work with the same requirement files on Colab, SNIC and locally.

Add a project-specific "compress_video" option

While standardising the format of the movies we should offer a project-specific option to compress the videos or not.

The "standarise_movie_format" function within the "movies.utils" uses a hard-coded option to compress the videos if it's not the Spyfish Aotearoa project.

Ideally, the "projects_list.csv" should have a column with the "compress_video" set to True or False

Combine dm and ml repos

Unification of Dockerfile for single image build
Unification of ReadME files for both repos
Transfer notebooks from dm repo to ml repo
Change the jupyter.sh files on SNIC, we now only need 1

Add a feature to extract occurrences for publishing them to GBIF/OBIS via ITP

Select the model of interest from Zenodo run it on the movies you want and publish observations to OBIS

We started the "format_to_gbif_occurence" function in tut#8. At the moment the function modify classifications from citizen scientists to the OBIS format but we will need to add the functionality to process ML and expert classifications.

The Integrated Publishing Toolkit and python package to read and parse Darwin might be useful.
Also, worth keeping an eye on the gbif python client. Currently, it seems to only be possible to download datasets but maybe there is an option to upload?

Using Wildlife.ai's GBIF account publish observations to OBIS.
Consider using the cameratrap data format and standards
https://tdwg.github.io/camtrap-dp/

Using Docker Playground and following the IPT installation guidelines you can temporarily run your own IPT to privately test the occurrence files

Add tests for kso widgets

Add tests for individual widgets (in test/widget-tests.py)
add to Github actions workflow

Notebook 5: No download folder gets selected

it should be download_folder.selected and not download_folder.value.
the .value does not work for this type of widget.

Update movie selection method from AWS in choose_footage

The "choose_footage" function in widget_utils to choose movies from AWS generates a temporary link for each of the movies available and then enables the user selects the movie of interest. To maximise our resources we should first select the movies of interest and then generate a temporary HTTP link for those movies only.

Tutorial 3 upload clips issue

🐛 Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

pp.upload_zoo_subjects("clip")

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 pp.upload_zoo_subjects("clip")

File /usr/src/app/kso/kso_utils/kso_utils/project.py:692, in ProjectProcessor.upload_zoo_subjects(self, subject_type)
    684 """
    685 This function uploads clips or frames to Zooniverse, depending on the subject_type argument
    686 
   (...)
    689 :type subject_type: str
    690 """
    691 if subject_type == "clip":
--> 692     upload_df, sitename, created_on = zoo_utils.set_zoo_clip_metadata(
    693         project=self.project,
    694         generated_clipsdf=self.generated_clips,
    695         sitesdf=self.local_sites_csv,
    696         moviesdf=self.local_movies_csv,
    697     )
    698     zoo_utils.upload_clips_to_zooniverse(
    699         project=self.project,
    700         upload_to_zoo=upload_df,
    701         sitename=sitename,
    702         created_on=created_on,
    703     )
    704     # Clean up subjects after upload

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1303, in set_zoo_clip_metadata(project, generated_clipsdf, sitesdf, moviesdf)
   1301 # Combine site info to the generated_clips df
   1302 if "site_id" in generated_clipsdf.columns:
-> 1303     upload_to_zoo = generated_clipsdf.merge(sitesdf, on="site_id")
   1304     sitename = upload_to_zoo["#siteName"].unique()[0]
   1305 else:

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:9329, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   9310 @Substitution("")
   9311 @Appender(_merge_doc, indents=2)
   9312 def merge(
   (...)
   9325     validate: str | None = None,
   9326 ) -> DataFrame:
   9327     from pandas.core.reshape.merge import merge
-> 9329     return merge(
   9330         self,
   9331         right,
   9332         how=how,
   9333         on=on,
   9334         left_on=left_on,
   9335         right_on=right_on,
   9336         left_index=left_index,
   9337         right_index=right_index,
   9338         sort=sort,
   9339         suffixes=suffixes,
   9340         copy=copy,
   9341         indicator=indicator,
   9342         validate=validate,
   9343     )

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:107, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     90 @Substitution("\nleft : DataFrame or named Series")
     91 @Appender(_merge_doc, indents=0)
     92 def merge(
   (...)
    105     validate: str | None = None,
    106 ) -> DataFrame:
--> 107     op = _MergeOperation(
    108         left,
    109         right,
    110         how=how,
    111         on=on,
    112         left_on=left_on,
    113         right_on=right_on,
    114         left_index=left_index,
    115         right_index=right_index,
    116         sort=sort,
    117         suffixes=suffixes,
    118         copy=copy,
    119         indicator=indicator,
    120         validate=validate,
    121     )
    122     return op.get_result()

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:704, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    696 (
    697     self.left_join_keys,
    698     self.right_join_keys,
    699     self.join_names,
    700 ) = self._get_merge_keys()
    702 # validate the merge keys dtypes. We may need to coerce
    703 # to avoid incompatible dtypes
--> 704 self._maybe_coerce_merge_keys()
    706 # If argument passed to validate,
    707 # check if columns specified as unique
    708 # are in fact unique.
    709 if validate is not None:

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:1257, in _MergeOperation._maybe_coerce_merge_keys(self)
   1251     # unless we are merging non-string-like with string-like
   1252     elif (
   1253         inferred_left in string_types and inferred_right not in string_types
   1254     ) or (
   1255         inferred_right in string_types and inferred_left not in string_types
   1256     ):
-> 1257         raise ValueError(msg)
   1259 # datetimelikes must match exactly
   1260 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Expected behavior

Uploading of created clips, error might be due to large file size?

Environment

Issues initiating kso in SNIC

I keep on running to the following issues in SNIC. Is this issue only on my end?

Requests dependencies issue in Colab again

I am afraid the current dev version is still having the same "requests" depency issue we had a while back

Train model with Tutorial 5 using Spyfish data

Train a YOLO model that can be used as a reference for the FasterRCNN model on Spyfish data.

Set up Tutorial 9 (Run ML on new footage)

Add workflow that runs the model over a selection of footage, and finally aggregates this by site and returns the maximum count for a given species within the given movies.

We should check that it works for the template project as well as for active projects (e.g. Spyfish)

Improve uploading of frames from third parties to Zooniverse

In tutorial #4, we should improve the current approach to uploading your own frames (e.g. not retrieving them from clips classified by Zooniverse volunteers).
This will help other projects that have classified their own videos or collected underwater images.

Check for valid species scientific names

We should add a roadblock to report if the species in the species.csv have non valid scientific names. Maybe checking the names against https://www.checklistbank.org/

The preview_media() function doesn't refresh when different option is selected

The preview media function in project.py doesn't refresh when a different movie is selected

Re-organise releases of stable versions

Currently (after PR #235) we are using the master branch as the 'stable' version. That is also why currently there is a new docker image created, every time we push to master. (The docker file contains the code that the users are using on SNIC. it does not only contain the requirements, but also the actual code) Because of the use of master as a stable version, we should not push changes to often to this branch.

However, we can in the future maybe move towards having new stable releases with tags. In that case, we only need to build a new docker image when we have such a new release. (It is desired to not build and push a new docker image too often, since it is quite big). And then we can more freely push new changes to master.

We should discuss what we think to be a good way forward.

For development, we can use the dev docker-image on SNIC and our own mounted clone of the code.

Move functions from tutorial_utils to project, widgets or their respective utils

To clean the util files and bring more clarity to the software, I think we should relocate the functions inside tutorial_utils so that if they get called directly within a notebook (e.g. t_utils.launch_viewer) we call them from 1) the ProjectProcessor or 2) the kso_widgets

Prevent nans from being higlighted in tut#1

When comparing changes made in csv files in Tutorial ocean-data-factory-sweden/kso-utils#1 empty values get highlighted. We need to have the system no to highlight these changes

Notebook 5: locale.getpreferredencoding() gets changed during the training. Causing the notebook to not be able to train again or run the evaluation part.

When you run Notebook 5, and request the preferred encoding at the beginning, or just before the cell where you do train.run(...), you get 'UTF-8'. (using code below)

import locale
locale.getpreferredencoding()

However, when you run the same thing after the cell in which you train, it returns 'ANSI_X3.4-1968'. (which is ASCII). So somewhere during this training that is performed by the YOLO5 code, this default gets changed. This causes an error with reading the names in the train.txt or valid.txt file when you train again or do the validation. (since these files contain Swedish letters, in the case of the template project)

Exception: train: Error loading data from /content/koster_yolov4/tutorials/ml-template-data/train.txt: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

This comes from line 470 in /content/koster_yolov4/yolov5/utils/dataloaders.py where the text file is opened with open(). This open() function uses the default encoding, and ASCII cannot read the ä.

We have not located exactly how this change in locale is made. We could not find anything in the code from YOLO5, when we search with git grep for ANSI, locale, encoding, ASCII, coding. only in the file utils/mws/mime.sh they do something with ASCII, but we do not think this file gets used.

Solutions would be to or prevent this change if we can locate where it is made. Or by every time setting it back to the correct default. However, we have not found a command yet that can set it back. We have tried the following:

locale.setlocale(locale.LC_ALL, '') (returns en_US.UTF-8')
sys.getfilesystemencoding() (returns utf-8)
locale.getlocale() (returns ('en_US', 'UTF-8') )
result = _locale.nl_langinfo(_locale.CODESET) (result contains 'ANSI_X3.4-1968'
_locale.CODESET (returns 14)

So it seems like there are 2 different encoding settings. One system wide one, that stays at UTF8 and is not changed, and one locale that gets changed. However, trying to change this back gives an error:

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

The ways we have tried to set it back:

!chcp 65001
!vim /etc/default/locale
!echo $PYTHONIOENCODING
locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')

The code below seems to set it back, but it does not solve the issue when training/validating, so it just sets it to a string or something.

import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

locale.getpreferredencoding()

To have the template project working for the workshop on 02-03-2023, we simply change the names of the files so that they do not contain any ä or other Swedish letters.

Notebook 5+6: Error while importing panoptes_client

When you try to run notebook 5 in Colab with a clean runtime, you get an error in the 2nd cell.

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/dist-packages/urllib3-1.24.3.dist-info/METADATA'

This comes from: import kso_utils.tutorials_utils as t_utils
In which it tries to import: from panoptes_client import Project

If you look in the google colab files, it does indeed not contain this file. However, it does contain a different version:
/usr/local/lib/python3.8/dist-packages/urllib3-1.26.14.dist-info

The strange thing is, that when you run the command again: import kso_utils.tutorials_utils as t_utils
Then there is no problem and it just runs.
I do not understand why this is the case and I cannot find it online. Does anyone know this??

I do have 2 work arounds to prevent the error from occurring:

Install from the requirements.txt file from the data-management repository. Then the error is not occurring, so probably the packages in that requirements list are better in agreement with each other. (This also brings up the question for me: Why are there 2 different repositories and 2 different requirements files? Or is this just due to historical reasons?)
Import the panoptes_client in the tutorials_utils with a try except, since it can import it the second time

try:
from panoptes_client import Project
except:
from panoptes_client import Project

Issues with panoptes import

I have successfully installed the KSO software from scratch following the guidelines in the readme. However, the notebooks get stack importing the projectprocessor. I have dug into it and found the problem is when trying to import panoptes_client.
I know I have had this problem before and fixed it manually uninstalling and reinstalling different packages but I can't find the solution in the readme.

Enable users to download raw and aggregated Zooniverse classifications in tut#8

Researchers from Spyfish Aotearoa would like to download the processed (i.e. json unnested) classifications from Zooniverse to analyse them in R.
This will be possible in tutorial https://github.com/ocean-data-factory-sweden/kso-data-management/pull/9, which is in a draft state at the moment.
Sections to be developed are in the checklist below

Widget to select date range of the classifications of interest
Process the classifications (i.e. unnest the json classifications) to have a label/species per row
Widget to select the columns users want to download

Tutorial 3 issue with producing sample clips

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

Current repo: run git fetch && git status -uno to check and git pull to update repo
Common dataset: coco.yaml or coco128.yaml
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#environments

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

Seems to be something that has not been defined.

To Reproduce (REQUIRED)

pp.generate_zoo_clips(
movie_name=pp.movie_selected,
movie_path=pp.movie_path,
is_example=True,
use_gpu=gpu_available.result,
)
Output:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/ipywidgets/widgets/interaction.py:257, in interactive.update(self, *args)
    255     value = widget.get_interact_value()
    256     self.kwargs[widget._kwarg] = value
--> 257 self.result = self.f(**self.kwargs)
    258 show_inline_matplotlib_plots()
    259 if self.auto_display and self.result is not None:

File /usr/src/app/kso/kso_utils/kso_utils/widgets.py:1182, in n_random_clips(clip_length, n_clips)
   1179 def n_random_clips(clip_length, n_clips):
   1180     # Create a list of starting points for n number of clips
   1181     # movie_df is currently missing here
-> 1182     duration_movie = int(movie_df["duration"].values[0])
   1183     starting_clips = random.sample(range(0, duration_movie, clip_length), n_clips)
   1185     # Seave the outputs in a dictionary

NameError: name 'movie_df' is not defined

Number of modifications:
0

Expected behavior

the expected behaviour is to be able to extract sample clips from the video

Fix testing GA workflow integration

The first version of testing has been implemented but needs to be improved by testing in the same environment and better managing the images that are created.

Add support for YOLOv8

Description:

YOLO has been upgraded and we risk deprecations causing issues down the line if we do not keep up. Yolov8 is now available for both the Ultralytics version of YOLO and the tracker submodule.

Example clip extraction not working with current configuration (t3)

Create python files from notebooks for testing

Convert existing notebooks to python files
Add descriptions for step-by-step workflow progression

Extracting clips from multiple movies at once, similar to how frame extraction is done.

Tutorial 3 currently only supports choosing a single movie and then extracting clips, but this could be extended to multiple movies as we do with frame extraction in Tutorial 4.

Update the submodule yolo_tracker to fix tut#6 error

As reported by Pilar, tut#6 has some error with the yolo_trackker submodule

Notebook 5 / training models gives error: run() got unexpected keyword argument batch_size

This issue is already solved and this is just a description of the error for documentation, since the error is not very clear and it was hard to find its cause.

Description of the error
When training a model in Notebook 5, with

mlp.train_yolov5(
exp_name.value,
weights.artifact_path,
epochs=epochs.value,
batch_size=batch_size.value,
img_size=(img_h.value, img_w.value),
)

which indirectly calls

yolov5.train()

you get the error from the image.

This error complains about the batch_size argument, while the code that is printed shows that the batch_size argument exist. And also if you go into the val.py file from the yolov5 repository, this batch_size argument exists.

Trace back of the origin of the error and solution
This error occurs since the commit where the project.py was created in commit 7c0d287.
The error only occurs when yolov5.train(epochs=1) (or our code) is run after that the MLProjectProcessor class is created. Without this class, the code runs properly.
In this class, t6_utils gets imported, which on its turn does: import yolov5_tracker.track as track.
In that code, some paths are appended to the sys.path. As a result of this, when the validation.run() is called in the tolov5.train(), it does not run the val.py from yolov5, but it runs the val.py from the tracker.

This val.py from the tracker is never imported in the entire repository, so it is just not used. Therefore the solution to this problem is to delete the val.py in the tracker repository. This makes it possible to run Notebook 5 / train models without errors.

Remaining issue
One remaining issue is that you cannot train 2 models in the same notebook session. For now this will be mentioned in the notebook and people are instructed to simply restart the notebook.

Notebook 5 cannot read in the files from the ml-template-project correctly on a Windows computer

When you run Notebook 5 in Google Colab on a windows computer, the file names of the images contain Sa╠êcken instead of säcken. This is because windows decodes the file names during the unzipping with CP437 instead of utf-8, which Linux does automatically. You can see that difference with the code below.

b'a\xcc\x88'.decode('CP437')

b'a\xcc\x88'.decode('utf-8')

This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.

Notebook 5 cannot read in the files from the ml-template-project correctly on a windows computer (or in google colab)

b'a\xcc\x88'.decode('CP437')
b'a\xcc\x88'.decode('utf-8')
This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.

Yolov5 detection produces no output for video streams (list of urls)

List of videos (urls from S3) treated as streams (e.g. webcam) and do not result in output video.

Line:
https://github.com/ultralytics/yolov5/blob/7398d2d77cbac9f66259926d49c26bfa3c257a9b/detect.py#LL82C5-L82C58

Proposed quick-fix for local runs:

Implement metrics for reports

Developed a point-based segmentation approach

Integrate a ML approach to segment areas of interest from the SGU point-based photo labels. Maybe using Taglab or CVAT?

Relevant literature:
General segmentation:
https://arxiv.org/pdf/2003.06148.pdf
https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-segment-anything-with-sam.ipynb

Segment Anything Meets Point Tracking:
Repo on a joint model of point tracking and Segmentation

@jannesgg @biomobst from our last conversation with SGU it's clear to me we should have different ML model approaches and human-in-the-loop annotation labelling (whether CVAT, Taglab,...)

readme: Add specifics for gpu usage on linux/mac for local computation

In the new readme that is added in commit 3ebc179, there is maybe something missing for linux and mac for if you want to use a GPU locally. Can one of you check if you need to do something extra here? (you did need to do something on windows and that is in the readme)

Finalise the workflow selection in tut #4 and #8

Automatically get the subject_type and display workflows and version id only for those subject types

Improve CI pipeline build docker and test

As described in the PR #235 of the docker file and the CI pipeline, the following logic is applied now:

Run the pipeline in "dev", "master", and PRs to "dev", and "master"
If any of the files we related to the container has changed, then rebuild it.
On a PR, this new image gets the tag of the current branch, in a push it gets the tag of what we push to (dev or master)
To fetch the correct image for the tests: If we're in PR, and the files changed, or we're in dev and master, then fetch current branch. Otherwise, the PR target
Run the tests unconditionally

This results in that the dev or master docker image only gets updated on a push, independent on if the tests pass/fail during that push. (That they will pas should be checked first in a PR).

So this is how it is now. Things that can be optimized are:

Pull "latest" before building the container every time. Then we can re-use layers which quickens up the build. (The tag latest is already added to the image, only the pulling needs to be added and look up if this automatically re-uses layers)
Note: This has been partially implemented and there is a comment in the Github Actions workflow file which should be added once the Dockerfile has been optimized.
Build the two stages independently, and push the first one to the registry too. (With 2 stages, we mean the first part that makes the ffmpeg compatible and the second stage would be from the next 'FROM', where we start over with the runtime image to build our image.) This is more explained in Issue : Improve Docker image
Remove tags for branches that are not dev and main automatically (current set up we need to manually do this). To do this automatically we will need permissions. See: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-with-a-personal-access-token-classic and https://docs.github.com/en/packages/learn-github-packages/about-permissions-for-github-packages#about-scopes-and-permissions-for-package-registries

Update the workflow names

The Zooniverse workflows from the KSO project are not very informative. We should rename them with the name of the project at the beginning (e.g. SGU_mussel_detection)

ocean-data-factory-sweden / kso Goto Github PK

kso's Introduction

KSO System

KSO overview

Notebooks

Local Installation

Docker Installation

Requirements

Pull KSO Docker image

Conda Installation

Requirements

Download this repository

Prepare your system

Set up the environment with Conda

Cloudina

Starting a new project

Developer instructions

Commit types

Citation

Collaborations/Questions

Troubleshooting

kso's People

Contributors

Stargazers

Watchers

Forkers

kso's Issues

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

Environment

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

b'a\xcc\x88'.decode('CP437')

b'a\xcc\x88'.decode('utf-8')

Recommend Projects

Recommend Topics

Recommend Org