Coder Social home page Coder Social logo

ocean-data-factory-sweden / kso Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 12.0 14.86 MB

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.

License: GNU General Public License v3.0

Dockerfile 0.65% Python 99.35%
object-detection deep-learning marine-protected-areas citizen-science

kso's Introduction

KSO System

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.

Contributors Forks Stargazers Issues GPL License

KSO overview

The KSO system has been developed to:

  • move and process underwater footage and its associated data (e.g. location, date, sampling device).
  • make this data available to citizen scientists in Zooniverse to annotate the data.
  • train and evaluate machine learning models (customise Yolov5 or Yolov8 models).

koster_info_diag

The system is built around a series of easy-to-use Jupyter Notebooks. Each notebook allows users to perform a specific task of the system (e.g. upload footage to the citizen science platform or analyse the classified data).

Users can run these notebooks via Google Colab (by clicking on the Colab links in the table below), locally or on a high-performance computing (HPC) environment.

Notebooks

Our notebooks are modular and grouped into four main task categories; Set up, Classify, Analyse and Publish.

Task Notebook Description Try it!
Set up Check_metadata Check format and contents of footage and sites, media and species csv files Open In Colab binder
Classify Upload_subjects_to_Zooniverse Prepare original footage and upload short clips to Zooniverse, extract frames of interest from the original footage and upload them to Zooniverse Open In Colab binder
Classify Process_classifications Pull and process up-to-date classifications from Zooniverse Open In Colab binder
Analyse Train_models Prepare the training and test data, set model parameters and train models Open In Colab binder
Analyse Evaluate_models Use ecologically relevant metrics to test the models Open In Colab binder
Publish Publish_models Publish the model to a public repository Open In Colab binder
Publish Publish_observations Automatically classify new footage and export observations to GBIF Open In Colab binder

Local Installation

Docker Installation

Requirements

Pull KSO Docker image

Bash
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev

Conda Installation

Requirements

Download this repository

Clone this repository using

git clone https://github.com/ocean-data-factory-sweden/kso.git

Prepare your system

Depending on your system (Windows/Linux/MacOS), you might need to install some extra tools. If this is the case, you will get a message about what you need to install in the next steps. For example, Microsoft Build Tools C++ with a version higher than 14.0 is required for Windows systems.

Set up the environment with Conda

  1. Open the Anaconda Prompt
  2. Navigate to the folder where you have cloned the repository or unzipped the manually downloaded repository. Then go into the kso folder.
cd kso
  1. Create an Anaconda environment with Python 3.8. Remember to change the name env.
conda create -n <name env> python=3.8
  1. Enter the environment:
conda activate <name env>
  1. Specify your GPU details.

5a. Find out the pytorch installation you need. Navigate to the system options (example below) and select your device/platform details.

CUDA Requirements

5b. Add the recommended command to the KSO's gpu_requirements_user.txt file.

  1. Install all the requirements:
pip install -r requirements.txt -r gpu_requirements_user.txt

Cloudina

Cloudina is a hosted version of KSO (powered by JupyterHub) on NAISS Science Cloud. It allows users to scale and automate larger workflows using a powerful processing backend. This is currently an invitation-only service. To access the platform, please contact jurie.germishuys[at]combine.se.

The current portals are accessible as:

  1. Console (object storage) - storage
  2. Album (JupyterHub) - notebooks
  3. Vendor (MLFlow) - mlflow

Starting a new project

To start a new project you will need to:

  1. Create initial information for the database: Input the information about the underwater footage files, sites and species of interest. You can use a template of the csv files and move the directory to the "db_starter" folder.
  2. Link your footage to the database: You will need files of underwater footage to run this system. You can download some samples and move them to db_starter. You can also store your own files and specify their directory in the notebooks.

Please remember the format of the underwater media is standardised (typically .mp4 or .jpg) and the associated metadata captured in three CSV files (“movies”, “sites” and “species”) should follow the Darwin Core standards (DwC).

Developer instructions

If you would like to expand and improve the KSO capabilities, please follow the instructions above to set the project up on your local computer.

When you add any changes, please create your branch on top of the current 'dev' branch. Before submitting a Merge Request, please:

  • Run Black on the code you have edited
black filename 
  • Clean up your commit history on your branch, so that every commit represents a logical change. (so squash and edit commits so that it is understandable for others)
  • For the commit messages, we ask that you please follow the conventional commits guidelines (table below) to facilitate code sharing. Also, please describe the logic behind the commit in the body of the message.

    Commit types

Commit Type Title Description Emoji
feat Features A new feature
fix Bug Fixes A bug Fix 🐛
docs Documentation Documentation only changes 📚
style Styles Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc) 💎
refactor Code Refactoring A code change that neither fixes a bug nor adds a feature 📦
perf Performance Improvements A code change that improves performance 🚀
test Tests Adding missing tests or correcting existing tests 🚨
build Builds Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm) 🛠
ci Continuous Integrations Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs) ⚙️
chore Chores Other changes that don't modify src or test files ♻️
revert Reverts Reverts a previous commit 🗑
  • Rebase on top of dev. (never merge, only use rebase)
  • Submit a Pull Request and link at least 2 reviewers

Citation

If you use this code or its models in your research, please cite:

Anton V, Germishuys J, Bergström P, Lindegarth M, Obst M (2021) An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548

Collaborations/Questions

You can find out more about the project at https://subsim.se.

We are always excited to collaborate and help other marine scientists. Please feel free to contact us (matthias.obst(at)marine.gu.se) with your questions.

Troubleshooting

If you experience issues importing panoptes_client in Windows, it is a known issue with the libmagic package. Pmason's suggestions in the Talk board of Zooniverse can be useful for troubleshooting it.

kso's People

Contributors

dependabot[bot] avatar diewertje11 avatar jannesgg avatar pilarnavarro avatar victor-wildlife avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kso's Issues

Dockerfile includes extra python packages but unknown what they do

While re-creating the ci-pipeline to automatically test the notebooks, it was found that the master branch of kso points to commit a306499 of kso_utils, which is branch origin/feat/pyav-backand.
The dev branch of kso points to the dev of kso_utils, commit f2ac787. (I believe these commits were made to try to fix the problem of extracting frames from movies that Emil had just before the summer holidays)
The requirement file in kso_utils is different in both these commits. This creates the error as can be seen in the image below, when the notebook tests are run in a container based on the requirements in dev (commit f2ac787).

Since the tests did work when the container was build based on commit a306499 and the requirements there, these python packages are added to the dockerfile. (temporarily! we do not want these here, they should be or removed or put in the requirements.)
They are not added to the requirements in dev yet, since I do not know why these packages are added and what their function is, and if we want to use them in the end.

So this issue needs to be resolved by finding out what these packages do and if we actually use them. If we do, they should be added to the kso_utils requirements in dev and master/main. If we do not want them, we need to find out why this error from the image occurs and how we can solve it.

Image

Notebook 4+8: Generalize workflow and species selection

To select the species and aggregation factors is done in a different order in Notebook 4 and 8, and they always display all options that do not give any annotations. The idea is to make it the same for both notebooks, and filter the options first on if there are annotations available or not. This can be done on zoo_info_dict. I am working on this.

Google Colab package dependency error

While working on issue #191, colab gives the following error during the instalation of all the packages.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.11.0 which is incompatible. google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 1.4.0 which is incompatible.

This comes from the code where all requirements are stated in 1 line. The code to reproduce this error can be found in commit:
778aaadc18076834617aa53e2636db432723ce58

Since we currently only use google-colab to clear the output and this still works. And since the pip list shows our versions of the packages (8.11.0 and 1.4.0), we will ignore this error for now. But it is something to keep in mind if something does not work in the future on google colab.

Rename main repo and update all links

We should come up with a new name for the repo, now that we only have 1 repo left. Then we should update the name everywhere

  • update the name in the ReadMe (also in the links to all images)
  • update the name in the link of the repo / docker image (not koster-yolo4 anymore)
  • update the name in the links in the Dockerfile
  • update the name in the Jupyter notebooks
  • use ghcr instead of dockerhub for image builds

Dockerfile nvidia starting image

The dockerfile (the one that is for both repositories combined (data-management) and (object-detection)) currently first loads the nvidia cuda devel docker image to start with, in order to build the ffmpeg from scratch.

Then it starts over from a new image, copies the final installation of the ffmpeg and builds up the rest of the environment. In theory this should be possible to do with the runtime image, which has the advantage that it is smaller. However, when trying that, the dockerfile could not get through the builder test on github since it ran out of disk. This is the error that occurred:

Image

This error is resolved by using the devel image for a second time, instead of loading in this new image. However, now we end up with a larger image at the end (which is not a problem), but it is not the neatest solution. So this is something we can take a look at again in the future.

Improve table next to frame display

In the launch_viewer function when displaying the frames there is a table on the side but it's not very useful. It should have the name of the actual labels instead of the colors

Create requirements.txt file for 1 env (works on local, SNIC, Colab)

Currently there are multiple requirement files in the 3 different repos and also 2 extra in the yolov5 and yolov5_tracker that we all pip install on separate lines. This causes that pip cannot manage that everything is compatible with each other. Therefor all the requirements should be installed on 1 line instead.

On top of that, our 3 requirement files contradict each other. The goal is now to remove these contradictions and final minimum combination of packages that makes everything work.

This should work with the same requirement files on Colab, SNIC and locally.

Add a project-specific "compress_video" option

While standardising the format of the movies we should offer a project-specific option to compress the videos or not.

The "standarise_movie_format" function within the "movies.utils" uses a hard-coded option to compress the videos if it's not the Spyfish Aotearoa project.

Ideally, the "projects_list.csv" should have a column with the "compress_video" set to True or False

Combine dm and ml repos

  • Unification of Dockerfile for single image build
  • Unification of ReadME files for both repos
  • Transfer notebooks from dm repo to ml repo
  • Change the jupyter.sh files on SNIC, we now only need 1

Add a feature to extract occurrences for publishing them to GBIF/OBIS via ITP

Select the model of interest from Zenodo run it on the movies you want and publish observations to OBIS

We started the "format_to_gbif_occurence" function in tut#8. At the moment the function modify classifications from citizen scientists to the OBIS format but we will need to add the functionality to process ML and expert classifications.

The Integrated Publishing Toolkit and python package to read and parse Darwin might be useful.
Also, worth keeping an eye on the gbif python client. Currently, it seems to only be possible to download datasets but maybe there is an option to upload?

Using Wildlife.ai's GBIF account publish observations to OBIS.
Consider using the cameratrap data format and standards
https://tdwg.github.io/camtrap-dp/

Using Docker Playground and following the IPT installation guidelines you can temporarily run your own IPT to privately test the occurrence files

Update movie selection method from AWS in choose_footage

The "choose_footage" function in widget_utils to choose movies from AWS generates a temporary link for each of the movies available and then enables the user selects the movie of interest. To maximise our resources we should first select the movies of interest and then generate a temporary HTTP link for those movies only.

Tutorial 3 upload clips issue

🐛 Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

pp.upload_zoo_subjects("clip")

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 pp.upload_zoo_subjects("clip")

File /usr/src/app/kso/kso_utils/kso_utils/project.py:692, in ProjectProcessor.upload_zoo_subjects(self, subject_type)
    684 """
    685 This function uploads clips or frames to Zooniverse, depending on the subject_type argument
    686 
   (...)
    689 :type subject_type: str
    690 """
    691 if subject_type == "clip":
--> 692     upload_df, sitename, created_on = zoo_utils.set_zoo_clip_metadata(
    693         project=self.project,
    694         generated_clipsdf=self.generated_clips,
    695         sitesdf=self.local_sites_csv,
    696         moviesdf=self.local_movies_csv,
    697     )
    698     zoo_utils.upload_clips_to_zooniverse(
    699         project=self.project,
    700         upload_to_zoo=upload_df,
    701         sitename=sitename,
    702         created_on=created_on,
    703     )
    704     # Clean up subjects after upload

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1303, in set_zoo_clip_metadata(project, generated_clipsdf, sitesdf, moviesdf)
   1301 # Combine site info to the generated_clips df
   1302 if "site_id" in generated_clipsdf.columns:
-> 1303     upload_to_zoo = generated_clipsdf.merge(sitesdf, on="site_id")
   1304     sitename = upload_to_zoo["#siteName"].unique()[0]
   1305 else:

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:9329, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   9310 @Substitution("")
   9311 @Appender(_merge_doc, indents=2)
   9312 def merge(
   (...)
   9325     validate: str | None = None,
   9326 ) -> DataFrame:
   9327     from pandas.core.reshape.merge import merge
-> 9329     return merge(
   9330         self,
   9331         right,
   9332         how=how,
   9333         on=on,
   9334         left_on=left_on,
   9335         right_on=right_on,
   9336         left_index=left_index,
   9337         right_index=right_index,
   9338         sort=sort,
   9339         suffixes=suffixes,
   9340         copy=copy,
   9341         indicator=indicator,
   9342         validate=validate,
   9343     )

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:107, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     90 @Substitution("\nleft : DataFrame or named Series")
     91 @Appender(_merge_doc, indents=0)
     92 def merge(
   (...)
    105     validate: str | None = None,
    106 ) -> DataFrame:
--> 107     op = _MergeOperation(
    108         left,
    109         right,
    110         how=how,
    111         on=on,
    112         left_on=left_on,
    113         right_on=right_on,
    114         left_index=left_index,
    115         right_index=right_index,
    116         sort=sort,
    117         suffixes=suffixes,
    118         copy=copy,
    119         indicator=indicator,
    120         validate=validate,
    121     )
    122     return op.get_result()

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:704, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    696 (
    697     self.left_join_keys,
    698     self.right_join_keys,
    699     self.join_names,
    700 ) = self._get_merge_keys()
    702 # validate the merge keys dtypes. We may need to coerce
    703 # to avoid incompatible dtypes
--> 704 self._maybe_coerce_merge_keys()
    706 # If argument passed to validate,
    707 # check if columns specified as unique
    708 # are in fact unique.
    709 if validate is not None:

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:1257, in _MergeOperation._maybe_coerce_merge_keys(self)
   1251     # unless we are merging non-string-like with string-like
   1252     elif (
   1253         inferred_left in string_types and inferred_right not in string_types
   1254     ) or (
   1255         inferred_right in string_types and inferred_left not in string_types
   1256     ):
-> 1257         raise ValueError(msg)
   1259 # datetimelikes must match exactly
   1260 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Expected behavior

Uploading of created clips, error might be due to large file size?

Environment

image
image
image
image

Set up Tutorial 9 (Run ML on new footage)

Add workflow that runs the model over a selection of footage, and finally aggregates this by site and returns the maximum count for a given species within the given movies.

We should check that it works for the template project as well as for active projects (e.g. Spyfish)

Improve uploading of frames from third parties to Zooniverse

In tutorial #4, we should improve the current approach to uploading your own frames (e.g. not retrieving them from clips classified by Zooniverse volunteers).
This will help other projects that have classified their own videos or collected underwater images.

Re-organise releases of stable versions

Currently (after PR #235) we are using the master branch as the 'stable' version. That is also why currently there is a new docker image created, every time we push to master. (The docker file contains the code that the users are using on SNIC. it does not only contain the requirements, but also the actual code) Because of the use of master as a stable version, we should not push changes to often to this branch.

However, we can in the future maybe move towards having new stable releases with tags. In that case, we only need to build a new docker image when we have such a new release. (It is desired to not build and push a new docker image too often, since it is quite big). And then we can more freely push new changes to master.

We should discuss what we think to be a good way forward.

For development, we can use the dev docker-image on SNIC and our own mounted clone of the code.

Notebook 5: locale.getpreferredencoding() gets changed during the training. Causing the notebook to not be able to train again or run the evaluation part.

When you run Notebook 5, and request the preferred encoding at the beginning, or just before the cell where you do train.run(...), you get 'UTF-8'. (using code below)

import locale
locale.getpreferredencoding()

However, when you run the same thing after the cell in which you train, it returns 'ANSI_X3.4-1968'. (which is ASCII). So somewhere during this training that is performed by the YOLO5 code, this default gets changed. This causes an error with reading the names in the train.txt or valid.txt file when you train again or do the validation. (since these files contain Swedish letters, in the case of the template project)

Exception: train: Error loading data from /content/koster_yolov4/tutorials/ml-template-data/train.txt: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

This comes from line 470 in /content/koster_yolov4/yolov5/utils/dataloaders.py where the text file is opened with open(). This open() function uses the default encoding, and ASCII cannot read the ä.

We have not located exactly how this change in locale is made. We could not find anything in the code from YOLO5, when we search with git grep for ANSI, locale, encoding, ASCII, coding. only in the file utils/mws/mime.sh they do something with ASCII, but we do not think this file gets used.

Solutions would be to or prevent this change if we can locate where it is made. Or by every time setting it back to the correct default. However, we have not found a command yet that can set it back. We have tried the following:

  • locale.setlocale(locale.LC_ALL, '') (returns en_US.UTF-8')
  • sys.getfilesystemencoding() (returns utf-8)
  • locale.getlocale() (returns ('en_US', 'UTF-8') )
  • result = _locale.nl_langinfo(_locale.CODESET) (result contains 'ANSI_X3.4-1968'
  • _locale.CODESET (returns 14)

So it seems like there are 2 different encoding settings. One system wide one, that stays at UTF8 and is not changed, and one locale that gets changed. However, trying to change this back gives an error:

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

The ways we have tried to set it back:

  • !chcp 65001
  • !vim /etc/default/locale
  • !echo $PYTHONIOENCODING
  • locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')

The code below seems to set it back, but it does not solve the issue when training/validating, so it just sets it to a string or something.

import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

locale.getpreferredencoding()

To have the template project working for the workshop on 02-03-2023, we simply change the names of the files so that they do not contain any ä or other Swedish letters.

Notebook 5+6: Error while importing panoptes_client

When you try to run notebook 5 in Colab with a clean runtime, you get an error in the 2nd cell.

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/dist-packages/urllib3-1.24.3.dist-info/METADATA'

This comes from: import kso_utils.tutorials_utils as t_utils
In which it tries to import: from panoptes_client import Project

If you look in the google colab files, it does indeed not contain this file. However, it does contain a different version:
/usr/local/lib/python3.8/dist-packages/urllib3-1.26.14.dist-info

The strange thing is, that when you run the command again: import kso_utils.tutorials_utils as t_utils
Then there is no problem and it just runs.
I do not understand why this is the case and I cannot find it online. Does anyone know this??

I do have 2 work arounds to prevent the error from occurring:

  1. Install from the requirements.txt file from the data-management repository. Then the error is not occurring, so probably the packages in that requirements list are better in agreement with each other. (This also brings up the question for me: Why are there 2 different repositories and 2 different requirements files? Or is this just due to historical reasons?)
  2. Import the panoptes_client in the tutorials_utils with a try except, since it can import it the second time

try:
from panoptes_client import Project
except:
from panoptes_client import Project

Issues with panoptes import

I have successfully installed the KSO software from scratch following the guidelines in the readme. However, the notebooks get stack importing the projectprocessor. I have dug into it and found the problem is when trying to import panoptes_client.
I know I have had this problem before and fixed it manually uninstalling and reinstalling different packages but I can't find the solution in the readme.

Image

Enable users to download raw and aggregated Zooniverse classifications in tut#8

Researchers from Spyfish Aotearoa would like to download the processed (i.e. json unnested) classifications from Zooniverse to analyse them in R.
This will be possible in tutorial https://github.com/ocean-data-factory-sweden/kso-data-management/pull/9, which is in a draft state at the moment.
Sections to be developed are in the checklist below

Widget to select date range of the classifications of interest
Process the classifications (i.e. unnest the json classifications) to have a label/species per row
Widget to select the columns users want to download

Tutorial 3 issue with producing sample clips

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

Seems to be something that has not been defined.

To Reproduce (REQUIRED)

pp.generate_zoo_clips(
movie_name=pp.movie_selected,
movie_path=pp.movie_path,
is_example=True,
use_gpu=gpu_available.result,
)
Output:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/ipywidgets/widgets/interaction.py:257, in interactive.update(self, *args)
    255     value = widget.get_interact_value()
    256     self.kwargs[widget._kwarg] = value
--> 257 self.result = self.f(**self.kwargs)
    258 show_inline_matplotlib_plots()
    259 if self.auto_display and self.result is not None:

File /usr/src/app/kso/kso_utils/kso_utils/widgets.py:1182, in n_random_clips(clip_length, n_clips)
   1179 def n_random_clips(clip_length, n_clips):
   1180     # Create a list of starting points for n number of clips
   1181     # movie_df is currently missing here
-> 1182     duration_movie = int(movie_df["duration"].values[0])
   1183     starting_clips = random.sample(range(0, duration_movie, clip_length), n_clips)
   1185     # Seave the outputs in a dictionary

NameError: name 'movie_df' is not defined

Number of modifications:
0

image

Expected behavior

the expected behaviour is to be able to extract sample clips from the video

Fix testing GA workflow integration

The first version of testing has been implemented but needs to be improved by testing in the same environment and better managing the images that are created.

Add support for YOLOv8

Description:

YOLO has been upgraded and we risk deprecations causing issues down the line if we do not keep up. Yolov8 is now available for both the Ultralytics version of YOLO and the tracker submodule.

Notebook 5 / training models gives error: run() got unexpected keyword argument batch_size

This issue is already solved and this is just a description of the error for documentation, since the error is not very clear and it was hard to find its cause.

Description of the error
When training a model in Notebook 5, with

mlp.train_yolov5(
exp_name.value,
weights.artifact_path,
epochs=epochs.value,
batch_size=batch_size.value,
img_size=(img_h.value, img_w.value),
)

which indirectly calls

yolov5.train()

you get the error from the image.

Image

This error complains about the batch_size argument, while the code that is printed shows that the batch_size argument exist. And also if you go into the val.py file from the yolov5 repository, this batch_size argument exists.

Trace back of the origin of the error and solution
This error occurs since the commit where the project.py was created in commit 7c0d287.
The error only occurs when yolov5.train(epochs=1) (or our code) is run after that the MLProjectProcessor class is created. Without this class, the code runs properly.
In this class, t6_utils gets imported, which on its turn does: import yolov5_tracker.track as track.
In that code, some paths are appended to the sys.path. As a result of this, when the validation.run() is called in the tolov5.train(), it does not run the val.py from yolov5, but it runs the val.py from the tracker.

This val.py from the tracker is never imported in the entire repository, so it is just not used. Therefore the solution to this problem is to delete the val.py in the tracker repository. This makes it possible to run Notebook 5 / train models without errors.

Remaining issue
One remaining issue is that you cannot train 2 models in the same notebook session. For now this will be mentioned in the notebook and people are instructed to simply restart the notebook.

Notebook 5 cannot read in the files from the ml-template-project correctly on a Windows computer

When you run Notebook 5 in Google Colab on a windows computer, the file names of the images contain Sa╠êcken instead of säcken. This is because windows decodes the file names during the unzipping with CP437 instead of utf-8, which Linux does automatically. You can see that difference with the code below.

b'a\xcc\x88'.decode('CP437')

b'a\xcc\x88'.decode('utf-8')

This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.

Notebook 5 cannot read in the files from the ml-template-project correctly on a windows computer (or in google colab)

When you run Notebook 5 in Google Colab on a windows computer, the file names of the images contain Sa╠êcken instead of säcken. This is because windows decodes the file names during the unzipping with CP437 instead of utf-8, which Linux does automatically. You can see that difference with the code below.

b'a\xcc\x88'.decode('CP437')
b'a\xcc\x88'.decode('utf-8')
This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.

Developed a point-based segmentation approach

Integrate a ML approach to segment areas of interest from the SGU point-based photo labels. Maybe using Taglab or CVAT?

Relevant literature:
General segmentation:
https://arxiv.org/pdf/2003.06148.pdf
https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-segment-anything-with-sam.ipynb

Segment Anything Meets Point Tracking:
Repo on a joint model of point tracking and Segmentation

Ocean-related:
On Improving the Training of Models for the Semantic Segmentation of Benthic Communities from Orthographic Imagery
Automatic Semantic Segmentation of Benthic Habitats Using Images from Towed Underwater Camera in a Complex Shallow Water Environment

@jannesgg @biomobst from our last conversation with SGU it's clear to me we should have different ML model approaches and human-in-the-loop annotation labelling (whether CVAT, Taglab,...)

Improve CI pipeline build docker and test

As described in the PR #235 of the docker file and the CI pipeline, the following logic is applied now:

  • Run the pipeline in "dev", "master", and PRs to "dev", and "master"
  • If any of the files we related to the container has changed, then rebuild it.
  • On a PR, this new image gets the tag of the current branch, in a push it gets the tag of what we push to (dev or master)
  • To fetch the correct image for the tests: If we're in PR, and the files changed, or we're in dev and master, then fetch current branch. Otherwise, the PR target
  • Run the tests unconditionally

This results in that the dev or master docker image only gets updated on a push, independent on if the tests pass/fail during that push. (That they will pas should be checked first in a PR).

So this is how it is now. Things that can be optimized are:

Update the workflow names

The Zooniverse workflows from the KSO project are not very informative. We should rename them with the name of the project at the beginning (e.g. SGU_mussel_detection)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.