whoigit / ifcb_classifier Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 5.81 MB

Image classifier for the IFCB plankton dataset

License: MIT License

Python 98.10% Shell 1.90%

ifcb_classifier's People

Contributors

Watchers

Forkers

joefutrelle sams-uk san-soucie tsgolden

ifcb_classifier's Issues

Metadata integration

Allow model to include roi dimensions and other metadata into the classification process

Trailing newline characters on filter strings read from a text file prevent proper matching

Hi @sbatchelder and @joefutrelle. I've been working to integrate ifcb_classifier into our IFCB processing pipeline here at Axiom. Overall, it's been great to work with, but I ran into a bug when trying to pull in filter strings from a text file.

The issue I was seeing was that when I ran something like python neuston_net.py RUN ... --clobber --filter my-filters.txt, I would just immediately get RUN IS DONE as if none of my filter strings had matched any of the bin object filepaths, even though the same filters matched if I passed them directly to --filter.

I dug in and tracked the issue down to trailing newline characters resulting from this call to f.readlines(). Basically each newline-delimited filter string still had its newline character on the end of it (because readlines doesn't remove them), which prevented it from matching with any of the bin object filepaths. I changed that line to f.read().splitlines(), and that got file-based filtering working again for me.

I noticed a similar pattern being used in some places, but not in others, so I'm assuming the issue I ran into was just an oversight. I have a fix on the fork we're currently running here and would be happy to turn that into a pull request if you're open to contributions. If so, I can also fix the other occurences of this readlines pattern I ran across in the codebase.

json file from --type img option has repeat entries

Following the wiki documentation on use of this command format:
./neuston_net.py RUN run-data/YOUR_PNG_FOLDER training-output/PATH/TO/MODEL.ptl YOUR_RUN_ID --type img
produces json result files. The files appear to contain two copies of all the output data (e.g., there are twice as many records as expected and the first and second half of the files appear to be identical).

IfcbBinDataset does not stitch old-style bins

IfcbBinDataset uses a bin's images property to retrieve ROIs. Old-style bins (whose pids start "IFCB") require the use of InfilledImages to stitch and fill overlapping ROIs.

SLURM Workflow Automation

Submitting HPC SLURM jobs is not very streamlined, especially for non-dev end-users.
Implement a one-stop-shop solution for submitting training and classifying jobs on a slurm-enabled system.

Transfer Learning

Transfer Learning is the process of training a pre-existing model for new output targets without having to retrain the whole network.

In this project this could look similar to the regular TRAIN subcommand, but where MODEL points to a previously trained .ptl model file.

neuston_net.py TRANSFER <optional_args> SRC MODEL TRAINING_ID

Selectively Train on multiple datasets

Labeled data may come from any number of datasets. To improve training experiment throughput, implement a feature by which training datasets can be dynamically combined.

The feature should support the aggregation of classes and images from any number of on-disc datasets and allow the user to specify what classes from what dataset should be included.

Furthermore, to account for the --class-max flag behavior, this feature should be able to prioritize certain datasets over others when truncating per-class sample sizes.

whoigit / ifcb_classifier Goto Github PK

ifcb_classifier's People

Contributors

Watchers

Forkers

ifcb_classifier's Issues

Metadata integration

Trailing newline characters on filter strings read from a text file prevent proper matching

json file from --type img option has repeat entries

IfcbBinDataset does not stitch old-style bins

SLURM Workflow Automation

Transfer Learning

Selectively Train on multiple datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent