whoigit / ifcb_classifier Goto Github PK
View Code? Open in Web Editor NEWImage classifier for the IFCB plankton dataset
License: MIT License
Image classifier for the IFCB plankton dataset
License: MIT License
Allow model to include roi dimensions and other metadata into the classification process
Hi @sbatchelder and @joefutrelle. I've been working to integrate ifcb_classifier
into our IFCB processing pipeline here at Axiom. Overall, it's been great to work with, but I ran into a bug when trying to pull in filter strings from a text file.
The issue I was seeing was that when I ran something like python neuston_net.py RUN ... --clobber --filter my-filters.txt
, I would just immediately get RUN IS DONE
as if none of my filter strings had matched any of the bin object filepaths, even though the same filters matched if I passed them directly to --filter
.
I dug in and tracked the issue down to trailing newline characters resulting from this call to f.readlines()
. Basically each newline-delimited filter string still had its newline character on the end of it (because readlines
doesn't remove them), which prevented it from matching with any of the bin object filepaths. I changed that line to f.read().splitlines()
, and that got file-based filtering working again for me.
I noticed a similar pattern being used in some places, but not in others, so I'm assuming the issue I ran into was just an oversight. I have a fix on the fork we're currently running here and would be happy to turn that into a pull request if you're open to contributions. If so, I can also fix the other occurences of this readlines
pattern I ran across in the codebase.
Following the wiki documentation on use of this command format:
./neuston_net.py RUN run-data/YOUR_PNG_FOLDER training-output/PATH/TO/MODEL.ptl YOUR_RUN_ID --type img
produces json result files. The files appear to contain two copies of all the output data (e.g., there are twice as many records as expected and the first and second half of the files appear to be identical).
IfcbBinDataset
uses a bin's images
property to retrieve ROIs. Old-style bins (whose pids start "IFCB") require the use of InfilledImages
to stitch and fill overlapping ROIs.
Submitting HPC SLURM jobs is not very streamlined, especially for non-dev end-users.
Implement a one-stop-shop solution for submitting training and classifying jobs on a slurm-enabled system.
Transfer Learning is the process of training a pre-existing model for new output targets without having to retrain the whole network.
In this project this could look similar to the regular TRAIN
subcommand, but where MODEL
points to a previously trained .ptl model file.
neuston_net.py TRANSFER <optional_args> SRC MODEL TRAINING_ID
Labeled data may come from any number of datasets. To improve training experiment throughput, implement a feature by which training datasets can be dynamically combined.
The feature should support the aggregation of classes and images from any number of on-disc datasets and allow the user to specify what classes from what dataset should be included.
Furthermore, to account for the --class-max flag behavior, this feature should be able to prioritize certain datasets over others when truncating per-class sample sizes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.