arampacha / clip-rsicd Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Documentation Feedback from evaluation
Hi,
i was able to fine tune a CLIP model on a custom data set with the provided training script. The outcome was a "config.json" as well as a "flax_model.msgpack" file. Is there a way to import my customised model in the provided demo?
Thanks for your excellent work! This pretraining work is done on three RSICD and how can the code directly be used to generate captions on these RSICD? Could you please give me some instructions? thanks a lot again!
Hi,
For fine tuning I often remove the top lawyer, add a small model on top of my model, freeze the base model , train the top model and then unfreeze the base model and retrain the whole thing.
In that way, large gradients of an untrained model wont mess up the pre trained model. Is that what is done here ? How the fine tuning procedure works ? I cannot see any additional layer or freezing /un freezing ... It seems that the whole thing is trained from scratch ?
Demo Feedback from evaluation
Create baseline evaluation using pre-trained CLIP model for held out subset of RSICD dataset. Code should be usable for evaluating fine-tuned model.
Dataset
Hi, I am planning to build a (scalable) vector search engine using this model. I need to understand couple of things for it. Will you guys be helping me?
These are the current challenges I face
gdrive/MyDrive/RSICD/RSICD_images.rar
. Drive link in RSICD_optimal is prompting "request access"Construct a 30-way classification task using the RSICD test dataset. Labels of the dataset are provided by the class names in image file names. Use the baseline CLIP model, and our RSICD-CLIP models as image encoders and measure the classification accuracy (per class, micro, macro) via standard metrics (precision, recall, f1-score).
We want to measure the model's ability to generalize beyond the 30 classes it was trained with. Idea is to take an aerial image of a subject that is not covered by the 30 classes it trained on, and measure its performance against our CLIP-RSICD model, and compare against baseline CLIP model. Evaluation metric used can be similar to the one we used for our original evaluation, i.e. rank of synthetic caption containing the correct class, averaged across all test images.
FMoW may be a good source of aerial images that deal with classes outside the RSICD training set.
Thank you for sharing this great project.
I would lie to know what is the required pytorch and cuda version?
1."Could you please tell me where the weights of the fine-tuned model are saved? I couldn't find this location. Even in the output of the out-dir, it's very confusing. I am a beginner, so I am not quite sure which file is the one saved after the fine-tuning."
2."Has anyone encountered an error saying 'unable to find input and val data' when using the run_clip_training_tv.sh training file? I have tried many methods but couldn't resolve this issue, though I believe that my file paths are set correctly."
@arampacha I am getting this error while executing ImageTextDataset
Please help
This was prompted by the query "avocado armchair" against the project demo during the HF community evaluation. The top result returned was this one:
So obviously, the demo returned the best match it found out of the corpus of aerial images. But the image looks kind of like an avocado, which is probably due to the CLIP-RSICD model "remembering" parameters it learned as a CLIP model (i.e. before being fine tuned with RSICD).
Experiment suggested is to measure the performance of image-caption pairs from (say) MS-COCO against the baseline CLIP model and the CLIP-RSICD models and measure their performance using either the top-k metric or classification accuracy metric (described in issue-29).
Hi @arampacha
I would like to thank you for such an amazing repo that we can get benefit from..
I am just wondering if you can provide fine-tuning scipt/notebook of this project. thanks
I tried to run the demo using the instructions given in the markdown file in the demo-1
directory as instructed by @sujitpal .
Did not get the context of-
In addition, we need to generate CLIP vectors for the image corpus using demo-image-encoder.ipynb.
So, did not follow it.
I do the rest sequentially. This is what I get-
SSH: Attempting to connect to worker 0...
bind [127.0.0.1]:8501: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 8501
Could not request local forwarding.
What is causing this? How to run this demo?
Hey @arampacha!
Many thanks for such a great repo!
Would it be possible to make the models available again for public use on HuggingFace?
Thank you in advance!
Hello,
I would like to encode an image and a text separately into a vector using clip-rscid-v2 and save them into a nearest neighbor library. What will be the best approach to encode an image? I would like to replicate the model.encode_image(image) from the origin clip model. Also, process = CLIPProcessor.from_pretrained("flax-community/clip-rsicd-v2") does not work.
Thanks
Hi, I noticed that evaluation loss metric was too high compared to training loss in training notebook. Can you please share training logs for you best model(s) ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.