The clip-rsicd from arampacha

Update README based on First evaluation feedback

Documentation Feedback from evaluation

The model card is still a bit sparse on details so it would have been nice to include more details about the dataset and the application of such model in this domain.
Social impact: CLIP has proved to be effective in a lot of different tasks so it would be interesting to see how such models can be used in a given demo and what applications it could enable.
It would be nice to include more details about the dataset and about how these different applications can be useful.

Find Eval Satellite data and figure out loading code.

Load fine tuned model with customised data set in demo

Hi,

i was able to fine tune a CLIP model on a custom data set with the provided training script. The outcome was a "config.json" as well as a "flax_model.msgpack" file. Is there a way to import my customised model in the provided demo?

Remote sensing image captioning

Thanks for your excellent work! This pretraining work is done on three RSICD and how can the code directly be used to generate captions on these RSICD? Could you please give me some instructions? thanks a lot again!

fine tuning procedure.

Hi,

For fine tuning I often remove the top lawyer, add a small model on top of my model, freeze the base model , train the top model and then unfreeze the base model and retrain the whole thing.

In that way, large gradients of an untrained model wont mess up the pre trained model. Is that what is done here ? How the fine tuning procedure works ? I cannot see any additional layer or freezing /un freezing ... It seems that the whole thing is trained from scratch ?

Update Demo based on evaluation feedback

Demo Feedback from evaluation

For the image-to-image and feature in image-task it would have been nice to let the users upload the images.
It would be nice to let the user upload images in the demo instead of using fixed images.

Text data augmentations

- backtranslations for diverse captions

Evaluation

Create baseline evaluation using pre-trained CLIP model for held out subset of RSICD dataset. Code should be usable for evaluating fine-tuned model.

Data

Upload RSICD to TPU-vm storage
repack data into Dataset
check other datasets
prepare dataloaders

Scaling the search capability -PoC with Elastic search

Hi, I am planning to build a (scalable) vector search engine using this model. I need to understand couple of things for it. Will you guys be helping me?

These are the current challenges I face

Not able to find downloadable gdrive/MyDrive/RSICD/RSICD_images.rar. Drive link in RSICD_optimal is prompting "request access"

Classification task evaluation

Construct a 30-way classification task using the RSICD test dataset. Labels of the dataset are provided by the class names in image file names. Use the baseline CLIP model, and our RSICD-CLIP models as image encoders and measure the classification accuracy (per class, micro, macro) via standard metrics (precision, recall, f1-score).

Measure generalization capabilities of CLIP-RSICD model

We want to measure the model's ability to generalize beyond the 30 classes it was trained with. Idea is to take an aerial image of a subject that is not covered by the 30 classes it trained on, and measure its performance against our CLIP-RSICD model, and compare against baseline CLIP model. Evaluation metric used can be similar to the one we used for our original evaluation, i.e. rank of synthetic caption containing the correct class, averaged across all test images.

FMoW may be a good source of aerial images that deal with classes outside the RSICD training set.

PyTorch version

Thank you for sharing this great project.

I would lie to know what is the required pytorch and cuda version?

fine-tuned model

1."Could you please tell me where the weights of the fine-tuned model are saved? I couldn't find this location. Even in the output of the out-dir, it's very confusing. I am a beginner, so I am not quite sure which file is the one saved after the fine-tuning."
2."Has anyone encountered an error saying 'unable to find input and val data' when using the run_clip_training_tv.sh training file? I have tried many methods but couldn't resolve this issue, though I believe that my file paths are set correctly."

Getting erro while Using this script to fine tune(Urgent)

@arampacha I am getting this error while executing ImageTextDataset

Please help

Measure model "forgetful-ness"

This was prompted by the query "avocado armchair" against the project demo during the HF community evaluation. The top result returned was this one:

So obviously, the demo returned the best match it found out of the corpus of aerial images. But the image looks kind of like an avocado, which is probably due to the CLIP-RSICD model "remembering" parameters it learned as a CLIP model (i.e. before being fine tuned with RSICD).

Experiment suggested is to measure the performance of image-caption pairs from (say) MS-COCO against the baseline CLIP model and the CLIP-RSICD models and measure their performance using either the top-k metric or classification accuracy metric (described in issue-29).

Streamlit Demo: Text-to-image and Image-to-image retrieval

Finetuning Script

Hi @arampacha
I would like to thank you for such an amazing repo that we can get benefit from..

I am just wondering if you can provide fine-tuning scipt/notebook of this project. thanks

Training

Loss curve anomalies

Thank you for sharing your work. May I ask why the loss of the validation dataset has been rising? this doesn't look normal.

Cannot run streamlit demo locally

I tried to run the demo using the instructions given in the markdown file in the demo-1 directory as instructed by @sujitpal .

Did not get the context of-

In addition, we need to generate CLIP vectors for the image corpus using demo-image-encoder.ipynb.

So, did not follow it.

I do the rest sequentially. This is what I get-

SSH: Attempting to connect to worker 0...
bind [127.0.0.1]:8501: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 8501
Could not request local forwarding.

What is causing this? How to run this demo?

Model currently unavailable on HuggingFace

Hey @arampacha!

Many thanks for such a great repo!

Would it be possible to make the models available again for public use on HuggingFace?

Thank you in advance!

Image encoding with clip-rscidv2 and clip-rscidv2 transformer doesn't work

Hello,

I would like to encode an image and a text separately into a vector using clip-rscid-v2 and save them into a nearest neighbor library. What will be the best approach to encode an image? I would like to replicate the model.encode_image(image) from the origin clip model. Also, process = CLIPProcessor.from_pretrained("flax-community/clip-rsicd-v2") does not work.

Thanks

Image data augmentations

- select augmentations set
- make aug pipeline

Training logs

Hi, I noticed that evaluation loss metric was too high compared to training loss in training notebook. Can you please share training logs for you best model(s) ?

arampacha / clip-rsicd Goto Github PK

clip-rsicd's People

Contributors

Stargazers

Watchers

Forkers

clip-rsicd's Issues

Recommend Projects

Recommend Topics

Recommend Org