Coder Social home page Coder Social logo

clip-rsicd's People

Contributors

arampacha avatar goutham794 avatar ritog avatar sujitpal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

clip-rsicd's Issues

Update README based on First evaluation feedback

Documentation Feedback from evaluation

  • The model card is still a bit sparse on details so it would have been nice to include more details about the dataset and the application of such model in this domain.
  • Social impact: CLIP has proved to be effective in a lot of different tasks so it would be interesting to see how such models can be used in a given demo and what applications it could enable.
  • It would be nice to include more details about the dataset and about how these different applications can be useful.

Load fine tuned model with customised data set in demo

Hi,

i was able to fine tune a CLIP model on a custom data set with the provided training script. The outcome was a "config.json" as well as a "flax_model.msgpack" file. Is there a way to import my customised model in the provided demo?

Remote sensing image captioning

Thanks for your excellent work! This pretraining work is done on three RSICD and how can the code directly be used to generate captions on these RSICD? Could you please give me some instructions? thanks a lot again!

fine tuning procedure.

Hi,

For fine tuning I often remove the top lawyer, add a small model on top of my model, freeze the base model , train the top model and then unfreeze the base model and retrain the whole thing.

In that way, large gradients of an untrained model wont mess up the pre trained model. Is that what is done here ? How the fine tuning procedure works ? I cannot see any additional layer or freezing /un freezing ... It seems that the whole thing is trained from scratch ?

Update Demo based on evaluation feedback

Demo Feedback from evaluation

  • For the image-to-image and feature in image-task it would have been nice to let the users upload the images.
  • It would be nice to let the user upload images in the demo instead of using fixed images.

**Evaluation**

Create baseline evaluation using pre-trained CLIP model for held out subset of RSICD dataset. Code should be usable for evaluating fine-tuned model.

**Data**

  • Upload RSICD to TPU-vm storage
  • repack data into Dataset
  • check other datasets
  • prepare dataloaders

Scaling the search capability -PoC with Elastic search

Hi, I am planning to build a (scalable) vector search engine using this model. I need to understand couple of things for it. Will you guys be helping me?

These are the current challenges I face

  • Not able to find downloadable gdrive/MyDrive/RSICD/RSICD_images.rar. Drive link in RSICD_optimal is prompting "request access"

Classification task evaluation

Construct a 30-way classification task using the RSICD test dataset. Labels of the dataset are provided by the class names in image file names. Use the baseline CLIP model, and our RSICD-CLIP models as image encoders and measure the classification accuracy (per class, micro, macro) via standard metrics (precision, recall, f1-score).

Measure generalization capabilities of CLIP-RSICD model

We want to measure the model's ability to generalize beyond the 30 classes it was trained with. Idea is to take an aerial image of a subject that is not covered by the 30 classes it trained on, and measure its performance against our CLIP-RSICD model, and compare against baseline CLIP model. Evaluation metric used can be similar to the one we used for our original evaluation, i.e. rank of synthetic caption containing the correct class, averaged across all test images.

FMoW may be a good source of aerial images that deal with classes outside the RSICD training set.

PyTorch version

Thank you for sharing this great project.

I would lie to know what is the required pytorch and cuda version?

fine-tuned model

1."Could you please tell me where the weights of the fine-tuned model are saved? I couldn't find this location. Even in the output of the out-dir, it's very confusing. I am a beginner, so I am not quite sure which file is the one saved after the fine-tuning."
2."Has anyone encountered an error saying 'unable to find input and val data' when using the run_clip_training_tv.sh training file? I have tried many methods but couldn't resolve this issue, though I believe that my file paths are set correctly."

Measure model "forgetful-ness"

This was prompted by the query "avocado armchair" against the project demo during the HF community evaluation. The top result returned was this one:

So obviously, the demo returned the best match it found out of the corpus of aerial images. But the image looks kind of like an avocado, which is probably due to the CLIP-RSICD model "remembering" parameters it learned as a CLIP model (i.e. before being fine tuned with RSICD).

Experiment suggested is to measure the performance of image-caption pairs from (say) MS-COCO against the baseline CLIP model and the CLIP-RSICD models and measure their performance using either the top-k metric or classification accuracy metric (described in issue-29).

Finetuning Script

Hi @arampacha
I would like to thank you for such an amazing repo that we can get benefit from..

I am just wondering if you can provide fine-tuning scipt/notebook of this project. thanks

**Training**

  • - make dataloaders
  • - prepare and run training script
  • - add wandb traking
  • - profile training pipeline
  • - tune hyperparameters

Loss curve anomalies

Thank you for sharing your work. May I ask why the loss of the validation dataset has been rising? this doesn't look normal.
image

Cannot run streamlit demo locally

I tried to run the demo using the instructions given in the markdown file in the demo-1 directory as instructed by @sujitpal .

Did not get the context of-

In addition, we need to generate CLIP vectors for the image corpus using demo-image-encoder.ipynb.

So, did not follow it.

I do the rest sequentially. This is what I get-

SSH: Attempting to connect to worker 0...
bind [127.0.0.1]:8501: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 8501
Could not request local forwarding.

What is causing this? How to run this demo?

Image encoding with clip-rscidv2 and clip-rscidv2 transformer doesn't work

Hello,

I would like to encode an image and a text separately into a vector using clip-rscid-v2 and save them into a nearest neighbor library. What will be the best approach to encode an image? I would like to replicate the model.encode_image(image) from the origin clip model. Also, process = CLIPProcessor.from_pretrained("flax-community/clip-rsicd-v2") does not work.

Thanks

Training logs

Hi, I noticed that evaluation loss metric was too high compared to training loss in training notebook. Can you please share training logs for you best model(s) ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.