wisdomikezogwo / quilt1m Goto Github PK

View Code? Open in Web Editor NEW

128.0 5.0 8.0 1.11 MB

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

Home Page: https://quilt1m.github.io/

License: MIT License

Python 100.00%

clip-model histopathology medical-dataset multimodal-datasets vlm

quilt1m's Issues

Visualizations of the results

Hello, thank you for your great work!

I see that in this repository, there are comparisons in terms of visualization. Could you provide the visualization tutorials or scipt on this? Thank you!

Access to the data

Hello,

Firstly, thank you for this. Amazing work!

Hello, I'm a PhD student and I have applied to get access to your dataset. I haven't received any reply yet, could you please give me access.

[email protected]

Best regards,
Markus Ekvall

How to access Quilt-Instruct?

Hey, thank you so much for making such an awesome resource available publicly. I was wondering where can I access Quilt-Instruct dataset that you discussed in your Quilt-Llava paper?

Error in data_utils.py

Hello!

After downloading the videos I get an error running

python -m main --base_dir ${BASE_DIR}

the stack trace is as follows;

Traceback (most recent call last):
  File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 166, in <module>
    main(args, data_df, recon_df, device, histo_models_dict, video_paths_dict)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 68, in main
    rep_chunk_im_temp = save_frame_chunks_recon(video_path, stable_times, chunk_id,fps, height, width)
  File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/data_utils.py", line 108, in save_frame_chunks_recon
    clip_start_time, clip_end_time = start_end_time
TypeError: cannot unpack non-iterable int object

Some additional variables that can help in understanding what's happening

>>> stable_se_times 
(2, 17)

>>> start_end_time
2

basically, the assignment coming from start_end_time generates an error due to the line

clip_start_time, clip_end_time = start_end_time

Any clue on where this might come from?

Thanks!

Hi there, thanks for your excellent curation of dataset. I submitted a request through zenodo to your rescaled dataset some days ago and yet I didn't get approved. My request id was c83c8db6-a46a-4d52-90e9-02ee7a254ef5. Please check if it's convinient for you, thanks again

Downloaded Dataset Size

Hi!

What is the expected size of the dataset once downloaded? After processing by calling main.py?
Also approximately how long would each step of the process take?

Training Text

Amazing work! But I do not find the code for training CLIP model. I have a question, how to use the text for training CLIP? I have no idea to confirm from the file quilt_1M_lookup.csv. Thank you~

Video and Frames Download

Hi again :)

do you have any code you can share for downloading the videos?

Thank you so much! I appreciate your help on this!

please do not use BiomedCLIP for ARCH dataset

Dear Author,

The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.

I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.

In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.

The retrieval performance on the pubmed_set was as follows:
{15.7; 79.8; 94.4; 16.7; 78.9; 93.7}

Meanwhile, the retrieval performance on the books_set was:
{7.3; 49.2; 74.2; 8.2; 49.7; 73.2}

In contrast, the performance of QUILT-GPT/77 showed different results:

The retrieval performance on the pubmed_set was:
{1.8; 23.6; 46.0; 1.6; 23.4; 45.7}

The retrieval performance on the books_set was:
{1.8; 27.7; 52.8; 1.5; 23.4; 46.4}

From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.

Missing Imports and Code Errors

Hello!

Thanks for the great resource!

I have been trying to run the data reconstruction but I kind of stumbled upon a couple of different errors (some are missing imports - e.g., nn from torch - one was a parenthesis that was not closed). There are also a couple of missing requirements (e.g., scikit-image) in the requirements file.

Would you mind taking a look? I have solved some of these and I am happy to send a PR in case but maybe you have an updated version of the code that runs out of the box.

Clarification on stable_times assignment in data/main.py

Hello, thanks for sharing your work.

I am currently working with your project and I have a question regarding a specific line of code in the data/main.py file. In the file, I noticed the following line:"stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id][0]". I wanted to confirm if this line should actually be:''stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id]''Could you provide some clarification on this?

Additionally, I'm curious about how the list_idle_im_se_t_tup variable is generated and how it ensures that its length matches the length of chunks. Could you please point me to that section of the code or provide some insights on how this synchronization is achieved?

I appreciate your time and assistance. Thank you in advance for your help!

Best regards

Zenodo dataset access

Thank you for your great work! I was trying to download the rescaled Quilt-1M from Zenodo but have not attained access. Could you please help in checking my request? The e-mail address should be [email protected]

Keyframe Extraction Code

Can you provide code to extract keyframes from a video? Thank you.

Load ViT-B/32

Hi, thank you for sharing your work.

Can you provide me some more details to load ViT-B/32 model?

Downstream tasks setting

First thansk for your impressive work on meidcal VLP coummunity!

From your paper, there are many downstream tasks in the benchmark to evlaute the VLP model, could you provide the pipeline or script to prepare the downstream dataset and evaluation?

Best Regards

Text Context Length

Dear authors,
Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?

Reproducing zero-shot classification results

Hi, thank you very much for this great work on image-text contrastive training for histopathology and also publishing a valuable dataset.

I used the provided pre-trained QuiltNet along with given tokenizer to reproduce zero-shot classification results on NCT-CRC-HE-100k dataset. Used following commands,

import open_clip

model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
tokenizer = open_clip.get_tokenizer('hf-hub:wisdomik/QuiltNet-B-32')

I also used the class names and templates as given in the paper as follows,



nct_classnames = ["Adipose", "Debris", "Lymphocytes", "Mucus", "Smooth muscle", "Normal colon mucosa", "Cancer-associated stroma", "Colorectal adenocarcinoma epithelium"]


nct_template = [
    lambda c: f'a histopathology slide showing {c}.',
    lambda c: f'histopathology image of {c}.',
    lambda c: f'pathology tissue showing {c}.',
    lambda c: f'presence of {c} tissue on image.',
]

But I get a top1 accuracy lower than what's reported in the paper (59.56%), I get

zero shot metrics {'nct-zeroshot-val-top1': 0.28518236912136324, 'nct-zeroshot-val-top5': 0.7248697363418835}

I also tried training my own QuiltNet using Open_clip codebase from OpenAI, and the results were,

zero shot metrics {'nct-zeroshot-val-top1': 0.30728805599660086, 'nct-zeroshot-val-top5': 0.6808149026097458}

Could you kindly help me understand why I am. unable to reproduce the given numbers? I need to understand what I might be doing wrong.

Thank you.

Missing Images

Hi,

Thank you for creating this wonderful repository.

I've recived acess to the dataset through Zenodo and downloaded all files. There seems to be missign images. Out of the 10 packed .zip there are only 650K~ images (out of 1M).

Is this an issue or am I missing something?

Thank you again

Which preprocess should I use for linear probing ?

Hi, thank you for your work.

I am adapt your model to my dataset,

using preprocess_train for linear probing (only use the vision encoder); preprocess_val for testing.

_, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
Is it correct ?

Should I skip a projection layer of the vision model (the one maps features from 768 to 512), replace it by 768 -> num_class ?

Noisy text

Hello,
In your data.csv file. The noisy text and corrected text are identical.
I wonder, by any chance, if you will release the noisy text without cleaning it.
I really appreciate any help you can provide.

Issues with CSV files

Hi,
I am trying to recreate the QUILT dataset. I have a doubt regarding some of the columns in the CSV files that you have shared in the repo. Can you please highlight how you obtained the "stable_times" column in quilt_recon.csv?

Also, Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm? Can you please elaborate on the generation of the quilt_data.csv file?

Thank you

How to fine-tune QuiltNet-B-32 model

Hello, thanks for sharing your work.

I want to fine-tune the QuiltNet-B-32 model to suit my downstream tasks. Can you provide a fine-tuning script? Or give an example of using QuiltNet-B-32?

Image-to-text generation

Can you please guide, How I can use quilt1 for image to text generation. Like I input an image, and it generates the text. Do I need to use LLaVA and BLIP like modes where I assign the weights of the quilt1m and use it for text description generation. As the API mentioned at the hugging face is only for zero short classification. and I could not find the Text retrieval code in GitHub repo. Moreover, I also tried blip, but got compatibility issues. Thanks.

How to access the corresponding texts for QUILT dataset? I got the access to images via ZENODO but could not find corresponding texts.

Zenodo dataset access

Superflous images?

Dear authors,

thanks so much for providing this resource! It seems to me that the following 4 files have no metadata (in quilt_1M_lookup.csv). Is this possible?

_b_M_sOb4ZI_image_0760643c-923b-4f1e-a5e4-8b2f9b3f2849.jpg
uytytgxGP2Y_image_1c51efef-1301-4f83-ad35-bbf92fb6f90a.jpg
7M7Ol5StU7U_image_b61a7317-b9b7-4d66-9158-828ba75bfb27.jpg
7M7Ol5StU7U_image_84954e04-5f71-46cd-aa20-8595596e4649.jpg

If the error is on my side I apologize for it, but using the data in my dataloader it complained that there were files without metadata, so I thought I'd give you the feedback.

Best,
Marc

Providing dataset

First of all, thank you for providing good paper and dataset.

I am wondering whether your team have a plan to provide quilt-1m dataset including additional dataset from twitter, PMC, etc.

Thank you!

Error on loading QuiltNet-B-16

Hi,
The error is occurred from below command:

from transformers import CLIPModel
model = CLIPModel.from_pretrained("wisdomik/QuiltNet-B-16", use_auth_token=None) .

The error msg is:

RuntimeError: Error(s) in loading state_dict for CLIPModel:
	size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([768, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
	size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([197, 768]) from checkpoint, the shape in current model is torch.Size([50, 768]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

Is there any issues with that? or my dev environment has something wrong?

wisdomikezogwo / quilt1m Goto Github PK

quilt1m's Issues

Recommend Projects

Recommend Topics

Recommend Org