wisdomikezogwo / quilt1m Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Home Page: https://quilt1m.github.io/
License: MIT License
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Home Page: https://quilt1m.github.io/
License: MIT License
Hello, thank you for your great work!
I see that in this repository, there are comparisons in terms of visualization. Could you provide the visualization tutorials or scipt on this? Thank you!
Hello,
Firstly, thank you for this. Amazing work!
Hello, I'm a PhD student and I have applied to get access to your dataset. I haven't received any reply yet, could you please give me access.
Best regards,
Markus Ekvall
Hey, thank you so much for making such an awesome resource available publicly. I was wondering where can I access Quilt-Instruct dataset that you discussed in your Quilt-Llava paper?
Hello!
After downloading the videos I get an error running
python -m main --base_dir ${BASE_DIR}
the stack trace is as follows;
Traceback (most recent call last):
File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/groups/jamesz/fede/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 166, in <module>
main(args, data_df, recon_df, device, histo_models_dict, video_paths_dict)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/main.py", line 68, in main
rep_chunk_im_temp = save_frame_chunks_recon(video_path, stable_times, chunk_id,fps, height, width)
File "/oak/stanford/groups/jamesz/pathtweets/quilt/quilt1m/data/data_utils.py", line 108, in save_frame_chunks_recon
clip_start_time, clip_end_time = start_end_time
TypeError: cannot unpack non-iterable int object
Some additional variables that can help in understanding what's happening
>>> stable_se_times
(2, 17)
>>> start_end_time
2
basically, the assignment coming from start_end_time
generates an error due to the line
clip_start_time, clip_end_time = start_end_time
Any clue on where this might come from?
Thanks!
Hi there, thanks for your excellent curation of dataset. I submitted a request through zenodo to your rescaled dataset some days ago and yet I didn't get approved. My request id was c83c8db6-a46a-4d52-90e9-02ee7a254ef5. Please check if it's convinient for you, thanks again
Hi!
Amazing work! But I do not find the code for training CLIP model. I have a question, how to use the text for training CLIP? I have no idea to confirm from the file quilt_1M_lookup.csv
. Thank you~
Hi again :)
do you have any code you can share for downloading the videos?
Thank you so much! I appreciate your help on this!
Dear Author,
The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.
I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.
In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.
The retrieval performance on the pubmed_set was as follows:
{15.7; 79.8; 94.4; 16.7; 78.9; 93.7}
Meanwhile, the retrieval performance on the books_set was:
{7.3; 49.2; 74.2; 8.2; 49.7; 73.2}
In contrast, the performance of QUILT-GPT/77 showed different results:
The retrieval performance on the pubmed_set was:
{1.8; 23.6; 46.0; 1.6; 23.4; 45.7}
The retrieval performance on the books_set was:
{1.8; 27.7; 52.8; 1.5; 23.4; 46.4}
From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.
Hello!
Thanks for the great resource!
I have been trying to run the data reconstruction but I kind of stumbled upon a couple of different errors (some are missing imports - e.g., nn from torch - one was a parenthesis that was not closed). There are also a couple of missing requirements (e.g., scikit-image) in the requirements file.
Would you mind taking a look? I have solved some of these and I am happy to send a PR in case but maybe you have an updated version of the code that runs out of the box.
Hello, thanks for sharing your work.
I am currently working with your project and I have a question regarding a specific line of code in the data/main.py
file. In the file, I noticed the following line:"stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id][0]". I wanted to confirm if this line should actually be:''stable_times = list_idle_im_se_t_tup[chunk_id][chunk_id]''Could you provide some clarification on this?
Additionally, I'm curious about how the list_idle_im_se_t_tup variable is generated and how it ensures that its length matches the length of chunks. Could you please point me to that section of the code or provide some insights on how this synchronization is achieved?
I appreciate your time and assistance. Thank you in advance for your help!
Best regards
Thank you for your great work! I was trying to download the rescaled Quilt-1M from Zenodo but have not attained access. Could you please help in checking my request? The e-mail address should be [email protected]
Can you provide code to extract keyframes from a video? Thank you.
Hi, thank you for sharing your work.
Can you provide me some more details to load ViT-B/32 model?
First thansk for your impressive work on meidcal VLP coummunity!
From your paper, there are many downstream tasks in the benchmark to evlaute the VLP model, could you provide the pipeline or script to prepare the downstream dataset and evaluation?
Best Regards
Dear authors,
Thanks for your great work. The maximum text context length for the CLIP text encoder is 77. However, the token length of several captions in quilt-1m is larger than 77. How can we utilize the CLIP text encoder to extract the caption features?
Hi, thank you very much for this great work on image-text contrastive training for histopathology and also publishing a valuable dataset.
I used the provided pre-trained QuiltNet along with given tokenizer to reproduce zero-shot classification results on NCT-CRC-HE-100k dataset. Used following commands,
import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
tokenizer = open_clip.get_tokenizer('hf-hub:wisdomik/QuiltNet-B-32')
I also used the class names and templates as given in the paper as follows,
nct_classnames = ["Adipose", "Debris", "Lymphocytes", "Mucus", "Smooth muscle", "Normal colon mucosa", "Cancer-associated stroma", "Colorectal adenocarcinoma epithelium"]
nct_template = [
lambda c: f'a histopathology slide showing {c}.',
lambda c: f'histopathology image of {c}.',
lambda c: f'pathology tissue showing {c}.',
lambda c: f'presence of {c} tissue on image.',
]
But I get a top1 accuracy lower than what's reported in the paper (59.56%), I get
zero shot metrics {'nct-zeroshot-val-top1': 0.28518236912136324, 'nct-zeroshot-val-top5': 0.7248697363418835}
I also tried training my own QuiltNet using Open_clip codebase from OpenAI, and the results were,
zero shot metrics {'nct-zeroshot-val-top1': 0.30728805599660086, 'nct-zeroshot-val-top5': 0.6808149026097458}
Could you kindly help me understand why I am. unable to reproduce the given numbers? I need to understand what I might be doing wrong.
Thank you.
Hi,
Thank you for creating this wonderful repository.
I've recived acess to the dataset through Zenodo and downloaded all files. There seems to be missign images. Out of the 10 packed .zip there are only 650K~ images (out of 1M).
Is this an issue or am I missing something?
Thank you again
Hi, thank you for your work.
I am adapt your model to my dataset,
_, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:wisdomik/QuiltNet-B-32')
Is it correct ?
Hello,
In your data.csv file. The noisy text and corrected text are identical.
I wonder, by any chance, if you will release the noisy text without cleaning it.
I really appreciate any help you can provide.
Hi,
I am trying to recreate the QUILT dataset. I have a doubt regarding some of the columns in the CSV files that you have shared in the repo. Can you please highlight how you obtained the "stable_times" column in quilt_recon.csv?
Also, Were the images in the "image_path" column of quilt_data.csv extracted using the Static Video Chunk Detection Algorithm? Can you please elaborate on the generation of the quilt_data.csv file?
Thank you
Hello, thanks for sharing your work.
I want to fine-tune the QuiltNet-B-32 model to suit my downstream tasks. Can you provide a fine-tuning script? Or give an example of using QuiltNet-B-32?
Can you please guide, How I can use quilt1 for image to text generation. Like I input an image, and it generates the text. Do I need to use LLaVA and BLIP like modes where I assign the weights of the quilt1m and use it for text description generation. As the API mentioned at the hugging face is only for zero short classification. and I could not find the Text retrieval code in GitHub repo. Moreover, I also tried blip, but got compatibility issues. Thanks.
Thank you for your great work! I was trying to download the rescaled Quilt-1M from Zenodo but have not attained access. Could you please help in checking my request? The e-mail address should be [email protected]
Dear authors,
thanks so much for providing this resource! It seems to me that the following 4 files have no metadata (in quilt_1M_lookup.csv). Is this possible?
_b_M_sOb4ZI_image_0760643c-923b-4f1e-a5e4-8b2f9b3f2849.jpg
uytytgxGP2Y_image_1c51efef-1301-4f83-ad35-bbf92fb6f90a.jpg
7M7Ol5StU7U_image_b61a7317-b9b7-4d66-9158-828ba75bfb27.jpg
7M7Ol5StU7U_image_84954e04-5f71-46cd-aa20-8595596e4649.jpg
If the error is on my side I apologize for it, but using the data in my dataloader it complained that there were files without metadata, so I thought I'd give you the feedback.
Best,
Marc
First of all, thank you for providing good paper and dataset.
I am wondering whether your team have a plan to provide quilt-1m dataset including additional dataset from twitter, PMC, etc.
Thank you!
Hi,
The error is occurred from below command:
from transformers import CLIPModel
model = CLIPModel.from_pretrained("wisdomik/QuiltNet-B-16", use_auth_token=None)
.
The error msg is:
RuntimeError: Error(s) in loading state_dict for CLIPModel:
size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([768, 3, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([197, 768]) from checkpoint, the shape in current model is torch.Size([50, 768]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
Is there any issues with that? or my dev environment has something wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.