Coder Social home page Coder Social logo

Comments (47)

yl-1993 avatar yl-1993 commented on May 18, 2024 1

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024 1

@SharharZ The pretrained model has already been shared through Baidu Yun. Checkout Setup and get data for more details.

from learn-to-cluster.

SharharZ avatar SharharZ commented on May 18, 2024

@yl-1993 Thanks for your reply! Whetheri use generate_proposal.py extract features and use dsgcn/main.py to cluster? Can you supported your pretrained model in Baidu Yun? How many images supported in code, maybe i have million images.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@SharharZ Yes, you can follow the pipeline in sctipts/pipeline.sh. As shown in our face clustering benchmark, it can handle at least 5M unlabeled face images.

from learn-to-cluster.

SharharZ avatar SharharZ commented on May 18, 2024

@yl-1993 Thank you! I'm sorry that maybe I didn't describe it clearly. I mean the pretrained model of hfsoftmax.I analysised the code and download your data. I am no sure how generate the .bin file and npz file for my face image data. In other words, i extract face features in 512 dimension, how to covert into your format file.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@SharharZ I think you can store your features with np.save. More details can be found in extract.py. Besides, I will upload the pretrained face recognition model to Baidu Yun soon.

from learn-to-cluster.

SharharZ avatar SharharZ commented on May 18, 2024

@yl-1993 thank you very much!

from learn-to-cluster.

jxyecn avatar jxyecn commented on May 18, 2024

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

@yl-1993 There are some different pre-trained models for extracting face features in the link you provided, which feature extracting pre-trained model matches for the clustering's pre-trained model?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@SharharZ Pretrained models for feature extraction has been uploaded to BaiduYun. You can find the link in the hfsoftmax wiki.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.

  • If you only want to try the clustering method, you can directly use the extracted features.
  • If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.

from learn-to-cluster.

jxyecn avatar jxyecn commented on May 18, 2024

@jxyecn For pretrained clustering model, we use ResNet-50 as feature extractor.

  • If you only want to try the clustering method, you can directly use the extracted features.
  • If you want to extract your own features and train the clustering model, you can choose any model as the feature extractor.

@yl-1993 感谢回复!不过我理解如果用的提face feature的模型不一致,聚类的模型应该需要重训吧?所以想确认下哪一个提feature模型是和放出来的聚类预训练模型是匹配的。

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@jxyecn 是的,所以上述回复中说,如果你想抽取自己的特征并训练你的聚类模型,可以选择任意的特征提取模型。另外,这个ResNet-50的模型参数和聚类预训练模型用到的略有不同,如果发现有较大影响,可以继续在这个issue下留言。

from learn-to-cluster.

engmubarak48 avatar engmubarak48 commented on May 18, 2024

@SharharZ Hi, you can (1) use pretrained face recognition models to extract face features. (2) use the clustering methods provided in this repo to group face features.

The question is how to use your main.py file. I wanted to provide extracted face features (face embeddings), but your config file seems to be taking the training related files. I suppose I should put the directory of embedding in the test path location (of this file "cfg_test_0.7_0.75.yaml"). but can't figure out how this it is gonna work since it is also taking training file path. can you explain this part a bit?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@engmubarak48 Thanks for pointing out. For testing part, it will read training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.

from learn-to-cluster.

engmubarak48 avatar engmubarak48 commented on May 18, 2024

@yl-1993 Thanks for your quick reply. I would like to ask, which part of your code extracts/generates the features of images. I have read your generate_proposals.py file, and it seems to be taking .bin files. do we have to extract the features on our own, or there is a file that extracts the features and saves as a bin file.
I was hoping there should a file in your repo, that exploits the face extraction pre-trained models and saves the extracted features as any format that the cluster faces file will accept.

thanks.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@engmubarak48 Since this repo focuses on the clustering framework, the face recognition training and feature extraction are not included. You can checkout hfsoftmax for pretrained model and feature extraction. Similar discussion can be found in #4.

from learn-to-cluster.

engmubarak48 avatar engmubarak48 commented on May 18, 2024

@engmubarak48 Thanks for pointing out. For the testing part, it will read the training file path but not use it. I will refine this part to make it more clear. Currently, I think you can set a dummy training path or simply set the training path the same as the testing path.

@yl-1993 Since the data is unlabeled, I can have only one file that consists of extracted features (assuming that I extracted my features and saved as a bin file). but in your test config file, there is a path pointing to a .meta file (which indicates the labels according to my understanding). what type of labels are they, and why do we need, since we are clustering unlabeled images.

or meta.file is used only for evaluation. and can be removed if the evaluation is not needed?

Dear @yl-1993 what I intend to do is the following.

  1. extract features of my unlabeled image data via your extract_feat.py
  2. then, by using your main.py script to cluster the embeddings.

And, also I realized your extract_feat.py in hfsoftmax reads images from bin.file. So, I think I should save my numpy array image data into a bin file too.

Could you please, in steps, clarify for me "the format my data should be" and also "what needs to be filled in the config file?"--- both in the extract.py and main.py

I would really appreciate.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@engmubarak48

  1. You can use the FileListDataset which takes filelist and image prefix as input.
val_dataset = FileListDataset(
    args.val_list, args.val_root,
    transforms.Compose([
        transforms.Resize(args.image_size),
        transforms.CenterCrop(args.input_size),
        transforms.ToTensor(),
        normalize,
    ]))
  1. For feature extraction, we don't need config file. You can use the following command.
python extract_feat.py \
        --arch {} \
        --batch-size {} \
        --input-size {} \
        --feature-dim {} \
        --load-path {} \
        --val_list {} \
        --val_root {} \
        --output-path {}

from learn-to-cluster.

engmubarak48 avatar engmubarak48 commented on May 18, 2024

Dear @yl-1993

The main question I asked is what should I fill to the .meta file if I don't have the labels of the data. In your "cfg_test_0.7_0.75.yaml" config file. there is a path pointing to this file "part1_test.meta"

In general, I only want to cluster the images. and add each cluster to a folder. then check the clusters manually.

Thanks

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@engmubarak48 Sorry for not fully understanding your question. For a quick fix, you can simply use a dummy meta for testing, which will not influence the clustering result. The meta file is currently used for measuring the difference between predicted score and ground-truth score. It is a reference value in test phase. This is a good point. We will support empty meta during inference soon.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@engmubarak48 #17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

Can I use which model you provide to extract face features and then use the clustering model(pretrained_gcn_d.pth.tar) you provide to process my own images?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@felixfuu You can use resnet50-softmax as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)

from learn-to-cluster.

engmubarak48 avatar engmubarak48 commented on May 18, 2024

@engmubarak48 #17 removes unnecessary inputs during inference. For now, you only need to feed features and proposals into the trained network.

Thanks, @yl-1993, I have already made it work back then when I was checking the performance. Do you have any further plans to improve the performance? I am working on this area (face clustering), let me know if you planning further research on this area. we might exchange some ideas.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

@felixfuu You can use resnet50-softmax as the feature extractor. (It is a little different with the feature extractor used to train the clustering model. If there is a big performance drop, feel free to report under this issue.)

How to make an annotation file(.meta) for new data?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@felixfuu For clustering, you only need to feed features and proposals into the trained network.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

The result of my experiment is not very good. i used 940 faces (many of the same ids) to cluster out 900 labels. Almost every picture has a label. @yl-1993

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

@yl-1993 I use resnet50-softmax as the feature extractor, and follow the pipeline in sctipts/pipeline.sh. Is there an error in this process?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@felixfuu The overall procedure is correct. I think there are two ways to check your results. (1) Check the extracted features. You can use the scripts/generate_proposals.sh to generate cluster proposals, which can be regarded as the clustering results. You may reduce the k or maxsz for your data (940 instances). This step only depends on the extracted features and should yield reasonable results. (2) Check the pipeline. You can download the provided features and reproduce the result on ms1m.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

@yl-1993 According to your suggestion, I visualize the cluster proposals and the result of clustering is not good, so it should be the reason of the feature. In my experiment, the k = 20, max=100.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@felixfuu k and maxsz is reasonable. To check the extracted features, you can pick up a face pair with same identity and another face pair with different identity, and compare the cosine similarity between these two pairs. Besides, as a reminder, the face images need to be aligned before feeding into the feature extraction network.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

@yl-1993 Feature extraction will not be a problem, I also checked it with a pair (the cosine similarity is over 0.7 when the pair with same identity and below 0.5 with different identity).

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

By the way, i used knn_hnsw. @yl-1993

from learn-to-cluster.

MrHwc avatar MrHwc commented on May 18, 2024

I use train_cluster_det to train the clustering model, node and edge are generated by generate_proposals. But I found that the cluster generated by generate_proposals is not accurate, The resulting model has poor performance. Am I using train_cluster_det correctly?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@felixfuu @MrHwc It seems both of you encounter problems with respect to proposal generation. The basic rule is to reduce th if the number of clusters is too large. Since the algorithm is finding connected components, a high threshold will lead to a large number of small clusters. You can post more details, e.g., images per cluster, and we can better diagnose the problem.

from learn-to-cluster.

felixfuu avatar felixfuu commented on May 18, 2024

@yl-1993 I checked the proposal, it should be that the feature is not robust enough, there is no obvious gap between the same identity and different identity.

from learn-to-cluster.

MrHwc avatar MrHwc commented on May 18, 2024

My training set is about 100,000, each id has at least 3 feature vectors, up to 381. K={30, 60, 80}, th={0.5, 0.55, 0.6, 0.65 }, I use evaluate evaluation results, pre=0.78 recall=0.62 fscore=0.69, number of class predicted is 29171. I think this is unreasonable, the ground truth is 4726.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@MrHwc There are several ways may help. (1) Have you checked the distribution of the generated clusters? Empirically, a large proportion of clusters may only have 2 images. (2) What's the results of single proposals? For example, the result of K=80, th=0.6. If the clustering model is well trained, it will surpass the result of single proposals. (3) Proposals with low threshold is helpful to recall and proposals with high threshold may improve precision. From the results, you can try to involve proposals with higher threshold. e.g., th=0.7.

from learn-to-cluster.

 avatar commented on May 18, 2024

@yl-1993 您好,我在使用您的代码时遇到了一些问题,烦请指教。我用了您提供的特征提取代码提取了55张图片特征后,再使用该聚类代码后最后出来的pred_labels.txt包含了584013行的数据,我的理解是每一行对应一张图片的label,但这远远大于我的图片数了,若使用自己的feature后是否需要修改程序,这个数据似乎是与您提供的feature对应的。

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@luhengjie 您好,可否列出具体的调用方式?我猜测应该是有些地方用到了默认的part1_test的数据。另外,为了便于有相同问题的人也能理解,我用英文也回复一下。When Hengjie uses the repo for his own feature, the number of predicted results does not match the number of his features. I guess the problem may lie in using the part1_test in somewhere. We can identify the problem when more details are posted.

from learn-to-cluster.

 avatar commented on May 18, 2024

@yl-1993 Thank you for your reply. The way I use your code is to replace the part1_test in the features with my own features, and delete all files in the label folder to avoid influence.The last step is sh scripts/pipeline.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

@luhengjie Thanks. If you name it as part1_test, then you can add --is_rebuild in the script (https://github.com/yl-1993/learn-to-cluster/blob/master/scripts/pipeline.sh#L33) to rebuild the knn and proposals. If you use a different name, you may also need to modify the feature path in dsgcn/configs/cfg_test_0.7_0.75.yaml.

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

Hi all, PR #28 simplifies the pipeline of training and testing. To apply the pretrained model to your own unlabeled features, you only need to:

  1. edit the feat_path in test config, e.g., dsgcn/configs/cfg_test_ms1m_20_prpsls.py. (remove the label_path if you don't have it.)
  2. run the test script(scripts/dsgcn/test_cluster_det.sh).

from learn-to-cluster.

rose-jinyang avatar rose-jinyang commented on May 18, 2024

Hi
I have a question
I made a bin file for entire image data to cluster.
But I found that the number of images is twice as that of labels in the script "hfsoftmax/utils.py".
image
Could u explain this?
Thanks

from learn-to-cluster.

rose-jinyang avatar rose-jinyang commented on May 18, 2024

Hi
I am going to make my custom training dataset.
I extracted 2048 dimension embedding per face image and saved all the embeddings to a file by using numpy.save.
Then how should I make a meta file for labels?
May I store an label for each face embedding in meta file?
Then the number of feature embedding in feature file will be equal to the number of labels in meta file.
Is it right?

from learn-to-cluster.

yl-1993 avatar yl-1993 commented on May 18, 2024

Hi @rose-jinyang

  • For the first question, this loader is mainly designed for processing the .bin file provided by ArcFace. It duplicates images for fast pair generation to evaluate face verification. I will make it clear in hfsoftmax/utils.py.
  • For the second question, it is basically correct except the embeddings are currently saved by np.tofile. Related explanations of making custom dataset will be added to README.

from learn-to-cluster.

liupengcnu avatar liupengcnu commented on May 18, 2024

@SharharZ I think you can store your features with np.save. More details can be found in extract.py. Besides, I will upload the pretrained face recognition model to Baidu Yun soon.

按照你的extract.py里面的代码写的,你是将你提取到的特征保存成.npy文件,而不是二进制文件.bin,请问怎么才可以保存出.bin文件呢

from learn-to-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.