Coder Social home page Coder Social logo

could you do me a favor? about hashnet HOT 2 CLOSED

thuml avatar thuml commented on May 20, 2024 1
could you do me a favor?

from hashnet.

Comments (2)

caozhangjie avatar caozhangjie commented on May 20, 2024

from hashnet.

yuewuqing2224 avatar yuewuqing2224 commented on May 20, 2024

I want to add some of my own thought on this.

The overall goal of image hashing is to map each image to a binary hash code so that ones that are similar shares similar hash codes and ones that are different have different hash codes. The distance is measured in Hamming distance. Binary codes are usually much shorter than the number of classes. This ensures fast retrieval speed since distance between binary codes can be efficiently computed using bitwise xor and smaller hash codes will further boost the speed. So two things that people should care about: (1) image similarity (2) hash code learning

Most papers nowadays are more concerned with improving the retrieval speed and accuracy based on commonly used datasets. This means that (2) are more relevant. (1) is simply based on some common practice in the field. If you look at those datasets, you will notice that they all have class label informations. Even if some paper do not use this directly and formulate it as similarity matrix, you should know that whether images are similar or not will always be computed from class label. For instance in imagenet100 and cifar10, this is simply the class lable. In NUS WIDE and MS COCO, this is whether two images share at least one common class label or not.

And as a side note, the pretrained model used for image hashing will always be from some pretrained Imagenet classification model. You never see people report training from scratch. This is because many simply do not converge. So I would say that nowadays most image hashing models work by projecting one hot vector or multiclass label into a much shorter binary hash codes. This is different from classification because softmax results or some intermediate conv features as used in face recognition are all floating values. You usually calculate the max or the Euclidean distance. Results from them can not be directly mapped to binary codes without fine-tune with image hashing algorithm.

With that being said, if you are interested in (1), you can check out papers like context embedding network or contextual visual similarity. If you don't care about it, then feel free to use any class label information you want if it gives you good results. Many papers do use it to boost their performance and some even strictly use it.

from hashnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.