Coder Social home page Coder Social logo

Comments (6)

angpo avatar angpo commented on June 19, 2024

Hi,

As we mentioned in the main paper, we do not leverage multi-camera information during the test phase. Instead, as done by recent works, each example of the query and the gallery set is represented by eight frames sampled from the corresponding tracklet.

So, Figure 3 refers to the procedure we adopt during training. Here, each example is obtained by sampling several frames depicting the same subject but from different cameras.

from vkd.

sunxia233 avatar sunxia233 commented on June 19, 2024

I can make my understanding as clear as possible. I know some multi-shot methods in person re-id. They use a camera tracklet as a multi-query. What I understand is that you used the student network in the final test, using the single-shot method without using camera information. Figure 3 is to verify the impact of multi-camera multi-query on the test. But you mean that Figure 3 is not the input during testing but the input during training. Figure 3(d) result is a single-shot retrieval method that is always maintained? In that case, why not directly use the teacher network as the final test result?

from vkd.

angpo avatar angpo commented on June 19, 2024

I'm sorry, there was a misunderstanding before. I thought you meant another Figure (precisely Fig. 2, which depicts the procedure we adopted during training).

Yes, we always used the student net in the final test. To be clear, its input is "multi-shot" during training, while being single-shot and tracklet-based during evaluation. This was done to be fair and in line with the standard evaluation protocol.

In Figure 3d) we compare the performance of the student net (orange lines) and the teacher one (blue lines), again assuming a single-shot scenario during evaluation. The figure shows that the way we trained the student net (namely, by distilling multi-camera information) leads to huge improvements in terms of performance (and this holds for different architectures).

I hope this is clearer for you now (if not, please feel free to write again).

from vkd.

sunxia233 avatar sunxia233 commented on June 19, 2024

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras. But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery, such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

from vkd.

angpo avatar angpo commented on June 19, 2024

Thank you for your patience. I understand the standard test prototype of single-shot. For example, for a query image a(1)(1), the first represents the category, and the second represents the camera, which represents category 1, camera 1 . The test method for scheme A is to remove category 1 and camera 2 in the gallery, and look for targets under different cameras.

Yes, this is the one and only protocol we follow during evaluation, both for the teacher and the student. Even if the latter has been trained with multi-camera input, we switch to single-camera input at test time (i.e. a subset of images of the same tracklet).

But for shceme B and C, I really don’t know how to exclude cameras of the same category in gallery,

What do you mean with "scheme B and C"?

such as a(1)(1),b(1)(2),c(1)(3),d(1) (4) Participate in the feature fusion, are all categories 1 and cameras 1, 2, 3, and 4 in the gallery removed?

Let me remind you that our work deals with video re-identification. Therefore, we apply feature fusion on a single (video) example at a time, precisely for merging representations of several frames into a single one. So, we never apply feature fusion to different examples of the gallery set (nor of query set).

from vkd.

sunxia233 avatar sunxia233 commented on June 19, 2024

I’m very sorry, because my webpage does not display the picture and did not look at the title of your paper carefully, which led me to think that you are the author of Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification. So forget my question, it is a misunderstanding.

from vkd.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.