Coder Social home page Coder Social logo

Comments (11)

maxbbraun avatar maxbbraun commented on July 27, 2024

Hi Michael!

Have a look at the model metadata in the release notes, particularly the outputTensorRepresentation. You can also look at the code for DetectionEngine, which parses this kind of tensor.

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

Thank you for the quick response!

I'm not too sure how to use the code in DetectionEngine. outputTensorRepresentation in the release notes looks like probably what I need, could you help explain a little further what it means? I see "maxDetections": 500 but I'm not quite sure what that means. The output tensor simply contains float numbers, so I'm not sure how that relates to "bounding_boxes", "class_labels", "class_confidences", and "num_of_boxes", which are the labels under outputTensorRepresentation.

I hope I am not missing something simple--I am a student coder. I really appreciate the help!

from thermal-face.

maxbbraun avatar maxbbraun commented on July 27, 2024

Sure! I was suggesting looking at DetectionEngine not necessarily to use that code but to see how they get the bounding boxes from the tensor. The model is one that supports detection as well as classification, so you can ignore the class label since it will always be the same. A quick reading of the code suggests that the coordinates of the bounding boxes are encoded as successive 4-tuples of floats where 1 means the full image width or height.

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

Thanks for the reply! Could you explain what you mean by "where 1 means the full image width or height"?

from thermal-face.

maxbbraun avatar maxbbraun commented on July 27, 2024

It looked to me like the bounding box coordinates are relative to the image size, with [0, 0] being the top left and [1, 1] being the bottom right, so you'd have to translate them back into pixels by multiplying the x and y values by width and height, respectively.

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

I see, thank you. Do you know why there's 500 sets of 4-tuples? I tried creating a bounding box using the first 4-tuple in the way you described, and it doesn't really seem correct. The bounding box doesn't bound my face most of the time--it floats around in space.

from thermal-face.

maxbbraun avatar maxbbraun commented on July 27, 2024

Could you post the raw tensor output you're seeing?

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

Yes, thanks for the reply. I made a real-time version of the code that attempts to draw a bounding box on a live video feed. As I played with the code a bit more and it occurred to me that the bounding box being drawn seemed to have some relationship with the position of my face, just not the correct one.

I tried a few more combinations of (X, Y) coordinates from the raw tensor output data, and realized that the bounding box is correct if I use this combination:

h, w, ch = image.shape y1 = int(results[0][0] * w) x1 = int(results[0][1] * h) y2 = int(results[0][2] * w) x2 = int(results[0][3] * h) cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 0)

I believe this means each row of the output array represents a detection (the 1st row is the highest confidence detection), and the 4 columns represent upper left corner Y, upper left corner X, lower right corner Y, lower right corner X, in that order.

That ended up being quite simple, I thought I had tried this already. Thank you for this help!

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

I do have one more question. The multi-person detection works really well, however, all 500 detections in the raw output tensor contain actual coordinates, but they don't contain any confidence levels. If, for example, there are 2 people in frame, the first 2 detections in the array very accurately bound the 2 people's faces, but the other 498 detections capture random background details. Is there any way to distinguish between detections of faces, and "filler" detections of background details?

from thermal-face.

maxbbraun avatar maxbbraun commented on July 27, 2024

This suggests that part of the output tensor contains the number of detections. I assume that only that many bounding boxes are valid and the rest are noise.

from thermal-face.

Michaelszeng avatar Michaelszeng commented on July 27, 2024

Hi, thank you for the response! I see how it works with DetectionEngine. However, I'm using the TensorflowLite API; do you know how I can achieve the same thing with TensorflowLite API?

For reference, this is my current code to run an image though the model and retrieve the output:
`interpreter = tflite.Interpreter(model_path=args.model_file)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
output_shape = output_details[0]['shape']

height = input_shape[1]
width = input_shape[2]
img = Image.open(args.image).convert('RGB').resize((width, height))

input_data = np.expand_dims(img, axis=0)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()

output_data = interpreter.get_tensor(output_details[0]['index'])`

output_data is an array with shape (1, 500, 4), so I'm not sure where to find the number of "candidates". Do you know how I could achieve the equivalent of num_candidates = raw_result[self._tensor_start_index[3]] using the TFLite API?

Thanks.

from thermal-face.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.