Hi, I'm trying to perform inference using the uncompiled model <code

Hi Michael! Have a look at the model metadata in the <a href="https:

<a href="https://github.com/google-coral/edgetpu/blob/6d699665efdc5d84944b5233223c55fe

Inference with uncompiled tflite model, format of output about thermal-face HOT 11 CLOSED

maxbbraun commented on July 27, 2024

Inference with uncompiled tflite model, format of output

from thermal-face.

Comments (11)

maxbbraun commented on July 27, 2024

Hi Michael!

Have a look at the model metadata in the release notes, particularly the outputTensorRepresentation. You can also look at the code for DetectionEngine, which parses this kind of tensor.

from thermal-face.

Michaelszeng commented on July 27, 2024

Thank you for the quick response!

I'm not too sure how to use the code in DetectionEngine. outputTensorRepresentation in the release notes looks like probably what I need, could you help explain a little further what it means? I see "maxDetections": 500 but I'm not quite sure what that means. The output tensor simply contains float numbers, so I'm not sure how that relates to "bounding_boxes", "class_labels", "class_confidences", and "num_of_boxes", which are the labels under outputTensorRepresentation.

I hope I am not missing something simple--I am a student coder. I really appreciate the help!

from thermal-face.

maxbbraun commented on July 27, 2024

Sure! I was suggesting looking at DetectionEngine not necessarily to use that code but to see how they get the bounding boxes from the tensor. The model is one that supports detection as well as classification, so you can ignore the class label since it will always be the same. A quick reading of the code suggests that the coordinates of the bounding boxes are encoded as successive 4-tuples of floats where 1 means the full image width or height.

from thermal-face.

Michaelszeng commented on July 27, 2024

Thanks for the reply! Could you explain what you mean by "where 1 means the full image width or height"?

from thermal-face.

maxbbraun commented on July 27, 2024

It looked to me like the bounding box coordinates are relative to the image size, with [0, 0] being the top left and [1, 1] being the bottom right, so you'd have to translate them back into pixels by multiplying the x and y values by width and height, respectively.

from thermal-face.

Michaelszeng commented on July 27, 2024

I see, thank you. Do you know why there's 500 sets of 4-tuples? I tried creating a bounding box using the first 4-tuple in the way you described, and it doesn't really seem correct. The bounding box doesn't bound my face most of the time--it floats around in space.

from thermal-face.

maxbbraun commented on July 27, 2024

Could you post the raw tensor output you're seeing?

from thermal-face.

Michaelszeng commented on July 27, 2024

Yes, thanks for the reply. I made a real-time version of the code that attempts to draw a bounding box on a live video feed. As I played with the code a bit more and it occurred to me that the bounding box being drawn seemed to have some relationship with the position of my face, just not the correct one.

I tried a few more combinations of (X, Y) coordinates from the raw tensor output data, and realized that the bounding box is correct if I use this combination:

h, w, ch = image.shape y1 = int(results[0][0] * w) x1 = int(results[0][1] * h) y2 = int(results[0][2] * w) x2 = int(results[0][3] * h) cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 0)

I believe this means each row of the output array represents a detection (the 1st row is the highest confidence detection), and the 4 columns represent upper left corner Y, upper left corner X, lower right corner Y, lower right corner X, in that order.

That ended up being quite simple, I thought I had tried this already. Thank you for this help!

from thermal-face.

Michaelszeng commented on July 27, 2024

I do have one more question. The multi-person detection works really well, however, all 500 detections in the raw output tensor contain actual coordinates, but they don't contain any confidence levels. If, for example, there are 2 people in frame, the first 2 detections in the array very accurately bound the 2 people's faces, but the other 498 detections capture random background details. Is there any way to distinguish between detections of faces, and "filler" detections of background details?

from thermal-face.

maxbbraun commented on July 27, 2024

This suggests that part of the output tensor contains the number of detections. I assume that only that many bounding boxes are valid and the rest are noise.

from thermal-face.

Michaelszeng commented on July 27, 2024

Hi, thank you for the response! I see how it works with DetectionEngine. However, I'm using the TensorflowLite API; do you know how I can achieve the same thing with TensorflowLite API?

For reference, this is my current code to run an image though the model and retrieve the output:
`interpreter = tflite.Interpreter(model_path=args.model_file)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']
output_shape = output_details[0]['shape']

height = input_shape[1]
width = input_shape[2]
img = Image.open(args.image).convert('RGB').resize((width, height))

input_data = np.expand_dims(img, axis=0)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()

output_data = interpreter.get_tensor(output_details[0]['index'])`

output_data is an array with shape (1, 500, 4), so I'm not sure where to find the number of "candidates". Do you know how I could achieve the equivalent of num_candidates = raw_result[self._tensor_start_index[3]] using the TFLite API?

Thanks.

from thermal-face.

Inference with uncompiled tflite model, format of output about thermal-face HOT 11 CLOSED

Comments (11)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent