Coder Social home page Coder Social logo

Method for face detection about pycozmo HOT 6 OPEN

zayfod avatar zayfod commented on August 26, 2024
Method for face detection

from pycozmo.

Comments (6)

gimait avatar gimait commented on August 26, 2024

Hey, I took a look at your code, I think it is fine, but we would need to work a bit to properly integrate it with cozmo.

First, I think the image processing should be done in a separate thread. This means that we would need the tracking to have its own thread class with methods to add new frames in the loop as they come from cozmo. We should also think about whether we want to process all frames or only the ones we have time to do (depending on each device it might take some time to process a frame).

We should also have a few classes to define and store the properties of each visible object, and add the functionality to store and retrieve positions and other information about visible objects (and maybe those that went out of the field of view for a more advanced version).

This should of course be independent of the implementation of yolo or other image recognition methods, so we can easily update the algorithms we use or combine them (have something like yolo for faces and something else for the cubes).

You could start implementing this slowly and create PRs with each part, I am interested in working with this too (I was planning to work in the cube recognition in the past), so I could start implementing some of the basic architecture if you'd like some help, then we can see how it goes as we add the code.

from pycozmo.

davinellulinvega avatar davinellulinvega commented on August 26, 2024

I think even before talking about threads, we should first split the MultiTracker class into (at least) three sub-classes, each corresponding to a different functionality: object tracking, object detection, and display. We might keep the MultiTracker class to more easily manage the tracker, object detection, and display, in one place. It would also contain the main loop.

I like your idea of creating different classes to define and store properties associated with each detected/tracked object. Makes it easier and more flexible to use from a script. So I am all for this. Might I suggest defining an interface so that other classes can handle those objects in a more generic manner?

At a more abstract level, you seem to imply that Yolo would only be used to detect faces. However, my intention was rather to have Yolo be THE algorithm/mechanism used for all sorts of object detection. At the moment it has only been trained to detect faces and hands. Using the Open Image dataset though it is possible to train the network to detect some 600 categories of objects. If I remember correctly, the roadmap for the pycozmo library includes pet detection. Well that could be done with the same network. Same goes for the cubes. Although, detecting the cubes would take a bit more work, since we would have to take pictures of them in different settings and annotate all those pictures by hand. But it is doable.
So to summarize a little bit. What I had in mind is to integrate the MultiTracker class into the pycozmo library and train its neural network to detect many categories of objects. Then the user or other classes within the library can simply specify what they want the network to detect and voila.

Finally, I am not against some help to implement this whole thing (that is the reason for opening this issue in the first place), especially if it will serve to detect different categories of object. But before that we should define what the final architecture will look like, so that we are not just coding in the dark.

from pycozmo.

gimait avatar gimait commented on August 26, 2024

we should first split the MultiTracker class into (at least) three sub-classes, each corresponding to a different functionality: object tracking, object detection, and display.

I agree, the code should be split in separate classes for different functionality. I am not sure about what should go where, but I think you have a good idea of how it should be done, so you can go ahead.

I think even before talking about threads, .... It would also contain the main loop.

I talk about threads thinking on how this should be included in the pycozmo architecture. We currently have independent thread classes managing different tasks, and computer vision (which I believe will be the most computationally expensive task in the package) should definitely not be blocking the main thread. This is because, in my opinion, any user of pycozmo should be able to import the package and start playing with the robot without needing to worry about the management of the tasks we are implementing in the package.

If you'd like to implement everything without worrying much about this, that's perfectly fine, you can do it in an example at first, and we'll find a way of including it in the package afterwards.

Might I suggest defining an interface so that other classes can handle those objects in a more generic manner?

I'm not sure what you mean by this, but yes, we should be able to access these objects in a generic manner. The way I see to do this would be to include a list/dict/manager class for these objects in the Client, so they can be easily accessed from there.

At a more abstract level, you seem to imply that Yolo would only be used to detect faces.

I think we can use yolo for any item we can train it to. My only concern is how easily would be to train this model to detect the cubes. I think it will be a pain to do enough pictures in enough environments to train the model.
This, as you probably know, is because the accuracy of cnns such as yolo depends deeply in the dataset, and in the case of the cubes, we would also need to classify them not only as a cube but which face of the cube it is...

There are other use cases where yolo might not be the right solution (e.g detect lines or points to improve the precision of the localization of the robot).

Finally, I am not against some help to implement this whole thing (that is the reason for opening this issue in the first place), especially if it will serve to detect different categories of object. But before that we should define what the final architecture will look like, so that we are not just coding in the dark.

As I said, you can go ahead and implement what you'd like, then create a PR and we'll take it from there. I don't know what is the best way of doing this and you have a head start so I think you should go ahead and implement it.

from pycozmo.

davinellulinvega avatar davinellulinvega commented on August 26, 2024

Very well then, I will start implementing all that and I will let you know where I might need some guidance to integrate my code into the pycozmo library. It might take some time, since I still have to find a job on the side, but I will try my best.

from pycozmo.

gimait avatar gimait commented on August 26, 2024

Sounds good! You don't need to hurry with this, I think the best of this library is how much we can learn from cozmo and developing robotic solutions. Hopefully, you'll have some good fun with it. Let me know if you want some help or have some questions (you can just drop me an email), I'd like to help you if I can.

Also, good luck with the job hunt!

from pycozmo.

davinellulinvega avatar davinellulinvega commented on August 26, 2024

Hello again,

I finally took some time off after getting my vaccine and worked on the face detection algorithm for the pycozmo. Really sorry that it took a month to get there.
If you want to have a look I created a pull request here: #49.

Hope you'll enjoy it.

from pycozmo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.