Coder Social home page Coder Social logo

cvat-ai / cvat Goto Github PK

View Code? Open in Web Editor NEW
11.4K 185.0 2.8K 252.71 MB

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

Home Page: https://cvat.ai

License: MIT License

Python 38.69% HTML 0.48% JavaScript 13.99% Shell 0.14% Dockerfile 0.12% TypeScript 41.53% SCSS 1.64% Smarty 0.05% Open Policy Agent 0.87% Mustache 2.49% Jinja 0.01%
video-annotation computer-vision computer-vision-annotation deep-learning image-annotation annotation-tool annotation labeling labeling-tool image-labeling

cvat's Introduction

CVAT Platform

Start Annotating Now

Computer Vision Annotation Tool (CVAT)

CI Gitter chat Discord Coverage Status server pulls ui pulls DOI

CVAT is an interactive video and image annotation tool for computer vision. It is used by tens of thousands of users and companies around the world. Our mission is to help developers, companies, and organizations around the world to solve real problems using the Data-centric AI approach.

Start using CVAT online: cvat.ai. You can use it for free, or subscribe to get unlimited data, organizations, autoannotations, and Roboflow and HuggingFace integration.

Or set CVAT up as a self-hosted solution: Self-hosted Installation Guide. We provide Enterprise support for self-hosted installations with premium features: SSO, LDAP, Roboflow and HuggingFace integrations, and advanced analytics (coming soon). We also do trainings and a dedicated support with 24 hour SLA.

Quick start ⚡

Partners ❤️

CVAT is used by teams all over the world. In the list, you can find key companies which help us support the product or an essential part of our ecosystem. If you use us, please drop us a line at [email protected].

  • Human Protocol uses CVAT as a way of adding annotation service to the Human Protocol.
  • FiftyOne is an open-source dataset curation and model analysis tool for visualizing, exploring, and improving computer vision datasets and models that are tightly integrated with CVAT for annotation and label refinement.

Public datasets

ATLANTIS, an open-source dataset for semantic segmentation of waterbody images, developed by iWERS group in the Department of Civil and Environmental Engineering at the University of South Carolina is using CVAT.

For developing a semantic segmentation dataset using CVAT, see:

CVAT online: cvat.ai

This is an online version of CVAT. It's free, efficient, and easy to use.

cvat.ai runs the latest version of the tool. You can create up to 10 tasks there and upload up to 500Mb of data to annotate. It will only be visible to you or the people you assign to it.

For now, it does not have analytics features like management and monitoring the data annotation team. It also does not allow exporting images, just the annotations.

We plan to enhance cvat.ai with new powerful features. Stay tuned!

Prebuilt Docker images 🐳

Prebuilt docker images are the easiest way to start using CVAT locally. They are available on Docker Hub:

The images have been downloaded more than 1M times so far.

Screencasts 🎦

Here are some screencasts showing how to use CVAT.

Computer Vision Annotation Course: we introduce our course series designed to help you annotate data faster and better using CVAT. This course is about CVAT deployment and integrations, it includes presentations and covers the following topics:

  • Speeding up your data annotation process: introduction to CVAT and Datumaro. What problems do CVAT and Datumaro solve, and how they can speed up your model training process. Some resources you can use to learn more about how to use them.
  • Deployment and use CVAT. Use the app online at app.cvat.ai. A local deployment. A containerized local deployment with Docker Compose (for regular use), and a local cluster deployment with Kubernetes (for enterprise users). A 2-minute tour of the interface, a breakdown of CVAT’s internals, and a demonstration of how to deploy CVAT using Docker Compose.

Product tour: in this course, we show how to use CVAT, and help to get familiar with CVAT functionality and interfaces. This course does not cover integrations and is dedicated solely to CVAT. It covers the following topics:

  • Pipeline. In this video, we show how to use app.cvat.ai: how to sign up, upload your data, annotate it, and download it.

For feedback, please see Contact us

API

SDK

CLI

Supported annotation formats

CVAT supports multiple annotation formats. You can select the format after clicking the Upload annotation and Dump annotation buttons. Datumaro dataset framework allows additional dataset transformations with its command line tool and Python library.

For more information about the supported formats, see: Annotation Formats.

Annotation format Import Export
CVAT for images ✔️ ✔️
CVAT for a video ✔️ ✔️
Datumaro ✔️ ✔️
PASCAL VOC ✔️ ✔️
Segmentation masks from PASCAL VOC ✔️ ✔️
YOLO ✔️ ✔️
MS COCO Object Detection ✔️ ✔️
MS COCO Keypoints Detection ✔️ ✔️
MOT ✔️ ✔️
MOTS PNG ✔️ ✔️
LabelMe 3.0 ✔️ ✔️
ImageNet ✔️ ✔️
CamVid ✔️ ✔️
WIDER Face ✔️ ✔️
VGGFace2 ✔️ ✔️
Market-1501 ✔️ ✔️
ICDAR13/15 ✔️ ✔️
Open Images V6 ✔️ ✔️
Cityscapes ✔️ ✔️
KITTI ✔️ ✔️
Kitti Raw Format ✔️ ✔️
LFW ✔️ ✔️
Supervisely Point Cloud Format ✔️ ✔️

Deep learning serverless functions for automatic labeling

CVAT supports automatic labeling. It can speed up the annotation process up to 10x. Here is a list of the algorithms we support, and the platforms they can be run on:

Name Type Framework CPU GPU
Segment Anything interactor PyTorch ✔️ ✔️
Deep Extreme Cut interactor OpenVINO ✔️
Faster RCNN detector OpenVINO ✔️
Mask RCNN detector OpenVINO ✔️
YOLO v3 detector OpenVINO ✔️
YOLO v7 detector ONNX ✔️ ✔️
Object reidentification reid OpenVINO ✔️
Semantic segmentation for ADAS detector OpenVINO ✔️
Text detection v4 detector OpenVINO ✔️
SiamMask tracker PyTorch ✔️ ✔️
TransT tracker PyTorch ✔️ ✔️
f-BRS interactor PyTorch ✔️
HRNet interactor PyTorch ✔️
Inside-Outside Guidance interactor PyTorch ✔️
Faster RCNN detector TensorFlow ✔️ ✔️
Mask RCNN detector TensorFlow ✔️ ✔️
RetinaNet detector PyTorch ✔️ ✔️
Face Detection detector OpenVINO ✔️

License

The code is released under the MIT License.

This software uses LGPL-licensed libraries from the FFmpeg project. The exact steps on how FFmpeg was configured and compiled can be found in the Dockerfile.

FFmpeg is an open-source framework licensed under LGPL and GPL. See https://www.ffmpeg.org/legal.html. You are solely responsible for determining if your use of FFmpeg requires any additional licenses. CVAT.ai Corporation is not responsible for obtaining any such licenses, nor liable for any licensing fees due in connection with your use of FFmpeg.

Contact us

Gitter to ask CVAT usage-related questions. Typically questions get answered fast by the core team or community. There you can also browse other common questions.

Discord is the place to also ask questions or discuss any other stuff related to CVAT.

LinkedIn for the company and work-related questions.

YouTube to see screencast and tutorials about the CVAT.

GitHub issues for feature requests or bug reports. If it's a bug, please add the steps to reproduce it.

#cvat tag on StackOverflow is one more way to ask questions and get our support.

[email protected] to reach out to us if you need commercial support.

Links

cvat's People

Contributors

activechoon avatar alexeyalexeevxperienceai avatar annapetrovicheva avatar arvfilippov avatar azhavoro avatar benhoff avatar bsekachev avatar cvat-bot[bot] avatar dependabot-preview[bot] avatar dependabot[bot] avatar dmitriyoparin avatar dmitriysidnev avatar dvkruchinin avatar k1won avatar klakhov avatar manasars avatar marishka17 avatar mdacoca avatar nmanovic avatar novda avatar pktiuk avatar pmazarovich avatar sizov-kirill avatar snyk-bot avatar speclad avatar tosmanov avatar vnishukov avatar yasakova-anastasia avatar zankevich avatar zhiltsov-max avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvat's Issues

Sort labels in alphabet order

Currently they are sorted by primary key. If all labels were provided at task creation it results in a semi-random order of labels. It makes significantly harder to find required label when working with a big number of them. One work-around is to add labels one-by-one, but it's not a pleasant process...

Remove all annotations inside a range of frames

This is very useful option to remove all options from one frame to another.
I want to reannotate part of video. It is not good idea to search for keyframes and turn off them or
delete lines in xml file and upload annotation.

Feature request: add tracking

It'd be great to add a tracking mode e.g. see video here.

Specifically, if I enable tracking (per track) if there are no annotations in the future of the track, then attempt to track the last annotated box for all future frames. If the user goes to the next frame while one of these tracks is displayed, then this is considered as marking this as 'good', and it gets added as a key frame. Otherwise the user can edit it manually.

There are probably more UI considerations, but this would provide a lot of value. My use case is tracking people heads, and the standard interpolation is less useful (but still much better than without!) due to heads 'bobbing' while walking etc. Feature tracking would likely solve this in many situations (aside from when a head is occluded etc.).

Aside from UI considerations, this is pretty easy to implement. It can even be done in the browser with opencv.js (and suitable performance, depending on device).

The login page at localhost:8080 can't be reached

I followed the Installation instructions, but after running the docker-compose up -d command, I get a "connection was reset" error in chrome, and don't see the login page.

The output of the docker-compose up -d command was:
Creating network "cvat_default" with the default driver
Creating cvat_db ... done
Creating cvat_redis ... done
Creating cvat ... done

And docker ps outputs:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec26a5065b07 cvat "/usr/bin/supervisord" 25 minutes ago Up 25 minutes 0.0.0.0:8080->8080/tcp, 8443/tcp cvat

Improve documentation for overlap parameter

My understanding is that overlap just specifies how many frames overlap when splitting a video into segments. If that's correct, what's the purpose? Does it actually do anything for the user? (E.g. if I do the tracks on the first segment, are they copied across to the next segment in the overlapping region? This doesn't appear to be the case.) Put another way- should I just set overlap=0 in my video tagging examples, to avoid having to manually resolve different taggings from each segment in the overlap?

Sorry if I've misunderstood something obvious.

PS - great tool!

Where does the shared server directory point at?

To create huge tasks the documentation suggests choosing Share option in the dialog box.
While trying to select the files I see a modal popping up with the following path as title:
//icv-cifs/icv_projects/cvat/data

However I can not navigate from there (nor I can find where this path is). The documentation also does not elaborate much for large number of frames. Any advice?

modal

Enable video stream access

Hi,

My workflow is such that I have thousands of frames per annotation task, which amounts to extensive disk space usage (e.g. a <15MB video (~40s VGA@30fps) results in ~2GB in jpeg's).

Adding the ability of CVAT to work directly on a video stream will be a significant improvement as it will allow for the user to only specify a URL/path with potentially an optional download and local storage capability.

One way I see this to be done is by usage of OpenCV.js (https://docs.opencv.org/3.4/d5/d10/tutorial_js_root.html).

I could invest time in this. Let me know of your thoughts.

Thanks

如何配置环境(how to configure the environment?)

这个环境配置能给一个教程吗,安装完docker和docker-compose后,就无法继续进行下去了。
(can you give me a guide to this project?After I configured the docker and docker-compose,I don't know what to do the next.)

Register new users

Hi
Thanks for great project. It is exactly what I was looking for. Even was able to run on AWS EC2 instance.
When new user tries to register he is getting:

Forbidden
Your account doesn't have access to this page. To proceed, please login with an account that has access or contact your admin.

I guess it still not implemented. Am I right?

CVAT - AWS-Deployment guide

It would be nice if we have some docs telling how we can deploy this into the AWS CUDA Deep-learning machine.
Let me add this to the doc of CVAT or if anyone can build CVAT AMI at AWS would be great. Most of the time, we use the CVAT in AWS. I believe it would be helpful to other teams.

Support Pascal VOC Format

Hi, it would be nice to be able to export the annotations as Pascal VOC format. I Couldn't find info about supported formats in the documentation, is this feature supported?

UI becomes slow after 300-400 annotations

I'm labeling large satellite images with hundreds to a few thousand objects of interest.

I noticed that after about 300-400 annotations, the UI slows down. It might take the program ~1 sec to become responsive again after creating a new bbox. After about 800-1000 annotations, it's nearly unusable -- adding an annotation might require ~5 seconds before it will register. For now, I'm just cropping my large images into smaller pieces as a workaround, but it'd be a lot nicer to add all annotations to a single large image (as raw satellite imagery often comes in fairly long strips). I'm using a 2017 MacBook pro to do the labeling.

I don't know enough about the backend to suggest a fix, but happy to answer questions if it's helpful.

Error: Failed to execute 'inverse' on 'SVGMatrix': The matrix is not invertible

Error: Failed to execute 'inverse' on 'SVGMatrix': The matrix is not invertible.
at translateSVGPos (https://cvat-icv.inn.intel.com/static/CACHE/js/33b452232897.js:9422:54)
at ShapeCreatorView. (https://cvat-icv.inn.intel.com/static/CACHE/js/33b452232897.js:7852:30)
at HTMLDivElement.dispatch (https://cvat-icv.inn.intel.com/static/CACHE/js/716e033f0bc5.js:24801:27)
at HTMLDivElement.elemData.handle (https://cvat-icv.inn.intel.com/static/CACHE/js/716e033f0bc5.js:24609:28)

Mark ignore regions and keyframes for an object

Thanks for great annotation server.
It would be great to have "uncertain" flag for annotation (like you have occlusion). Which means that as human I see the object and can annotate but for detection it would be nice to detect but it is OK if detector does not detect it (do no penalize algorithm for that).

The login page at localhost:8080 have Bad Request

I am using Ubuntu 16.04.

I followed the tutorial and installed it successfully, but on the second day, when I ran the docker, it couldn't open the page, showing the 400 status code(Bad Request), how can I fix this?

The output of the docker-compose up -d command was:
Creating network "cvat_default" with the default driver
Creating cvat_db ... done
Creating cvat_redis ... done
Creating cvat ... done

And the output of docker logs cvat:
logs.txt

Mechanical Turk Integration

Integration of CVAT with MTurk for deploying work as HITs would be very useful for such projects. Need to integrate Turkic framework of VATIC with CVAT.
I would also like to contribute to your project. Please help me in setting up the development environment for the same.

Navigation by frames may works incorrect

Navigation by frames may will work incorrect in next scenario:

  1. Open any task in CVAT
  2. Resize browser to size which is less then CVAT workspace
  3. Scroll browser slider right
  4. Try to navigate with player progress bar

Player will not react to the progress bar navigation if cursor near start of the progress bar. Such "non react" area will increase if you scroll the browser slider more to the right

Keypoint Annotation

I wanted to ask about the keypoint annotation feature you are workig on now. Would that have standard configuration/format like keypoints for annotating human pose? Would it have the same interpolation feature as the current bounding boxes? Finally when would the feature be released? Do you have specific date in mind? Thank you

Video/Image loading status as on youtube

Another question and likely feature suggestion.

When start a job, if I wait long enough, will all the frames be loaded into the browser?
Or, are they loaded on demand as I seek through the video?
Are they cached locally in memory?

I'm working with 4k video and the interface isn't that usable, at least for my current use model, until all frames have been loaded.

Based on the answer above, it would be great to have feedback as to whether the frames have all been loaded or, better, which frames have been loaded. What I've seen that works well is using a different color on the seek bar for frames that have been loaded.

If they are demand loaded, it would be nice to have a way to force it to load them all (as long as there's enough memory available).

How to keep track id?

After annotation done --> upload annotation:
The ID of the target is messed up.

How to include the id information of the calibration target in the exported annotation file, and re-import the annotation file without showing confusion.

Release notes?

Is there release notes available anywhere?
If not:

  • should they be added?
  • what's the best way to figure out what has changed? look through git logs?

Undo functionality

Have you considered undo functionality?
Seems like that would be a very useful feature.
Thanks.

Extend contributing.md

Hi,

Could you, please, describe / suggest development / testing steps.
In particular, how would one perform edit - update - run (debug) cycle for both server and client parts.

Running on AWS EC2

Was trying to run cvat on EC2 AWS and met an issue to access cvat from outside AWS. It was returning Bad requests: 400 all the time. Found solution to add EC2 instance public IP to ALLOWED_HOSTS in docker-compose.override.yml as specified in documentation. But it is not the nicests solution, every time IP changes I have to change that value. It would be great if someone with move AWS experience can provide more elegant solution. Thanks

Re-id app to merge bboxes into tracks after TF annotation

Hi, great tool. For ground-truth annotation, there are often too many objects in every frame. It would be tremendously tedious work to annotate the track for every single one of objects. Is there any pre-trained model or a way to run a custom model that can detect possibly identical objects, and all I have to do is to review and merge their tracks/IDs into one?

thanks

How to run it without docker?

It's tedious to install docker and configure the settings, is there any way to run it directly?
After install lots library missed for django, I meet a problem:
ERRORS:
engine.Task: (auth.E005) The permission codenamed 'view_task' clashes with a builtin permission for model 'engine.Task'.
the script is:
sudo python3 manage.py createsuperuser

Video file name / url in output file

Hi,
First of all, thanks for the tool. It works great!

When I annotate video files, and for that purpose I create an annotation task per video, I cannot seem to find any reference to the original video name / path / url inside the task itself. Moreover, inside the output XML file generated after annotating, there are no references to that information at all. The only thing I can find is the url of the corresponding task, but I don't think I can extract from that url the information I'm looking for (i.e. the name of the video).

The only workaround I can think of is naming the annotation task after the video itself, and do the same for the xml file. However, I don't really like that solution. The ideal solution for me would be to have video file name inside the xml file.

Am I missing something? Please point me in the right direction.

Thank you very much

XML file metadata labels is incomplete

The labeling schema doesn't make it into the output XML file.

As an example, I created a job with a 'labels' spec of:

person @select=type:white,blue,ref ball

and the dumped XML file is:

<?xml version="1.0" encoding="utf-8"?>                          
<annotations>                   
  <version>1.0</version>        
  <meta>                        
    <task>                      
      <id>16</id>               
      <name>test</name>         
      <size>902</size>          
      <mode>interpolation</mode>                                
      <overlap>5</overlap>      
      <bugtracker></bugtracker> 
      <created>2018-07-26 02:58:56.014598+03:00</created>       
      <updated>2018-07-26 02:58:56.014613+03:00</updated>       
      <labels>                  
        <label>                 
          <name>ball</name>     
          <attributes>          
          </attributes>         
        </label>                
      </labels>                 
      <segments>                
        <segment>               
          <id>24</id>           
          <start>0</start>      
          <stop>901</stop>      
          <url>http://13.66.164.80/?id=24</url>                 
        </segment>              
      </segments>               
      <owner>                   
        <username>cvat</username>                               
        <email>[email protected]</email>                         
      </owner>                  
    </task>                     
    <dumped>2018-07-26 02:59:11.669206+03:00</dumped>           
  </meta>                       
</annotations>         

Note that most of the 'labels' information is missing. The only way I was able to be sure of my 'labels' spec for an existing job is that it was stored in the browser history.

Thanks.

Could not create the task. ffmpy.FFRuntimeError

Built docker image from the latest sources. Created superuser. Getting an error on task creation:

Could not create the task. ffmpy.FFRuntimeError: ffmpeg -i /home/django/data/2/.upload/20170209T193000.000000Z.mp4 -start_number 0 -b:v 10000k -vsync 0 -an -y -q:v 16 /tmp/cvat-p9csbe_h.data/%d.jpg exited with status 1 STDOUT: STDERR:

screen shot 2018-09-05 at 16 22 23

Connected to running cvat container with docker exec -it <container id> /bin/bash and pasted command from error message into terminal. It fails, because folder cvat-* in /tmp doesn't exist.

Not correct number of frames in video

I load video with resolution 4096x2178. Number of frames is 1079.
In job statistic I see number of frames is 959 and in xml file number of frames is the same.

Using the same attribute for label twice -> Stuck

There is no warning that you are using the same attribute multiple times. This can easily happen when copy, pasting stuff.

Errors I've experienced when doing this:

  1. Job doesnt start up. Instead you can only see the loading screen.
    2. Can't exit out of drawing the bounding box.

EDIT: Number two has more to do with large files (3.5Gb) I think. I will further investigate.

Also the one line given makes it extremely uncomfortable to type/paste in labels.

Greetings

Host Container on Docker Hub

Could you please connect this repository to docker hub. This way it would be possible just to download to already built container, since the build process is rather lengthy.

I encountered difficulties in the task configuration page.

Now,I hava create new task.After I filled out name,labels and select files,I submitted this page.But I have been waiting for this page for two or three hours for this page that has one word: "Sucessful Request!Creating...".
So,I want to know how to configure the task,can you share your configuration.
my configuration is as follows:
Name: task 1
Labels: vehicle @select=type:undefined,car,truck,bus,train ~radio=quality:good,bad ~checkbox=parked:false
Select Files: 2.mp4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.