insight-platform / savant Goto Github PK
View Code? Open in Web Editor NEWPython Computer Vision & Video Analytics Framework With Batteries Included
Home Page: https://savant-ai.io
License: Apache License 2.0
Python Computer Vision & Video Analytics Framework With Batteries Included
Home Page: https://savant-ai.io
License: Apache License 2.0
DS added preprocessing gst element and meta NvDsPreProcessTensorMeta.
We can't use the gst element because there isn't primary object access, but we can try to use meta directly.
Add tracing instrumentation to the project and support exporting metrics to the Jaeger agent. We will measure the performance of ZMQ input / ZMQ output / batched operations. The purpose of tracing is to understand how long it takes to evaluate our code.
The client code must be implemented with OpenTelemetry and use UDP to export metrics to Jaeger Agent. UDP enables seamless operation in cases when the agent doesn't function.
graph TD
LIB --> |HTTP or gRPC| COLLECTOR
LIB["Jaeger Client (deprecated)"] --> |UDP| AGENT[Jaeger Agent]
%% AGENT --> |HTTP/sampling| LIB
AGENT --> |gRPC| COLLECTOR[Jaeger Collector]
%% COLLECTOR --> |gRPC/sampling| AGENT
SDK["OpenTelemetry SDK (recommended)"] --> |UDP| AGENT
SDK --> |HTTP or gRPC| COLLECTOR
COLLECTOR --> STORE[Storage]
COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
PLUGIN --> STORE
QUERY[Jaeger Query Service] --> STORE
QUERY --> |gRPC| PLUGIN
UI[Jaeger UI] --> |HTTP| QUERY
subgraph Application Host
subgraph User Application
LIB
SDK
end
AGENT
end
To enable tracing between processes (adapters, framework), we need to add a tracing id to the AVRO schema and add the tracing to adapters and into the framework. This will give us the capability to trace packets e2e:
Packed created (new trace) > handled during the adapters and pipeline > ...
Distributed tracing: https://uptrace.dev/opentelemetry/distributed-tracing.html#what-to-instrument
The framework supports two options for tracing:
External: If a source doesn't trace the packet, the framework avoids creating a new trace by itself.
Internal: the framework creates traces by itself and supports configuring the sampling.
Ensure that tracing doesn't influence performance significantly. Consider configuring sampling (Head-based sampling) on adapters to decrease the performance influence and amount of traces: https://uptrace.dev/opentelemetry/sampling.html
Replace the draw element artist with a new one based on GpuMat OpenCV.
The current implementation of the GigE adapter works with RGB images. To pass frames over a network or locally on higher FPS, it's necessary to pack them into an efficient format like H264/HEVC or JPEG.
We need to develop the implementation that generates HQ HEVC with a standard software-based encoder. Support the encoding profiles.
After the implementation is complete, the adapter can send:
The user can configure the quality profile, bitrate, and IDR interval.
Add a sample demonstrating how to use adapter with fake GigE (docker compose), on mock gige:
gige_source
.
Take a look at:
https://github.com/insight-platform/Savant/tree/develop/samples/multiple_rtsp
1st adapter uses raw frames
2nd adapter uses hevc frames
Allow setting bitrate and Quality profile for H264, H265, and other supported video codecs.
Sometimes, for processing in custom functions (plugins NvDsPyFuncPlugin) or in rendering (NvDsDrawFunc), additional frame information is needed. This information is stored in the tag field of the GstFrameMeta structure, for example the location in the case of a video-source adapter.
In all cases, NvDsFrameMeta is passed as an argument. I propose extending NvDsFrameMeta to give the user access to additional meta-information for frames stored in METADATA_STORAGE as GstFrameMeta.
We have to finally establish the necessary number of socket types that are sufficient for all the tasks.
Prevent zeromq recv queue length from growth. Implement with the parameter.
Hi, Can you please share any ComplexModel output converter example python script. In your code I have seen
converter:
module: module.face_detector_coverter
class_name: FaceDetectorConverter
this configuration but I could not found FaceDetectorConverter script. I'm using retinaface for nvinfer@complex_model and I have trouble in converter section. If you have any converter can your share it please?
Savant containers now have a build stage. At this point, it will be more convenient to build the boost library for each supported platform.
Add the ability to use Similari trackers in Savant pipelines (only sort for now).
Similari is now available on the PyPI repository:
pip install similari-trackers-rs==0.26.1
Introduce the parameter that limits the length of GST queues within the pipeline.
Make an efficient function (not Python) that activates and deactivates corresponding elements based on the presence/absence of certain tags.
Motivation: The feature is required to deploy a single pipeline that can handle different scenarios for various cameras.
E.g. We have two cams that we want to handle in the following way:
We can cope with the task in two different ways:
The filter must handle the following tags:
*
- any (default, the same if not specified)
- a=x
- b=y
If a=x
or b=y
, then activates.
This is an optimizing feature, that decreases the amount of traffic sent to wrong destinations.
Zero MQ's Pub/Sub implements a topic filtering feature. The Subscriber can specify the topic prefix which causes that the only matching topics are being processed. The feature is only available for pub
or sub
socket types, but ZeroMQ transparently handles other kinds of generally compatible sockets without topic filtering as well.
We have to support the feature based on stream source_id
concept. So, when the publisher of any kind (source adapter or framework) publishes the messages to pub/sub it must specify source_id
as topic_name
.
When the subscriber of any kind (framework or sink adapter) is configured it can optionally specify topic prefix to filter out only matching topics. For the framework the prefix may be specified in a YaML configuration, for sink adapter it may be specified as an argument or if adapter is designed to support only the kind of operation for a single source_id
, then topic filter must be set to source_id
.
Summary:
To create corresponding tests.
From ZMQ GitHub: zeromq/libzmq#3611
To provide to way to access batch elements as GpuMat OpenCV class.
The idea of function/plugin/node is to unify the simultaneous processing of frames with various aspect ratios to a single resolution without deformation. It can be used when the aspect ratio can vary between sources, especially during image processing.
Configuration options include the following properties:
left_alignment: center|let|right
vertical_alignment: center|top|bottom
interpolation: what is supported by OpenCV-CUDA (choose default good)
frame_roi: original-image | full-area
Add TAO and YOLOv5/6/7/X models post processors so that they can be used effectively with Savant.
While sending consecutive sources into a module, various problems may be observed.
Input: video source adapter + one video file (50 frames, 60 fps, 1920x1080, h264)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter
2nd source's processing stops early, with module completing output of only 1 frame. At the same time 1st source and 3rd and later sources are processed in full, 50 frames written by the sink adapter.
In case sources are started each with their own source-id, sink adapter writes 50 / 1 / 50 frames into their respective directories.
In case sources are started using the same source-id each time, sink adapter similarly writes 50 / 1 / 50 frames into single directory, resulting in cumulative amount of frames of 50 \ 51 \ 101 after 1st, 2nd and 3rd runs respectively.
Input: video source adapter + one video file (1442 frames, 30 fps, 1280x720, h264)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter
Sources are started each with their own source-id.
1st source is processed in full, 1442 jpegs written by the sink adapter. 2nd source processing is not finished, module hangs, 1 jpeg written by the sink adapter.
Input: pictures source adapter + one image file (jpeg 1280x720)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter
Sources are started each with their own source-id, results OK, as expected.
Input: pictures source adapter + 1400 image files (jpeg 1280x720)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter
Sources are started each with their own source-id.
1st source is processed in full, 1400 jpegs written by the sink adapter. 2nd source processing is not finished, module hangs, 1 jpeg written by the sink adapter.
Input: video source adapter + one video file (8340 frames, 60 fps, 1920x1080, h264)
Module: simple, only zmq src + zmq sink
Output: jpeg frames, image-files sink adapter
Source is started 2 times with the same source-id.
Result: 8158 + 1 jpegs written by the sink adapter.
Problem appeared after updating to DS6.2
When it occupies a step in a pipeline, this element creates a GpuMat snapshot of a frame and its metadata.
The snapshot is replaced when the next tagged frame arrives.
The snapshot lives for a configured amount of time.
The element supports scaling to a specific resolution if configured.
The PyFunc API supports iterating over the snapshots (incl. access to their metadata). The snapshots of the interest can be mixed into the current frame with CUDA GpuMat API.
E.g.
Because the tags are dynamic, the source can manage how often snapshots are saved.
If the frame doesn't have a specified tag, the frame doesn't pass to the encoder. Similarly, if the frame doesn't have a specified tag (other than the encoding tag), the draw element skips it.
The feature can lead to generating of 'sparse' video streams which are partially filled with frames (e.g. only key-frames are configured with tags).
Why it is required?
We may need to debug and visualize only certain streams; for the others, we don't want to encode and draw to save resources. To implement that, we may use a custom source/transit adapter, which injects additional tags into streams that must be encoded and visualized.
Similarly, only when something useful occurs in the stream a pyfunc
may be utilized, which will inject a required tag causing the frames to be drawn and encoded to the output. If there is no valuable information within the frame, it is not drawn or encoded.
If the user uses noscale
the resulting images go as is. By default noscale=true
.
Add an option to run a code with profiling with: https://github.com/gaogaotiantian/viztracer
The feature allows monitoring the code for bottlenecks and awaiting for GIL in various components.
By default, the profiling is disabled.
Use inline profiling:
https://github.com/gaogaotiantian/viztracer#inline
The tracing must be optionally enabled with the use of arguments (e.g. --with-profiling
or with a parameter in the config or with the environment variable SAVANT_PROFILING=enable
.
The periodicity of the profile dump must be implemented with the extra parameter --profiling-dump-interval=5
or with or with a parameter in the config (default 5).
When the profiling file is dumped, the name must be constructed using the timestamp: /tmp/profiling/savant-profiling-%{TS}.json
.
To use the profiling dumps, the user maps the dump directory to a host directory and analyzes the profiling with the vizviewer
program.
Also support with the configuration directives:
Support new DeepStream version,6.1.1 see Release notes.. Replace DS 6.1 base docker images with DS 6.1.1.
Support both oriented and axis-aligned boxes with IoU and Maha metrics. Use batch trackers.
Currently, the multimedia object is always serialized into AVRO. However, 0MQ supports a multipart message feature, which enables transferring the parts of the objects within a single large message. It may be beneficial to our protocol because we can avoid packing large blobs in AVRO, spend less time within it, and maybe utilize zero-copy features (python's memoryview
).
Nevertheless, it is impossible to completely remove the method (when multimedia data is packed into AVRO), because external transport systems may not support multipart messages (like Apache Kafka). So we have to implement it in a way to be able to choose how to transfer (and where to get the corresponding blobs). To enable that, the metadata (AVRO) object must include the descriptor specifying which type of packing is used. It can be:
External for 0MQ: type=0mq, location=null
.
E.g. external for S3: type=s3, location=b"s3://somewhere.about/a/b/c/d"
.
when embedded
, the multimedia data is packed/unpacked into AVRO blob field; when external
it is packed externally (in the way, that the transport protocol supports).
Currently, in all adapters and the framework, we may use external
. However, the code logic must be implemented to be able to read multimedia data based on AVRO specification (both embedded
and external
).
The external
option may help to use optimized processing features like memoryview
to exclude excessive blob copying, and copy=False
in 0mq send_multipart
, recv_multipart
.
The protocol must support the framework's key points (in and out). The key points must be abstract enough to encode various objects, not only persons. Each key point must include 2D coordinates and id
, which is domain-specific.
The merge will help users to find out how to use various scenarios.
Always-on RTSP sink implements stable RTSP server that delivers streaming frames when they present or stub picture when there is no incoming data.
The module handles only one source_id
which is specified upon the module launch. If the module is run as sub
, topic filtering is used, when it is launched with another kind of socket, the filtering happens based on metadata source_id
.
The module supports publishing to streaming media server like Aler9 RTSP frontend, Aler9 must be delivered and launched within a separate container like here: https://github.com/insight-platform/Fake-RTSP-Stream/blob/main/docker-compose.yml
A user specifies a stub picture in the form of a jpeg file. The resolution of the stub picture defines the resolution of the output RTSP stream. When there is no incoming data, the module places the current date/time in the user-format to the top-right corner (white on black). The format for date-time is specified by the user with command line argument.
The stub picture is displayed in two cases:
The module always delivers output stream with a fixed FPS (default 30) and a encoding profile (default high) specified with command line arguments. No matter either incoming stream produces frames or it is doesn't deliver anything, the module generates the outgoing stream with those parameters.
NB: If the encoding profile doesn't work due to a DeepStream error, try to replace it with the bitrate configuration.
The module consists of two threads:
It looks like the both pipelines may heavily reuse current Savant codebase.
When a stream image is positioned within the black-background stub image. There are two modes (scale-to-fit
or crop-to-fit
).
scale-to-fit
: scale with interpolation and aspect preservation fit the smallest dimension (width or hight). To fit well it must be scaled up or down. This is necessary to not limit incoming streams to specific resolutions;crop-to-fit
: crop to fit the canvas and place in the central position.The module handles the metadata in two ways:
source_id
and dumping the metadata to standard output as formatted JSON structures.To support the function that can dynamically change top-level ROI for images according to the passed area parameters specific to a particular source.
Artist in draw func erroneously receives normalized values for bbox center coordinates, width and height.
This is caused by NvDsBBox
scaling writing its changes into Deepstream meta
Savant/savant/deepstream/meta/bbox.py
Lines 89 to 103 in 0ea7c2b
in combination with the order of probes on output sink pad
Savant/savant/deepstream/pipeline.py
Lines 692 to 694 in 0ea7c2b
Proposed fix: move draw func probe before update_frame_meta
probe.
Hi, my module.yml is below, my video mp4 input size is 1944x2592, the problem is, output json successfully founds person and cars and also bbox'es of them, but there are no bbox on output image or video. I think the problem is bbox locations need to be scaled up, however I could not found any configuration about this. Can you please share how can I scale up bbox locations so that I can see them on the output image o videos.
This is my module.yml
name: ${oc.env:MODULE_NAME, 'deepstream_test2'}
parameters:
output_frame: {json:${oc.env:OUTPUT_FRAME, '{"codec":"jpeg"}'}}
frame_width: 2592
frame_height: 1944
pipeline:
elements:
- element: nvinfer@detector
name: peoplenet
model:
format: etlt
remote:
url: s3://savant-data/models/peoplenet/peoplenet_pruned_v2.0.zip
checksum_url: s3://savant-data/models/peoplenet/peoplenet_pruned_v2.0.md5
parameters:
endpoint: https://eu-central-1.linodeobjects.com
model_file: resnet34_peoplenet_pruned.etlt # v2.0 Accuracy: 84.3 Size 20.9 MB
input:
layer_name: input_1
shape: [3, 1944, 2592]
output:
layer_names: [output_bbox]
And here is some rows from output json
{"source_id": "104", "pts": 80000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": []}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 2}
{"source_id": "104", "pts": 120000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": [{"model_name": "Primary_Detector", "label": "Person", "object_id": 0, "bbox": {"xc": 0.31223976612091064, "yc": 0.27872899174690247, "width": 0.036283232271671295, "height": 0.1472644805908203, "angle": 0.0}, "confidence": 0.9695695638656616, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Person", "object_id": 1, "bbox": {"xc": 0.1602877974510193, "yc": 0.5068618655204773, "width": 0.055438458919525146, "height": 0.2115742415189743, "angle": 0.0}, "confidence": 0.899117648601532, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Car", "object_id": 2, "bbox": {"xc": 0.10423523187637329, "yc": 0.047047920525074005, "width": 0.0737493708729744, "height": 0.08167795836925507, "angle": 0.0}, "confidence": 0.7405556440353394, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}]}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 3}
{"source_id": "104", "pts": 160000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": [{"model_name": "Primary_Detector", "label": "Person", "object_id": 0, "bbox": {"xc": 0.31226062774658203, "yc": 0.2789035141468048, "width": 0.03635203838348389, "height": 0.1467110812664032, "angle": 0.0}, "confidence": 0.9607311487197876, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Person", "object_id": 1, "bbox": {"xc": 0.16161800920963287, "yc": 0.5075418949127197, "width": 0.056631267070770264, "height": 0.21095207333564758, "angle": 0.0}, "confidence": 0.8203519582748413, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Car", "object_id": 2, "bbox": {"xc": 0.10289463400840759, "yc": 0.04793255403637886, "width": 0.07455217093229294, "height": 0.08282533288002014, "angle": 0.0}, "confidence": 0.6844093799591064, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}]}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 4}
```
`
Update Savant to use DS 6.2
Run commands:
LOGLEVEL=DEBUG ./scripts/run_module.py samples/deepstream_test2/module.yml
./scripts/run_sink.py display
./scripts/run_source.py pictures black_bottles.jpg --source-id src-id-1
Result: sink adapter failing with error:
Traceback (most recent call last):
File "/opt/app/adapters/ds/gst_plugins/python/avro_video_player.py", line 174, in _add_branch
assert pad.link(branch.sink.get_static_pad('sink')) == Gst.PadLinkReturn.OK
File "/usr/lib/python3/dist-packages/gi/overrides/Gst.py", line 178, in link
raise LinkError(ret)
gi.overrides.Gst.LinkError: <enum GST_PAD_LINK_NOFORMAT of type Gst.PadLinkReturn>
Traceback (most recent call last):
File "/opt/app/savant/gstreamer/utils.py", line 112, in on_pad_event
return event_handler(pad, event, *data)
File "/opt/app/savant/gst_plugins/python/avro_video_decode_bin.py", line 337, in on_src_pad_eos
peer.send_event(Gst.Event.new_eos())
AttributeError: 'NoneType' object has no attribute 'send_event'
Notes:
source_adapter-module
works because another type of sink adapter gives correct result (image, video and json file are written aright). Display sink param only error.It is a concern, not a requirement. To check it first.
The default selector (https://github.com/insight-platform/Savant/blob/develop/savant/selector/detector.py) is called whenever no selector is specified (it might be that I'm wrong). But if it is, it must be replaced with efficient code, which doesn't hold GIL because otherwise, the inference plugin may block the execution of other parts because GIL may be held extra time.
It will allow users to add custom reaction to the creation of a new stream (adding a new source), resetting a stream, and end-of-stream events (e.g. to reset a tracker).
The pipeline represents up-to-date functions of the Savant framework. It is built around people-centric models (body detection, face detection).
The practical purpose of the pipeline is to demonstrate the following:
The number of people must be updated once in a second to decrease flickering. The 'green' icons represent people with faces; the 'blue' icons represent people without facial information. Object bounding boxes must have the same colors as human icons.
Every face is tracked with a stock Nvidia tracker, and the box is received from the tracker to decrease flickering. Object track id is displayed for the tracked faces.
Bodies are also tracked, and their boxes are drawn based on tracked boxes to reduce flickering. The logo is uploaded into GpuMat only once and applied to every frame.
The interactive demo demonstrates how it works; it will be presented in multiple in/out configurations: N RTSP cams and N Always-ON RTSP adapters.
The non-interactive demo demonstrates the performance; it will be presented in the configuration: N video files and N metadata files.
Output Video: 1280x(720+180)
The upper padding where the logo, icons, and numbers are situated has 180px of height.
The logo has a height of 120px;
The human icon (including paddings) has a height of 120px;
The font is Lato (if possible, if not, use Sans-Serif), letter height is 85px;
Logo - Green: 200 px;
Green - text: 60px;
Green - Blue - 300px;
Blue - text: 60px;
At least on the pyfunc
side, access to the frame
object must be provided with a direct call rather than via iteration. Internally it still can be implemented in the form of a loop with caching (?).
Linked (to do first): #43
In the Savant, the standard Nvidia approach is used to draw graphics on a GPU-defined image: one must map it to the CPU, draw changes, and unmap it to apply changes in GPU RAM. The approach requires two copies (to CPU and back to GPU) which is excessive.
Another approach is possible with a single copy from the CPU to the GPU RAM. To implement that, Savant must provide the "overlay" operation that allows the conjunction of two images (background one (GPU) and foreground one (stencil, defined in CPU to be moved to GPU)).
The approach requires the foreground image to be defined in the colorspace that supports alpha-channel (transparency) - RGBA. After the FG image is moved to GPU RAM, it is to be placed over the original background image with respect to the transparency defined.
The user can provide the following parameters:
To make it efficient, two features are required:
FG1: [<linear|catmull|nearest|noscale>, LX1,TY1, RX1, BY1], [BG1, BG2, BG3, ...., BGN], []
FG2: [<linear|catmull|nearest|noscale>, LX2,TY2, RX2, BY2], [BG1], [blur]
To optimize CPU-GPU transfers, the method may require clipping all the stencils on a single image (knapsack placement) with properly defined coordinates to avoid multiple transfer operations over the PCI-E bus.
From the user-side perspective, there are two different use-case models:
X>0
expires in X
ms);There is a memory leak in drawbin element. The problem was found on Xavier NX. Run any module with a source adapter and use htop tool to watch RES column of the running module.
Consider to refactor drawbin element, to make more lightweight (convert to simple pyfunc element, not a bin) and remove location property support
Implement the plugin that is capable of blurring the area within the box.
Scale frames to the original resolution & aspect ratio before sending them to the pipeline output (should be configurable).
Create a plugin that allows the user to specify additional graphical space on the image filled with blank black color and is used to place temporary graphical objects used in secondary inference steps. The primary ROI must include the initial frame area, while the extended area must be used only for utility operations.
There must also be settings configuring a mirror operation - the plugin removes excessive space.
E.g.
Muxer: WxH
(optional) canvas-size: {top, left}, Wx(H+N)
...
(optional) canvas-size: {top, left}, WxH
Demuxer: WxH
First of all, thanks for the amazing repo!
I am a beginner in DeepStream and currently learning Savant to convert pytorch-based multi-streamsCV analytics pipeline to DeepStream/Savant based pipeline to improve runtime performance on Jetson devices. I want to implement custom functions such as line-cross counting similar to sample video and also want to stream outputs as RTSP stream. Also, I want to upload counted results to Azure CosmosDB or Firebase in interval. I am struggling to integrate these functions to Savant and kinda lost on where to start right now. Can you please guide me on this? Thanks in advance!
By default normalize=false
. The outgoing/incoming metadata must include a descriptor that helps to distinguish between [0,1]
-normalized and regular coordinates.
Add properties to Savant meta API that are helpful for batch tracking in PyFunc
Deepstream on jetson devices has a hardware JPEG encoder - nvjpegenc
. We need to use it rather than a CPU encoder (jpegenc
).
Make it possible to blur with the artist without a custom pyfunc code. Change the PeopleNet demo accordingly.
The drawfunc
customization must be removed. The properties must be configured declaratively.
Write parameters validation tests for draw_func
block to ensure reasonable errors during the initialization.
I'm using yolov5s.onnx as model file for detector. However, It gets error about parsing bboxes. I've tried to use convertor but I could not achieve that. What is wrong here? How can I make savant to draw bbox on output with yolov5 model?
name: mytest
parameters:
output_frame: ${json:${oc.env:OUTPUT_FRAME, '{"codec":"jpeg"}'}}
frame_width: 1920
frame_height: 1080
pipeline:
elements:
- element: nvinfer@detector
name: yolonew
model:
format: onnx
model_file: yolov5s.onnx
input:
layer_name: images
shape: [3, 640, 640]
scale_factor: 0.0039215697906911373
maintain_aspect_ratio: true
batch_size: 1
output:
layer_names: [output0]
converter:
module: savant.converter.yolo_v4
class_name: TensorToBBoxConverter
num_detected_classes: 80
objects:
- class_id: 0
label: Person
- element: drawbin
module: savant.deepstream.drawbin
class_name: NvDsDrawBin
element_type: detector
As far as I understand, currently, to access the properties of a frame
object and other predefined
or pre-configured
objects, we always look for them in the Gst metadata to get their IDs. When we know IDs, we may access them through storage. Maybe, there is a way to know them beforehand (without cycling through Gst Metadata). Like having a reverse lookup table to quickly access an object by type without parsing Gst metadata.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.