insight-platform / savant Goto Github PK

Python Computer Vision & Video Analytics Framework With Batteries Included

License: Apache License 2.0

Makefile 0.49% Python 85.80% Shell 1.66% CMake 1.24% C 0.63% C++ 8.50% Cuda 1.68%

computer-vision deepstream edge-computing inference-engine machine-learning nvidia-deepstream-sdk deep-learning nvidia object-detection video

savant's People

Stargazers

Watchers

savant's Issues

Add support for input tensor meta

DS added preprocessing gst element and meta NvDsPreProcessTensorMeta.
We can't use the gst element because there isn't primary object access, but we can try to use meta directly.

Investigate the possibility to use this meta in Savant. Cases: 1) face alignment for classification and reidentification; 2) rotated bbox alignment for use as secondary model's input.
Add needed interfaces/bindings
Replace the current rotated bbox alignment implementation

Implement OpenTelemetry Instrumentation

Add tracing instrumentation to the project and support exporting metrics to the Jaeger agent. We will measure the performance of ZMQ input / ZMQ output / batched operations. The purpose of tracing is to understand how long it takes to evaluate our code.

The client code must be implemented with OpenTelemetry and use UDP to export metrics to Jaeger Agent. UDP enables seamless operation in cases when the agent doesn't function.

graph TD
    LIB --> |HTTP or gRPC| COLLECTOR
    LIB["Jaeger Client (deprecated)"] --> |UDP| AGENT[Jaeger Agent]
    %% AGENT --> |HTTP/sampling| LIB
    AGENT --> |gRPC| COLLECTOR[Jaeger Collector]
    %% COLLECTOR --> |gRPC/sampling| AGENT
    SDK["OpenTelemetry SDK (recommended)"] --> |UDP| AGENT
    SDK --> |HTTP or gRPC| COLLECTOR
    COLLECTOR --> STORE[Storage]
    COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
    PLUGIN --> STORE
    QUERY[Jaeger Query Service] --> STORE
    QUERY --> |gRPC| PLUGIN
    UI[Jaeger UI] --> |HTTP| QUERY
    subgraph Application Host
        subgraph User Application
            LIB
            SDK
        end
        AGENT
    end

To enable tracing between processes (adapters, framework), we need to add a tracing id to the AVRO schema and add the tracing to adapters and into the framework. This will give us the capability to trace packets e2e:

Packed created (new trace) > handled during the adapters and pipeline > ...

Distributed tracing: https://uptrace.dev/opentelemetry/distributed-tracing.html#what-to-instrument

The framework supports two options for tracing:

external;
internal;

External: If a source doesn't trace the packet, the framework avoids creating a new trace by itself.

Internal: the framework creates traces by itself and supports configuring the sampling.

Ensure that tracing doesn't influence performance significantly. Consider configuring sampling (Head-based sampling) on adapters to decrease the performance influence and amount of traces: https://uptrace.dev/opentelemetry/sampling.html

Implement draw element artist using GpuMat OpenCV

Replace the draw element artist with a new one based on GpuMat OpenCV.

Implement HEVC encoding for GigE adapter

The current implementation of the GigE adapter works with RGB images. To pass frames over a network or locally on higher FPS, it's necessary to pack them into an efficient format like H264/HEVC or JPEG.

We need to develop the implementation that generates HQ HEVC with a standard software-based encoder. Support the encoding profiles.

After the implementation is complete, the adapter can send:

raw frames;
HEVC (CPU encoded).

The user can configure the quality profile, bitrate, and IDR interval.

Add a sample demonstrating how to use adapter with fake GigE (docker compose), on mock gige:
gige_source.

Take a look at:
https://github.com/insight-platform/Savant/tree/develop/samples/multiple_rtsp

1st adapter uses raw frames
2nd adapter uses hevc frames

Quality and Bitrate Configuration for Savant Output Frame

Allow setting bitrate and Quality profile for H264, H265, and other supported video codecs.

Access to more information about the frame

Sometimes, for processing in custom functions (plugins NvDsPyFuncPlugin) or in rendering (NvDsDrawFunc), additional frame information is needed. This information is stored in the tag field of the GstFrameMeta structure, for example the location in the case of a video-source adapter.
In all cases, NvDsFrameMeta is passed as an argument. I propose extending NvDsFrameMeta to give the user access to additional meta-information for frames stored in METADATA_STORAGE as GstFrameMeta.

Finalize Pub/Sub, Req/Rep, Dealer/Router configurations

We have to finally establish the necessary number of socket types that are sufficient for all the tasks.

Limit recv queue length

Prevent zeromq recv queue length from growth. Implement with the parameter.

complexModel Converter example

Hi, Can you please share any ComplexModel output converter example python script. In your code I have seen

converter:
module: module.face_detector_coverter
class_name: FaceDetectorConverter

this configuration but I could not found FaceDetectorConverter script. I'm using retinaface for nvinfer@complex_model and I have trouble in converter section. If you have any converter can your share it please?

Move SavantBoost library to the project

Savant containers now have a build stage. At this point, it will be more convenient to build the boost library for each supported platform.

Support Similari trackers for objects [SORT]

Add the ability to use Similari trackers in Savant pipelines (only sort for now).

Similari is now available on the PyPI repository:

pip install similari-trackers-rs==0.26.1

Limit GST queues from excessive growth

Introduce the parameter that limits the length of GST queues within the pipeline.

Tag-based Processing Activation Feature

Make an efficient function (not Python) that activates and deactivates corresponding elements based on the presence/absence of certain tags.

Motivation: The feature is required to deploy a single pipeline that can handle different scenarios for various cameras.

E.g. We have two cams that we want to handle in the following way:

cam1: cam motion, fire detection
cam2: cam motion, fire detection, human detection

We can cope with the task in two different ways:

deploy two pipelines and multiplex cameras accordingly; it introduces code duplication, more disk space required;
deploy a single pipeline and annotate streams with appropriate tags which enable/disable certain elements within the pipeline on per frame basis;

The filter must handle the following tags:

* - any (default, the same if not specified)

   - a=x
   - b=y

If a=x or b=y, then activates.

Pub/Sub topic filtering

This is an optimizing feature, that decreases the amount of traffic sent to wrong destinations.

Zero MQ's Pub/Sub implements a topic filtering feature. The Subscriber can specify the topic prefix which causes that the only matching topics are being processed. The feature is only available for pub or sub socket types, but ZeroMQ transparently handles other kinds of generally compatible sockets without topic filtering as well.

We have to support the feature based on stream source_id concept. So, when the publisher of any kind (source adapter or framework) publishes the messages to pub/sub it must specify source_id as topic_name.

When the subscriber of any kind (framework or sink adapter) is configured it can optionally specify topic prefix to filter out only matching topics. For the framework the prefix may be specified in a YaML configuration, for sink adapter it may be specified as an argument or if adapter is designed to support only the kind of operation for a single source_id, then topic filter must be set to source_id.

Summary:

multiplexed sinks/framework input: source_id prefix or source_id;
single stream sinks: source_id
publishers (source adapter, framework output): source_id > topic

To create corresponding tests.

From ZMQ GitHub: zeromq/libzmq#3611

Access to frames from PyFunc as OpenCV GpuMat

To provide to way to access batch elements as GpuMat OpenCV class.

Fit to rect/square plugin function

The idea of function/plugin/node is to unify the simultaneous processing of frames with various aspect ratios to a single resolution without deformation. It can be used when the aspect ratio can vary between sources, especially during image processing.

Configuration options include the following properties:

left_alignment: center|let|right
vertical_alignment: center|top|bottom
interpolation: what is supported by OpenCV-CUDA (choose default good)
frame_roi: original-image | full-area

Add TAO and YOLO models post processors

Add TAO and YOLOv5/6/7/X models post processors so that they can be used effectively with Savant.

Source processing early stop

While sending consecutive sources into a module, various problems may be observed.

Case 1

Input: video source adapter + one video file (50 frames, 60 fps, 1920x1080, h264)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter

2nd source's processing stops early, with module completing output of only 1 frame. At the same time 1st source and 3rd and later sources are processed in full, 50 frames written by the sink adapter.

In case sources are started each with their own source-id, sink adapter writes 50 / 1 / 50 frames into their respective directories.
In case sources are started using the same source-id each time, sink adapter similarly writes 50 / 1 / 50 frames into single directory, resulting in cumulative amount of frames of 50 \ 51 \ 101 after 1st, 2nd and 3rd runs respectively.

Case 2

Input: video source adapter + one video file (1442 frames, 30 fps, 1280x720, h264)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter

Sources are started each with their own source-id.
1st source is processed in full, 1442 jpegs written by the sink adapter. 2nd source processing is not finished, module hangs, 1 jpeg written by the sink adapter.

Case 3

Input: pictures source adapter + one image file (jpeg 1280x720)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter

Sources are started each with their own source-id, results OK, as expected.

Case 4

Input: pictures source adapter + 1400 image files (jpeg 1280x720)
Module: peoplenet_detector
Output: jpeg frames, image-files sink adapter

Sources are started each with their own source-id.
1st source is processed in full, 1400 jpegs written by the sink adapter. 2nd source processing is not finished, module hangs, 1 jpeg written by the sink adapter.

Case 5

Input: video source adapter + one video file (8340 frames, 60 fps, 1920x1080, h264)
Module: simple, only zmq src + zmq sink
Output: jpeg frames, image-files sink adapter

Source is started 2 times with the same source-id.
Result: 8158 + 1 jpegs written by the sink adapter.

Problem appeared after updating to DS6.2

Video MixIn Element

When it occupies a step in a pipeline, this element creates a GpuMat snapshot of a frame and its metadata.

The snapshot is replaced when the next tagged frame arrives.

The snapshot lives for a configured amount of time.

The element supports scaling to a specific resolution if configured.

The PyFunc API supports iterating over the snapshots (incl. access to their metadata). The snapshots of the interest can be mixed into the current frame with CUDA GpuMat API.

E.g.

Front cam/back cam mixin (every cam finds the snapshot of the other one and mixes it into the corner of the self-view as a miniature);
360 view for a car.

Because the tags are dynamic, the source can manage how often snapshots are saved.

Implement conditional encoding/drawing based on per-stream tags

If the frame doesn't have a specified tag, the frame doesn't pass to the encoder. Similarly, if the frame doesn't have a specified tag (other than the encoding tag), the draw element skips it.

The feature can lead to generating of 'sparse' video streams which are partially filled with frames (e.g. only key-frames are configured with tags).

Why it is required?

We may need to debug and visualize only certain streams; for the others, we don't want to encode and draw to save resources. To implement that, we may use a custom source/transit adapter, which injects additional tags into streams that must be encoded and visualized.

Similarly, only when something useful occurs in the stream a pyfunc may be utilized, which will inject a required tag causing the frames to be drawn and encoded to the output. If there is no valuable information within the frame, it is not drawn or encoded.

Add an option to avoid scaling the frames back at the end of the pipeline

If the user uses noscale the resulting images go as is. By default noscale=true.

Add python profiling to Savant

Add an option to run a code with profiling with: https://github.com/gaogaotiantian/viztracer

The feature allows monitoring the code for bottlenecks and awaiting for GIL in various components.

By default, the profiling is disabled.

Use inline profiling:
https://github.com/gaogaotiantian/viztracer#inline

The tracing must be optionally enabled with the use of arguments (e.g. --with-profiling or with a parameter in the config or with the environment variable SAVANT_PROFILING=enable.

The periodicity of the profile dump must be implemented with the extra parameter --profiling-dump-interval=5 or with or with a parameter in the config (default 5).

When the profiling file is dumped, the name must be constructed using the timestamp: /tmp/profiling/savant-profiling-%{TS}.json.

To use the profiling dumps, the user maps the dump directory to a host directory and analyzes the profiling with the vizviewer program.

Also support with the configuration directives:

Min Duration: https://viztracer.readthedocs.io/en/stable/filter.html#min-duration
Include Files: https://viztracer.readthedocs.io/en/stable/filter.html#include-files
Exclude Files: https://viztracer.readthedocs.io/en/stable/filter.html#exclude-files
Sparse Log: https://viztracer.readthedocs.io/en/stable/filter.html#log-sparse

Move to DS 6.1.1

Support new DeepStream version,6.1.1 see Release notes.. Replace DS 6.1 base docker images with DS 6.1.1.

Implement Similari SORT plugin

Support both oriented and axis-aligned boxes with IoU and Maha metrics. Use batch trackers.

Change transport protocol to optionally transfer the multimedia object outside of the AVRO message

Currently, the multimedia object is always serialized into AVRO. However, 0MQ supports a multipart message feature, which enables transferring the parts of the objects within a single large message. It may be beneficial to our protocol because we can avoid packing large blobs in AVRO, spend less time within it, and maybe utilize zero-copy features (python's memoryview).

Nevertheless, it is impossible to completely remove the method (when multimedia data is packed into AVRO), because external transport systems may not support multipart messages (like Apache Kafka). So we have to implement it in a way to be able to choose how to transfer (and where to get the corresponding blobs). To enable that, the metadata (AVRO) object must include the descriptor specifying which type of packing is used. It can be:

embedded;
external [type: String, descriptor: blob].

External for 0MQ: type=0mq, location=null.
E.g. external for S3: type=s3, location=b"s3://somewhere.about/a/b/c/d".

when embedded, the multimedia data is packed/unpacked into AVRO blob field; when external it is packed externally (in the way, that the transport protocol supports).

Currently, in all adapters and the framework, we may use external. However, the code logic must be implemented to be able to read multimedia data based on AVRO specification (both embedded and external).

The external option may help to use optimized processing features like memoryview to exclude excessive blob copying, and copy=False in 0mq send_multipart, recv_multipart.

Add keypoints to the protocol

The protocol must support the framework's key points (in and out). The key points must be abstract enough to encode various objects, not only persons. Each key point must include 2D coordinates and id, which is domain-specific.

Merge the test suite from the private repo into the public repo

The merge will help users to find out how to use various scenarios.

Always-On Low Latency Streaming Sink (RTSP)

Always-on RTSP sink implements stable RTSP server that delivers streaming frames when they present or stub picture when there is no incoming data.

General

The module handles only one source_id which is specified upon the module launch. If the module is run as sub, topic filtering is used, when it is launched with another kind of socket, the filtering happens based on metadata source_id.

The module supports publishing to streaming media server like Aler9 RTSP frontend, Aler9 must be delivered and launched within a separate container like here: https://github.com/insight-platform/Fake-RTSP-Stream/blob/main/docker-compose.yml

Stub Producer

A user specifies a stub picture in the form of a jpeg file. The resolution of the stub picture defines the resolution of the output RTSP stream. When there is no incoming data, the module places the current date/time in the user-format to the top-right corner (white on black). The format for date-time is specified by the user with command line argument.

The stub picture is displayed in two cases:

when the module is started and there are no incoming frames;
when the incoming frame stream doesn't update frames for X milliseconds (default: 1000, which is specified by the user with command line argument).

FPS and Quality

The module always delivers output stream with a fixed FPS (default 30) and a encoding profile (default high) specified with command line arguments. No matter either incoming stream produces frames or it is doesn't deliver anything, the module generates the outgoing stream with those parameters.

NB: If the encoding profile doesn't work due to a DeepStream error, try to replace it with the bitrate configuration.

Proposed Architecture

The module consists of two threads:

output thread, which generates black-background stub, optionally replaces it with either real-stream image OR stub picture (with date-time), and generates RTSP output (NVENC).
input thread, which decodes incoming frames (NVDEC), places them to GpuMat, which can be and updates the last-time counter.

It looks like the both pipelines may heavily reuse current Savant codebase.

Image Alignment

When a stream image is positioned within the black-background stub image. There are two modes (scale-to-fit or crop-to-fit).

scale-to-fit: scale with interpolation and aspect preservation fit the smallest dimension (width or hight). To fit well it must be scaled up or down. This is necessary to not limit incoming streams to specific resolutions;
crop-to-fit: crop to fit the canvas and place in the central position.

Metadata Processing

The module handles the metadata in two ways:

unconditional drop;
filtering the source_id and dumping the metadata to standard output as formatted JSON structures.

Per-source dynamic top-level ROI change

To support the function that can dynamically change top-level ROI for images according to the passed area parameters specific to a particular source.

Incorrect drawing of object boxes

Artist in draw func erroneously receives normalized values for bbox center coordinates, width and height.

This is caused by NvDsBBox scaling writing its changes into Deepstream meta

Savant/savant/deepstream/meta/bbox.py

Lines 89 to 103 in 0ea7c2b

    
               @left.setter 
        
               def left(self, value: float): 
        
                   self._nv_ds_bbox.left = value 
        
                   self._nv_ds_rect_meta.left = value 
        
               def scale(self, scale_x: float, scale_y: float): 
        
                   """Scales BBox. 
        
                   :param scale_x: The scaling factor applied along the x-axis. 
        
                   :param scale_y: The scaling factor applied along the y-axis. 
        
                   """ 
        
                   self.left *= scale_x 
        
                   self.top *= scale_y 
        
                   self.width *= scale_x 
        
                   self.height *= scale_y

in combination with the order of probes on output sink pad

Savant/savant/deepstream/pipeline.py

Lines 692 to 694 in 0ea7c2b

    
           sink_peer_pad.add_probe(Gst.PadProbeType.BUFFER, self.update_frame_meta) 
        
           if self._draw_func and self._output_frame_codec: 
        
               sink_peer_pad.add_probe(Gst.PadProbeType.BUFFER, self._draw_on_frame_probe)

Proposed fix: move draw func probe before update_frame_meta probe.

Output BBox not drawing

Hi, my module.yml is below, my video mp4 input size is 1944x2592, the problem is, output json successfully founds person and cars and also bbox'es of them, but there are no bbox on output image or video. I think the problem is bbox locations need to be scaled up, however I could not found any configuration about this. Can you please share how can I scale up bbox locations so that I can see them on the output image o videos.

This is my module.yml

name: ${oc.env:MODULE_NAME, 'deepstream_test2'}

parameters:
  output_frame: {json:${oc.env:OUTPUT_FRAME, '{"codec":"jpeg"}'}}
  frame_width: 2592
  frame_height: 1944

pipeline:
  elements:
    - element: nvinfer@detector
      name: peoplenet
      model:
        format: etlt
        remote:
          url: s3://savant-data/models/peoplenet/peoplenet_pruned_v2.0.zip
          checksum_url: s3://savant-data/models/peoplenet/peoplenet_pruned_v2.0.md5
          parameters:
            endpoint: https://eu-central-1.linodeobjects.com
        
        model_file: resnet34_peoplenet_pruned.etlt  # v2.0 Accuracy: 84.3 Size 20.9 MB

 
        input:
          layer_name: input_1
          shape: [3, 1944, 2592]
        output:
          layer_names: [output_bbox]

And here is some rows from output json

{"source_id": "104", "pts": 80000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": []}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 2}
{"source_id": "104", "pts": 120000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": [{"model_name": "Primary_Detector", "label": "Person", "object_id": 0, "bbox": {"xc": 0.31223976612091064, "yc": 0.27872899174690247, "width": 0.036283232271671295, "height": 0.1472644805908203, "angle": 0.0}, "confidence": 0.9695695638656616, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Person", "object_id": 1, "bbox": {"xc": 0.1602877974510193, "yc": 0.5068618655204773, "width": 0.055438458919525146, "height": 0.2115742415189743, "angle": 0.0}, "confidence": 0.899117648601532, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Car", "object_id": 2, "bbox": {"xc": 0.10423523187637329, "yc": 0.047047920525074005, "width": 0.0737493708729744, "height": 0.08167795836925507, "angle": 0.0}, "confidence": 0.7405556440353394, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}]}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 3}
{"source_id": "104", "pts": 160000000, "framerate": "25/1", "width": 1280, "height": 720, "dts": null, "duration": 40000000, "codec": "jpeg", "keyframe": true, "metadata": {"objects": [{"model_name": "Primary_Detector", "label": "Person", "object_id": 0, "bbox": {"xc": 0.31226062774658203, "yc": 0.2789035141468048, "width": 0.03635203838348389, "height": 0.1467110812664032, "angle": 0.0}, "confidence": 0.9607311487197876, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Person", "object_id": 1, "bbox": {"xc": 0.16161800920963287, "yc": 0.5075418949127197, "width": 0.056631267070770264, "height": 0.21095207333564758, "angle": 0.0}, "confidence": 0.8203519582748413, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}, {"model_name": "Primary_Detector", "label": "Car", "object_id": 2, "bbox": {"xc": 0.10289463400840759, "yc": 0.04793255403637886, "width": 0.07455217093229294, "height": 0.08282533288002014, "angle": 0.0}, "confidence": 0.6844093799591064, "attributes": [], "parent_model_name": null, "parent_label": null, "parent_object_id": null}]}, "tags": {"location": "/home/argegpu/softwareTeam/cv-pipeline-abdullah/cv-pipeline/InsightFace_Pytorch/data/InterProbe/inputs/videos/s15/s15.mp4"}, "schema": "VideoFrame", "frame_num": 4}
```
`

Review Dynamic Source Bin Look and Feel

Stale source bin lifecycle management (how it works, how it can be tuned/configured);
Batch size management;
Document it carefully for user's sake.

Move to DS 6.2

Update Savant to use DS 6.2

release notes

Sink adapter doesn't display video and pictures

Run commands:

LOGLEVEL=DEBUG ./scripts/run_module.py samples/deepstream_test2/module.yml
./scripts/run_sink.py display
./scripts/run_source.py pictures black_bottles.jpg --source-id src-id-1

Result: sink adapter failing with error:

Traceback (most recent call last):
  File "/opt/app/adapters/ds/gst_plugins/python/avro_video_player.py", line 174, in _add_branch
    assert pad.link(branch.sink.get_static_pad('sink')) == Gst.PadLinkReturn.OK
  File "/usr/lib/python3/dist-packages/gi/overrides/Gst.py", line 178, in link
    raise LinkError(ret)
gi.overrides.Gst.LinkError: <enum GST_PAD_LINK_NOFORMAT of type Gst.PadLinkReturn>
Traceback (most recent call last):
  File "/opt/app/savant/gstreamer/utils.py", line 112, in on_pad_event
    return event_handler(pad, event, *data)
  File "/opt/app/savant/gst_plugins/python/avro_video_decode_bin.py", line 337, in on_src_pad_eos
    peer.send_event(Gst.Event.new_eos())
AttributeError: 'NoneType' object has no attribute 'send_event'

black_bottles.tar.gz

Notes:

This problem is relevant for any type of source adapter.
Combination source_adapter-module works because another type of sink adapter gives correct result (image, video and json file are written aright). Display sink param only error.

Optimize the default selector with C++ code

It is a concern, not a requirement. To check it first.

The default selector (https://github.com/insight-platform/Savant/blob/develop/savant/selector/detector.py) is called whenever no selector is specified (it might be that I'm wrong). But if it is, it must be replaced with efficient code, which doesn't hold GIL because otherwise, the inference plugin may block the execution of other parts because GIL may be held extra time.

Add callbacks for stream events to the pipeline and components

It will allow users to add custom reaction to the creation of a new stream (adding a new source), resetting a stream, and end-of-stream events (e.g. to reset a tracker).

Add bindings for gst-nvevent.h (deepstream/lib/libnvdsgst_helper.so). There are some event helpers in savant/deepstream/utils.py, refactor code.
Add callback(s) for pyfunc and pipeline (NvDsPipeline) on stream start, reset, and end

0.2.0 Release Demo Pipeline

The pipeline represents up-to-date functions of the Savant framework. It is built around people-centric models (body detection, face detection).

The practical purpose of the pipeline is to demonstrate the following:

how a detection model is used (body, facial);
how tracking is used (standard Nvidia tracking);
how the overlay element works (draw boxes, text, blurring);
how PyFunc callbacks and OpenCV integration work (demo logo, display number of persons as infographic icons);
how to inject video streams into the framework for processing with files and RTSP fake streams;
how to receive the results from the framework with the Always-ON RTSP plugin;
how the pipeline manifest is organized;
how to deploy everything in a Compose for Jetson and x86 platforms;
measure the performance and include an FPS meter inside the pipeline;

The number of people must be updated once in a second to decrease flickering. The 'green' icons represent people with faces; the 'blue' icons represent people without facial information. Object bounding boxes must have the same colors as human icons.

Every face is tracked with a stock Nvidia tracker, and the box is received from the tracker to decrease flickering. Object track id is displayed for the tracked faces.

Bodies are also tracked, and their boxes are drawn based on tracked boxes to reduce flickering. The logo is uploaded into GpuMat only once and applied to every frame.

The interactive demo demonstrates how it works; it will be presented in multiple in/out configurations: N RTSP cams and N Always-ON RTSP adapters.

The non-interactive demo demonstrates the performance; it will be presented in the configuration: N video files and N metadata files.

Layout

Sizes

Output Video: 1280x(720+180)
The upper padding where the logo, icons, and numbers are situated has 180px of height.
The logo has a height of 120px;
The human icon (including paddings) has a height of 120px;
The font is Lato (if possible, if not, use Sans-Serif), letter height is 85px;

Measurements

Logo - Green: 200 px;
Green - text: 60px;
Green - Blue - 300px;
Blue - text: 60px;

Logo Image

Quick access to Frame object without iteration

At least on the pyfunc side, access to the frame object must be provided with a direct call rather than via iteration. Internally it still can be implemented in the form of a loop with caching (?).

Support the application of stencils defined in CPU RAM to GPU images

Linked (to do first): #43

In the Savant, the standard Nvidia approach is used to draw graphics on a GPU-defined image: one must map it to the CPU, draw changes, and unmap it to apply changes in GPU RAM. The approach requires two copies (to CPU and back to GPU) which is excessive.

Another approach is possible with a single copy from the CPU to the GPU RAM. To implement that, Savant must provide the "overlay" operation that allows the conjunction of two images (background one (GPU) and foreground one (stencil, defined in CPU to be moved to GPU)).

The approach requires the foreground image to be defined in the colorspace that supports alpha-channel (transparency) - RGBA. After the FG image is moved to GPU RAM, it is to be placed over the original background image with respect to the transparency defined.

The user can provide the following parameters:

left_X, top_Y to define the placement offset;
right_X, bottom_Y to define the scale (the module must be able to scale the image in the GPU (to be able to transfer smaller images and scale them later);
scale method (one of those that is supported by NVIDIA libs like https://docs.nvidia.com/vpi/algo_rescale.html) that is applied when stencil size varies from the placement defined (or no scale);
modifier method that supports preprocessing modifiers like 'blur' for the area where the stencil is applied ('blur', 'pixelize', and 'grayscale' are examples); the candidates may be found here

To make it efficient, two features are required:

to apply a single stencil to a whole batch;
to map multiple stencils to every BG image from the batch;

FG1: [<linear|catmull|nearest|noscale>, LX1,TY1, RX1, BY1],  [BG1, BG2, BG3, ...., BGN], []
FG2: [<linear|catmull|nearest|noscale>, LX2,TY2, RX2, BY2],  [BG1], [blur]

To optimize CPU-GPU transfers, the method may require clipping all the stencils on a single image (knapsack placement) with properly defined coordinates to avoid multiple transfer operations over the PCI-E bus.

From the user-side perspective, there are two different use-case models:

a user-defined function that constructs clips to be applied (most flexible configuration);
static images, defined in the config, which are read once (when the pipeline starts) and are always applied to the exact coordinates to the whole batch (if possible, they can be loaded to the GPU RAM once as well);
support cached transfers that are valid for X milliseconds without additional transfers (-1 - never expires, 0 - immediately expires, X>0 expires in X ms);
support named shared map transfers accessible by OS-level SHM descriptor (cached as well)

Fix memory leak when using a frame access on Jetson device

There is a memory leak in drawbin element. The problem was found on Xavier NX. Run any module with a source adapter and use htop tool to watch RES column of the running module.

Consider to refactor drawbin element, to make more lightweight (convert to simple pyfunc element, not a bin) and remove location property support

Implement Blur plugin

Implement the plugin that is capable of blurring the area within the box.

Keep the original frame resolution/aspect ratio on pipeline output

Scale frames to the original resolution & aspect ratio before sending them to the pipeline output (should be configurable).

Extend in-GPU image dimensions to add spare space for cropped and exogenous elements placement

Create a plugin that allows the user to specify additional graphical space on the image filled with blank black color and is used to place temporary graphical objects used in secondary inference steps. The primary ROI must include the initial frame area, while the extended area must be used only for utility operations.

There must also be settings configuring a mirror operation - the plugin removes excessive space.

E.g.

Muxer: WxH
(optional) canvas-size: {top, left}, Wx(H+N)
...
(optional) canvas-size:  {top, left}, WxH
Demuxer: WxH

RTSP stream as output and custom functions

First of all, thanks for the amazing repo!

I am a beginner in DeepStream and currently learning Savant to convert pytorch-based multi-streamsCV analytics pipeline to DeepStream/Savant based pipeline to improve runtime performance on Jetson devices. I want to implement custom functions such as line-cross counting similar to sample video and also want to stream outputs as RTSP stream. Also, I want to upload counted results to Azure CosmosDB or Firebase in interval. I am struggling to integrate these functions to Savant and kinda lost on where to start right now. Can you please guide me on this? Thanks in advance!

Add an option to avoid normalizing metadata coordinates to [0,1]

By default normalize=false. The outgoing/incoming metadata must include a descriptor that helps to distinguish between [0,1]-normalized and regular coordinates.

Extend frame and object meta API for batch tracking

Add properties to Savant meta API that are helpful for batch tracking in PyFunc

frame.source_id
frame.is_initial
object.uid

Use a hardware JPEG encoder when it's available

Deepstream on jetson devices has a hardware JPEG encoder - nvjpegenc. We need to use it rather than a CPU encoder (jpegenc).

Extend the artist to support bluring

Make it possible to blur with the artist without a custom pyfunc code. Change the PeopleNet demo accordingly.

The drawfunc customization must be removed. The properties must be configured declaratively.

Write parameters validation tests for draw_func block to ensure reasonable errors during the initialization.

Yolo BBox Error

I'm using yolov5s.onnx as model file for detector. However, It gets error about parsing bboxes. I've tried to use convertor but I could not achieve that. What is wrong here? How can I make savant to draw bbox on output with yolov5 model?

name: mytest

parameters:
  output_frame: ${json:${oc.env:OUTPUT_FRAME, '{"codec":"jpeg"}'}}
  frame_width: 1920
  frame_height: 1080

pipeline:
  elements:
    - element: nvinfer@detector
      name: yolonew
      model:
        format: onnx
        model_file: yolov5s.onnx
        input:
          layer_name: images
          shape: [3, 640, 640]
          scale_factor: 0.0039215697906911373
          maintain_aspect_ratio: true
        batch_size: 1
        
        output:
          layer_names: [output0]
          converter:
            module: savant.converter.yolo_v4
            class_name: TensorToBBoxConverter
          num_detected_classes: 80
          objects:
            - class_id: 0
              label: Person


    - element: drawbin
      module: savant.deepstream.drawbin
      class_name: NvDsDrawBin
      element_type: detector

Special reverse index for accessing custom metadata for top-level objects

As far as I understand, currently, to access the properties of a frame object and other predefined or pre-configured objects, we always look for them in the Gst metadata to get their IDs. When we know IDs, we may access them through storage. Maybe, there is a way to know them beforehand (without cycling through Gst Metadata). Like having a reverse lookup table to quickly access an object by type without parsing Gst metadata.

	@left.setter
	def left(self, value: float):
	self._nv_ds_bbox.left = value
	self._nv_ds_rect_meta.left = value

	def scale(self, scale_x: float, scale_y: float):
	"""Scales BBox.

	:param scale_x: The scaling factor applied along the x-axis.
	:param scale_y: The scaling factor applied along the y-axis.
	"""
	self.left *= scale_x
	self.top *= scale_y
	self.width *= scale_x
	self.height *= scale_y

	sink_peer_pad.add_probe(Gst.PadProbeType.BUFFER, self.update_frame_meta)
	if self._draw_func and self._output_frame_codec:
	sink_peer_pad.add_probe(Gst.PadProbeType.BUFFER, self._draw_on_frame_probe)