the-database / mpv-upscale-2x_animejanai Goto Github PK

View Code? Open in Web Editor NEW

295.0 12.0 6.0 111.04 MB

Real-time anime upscaling to 4k in mpv with Real-ESRGAN compact models

License: Other

Lua 4.83% Python 94.96% Batchfile 0.21%

4k anime anime-upscaling esrgan mpv real-esrgan super-resolution vapoursynth video nvidia

mpv-upscale-2x_animejanai's People

Contributors

Stargazers

Watchers

Forkers

dvize zineos rest-ia nakedlitttlezombie cnwxzhaoyang mzb98

mpv-upscale-2x_animejanai's Issues

Plex for Windows support

Since Plex for Windows is mpv-based, it should be possible to upscale with AnimeJaNai in Plex. The steps just need to be documented. Most likely setup will be similar to https://www.svp-team.com/wiki/SVP:Plex_Media_Player

Does this only work on nvidia?

save-position-on-quit not working

When I press close the player and then open again it starts from the begining instead of where It was before. The option save-position-on-quit is set to yes on the config but it's not working properly

I will share with the developer how I caught the video lagging phenomenon

First of all, thank you so much for making such a great program.
This time, I had difficulty introducing 2x_animejanai, including mpv. However, I found two points, corrected them, and I think I overcame them.

Computer used: 13600k & Nvida RTX 4070ti
My coding knowledge: convergence to zero
English proficiency: Poor English skill, but Google Translate handles it

I tried using Standard_V1_Ultra compact for the first time, but there was a huge roar and a drop frame phenomenon.

After several trials and errors, I overcame it by modifying the 2x_SharpLines.vpy value.

core.num_theads = 14

I saw in the task manager that my computer had a CPU 14 core and changed the data value.
I think there will be a lot of improvements if many people change it according to their computer performance.

I noticed that there was a huge overload in processing upscaling twice.

To summarize, I added "SHD_ENGINE" as shown in the example below.
If I put the COMPACT engine twice, it seemed quite burdensome for the computer, so I lowered the engine one step at a time and applied it.

The result was very satisfying and I would be very happy if it would help the developer.

Result
cpu 99% > 20 ~ 40 %
gpu 99% > 60 ~ 80 %

import vapoursynth as vs
import os

SD_ENGINE_NAME = "2x_AnimeJaNai_Strong_V1_Compact_net_g_120000"
HD_ENGINE_NAME = "2x_AnimeJaNai_Strong_V1_UltraCompact_net_g_100000"
SHD_ENGINE_NAME = "2x_AnimeJaNai_Strong_V1_SuperUltraCompact_net_g_100000"

core = vs.core
core.num_threads = 14 # can influence ram usage
colorspace="709"

def scaleTo1080(clip, w=1920, h=1080):
if clip.width / clip.height > 16 / 9:
prescalewidth = w
prescaleheight = w * clip.height / clip.width
else:
prescalewidth = h * clip.width / clip.height
prescaleheight = h
return vs.core.resize.Bicubic(clip, width=prescalewidth, height=prescaleheight)

def upscale2x(clip):
engine_name = SD_ENGINE_NAME if clip.height < 720 else HD_ENGINE_NAME
return core.trt.Model(
clip,
engine_path=f"C:\Program Files (x86)\mpv-lazy-20230404-vsCuda\mpv-lazy\vapoursynth64\plugins\vsmlrt-cuda\{engine_name}.engine",
num_streams=4,
)

def upscale4x(clip):
engine_name = HD_ENGINE_NAME if clip.height < 720 else SHD_ENGINE_NAME
return core.trt.Model(
clip,
engine_path=f"C:\Program Files (x86)\mpv-lazy-20230404-vsCuda\mpv-lazy\vapoursynth64\plugins\vsmlrt-cuda\{engine_name}.engine",
num_streams=4,
)

clip = video_in

if clip.height < 720:
colorspace = "170m"

clip = vs.core.resize.Bicubic(clip, format=vs.RGBS, matrix_in_s=colorspace,

# width=clip.width/2.25,height=clip.height/2.25 # pre-downscale

)

# pre-scale 720p or higher to 1080

if clip.height >= 720 or clip.width >= 1280:
clip = scaleTo1080(clip)

# upscale 2x

clip = upscale2x(clip)

# upscale 2x again if necessary

if clip.height < 2160 and clip.width < 3840:

# downscale down to 1080 if first 2x went over 1080

if clip.height > 1080 or clip.width > 1920:
clip = scaleTo1080(clip)

# upscale 2x again << CHANGE IT>>

clip = upscale4x(clip)

clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s=colorspace)

clip.set_output()

Model for handling MPEG2 artifacts

This suggestion is from Lycoris2013. Low bitrate MPEG2 sources are very common among anime captured from Japanese broadcasts. The main 2x_AnimeJaNai models are intended to preserve details as much as possible so they cannot have heavy artifact correction which tends to also remove details. So a separate model could be trained specifically for these MPEG2 sources such as the image below.

I just made the setting to hold the frame drop

This time, I've made optimization after a lot of trial and error.

The amendments are as follows

1. Add "scale_to_540, 675, 810"

Upscale_twice or 60 frame videos above 1080p almost unconditionally get frame drops.
To find a solution to this, I tried to manipulate the "animejanai_v2.conf" data and find a figure that prevents frame drops while preserving the most quality.

In conclusion, I think the frame drop was the best when it was "resize_height_before_first_2x=540", and I started to make a code based on this

scale_to_810: 1440p60 upscaling
scale_to_675 : 1080p60 Upscaling
scale_to_540 : 809p30~540p30 upscale_twice
scale_to_1080: 1079p to 810p upscaling

2. Different engine settings for different situations

Compact / UltraCompact/ SuperUltraCompact (strong.v1)

I used all three of these to make the best combination

30 frames
2159p - 810p >>> resize1080 + UltraCompact >>> 2160p
809p - 540p >>> resize540 + SuperUltraCompact + SpuerUltraCompact >>> 2160p
539p - 1p >>> Compact + UltraCompact >>> 2156p - 4p

60 frames
2159p - 1081p >>> resize810 + SuperUltraCompact >>> 1620p
1080p - 810p >>> resize675 + UltraCompact >>> 1350p
809p - 540p >>> UltraCompact >>> 1618p - 1082p
539p - 1p >>> Compact >>> 1078p - 2p

I am very pleased with this result

rife_cuda.py also I've tried this too. but I think it's going to be hard.

Below is the code I used

I'll mark ###new### for the part where I fixed the code

animejanai_v2.py

import vapoursynth as vs
import os
import subprocess
import logging
import configparser
import sys
from logging.handlers import RotatingFileHandler

sys.path.append(os.path.dirname(os.path.abspath(__file__)))

import rife_cuda
import animejanai_v2_config
# import gmfss_cuda

# trtexec num_streams
TOTAL_NUM_STREAMS = 4

core = vs.core
core.num_threads = 4  # can influence ram usage

plugin_path = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                           r"..\..\vapoursynth64\plugins\vsmlrt-cuda")
model_path = os.path.join(plugin_path, r"..\models\animejanai")

formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
                              datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger('animejanai_v2')

config = {}



def init_logger():
    global logger
    logger.setLevel(logging.DEBUG)
    rfh = RotatingFileHandler(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'animejanai_v2.log'),
                              mode='a', maxBytes=1 * 1024 * 1024, backupCount=2, encoding=None, delay=0)
    rfh.setFormatter(formatter)
    rfh.setLevel(logging.DEBUG)
    logger.addHandler(rfh)


# model_type: HD or SD
# binding: 1 through 9
def find_model(model_type, binding):
    section_key = f'slot_{binding}'
    key = f'{model_type.lower()}_model'

    if section_key in config:
        if key in config[section_key]:
            return config[section_key][key]
    return None


def create_engine(onnx_name):
    onnx_path = os.path.join(model_path, f"{onnx_name}.onnx")
    if not os.path.isfile(onnx_path):
        raise FileNotFoundError(onnx_path)

    engine_path = os.path.join(model_path, f"{onnx_name}.engine")

    subprocess.run([os.path.join(plugin_path, "trtexec"), "--fp16", f"--onnx={onnx_path}",
                    "--minShapes=input:1x3x8x8", "--optShapes=input:1x3x1080x1920", "--maxShapes=input:1x3x1080x1920",
                    f"--saveEngine={engine_path}", "--tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT"],
                   cwd=plugin_path)


def scale_to_1080(clip, w=1920, h=1080):
    if clip.width / clip.height > 16 / 9:
        prescalewidth = w
        prescaleheight = w * clip.height / clip.width
    else:
        prescalewidth = h * clip.width / clip.height
        prescaleheight = h
    return vs.core.resize.Bicubic(clip, width=prescalewidth, height=prescaleheight)

###new###
def scale_to_810(clip, w=1440, h=810):
   if clip.width / clip.height > 16 / 9:
      prescalewidth = w
      prescaleheight = w * clip.height / clip.width
   else:
      prescalewidth = h * clip.width / clip.height
      prescaleheight = h
   return vs.core.resize.Bicubic(clip, width=prescalewidth, height=prescaleheight)

###new###
def scale_to_675(clip, w=1200, h=675):
   if clip.width / clip.height > 16 / 9:
      prescalewidth = w
      prescaleheight = w * clip.height / clip.width
   else:
      prescalewidth = h * clip.width / clip.height
      prescaleheight = h
   return vs.core.resize.Bicubic(clip, width=prescalewidth, height=prescaleheight)

###new###
def scale_to_540(clip, w=960, h=540):
   if clip.width / clip.height > 16 / 9:
      prescalewidth = w
      prescaleheight = w * clip.height / clip.width
   else:
      prescalewidth = h * clip.width / clip.height
      prescaleheight = h
   return vs.core.resize.Bicubic(clip, width=prescalewidth, height=prescaleheight)

###new###
def upscale2x(clip, sd_engine_name, hd_engine_name, shd_engine_name, num_streams):
    if clip.height == 675 or clip.width == 1200:
        engine_name = hd_engine_name
    else:
        if (clip.height < 540 or clip.width < 960):
            engine_name = sd_engine_name
        if (clip.height == 1080 or clip.width == 1920) or ((clip.height > 540 and clip.width > 960) and (clip.height < 1080 or clip.width < 1920)):
            engine_name = hd_engine_name
        if (clip.height == 540 or clip.width == 960) or (clip.height == 810 or clip.width == 1440):
            engine_name = shd_engine_name
    if engine_name is None:
        return clip
    engine_path = os.path.join(model_path, f"{engine_name}.engine")

    message = f"upscale2x: scaling 2x from {clip.width}x{clip.height} with engine={engine_name}; num_streams={num_streams}"
    logger.debug(message)
    print(message)

    if not os.path.isfile(engine_path):
        create_engine(engine_name)

    return core.trt.Model(
        clip,
        engine_path=engine_path,
        num_streams=num_streams,
    )
###new###
def upscale22x(clip, hd_engine_name, shd_engine_name, num_streams):
    if clip.height == 1080 or clip.width == 1920:
        engine_name = shd_engine_name
    else:
        if clip.height < 1080 or clip.width < 1920:
            engine_name = hd_engine_name
    if engine_name is None:
        return clip
    engine_path = os.path.join(model_path, f"{engine_name}.engine")

    message = f"upscale22x: scaling 2x from {clip.width}x{clip.height} with engine={engine_name}; num_streams={num_streams}"
    logger.debug(message)
    print(message)

    if not os.path.isfile(engine_path):
        create_engine(engine_name)

    return core.trt.Model(
        clip,
        engine_path=engine_path,
        num_streams=num_streams,
    )



def run_animejanai(clip, sd_engine_name, hd_engine_name, shd_engine_name, container_fps, resize_factor_before_first_2x,
                   resize_height_before_first_2x, resize_720_to_1080_before_first_2x, do_upscale,
                   resize_to_1080_before_second_2x, upscale_twice, use_rife):
    if do_upscale:
        colorspace = "709"
        colorlv = clip.get_frame(0).props._ColorRange
        fmt_in = clip.format.id

        if clip.height < 720 or clip.width < 1280:
            colorspace = "170m"

        if resize_height_before_first_2x != 0:
            resize_factor_before_first_2x = 1

        try:
            # try half precision first
            clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s=colorspace,
                                          width=clip.width/resize_factor_before_first_2x,
                                          height=clip.height/resize_factor_before_first_2x)

            clip = run_animejanai_upscale(clip, sd_engine_name, hd_engine_name, shd_engine_name, container_fps, resize_factor_before_first_2x,
                                          resize_height_before_first_2x, resize_720_to_1080_before_first_2x, do_upscale,
                                          resize_to_1080_before_second_2x, upscale_twice, use_rife, colorspace, colorlv,
                                          fmt_in)
        except:
            clip = vs.core.resize.Bicubic(clip, format=vs.RGBS, matrix_in_s=colorspace,
                                          width=clip.width/resize_factor_before_first_2x,
                                          height=clip.height/resize_factor_before_first_2x)
            clip = run_animejanai_upscale(clip, sd_engine_name, hd_engine_name, shd_engine_name, container_fps, resize_factor_before_first_2x,
                                          resize_height_before_first_2x, resize_720_to_1080_before_first_2x, do_upscale,
                                          resize_to_1080_before_second_2x, upscale_twice, use_rife, colorspace, colorlv,
                                          fmt_in)
            ###new###
    if use_rife:
        clip = rife_cuda.rife(clip, clip.width, clip.height, container_fps)

    clip.set_output()

    ###new###
def run_animejanai_upscale(clip, sd_engine_name, hd_engine_name, shd_engine_name, container_fps, resize_factor_before_first_2x,
                          resize_height_before_first_2x, resize_720_to_1080_before_first_2x, do_upscale,
                          resize_to_1080_before_second_2x, upscale_twice, use_rife, colorspace, colorlv, fmt_in):
    ###new###
    if resize_height_before_first_2x == 540:
        if (clip.height >= 540 or clip.width >= 960) and container_fps >= 45:
            if (clip.height <= 1080 or clip.width <= 1920):
                clip = scale_to_675(clip)
            else:
                clip = scale_to_810(clip)
        else:
            if (clip.height >= 540 or clip.width >= 960) and clip.height < 810 and clip.width < 1440 and container_fps < 45:
                clip = scale_to_540(clip)
            if (clip.height >= 810 or clip.width >= 1440) and clip.height < 2160 and clip.width < 3840:
                clip = scale_to_1080(clip)



    # if not 540, error occurred at upscale2x, upscale22x
    if resize_height_before_first_2x != 540 and resize_height_before_first_2x != 0 :
        clip = scale_to_1080(clip, resize_height_before_first_2x * 16 / 9, resize_height_before_first_2x)

    # pre-scale 720p or higher to 1080 > NO
    if resize_720_to_1080_before_first_2x:
        if (clip.height >= 810 or clip.width >= 1440) and clip.height < 1080 and clip.width < 1920:
            clip = scale_to_1080(clip)

    num_streams = TOTAL_NUM_STREAMS
    if upscale_twice and ( clip.height <= 540 or clip.width <= 960 ) and container_fps < 45:
        num_streams /= 2

    # upscale 2x
    clip = upscale2x(clip, sd_engine_name, hd_engine_name, shd_engine_name, num_streams)

    # upscale 2x again if necessary
    if upscale_twice and ( clip.height <= 1080 or clip.width <= 1920 ) and container_fps < 45:
        # downscale down to 1080 if first 2x went over 1080,
        # or scale up to 1080 if enabled >> NO
        if resize_to_1080_before_second_2x and ( clip.height > 720 or clip.width > 1280):
            clip = scale_to_1080(clip)

        # upscale 2x again
        clip = upscale22x(clip, hd_engine_name, shd_engine_name, num_streams)


    fmt_out = fmt_in
    if fmt_in not in [vs.YUV410P8, vs.YUV411P8, vs.YUV420P8, vs.YUV422P8, vs.YUV444P8, vs.YUV420P10, vs.YUV422P10,
                      vs.YUV444P10]:
        fmt_out = vs.YUV420P10

    return vs.core.resize.Bicubic(clip, format=fmt_out, matrix_s=colorspace, range=1 if colorlv == 0 else None)

# keybinding: 1-9
def run_animejanai_with_keybinding(clip, container_fps, keybinding):
    sd_engine_name = find_model("SD", keybinding)
    hd_engine_name = find_model("HD", keybinding)
    shd_engine_name = find_model("SHD", keybinding)
    section_key = f'slot_{keybinding}'
    do_upscale = config[section_key].get(f'upscale_2x', True)
    upscale_twice = config[section_key].get(f'upscale_4x', True)
    use_rife = config[section_key].get(f'rife', True)
    resize_720_to_1080_before_first_2x = config[section_key].get(f'resize_720_to_1080_before_first_2x', True)
    resize_factor_before_first_2x = config[section_key].get(f'resize_factor_before_first_2x', 1)
    resize_height_before_first_2x = config[section_key].get(f'resize_height_before_first_2x', 0)
    resize_to_1080_before_second_2x = config[section_key].get(f'resize_to_1080_before_second_2x', True)

    if do_upscale:
        if sd_engine_name is None and hd_engine_name is None and shd_engine_name is None:
            raise FileNotFoundError(
                f"2x upscaling is enabled but no SD model and HD model defined for slot {keybinding}. Expected at least one of SD or HD model to be specified with sd_model or hd_model in animejanai.conf.")
        ###new###
    if (clip.height < 2160 or clip.width < 3840) and container_fps < 100:
        run_animejanai(clip, sd_engine_name, hd_engine_name, shd_engine_name, container_fps, resize_factor_before_first_2x,
                   resize_height_before_first_2x, resize_720_to_1080_before_first_2x, do_upscale,
                   resize_to_1080_before_second_2x, upscale_twice, use_rife)


def init():
    global config
    config = animejanai_v2_config.read_config()
    if config['global']['logging']:
        init_logger()


init()

animejanai_v2_1.vpy

import sys, os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

import animejanai_v2

animejanai_v2.run_animejanai_with_keybinding(video_in, container_fps, 1)

animejanai_v2.conf

[slot_1]

SD_model=2x_AnimeJaNai_Strong_V1_Compact_net_g_120000

HD_model=2x_AnimeJaNai_Strong_V1_UltraCompact_net_g_100000

SHD_model=2x_AnimeJaNai_Strong_V1_SuperUltraCompact_net_g_100000

resize_factor_before_first_2x=1

resize_height_before_first_2x=540

resize_720_to_1080_before_first_2x=no

upscale_2x=yes

upscale_4x=yes

resize_to_1080_before_second_2x=no

rife=no

Request for Benchmarks

I'm collecting benchmarks for different hardware configurations on the wiki here: https://github.com/the-database/mpv-upscale-2x_animejanai/wiki/Benchmarks

If you would like to contribute your benchmarks, please try downloading the prerelease archive 1.0.1 here: https://github.com/the-database/mpv-upscale-2x_animejanai/releases/tag/1.0.1

To run the benchmark, extract the release archive and then run the bat file at: mpv-upscale-2x_animejanai\portable_config\shaders\animejanai_v2_benchmark_all.bat. If running the bat just opens and closes the console Window immediately, an error has occurred. If that happens, please open a command window, navigate to the mpv-upscale-2x_animejanai\portable_config\shaders directory, and then run this command: .\animejanai_v2_benchmark_all.bat. Any errors that occurred should be printed to the console, please share them here.

[Request] Add linux guide

[Request] Add Intel and AMD support (documentation)

Please add:

documentation for usage on Intel and AMD with DirectML (change to DirectML in global settings).
fp32 models as only those are working with DirectML.
V2 models as those are preferable for some video sources
fp32 SD model

It works fine on my Intel Arc A750.
Thank You for Your hard work :)

Is there a way to simultaneously use AnimeJaNai with RIFE AI frame interpolation?

I'm using rife-cuda in mpv_lazy and wanted to see if I could get AnimeJaNai working alongside it. I'm totally new to using these machine learning models. Is the only way to do it to merge the two desired ONNX models? It's easy to apply anime4k shaders with RIFE so I'm hoping there is a solution to get this working too.

~1m 20s black screen after loading Video

After starting mpv or mpvnet and loading a Video over URL, it has a Black screen for more than a minute.

On normal MPV, it does not happen, starting the video instantly.

Is it, because it has to load the Model? Any way to speed up the process?

(RTX 3080, i7 8700k@5GHz, SSD, Win11)

Dolby Vision Outputs Wrong Colors

MPV is able to properly decode dolby vision files with the following configuration lines but when using animejanai, it displays the wrong colors.

profile=gpu-hq
vo=gpu-next
gpu-api=d3d11
gpu-context=d3d11

With Animejanai

Can you provide a simple function for VapourSynth/avisynth_filter?

Can you give me links for ONNX model ?

Running on Android MPV?

Is there a way to run these on a phone? Like, loading in android mpv player, etc?

Upscaler mute all the other sounds

Hello, when I play a video with the upscale feature, all other system sounds are muted. I want to talk on Discord while using the program, but when I start the video, I stop hearing it. Is there any way to hear the other programs besides the audio track of the video I'm playing?

[Question] Amd Graphics Card

Will it be possible to do this (in realtime) with an amd card in the near future?
I sold my 1070ti and I'm going to get a 6800xt + 13600k. Was also considering the rtx 4070 but 12gb vram and 192 bit made me look at amd.

Encoding script hangs and leaks RAM

animejanai_v2_encode.vpy doesn't work, needs investigation

V2 models

V2 models are being developed to address some feedback provided for the V1 models:

V1 models are oversharpening, even the models which are not Strong. Some ringing artifacts are introduced in the line art and oversharpening artifacts including "dot" artifacts are introduced in the backgrounds, easily seen in backgrounds with trees for example.
V1 models can introduce unwanted line darkening.
In some scenarios, V1 models treat grain or detail as unwanted noise and denoises them. All grain and detail should be preserved and upscaled.

The primary goal of the V2 models is to produce results that appear as if the source was originally produced in 4K while faithfully retaining the original look as much as possible. This will be tracked by downscaling the upscaled results to the native resolution of the anime. The results should be difficult to distinguish from the original anime source. Performing this test on the V1 models more easily reveals the above oversharpening, line darkening, and loss of detail.

It's expected that V2 will not require Soft/Standard/Strong models, there will simply 3 models consisting of V2 Compact, V2 UltraCompact, and V2 SuperUltraCompact.

As with V1, V2 will not be intended for use on low quality sources with heavy artifacting, as those artifacts will be preserved and upscaled. But with less oversharpening, V2 should perform better than V1 on low quality sources.

V2 will undoubtedly lose some sharpness when directly compared to V1. V1 models will remain available for those that prefer the extra sharp look.

Screenshots of progress on the V2 models are included in the following comments. Please note those screenshots are not final and the released V2 model may produce different results.

GLSL Shader lite version

Hi there,

Setting up animejanai v3 on a system with AMD Radeon graphics can be challenging, especially for noobie.
I'd like to request a lite version of animejanai v3 implemented as a GLSL shader for mpv player.

This would allow users easily use animejanai.

Example,
https://github.com/igv/FSRCNN-TensorFlow/releases/tag/1.1

Thank you.

upscaling of the wire mesh pattern in the background

The V2 image quality is really really excellent. Thanks for the great work!

However, there are some not-so-good points. I don't think the upscaling of the wire mesh pattern in the background is very good. It occurs at low resolutions of 360-480p. I look forward to improvements in the next version.

Replace Powershell script with prepackaged release archives

The Powershell script should be deprecated in favor of releasing prepackaged archives for simpler installation. Two archives should be supplied - one as a full standalone mpv archive like mpv_lazy, and another which can be dropped into any existing mpv installation. TensorRT engines should be generated on the fly.

Do you plan to release the missing SuperUltraCompact models?

I use 2x_AnimeJaNai_ Strong_V1_ SuperUltraCompact_net_g_100000 on my 1060 GTX card to upsacle sd material to fullhd.

It would be nice to have the normal and soft variant of the model. Do you plan to train and release them?

Tip regarding nvenc encoding in the bat-files.

Instead of
set video_settings=hevc_nvenc -preset slow -profile:v main10 -b:v 50M
use
set video_settings=hevc_nvenc -preset p7 -profile:v main10 -b:v 50M
instead.

Also works for h264_nvenc.

You can read more about the more modern nvenc presets here:
https://developer.nvidia.com/blog/introducing-video-codec-sdk-10-presets/

Cannot create engine unless I downgrade TensorRT version

When attempting to build an engine, TensorRT throws errors and fails (trtexec log below). Nothing fixed it until I tried to downgrade TRT. From mpv-janai v2.0.2, I copied the entire vsmlrt-cuda folder into v3.0's vapoursynth64/plugins folder, overwriting the old one.

Then v3 finally generated the engine and I tested it as working, but trtexec still complained that because of INT64, the result might be less accurate. Is that true? Should I be worried about that?

Error log before I downgraded TRT (regular janai v3)

&&&& RUNNING TensorRT.trtexec [TensorRT v9200] # C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\..\vapoursynth64\plugins\vsmlrt-cuda\trtexec --fp16 --onnx=C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --skipInference --infStreams=4 --builderOptimizationLevel=4 --saveEngine=C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.engine --tacticSources=-CUDNN,-CUBLAS,-CUBLAS_LT
[05/19/2024-14:15:37] [I] === Model Options ===
[05/19/2024-14:15:37] [I] Format: ONNX
[05/19/2024-14:15:37] [I] Model: C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx
[05/19/2024-14:15:37] [I] Output:
[05/19/2024-14:15:37] [I] === Build Options ===
[05/19/2024-14:15:37] [I] Max batch: explicit batch
[05/19/2024-14:15:37] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/19/2024-14:15:37] [I] minTiming: 1
[05/19/2024-14:15:37] [I] avgTiming: 8
[05/19/2024-14:15:37] [I] Precision: FP32+FP16
[05/19/2024-14:15:37] [I] LayerPrecisions:
[05/19/2024-14:15:37] [I] Layer Device Types:
[05/19/2024-14:15:37] [I] Calibration:
[05/19/2024-14:15:37] [I] Refit: Disabled
[05/19/2024-14:15:37] [I] Weightless: Disabled
[05/19/2024-14:15:37] [I] Version Compatible: Disabled
[05/19/2024-14:15:37] [I] ONNX Native InstanceNorm: Disabled
[05/19/2024-14:15:37] [I] TensorRT runtime: full
[05/19/2024-14:15:37] [I] Lean DLL Path:
[05/19/2024-14:15:37] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/19/2024-14:15:37] [I] Exclude Lean Runtime: Disabled
[05/19/2024-14:15:37] [I] Sparsity: Disabled
[05/19/2024-14:15:37] [I] Safe mode: Disabled
[05/19/2024-14:15:37] [I] Build DLA standalone loadable: Disabled
[05/19/2024-14:15:37] [I] Allow GPU fallback for DLA: Disabled
[05/19/2024-14:15:37] [I] DirectIO mode: Disabled
[05/19/2024-14:15:37] [I] Restricted mode: Disabled
[05/19/2024-14:15:37] [I] Skip inference: Enabled
[05/19/2024-14:15:37] [I] Save engine: C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.engine
[05/19/2024-14:15:37] [I] Load engine:
[05/19/2024-14:15:37] [I] Profiling verbosity: 0
[05/19/2024-14:15:37] [I] Tactic sources: cublas [OFF], cublasLt [OFF], cudnn [OFF],
[05/19/2024-14:15:37] [I] timingCacheMode: local
[05/19/2024-14:15:37] [I] timingCacheFile:
[05/19/2024-14:15:37] [I] Enable Compilation Cache: Enabled
[05/19/2024-14:15:37] [I] errorOnTimingCacheMiss: Disabled
[05/19/2024-14:15:37] [I] Heuristic: Disabled
[05/19/2024-14:15:37] [I] Preview Features: Use default preview flags.
[05/19/2024-14:15:37] [I] MaxAuxStreams: -1
[05/19/2024-14:15:37] [I] BuilderOptimizationLevel: 4
[05/19/2024-14:15:37] [I] Calibration Profile Index: 0
[05/19/2024-14:15:37] [I] Input(s)s format: fp32:CHW
[05/19/2024-14:15:37] [I] Output(s)s format: fp32:CHW
[05/19/2024-14:15:37] [I] Input build shape (profile 0): input=1x3x8x8+1x3x1080x1920+1x3x1080x1920
[05/19/2024-14:15:37] [I] Input calibration shapes: model
[05/19/2024-14:15:37] [I] === System Options ===
[05/19/2024-14:15:37] [I] Device: 0
[05/19/2024-14:15:37] [I] DLACore:
[05/19/2024-14:15:37] [I] Plugins:
[05/19/2024-14:15:37] [I] setPluginsToSerialize:
[05/19/2024-14:15:37] [I] dynamicPlugins:
[05/19/2024-14:15:37] [I] ignoreParsedPluginLibs: 0
[05/19/2024-14:15:37] [I]
[05/19/2024-14:15:37] [I] === Inference Options ===
[05/19/2024-14:15:37] [I] Batch: Explicit
[05/19/2024-14:15:37] [I] Input inference shape : input=1x3x1080x1920
[05/19/2024-14:15:37] [I] Iterations: 10
[05/19/2024-14:15:37] [I] Duration: 3s (+ 200ms warm up)
[05/19/2024-14:15:37] [I] Sleep time: 0ms
[05/19/2024-14:15:42] [I] Idle time: 0ms
[05/19/2024-14:15:42] [I] Inference Streams: 4
[05/19/2024-14:15:42] [I] ExposeDMA: Disabled
[05/19/2024-14:15:42] [I] Data transfers: Enabled
[05/19/2024-14:15:42] [I] Spin-wait: Disabled
[05/19/2024-14:15:42] [I] Multithreading: Disabled
[05/19/2024-14:15:42] [I] CUDA Graph: Disabled
[05/19/2024-14:15:42] [I] Separate profiling: Disabled
[05/19/2024-14:15:42] [I] Time Deserialize: Disabled
[05/19/2024-14:15:42] [I] Time Refit: Disabled
[05/19/2024-14:15:42] [I] NVTX verbosity: 0
[05/19/2024-14:15:42] [I] Persistent Cache Ratio: 0
[05/19/2024-14:15:42] [I] Optimization Profile Index: 0
[05/19/2024-14:15:42] [I] Inputs:
[05/19/2024-14:15:42] [I] === Reporting Options ===
[05/19/2024-14:15:42] [I] Verbose: Disabled
[05/19/2024-14:15:42] [I] Averages: 10 inferences
[05/19/2024-14:15:42] [I] Percentiles: 90,95,99
[05/19/2024-14:15:42] [I] Dump refittable layers:Disabled
[05/19/2024-14:15:42] [I] Dump output: Disabled
[05/19/2024-14:15:42] [I] Profile: Disabled
[05/19/2024-14:15:42] [I] Export timing to JSON file:
[05/19/2024-14:15:42] [I] Export output to JSON file:
[05/19/2024-14:15:42] [I] Export profile to JSON file:
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] === Device Information ===
[05/19/2024-14:15:42] [I] Available Devices:
[05/19/2024-14:15:42] [I]   Device 0: "NVIDIA GeForce RTX 3080" UUID: GPU-44b0b0ec-4a4a-2291-c949-ad5f2d47ac82
[05/19/2024-14:15:42] [I] Selected Device: NVIDIA GeForce RTX 3080
[05/19/2024-14:15:42] [I] Selected Device ID: 0
[05/19/2024-14:15:42] [I] Selected Device UUID: GPU-44b0b0ec-4a4a-2291-c949-ad5f2d47ac82
[05/19/2024-14:15:42] [I] Compute Capability: 8.6
[05/19/2024-14:15:42] [I] SMs: 68
[05/19/2024-14:15:42] [I] Device Global Memory: 10239 MiB
[05/19/2024-14:15:42] [I] Shared Memory per SM: 100 KiB
[05/19/2024-14:15:42] [I] Memory Bus Width: 320 bits (ECC disabled)
[05/19/2024-14:15:42] [I] Application Compute Clock Rate: 1.8 GHz
[05/19/2024-14:15:42] [I] Application Memory Clock Rate: 9.501 GHz
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/19/2024-14:15:42] [I]
[05/19/2024-14:15:42] [I] TensorRT version: 9.2.0
[05/19/2024-14:15:42] [I] Loading standard plugins
[05/19/2024-14:15:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 8080, GPU 1166 (MiB)
[05/19/2024-14:15:50] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2726, GPU +312, now: CPU 11097, GPU 1478 (MiB)
[05/19/2024-14:15:50] [I] Start parsing network model.
[05/19/2024-14:15:50] [I] [TRT] ----------------------------------------------------------------
[05/19/2024-14:15:50] [I] [TRT] Input filename:   C:\Users\heath\Documents\mpv-upscale-2x_animejanai-v3\animejanai\core\..\onnx\2x_AnimeJaNai_HD_V3_UltraCompact.onnx
[05/19/2024-14:15:50] [I] [TRT] ONNX IR version:  0.0.7
[05/19/2024-14:15:50] [I] [TRT] Opset version:    14
[05/19/2024-14:15:50] [I] [TRT] Producer name:    pytorch
[05/19/2024-14:15:50] [I] [TRT] Producer version: 2.1.2
[05/19/2024-14:15:50] [I] [TRT] Domain:
[05/19/2024-14:15:50] [I] [TRT] Model version:    0
[05/19/2024-14:15:50] [I] [TRT] Doc string:
[05/19/2024-14:15:50] [I] [TRT] ----------------------------------------------------------------
[05/19/2024-14:15:50] [I] Finished parsing network model. Parse time: 0.0284306
[05/19/2024-14:15:50] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x8x8 OPT=1x3x1080x1920 MAX=1x3x1080x1920
[05/19/2024-14:15:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/19/2024-14:15:53] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_lessZero = lt(/Conv_output_0', /PRelu_zero), name=/PRelu_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:15:53] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_lessZero = lt(/Conv_output_0', /PRelu_zero), name=/PRelu_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_1_lessZero = lt(/Conv_1_output_0', /PRelu_1_zero), name=/PRelu_1_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_1_lessZero = lt(/Conv_1_output_0', /PRelu_1_zero), name=/PRelu_1_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_2_lessZero = lt(/Conv_2_output_0', /PRelu_2_zero), name=/PRelu_2_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_2_lessZero = lt(/Conv_2_output_0', /PRelu_2_zero), name=/PRelu_2_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_3_lessZero = lt(/Conv_3_output_0', /PRelu_3_zero), name=/PRelu_3_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_3_lessZero = lt(/Conv_3_output_0', /PRelu_3_zero), name=/PRelu_3_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_4_lessZero = lt(/Conv_4_output_0', /PRelu_4_zero), name=/PRelu_4_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_4_lessZero = lt(/Conv_4_output_0', /PRelu_4_zero), name=/PRelu_4_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_5_lessZero = lt(/Conv_5_output_0', /PRelu_5_zero), name=/PRelu_5_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_5_lessZero = lt(/Conv_5_output_0', /PRelu_5_zero), name=/PRelu_5_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_6_lessZero = lt(/Conv_6_output_0', /PRelu_6_zero), name=/PRelu_6_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_6_lessZero = lt(/Conv_6_output_0', /PRelu_6_zero), name=/PRelu_6_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_7_lessZero = lt(/Conv_7_output_0', /PRelu_7_zero), name=/PRelu_7_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_7_lessZero = lt(/Conv_7_output_0', /PRelu_7_zero), name=/PRelu_7_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_8_lessZero = lt(/Conv_8_output_0', /PRelu_8_zero), name=/PRelu_8_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:01] [E] Error[9]: Skipping tactic 0x0000000000000000 due to exception [::0]
Error during shape inference of
/PRelu_8_lessZero = lt(/Conv_8_output_0', /PRelu_8_zero), name=/PRelu_8_less
Error is:
Input 0's element type (half) differs from input 1's element type (float).
[05/19/2024-14:16:10] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/19/2024-14:16:11] [I] [TRT] Total Host Persistent Memory: 56032
[05/19/2024-14:16:11] [I] [TRT] Total Device Persistent Memory: 0
[05/19/2024-14:16:11] [I] [TRT] Total Scratch Memory: 4608
[05/19/2024-14:16:11] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 40 steps to complete.
[05/19/2024-14:16:11] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.6522ms to assign 4 blocks to 40 nodes requiring 1111454208 bytes.
[05/19/2024-14:16:11] [I] [TRT] Total Activation Memory: 1111454208
[05/19/2024-14:16:11] [I] [TRT] Total Weights Memory: 670720
[05/19/2024-14:16:11] [I] [TRT] Engine generation completed in 21.705 seconds.
[05/19/2024-14:16:11] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.

Flashing Artifacts at the Top of the Video

Followed the instructions for setup and it seems to work, but I'm getting strange artifacts at the top of the video randomly throughout on any file that I try to run with this on the default settings. They flash for split second and disappear.

Slow seek performance

Works great but can anything be done to improve the seek performance.
Running a 4090 + i7 12700k.
Are the buffers to be filled the cause of pause on seek? If so, can they be reduced.