Coder Social home page Coder Social logo

crlandsc / torch-log-wmse Goto Github PK

View Code? Open in Web Editor NEW
23.0 2.0 1.0 405 KB

logWMSE, an audio quality metric & loss function with support for digital silence target. Useful for training and evaluating audio source separation systems.

License: Apache License 2.0

Python 100.00%
bss mss music ai audio audio-denoising audio-quality audio-quality-assessment audio-separation audio-source-separation

torch-log-wmse's Introduction

Hi there πŸ‘‹ my name is Chris!

I am an audio machine learning engineer and researcher working on advancing audio AI/ML and spatial audio capabilities.

My recent work has focused on binaural externalization, audio waveform diffusion for generative audio, and audio source separation for music "demixing".

I also make music under the name 🎢After August.

Please reach out if you have any questions, or if you are interested in chatting about audio, music, AI/ML, spatial audio, or all of the above!

Follow my work, writing, and music on:

My Website | Medium | LinkedIn
YouTube | Spotify | Facebook | Instagram

torch-log-wmse's People

Contributors

crlandsc avatar iver56 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

nomonosound

torch-log-wmse's Issues

Add an option to disable the frequency curve weigthing

It would be great to have a option to easily disable the frequency curve weighting, when working on bass heavy audio (bass/drums/kick stems), it can lead to kinda unexpected results (high score while there are obvious audible issues in really low frequencies).

Returns nan when one of the entries in the batch is a digital silence triplet

Here is some code that reproduces the issue:

import torch
from torch_log_wmse import LogWMSE

loss_function = LogWMSE(audio_length=1, return_as_loss=True)
torch.manual_seed(0)
raw = torch.randn(2, 1, 44100, dtype=torch.float32)
raw[0] = 0.0
est = torch.randn(2, 1, 44100, dtype=torch.float32)
est[0] = 0.0
gt = torch.randn(2, 1, 44100, dtype=torch.float32)
gt[0] = 0.0
loss0 = loss_function(raw[0:1], est[0:1].unsqueeze(2), gt[0:1].unsqueeze(2))  # -73.6827
print(loss0)
assert not torch.isnan(loss0)
loss1 = loss_function(raw[1:2], est[1:2].unsqueeze(2), gt[1:2].unsqueeze(2))  # 2.7946
print(loss1)
assert not torch.isnan(loss1)
loss_combined = loss_function(raw, est.unsqueeze(2), gt.unsqueeze(2))  # nan
print(loss_combined)
assert not torch.isnan(loss_combined)

It looks like this part of the code does not add EPS to the batch entry that is all digital zero:

if input_rms.sum() < ERROR_TOLERANCE_THRESHOLD:

That leads to the problematic division by zero here:

scaling_factor = 1 / input_rms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.