adobe-research / deepafx-st Goto Github PK

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/

License: Other

Shell 4.63% Python 95.37%

afx ai audio audio-processing audio-production compressor deeplearning drc effects eq

deepafx-st's People

Contributors

Stargazers

Watchers

Forkers

ishine njb maxmax2016 entn-at jamesbrownjr lvzhiqiang daitomanabe agangzz recreationdevelopers nateraw xzm2004260 shaun95 rogervaas yoyololicon janfschr worthlesspixels zhipingzhou dfrntl xiaozhuo12138 icodein yuan-manx dhockaday pysync jaedukseo sshyran wujian-sinemedia luckybian mengwangme ap1075 steelbin whiteasreal c00renut adventurecomputer knut0815 mikesol nactemha drscotthawley render-ai marcos-vm-1708 isabella232 kyrillosl csteinmetz1 buseoznurozkan

deepafx-st's Issues

RumtimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Hello!

When I run the script:

(deepafx-st) E:\CODE\DeepAFx-ST>python scripts/process.py -i "E:\CODE\DeepAFx-ST\audio files\raw\160 JAZZ DNB2_MdcL3.wav" -r "E:\CODE\DeepAFx-ST\audio files\target\05 NOT TiGHT.wav" -c "E:\CODE\DeepAFx-ST\checkpoints\style\jamendo\autodiff\lightning_logs\version_0\checkpoints\epoch=362-step=1210241-val-jamendo-autodiff.ckpt"

I encounter a runtime error:

Resampling to 24000 Hz...
Traceback (most recent call last):
  File "scripts/process.py", line 89, in <module>
    x_24000 = torch.tensor(resampy.resample(x.view(-1).numpy(), x_sr, 24000))
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Any ideas how I can resolve this?

System specs:
Windows 10
Anaconda Python
cmd.exe

How can I process an entire audio file?

I want to process an entire audio file, but the code currently uses only five seconds from the input and reference.

When I comment these out in process.py, it returns the processed file, but the audio seems to appear more than once in the output.
x_24000 = x_24000[0:1, : 24000 * 5]
r_24000 = r_24000[0:1, : 24000 * 5]

colab inference notebook

hi @csteinmetz1 ,

great work.
hoping that the inference notebook would be ready soon on google colab.

Haw can I get the parameters of EQ and Compressor?

there are an EQ and Compressor in "system.processor"
how can I get the detail?

forward() got an unexpected keyword argument 'time_it'

if run process.py with option "--time", it crashed with "forward() got an unexpected keyword argument 'time_it'":

process.py -i examples/voice_raw.wav -r examples/voice_produced.wav --time -c checkpoints/style/libritts/tcn1/lightning_logs/version_1/checkpoints/epoch=367-step=1226911-val-libritts-tcn1.ckpt

Exception has occurred: TypeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
forward() got an unexpected keyword argument 'time_it'
File "C:\Users\xxx\anaconda3\envs\afx\Lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\DeepAFx-ST-0.1.0\scripts\process.py", line 130, in
y_hat, p, e, encoder_time_sec, dsp_time_sec = system(
File "C:\Users\xxx\anaconda3\envs\afx\Lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\xxx\anaconda3\envs\afx\Lib\runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,

Is it possible to change the resample from 24000 to 44100?

Is it possible to change the resample from 24000 to 44100? Ideally, I would like to get the 20hz-20khz bandwidth.

Google Colab available ?

Is the google colab available anywhere or not ? The link on the GitHub does not work :/ Thanks !

About the variable 'input_audio_corrupt' in lines 240-241, isn't it target_audio_corrupt?

DeepAFx-ST/deepafx_st/data/dataset.py

Lines 229 to 246 in 49bd0c8

    
           # ------------------------ Target audio ---------------------- 
        
           # use the same augmented audio clip, add different random EQ and compressor 
        
           target_audio_corrupt = input_audio_aug.clone() 
        
           # apply frequency and dynamic range corrpution (expander) 
        
           if self.freq_corrupt and torch.rand(1).sum() < 0.75: 
        
               target_audio_corrupt = augmentations.frequency_corruption( 
        
                   [target_audio_corrupt], self.sample_rate 
        
               )[0] 
        
           # peak normalize again before passing through dynamic range compressor 
        
           input_audio_corrupt /= input_audio_corrupt.abs().max() 
        
           input_audio_corrupt *= 10 ** (-12.0 / 20)  # with min 3 dBFS headroom 
        
           if self.drc_corrupt and torch.rand(1).sum() < 0.75: 
        
               target_audio_corrupt = augmentations.dynamic_range_compression( 
        
                   [target_audio_corrupt], self.sample_rate 
        
               )[0]

[Improvement] Increased sample rate to 44100 and added the ability to process entire files.

I attempted to improve DeepAFx-ST. Here's what I did.

Download the zip from https://github.com/adobe-research/DeepAFx-ST and extract it.

Open Notepad++, press CTRL+SHIFT+F, find 24000, replace 44100, set the directory, replace in files.

At this point you can safely add the checkpoints and examples.

Edit scripts/process.py
Replace x_44100 = torch.tensor(resampy.resample(x.view(-1).numpy(), x_sr, 44100)) with x_44100 = torch.tensor(resampy.resample(x.reshape(-1).numpy(), x_sr, 44100))
Under x_44100 = x_44100.view(1, -1) insert x_44100 = x_44100[0:1, : x_44100.shape[-1] // 2]
Under x_44100 = x insert x_44100 = x_44100[0:1, : x_44100.shape[-1]]
Replace r_44100 = torch.tensor(resampy.resample(r.view(-1).numpy(), r_sr, 44100)) with r_44100 = torch.tensor(resampy.resample(r.reshape(-1).numpy(), r_sr, 44100))
Under r_44100 = r_44100.view(1, -1) insert r_44100 = r_44100[0:1, : r_44100.shape[-1] // 2]
Under r_44100 = r insert r_44100 = r_44100[0:1, : r_44100.shape[-1]]

Remove x_44100 = x_44100[0:1, : 44100 * 5]
Remove r_44100 = r_44100[0:1, : 44100 * 5]

Replace filename = os.path.basename(args.input).replace(".wav", "") with filename = os.path.splitext(os.path.basename(args.input))[0]
Remove reference = os.path.basename(args.reference).replace(".wav", "")
Replace out_filepath = os.path.join(dirname, f"{filename}_out_ref={reference}.wav") with out_filepath = os.path.join(dirname, f"{filename}_DeepAFx-ST.wav")
Remove in_filepath = os.path.join(dirname, f"{filename}_in.wav")
Remove torchaudio.save(in_filepath, x_44100.cpu().view(1, -1), 44100)

You should be good to go!

It's possible that this approach may have broken some things not related to processing.

adobe-research / deepafx-st Goto Github PK

deepafx-st's People

Contributors

Stargazers

Watchers

Forkers

deepafx-st's Issues

RumtimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

How can I process an entire audio file?

colab inference notebook

Haw can I get the parameters of EQ and Compressor?

forward() got an unexpected keyword argument 'time_it'

Is it possible to change the resample from 24000 to 44100?

Google Colab available ?

About the variable 'input_audio_corrupt' in lines 240-241, isn't it target_audio_corrupt?

[Improvement] Increased sample rate to 44100 and added the ability to process entire files.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	# ------------------------ Target audio ----------------------
	# use the same augmented audio clip, add different random EQ and compressor

	target_audio_corrupt = input_audio_aug.clone()
	# apply frequency and dynamic range corrpution (expander)
	if self.freq_corrupt and torch.rand(1).sum() < 0.75:
	target_audio_corrupt = augmentations.frequency_corruption(
	[target_audio_corrupt], self.sample_rate
	)[0]

	# peak normalize again before passing through dynamic range compressor
	input_audio_corrupt /= input_audio_corrupt.abs().max()
	input_audio_corrupt = 10 * (-12.0 / 20) # with min 3 dBFS headroom

	if self.drc_corrupt and torch.rand(1).sum() < 0.75:
	target_audio_corrupt = augmentations.dynamic_range_compression(
	[target_audio_corrupt], self.sample_rate
	)[0]