Coder Social home page Coder Social logo

adobe-research / deepafx-st Goto Github PK

View Code? Open in Web Editor NEW
350.0 12.0 44.0 191 KB

DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/

License: Other

Shell 4.63% Python 95.37%
afx ai audio audio-processing audio-production compressor deeplearning drc effects eq

deepafx-st's People

Contributors

csteinmetz1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepafx-st's Issues

RumtimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Hello!

When I run the script:

(deepafx-st) E:\CODE\DeepAFx-ST>python scripts/process.py -i "E:\CODE\DeepAFx-ST\audio files\raw\160 JAZZ DNB2_MdcL3.wav" -r "E:\CODE\DeepAFx-ST\audio files\target\05 NOT TiGHT.wav" -c "E:\CODE\DeepAFx-ST\checkpoints\style\jamendo\autodiff\lightning_logs\version_0\checkpoints\epoch=362-step=1210241-val-jamendo-autodiff.ckpt"

I encounter a runtime error:

Resampling to 24000 Hz...
Traceback (most recent call last):
  File "scripts/process.py", line 89, in <module>
    x_24000 = torch.tensor(resampy.resample(x.view(-1).numpy(), x_sr, 24000))
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Any ideas how I can resolve this?

System specs:
Windows 10
Anaconda Python
cmd.exe

How can I process an entire audio file?

I want to process an entire audio file, but the code currently uses only five seconds from the input and reference.

When I comment these out in process.py, it returns the processed file, but the audio seems to appear more than once in the output.
x_24000 = x_24000[0:1, : 24000 * 5]
r_24000 = r_24000[0:1, : 24000 * 5]

forward() got an unexpected keyword argument 'time_it'

if run process.py with option "--time", it crashed with "forward() got an unexpected keyword argument 'time_it'":

process.py -i examples/voice_raw.wav -r examples/voice_produced.wav --time -c checkpoints/style/libritts/tcn1/lightning_logs/version_1/checkpoints/epoch=367-step=1226911-val-libritts-tcn1.ckpt

Exception has occurred: TypeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
forward() got an unexpected keyword argument 'time_it'
File "C:\Users\xxx\anaconda3\envs\afx\Lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\DeepAFx-ST-0.1.0\scripts\process.py", line 130, in
y_hat, p, e, encoder_time_sec, dsp_time_sec = system(
File "C:\Users\xxx\anaconda3\envs\afx\Lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\xxx\anaconda3\envs\afx\Lib\runpy.py", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,

Google Colab available ?

Is the google colab available anywhere or not ? The link on the GitHub does not work :/ Thanks !

About the variable 'input_audio_corrupt' in lines 240-241, isn't it target_audio_corrupt?

# ------------------------ Target audio ----------------------
# use the same augmented audio clip, add different random EQ and compressor
target_audio_corrupt = input_audio_aug.clone()
# apply frequency and dynamic range corrpution (expander)
if self.freq_corrupt and torch.rand(1).sum() < 0.75:
target_audio_corrupt = augmentations.frequency_corruption(
[target_audio_corrupt], self.sample_rate
)[0]
# peak normalize again before passing through dynamic range compressor
input_audio_corrupt /= input_audio_corrupt.abs().max()
input_audio_corrupt *= 10 ** (-12.0 / 20) # with min 3 dBFS headroom
if self.drc_corrupt and torch.rand(1).sum() < 0.75:
target_audio_corrupt = augmentations.dynamic_range_compression(
[target_audio_corrupt], self.sample_rate
)[0]

[Improvement] Increased sample rate to 44100 and added the ability to process entire files.

I attempted to improve DeepAFx-ST. Here's what I did.

Download the zip from https://github.com/adobe-research/DeepAFx-ST and extract it.

Open Notepad++, press CTRL+SHIFT+F, find 24000, replace 44100, set the directory, replace in files.

At this point you can safely add the checkpoints and examples.

Edit scripts/process.py
Replace x_44100 = torch.tensor(resampy.resample(x.view(-1).numpy(), x_sr, 44100)) with x_44100 = torch.tensor(resampy.resample(x.reshape(-1).numpy(), x_sr, 44100))
Under x_44100 = x_44100.view(1, -1) insert x_44100 = x_44100[0:1, : x_44100.shape[-1] // 2]
Under x_44100 = x insert x_44100 = x_44100[0:1, : x_44100.shape[-1]]
Replace r_44100 = torch.tensor(resampy.resample(r.view(-1).numpy(), r_sr, 44100)) with r_44100 = torch.tensor(resampy.resample(r.reshape(-1).numpy(), r_sr, 44100))
Under r_44100 = r_44100.view(1, -1) insert r_44100 = r_44100[0:1, : r_44100.shape[-1] // 2]
Under r_44100 = r insert r_44100 = r_44100[0:1, : r_44100.shape[-1]]

Remove x_44100 = x_44100[0:1, : 44100 * 5]
Remove r_44100 = r_44100[0:1, : 44100 * 5]

Replace filename = os.path.basename(args.input).replace(".wav", "") with filename = os.path.splitext(os.path.basename(args.input))[0]
Remove reference = os.path.basename(args.reference).replace(".wav", "")
Replace out_filepath = os.path.join(dirname, f"{filename}_out_ref={reference}.wav") with out_filepath = os.path.join(dirname, f"{filename}_DeepAFx-ST.wav")
Remove in_filepath = os.path.join(dirname, f"{filename}_in.wav")
Remove torchaudio.save(in_filepath, x_44100.cpu().view(1, -1), 44100)

You should be good to go!

It's possible that this approach may have broken some things not related to processing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.