Audio Style Transfer

Introduction

Style transfer is a concept which is successfully applied to image domain with the example of creating a Van Gogh painting from any given input image. [1] Aim of this project is to adapt the "style transfer" concept to audio domain. Specifically, we aim to transfer the style of an audio (preferably a song) which is labeled as the "style", to another audio which is labeled as the "content", and synthesize a new audio with the general characteristics of the "style" by also remaining loyal to the "content". Through this goal, we can take a step forward for understanding the features of raw music audio signals such as the style, melody, rhythm, and tempo.

Some of the proposed solutions to this problem in the literature include using multiple time-frequency representations [2], short time Fourier transform and Griffin-Lim algorithm [3], and shallow convolutional networks [4]. We aim to implement some of these methods, use the results we will obtain as baselines and try to improve the baseline results by using different features, methods, and models. We want to contribute to this relatively new field of research and come up with interesting results which may bring more attention to the subject.