Coder Social home page Coder Social logo

amen's Introduction

Build Status Coverage Status Dependency Status Documentation Status GitHub license

amen

A toolbox for algorithmic remixing, after Echo Nest Remix.

Platforms

Amen is developed on Ubuntu 14.04 and higher. OS X should be workable. Windows users should install Ubuntu.

Installation

Amen is pretty simple, but it stands on top of some complex stuff.

If you are on OSX, go on to the install Anaconda step. If you are on Linux, you'll need to do some apt-getting:

  • libsoundfile: sudo apt-get install libsndfile1
  • libavtools: sudo apt-get update && sudo apt-get install libav-tools

You should install Anaconda, (https://www.continuum.io/downloads) which will get you all of the dependencies.

Then, install via pip: pip install amen. That should be it!

(If you're a serious Python person, you can just get Amen from pip, without Anaconda - but that will require installing numpy, scipy, a fortran compiler, and so on.)

Testing the Installation

After installation is finished, open up a Python interpreter and run the following (or run it from a file):

from amen.utils import example_audio_file
from amen.audio import Audio
from amen.synthesize import synthesize

audio_file = example_audio_file()
audio = Audio(audio_file)

beats = audio.timings['beats']
beats.reverse()

out = synthesize(beats)
out.output('reversed.wav')

If all that works, you just need to play the resulting reversed.wav file, and you're on your way!

Examples

We've got a few other examples in the examples folder - most involve editing a file based on the audio features thereof. We'll try to add more as we go.

Documentation

You can read the docs at http://amen.readthedocs.io/en/latest! You can also build the docs locally, using Sphinx. Just run make within the docs directory.

Contributing

Welcome aboard! Please see CONTRIBUTING.md, or open an issue if things don't work right.

Thanks

Amen owes a very large debt to Echo Nest Remix. Contributors to that most esteemed library include:

  • Chris Angelico
  • Yannick Antoine
  • Adam Baratz
  • Ryan Berdeen
  • Dave DesRoches
  • Dan Foreman-Mackey
  • Tristan Jehan
  • Joshua Lifton
  • Adam Lindsay
  • Alison Mandel
  • Nicola Montecchio
  • Rob Oschorn
  • Jason Sundram
  • Brian Whitman

amen's People

Contributors

blacker avatar bmcfee avatar curly-mo avatar mikeill avatar mkanespotify avatar tkell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amen's Issues

beat timing is incorrectly quantized

I was playing with the reverse.py example, and noticed that things were sounding ... strange.

Looking into the code, I noticed this, which is incorrect. The problem here is that fix_frames is intended for use with frame indices, which must be integer typed. When you call this after mapping back to the time representation, everything gets rounded to the nearest second.

I'll fix and PR.

Make 0.0.1

As per the comments in #86, we should get to this soon!

Feature object

Starting a new thread to expand on the feature object design that I vaguely started in #4.

In the previous jam, we had features like 'timbre' and 'pitch' connected to segments. I'd like to abstract out features from timing in amen, since different features may come at different sampling rates. Here, I'm distinguishing feature observations like 'pitch' and 'timbre' from time-index observations, like 'beat' and 'segment'.

To make this all work, I'm thinking of the following design. First, features are stored as pandas dataframes with a time-valued index. This gives us a few nice features right off the bat:

  • direct connection between a feature observation and its position in the audio signal
  • sane column-headings, eg, pitch['D#'] instead of pitch[4]
  • numpy-friendly operations (slicing, math)
  • sql-like operations (joining, merging)
  • time-friendly resampling

Then, we can define a Feature class which wraps the dataframe, and provides a few extra operations:

  • re-indexing, relative to a TimeSlice collection. This way, we can index a feature object (say, pitch_class) by any type of time-interval indices (eg, 'beats' or 'segments')
  • data-dependent interpolation logic. Re-indexing will necessarily involve some statistical summarization if the time points span multiple values. Numeric types may be summarized by different statistics (mean, median, max); categorical types (eg, chord labels) may be summarized by mode. This definition will be part of the corresponding derived Feature class
  • iteration over observations (rows) of the dataframe

At the end of the day, the old style of

>>> [beat.pitches for beat in track.beats]

would look more like

>>> track.features.pitch[beats]

with the added benefit that indexing the Feature object pitch will return a new Feature object (with time indexing and column headers), rather than a flat list.

Loading Audio

..brought to you by a frustrating day reinstalling my Ubuntu partition.

For the moment, I think we just worry about loading WAV and MP3 - Brian's said that librosa can deal with both of them.

I feel like we want to do analysis = amen.load('some_audio_file.wav'). Does this generate all analyses that are possible? Or do we do something like:

audio = amen.load('some_audio_file')
audio.get_analysis('pitches')
audio.get_time_slices('beats') # this needs a better name
audio['pitches'].at(audio.beats)
# do something with beats and pitch analysis

The former has the advantage of being simple - even people who don't know what they're looking for can get analysis data with a single line of code. The latter has the advantage of being faster, more modular, and more specific.

Or, maybe we generate some basic things when we do amen.load (beats, pitches, etc?) - but if you want, say, some black-magic timbre analysis, you can run `audio.get_analysis('black_magic_timbre')

Installation test bug: TypeError: only integer scalar arrays can be converted to a scalar index

When running the installation test script as described on the README I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/amen/synthesize.py", line 100, in synthesize
    truncated_array = sparse_array[:, 0:max_samples].toarray()
  File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
    return self._get_row_ranges(i, j)
  File "/Library/Python/2.7/site-packages/scipy/sparse/lil.py", line 320, in _get_row_ranges
    j_start, j_stop, j_stride = col_slice.indices(self.shape[1])
TypeError: only integer scalar arrays can be converted to a scalar index

I was able to trace this down to the max_samples array not being parsed properly to an array index when used in sparse_array[:, 0:max_samples].toarray(). I was able to fix the issue on my local system by changing this to sparse_array[:, 0:max_samples[0]].toarray().

Not sure if this is an error associated with my installation or with the code here. This is a fresh install of Anaconda on Python 2.7.

Performance Issues

...is my computer amazingly slow? How long should it take for librosa to analyze a five minute long wav file?

I am getting like 45 second to a minute to create an Audio object, and even longer for my apparently awful synthesis code to run. Has anyone had comparable experiences?

Per-TimeSlice timestreching / effects

Needed to emulate the Swinger! See #104 and #66 and #105 for current thinking and code around this.

It may be helpful to allow a per-TimeSlice timestrech without impacting the entire parent Audio object – and maybe this happens at synthesis time? Unsure yet.

Deformation architecture

How do we want/expect people to manipulate audio within amen?

The synthesize function is great for re-arranging a clip by timing, but doesn't give us a handle on how to do things like, say, vocal subtraction or time-stretching.

Do we want to provide an object interface for this kind of thing? Or just let folks hack functions themselves? Either way, I think we should not support/allow in-place modification of the audio buffers, since it would either trigger an (expensive) feature analysis or have inconsistent results.

For example, a time-stretcher might look something like:

import pyrubberband as pyrb

def amen_time_stretch(audio, rate=1.0):
    y_stretch = pyrb.time_stretch(audio.raw_samples, audio.sample_rate, rate=rate)
    return Audio(raw_samples=y_stretch,
                         sample_rate=sample_rate,
                         analysis_sample_rate=audio.analysis_sample_rate)

This is pretty simple, but it bothers me that you have to access the Audio object's internals directly and propagate them manually. Maybe that's the only way though?

More generally, I could imagine effects that return multiple clips (eg, source separation), so a consistent object interface might be tricky to pull off here.

version 0.0.0

pip install is failing to find new lib changes because there are no version changes, the version is always 0.0.0.

In order to install, I can not use pip install amen, I need to instead use pip install git+git://github.com/algorithmic-music-exploration/amen

Accelerating various ops

Just jotting this down before I forget.

librosa 0.5 will add dynamic time warping (not totally relevant for amen), and as a side-effect, optional numba jit compilation for certain methods.

This should make it much easier to accelerate certain bottleneck ops like zero-crossing alignment.

Video support?

I was kinda dissapointed when I saw this as the successor to the Remix API only to find it didn't have a feature that I wanted to try out. So far, importing an mp4 works and everything, all up until the export process. I'd love to see this feature implemented.

Feature: Segments

As per @Cortexelus on the timbre thread, it would be nice to have Echo Nest style segments as a Timing.

Will doing this make anything weird, in terms of how we handle Feature data with pandas? I don't think so.

Porting from Echonest

Thank you! We're porting a web app from Echonest that is basically a fork of P. Sobot's Forever.fm and a couple of features we require are:

  1. Crossfade (which was coming from Action/cAction)
    2. AudioData (which I [grabbed](https://github.com/echonest/remix/blob/master/src/echonest/remix/audio.py along with the ffmpeg wrapper it depended on)

We were using a few other Capsule functions as well and I'm wondering if there are any plans to incorporate any of these remix helpers into Amen.

If it does make sense to include AudioData (and even AudioStream) I can add them to my fork and submit a PR.

Default DSP Settings

Librosa has defaults of a 22050 sample rate, and a hop length of 512.

Brian comment that we may eventually use other feature generation / analysis tools that having other defaults, and should consider that.

I feel like we can worry about that when we get there, myself?

Remix interface

[documenting offline conversation with @blacker ]

Some quick thoughts about how the interface for synthesizing waveforms should look.

  • We'll need some kind of audio container object that includes an audio buffer + metadata (sampling rate, # channels, duration, etc)
  • Synthesis happens via generators, eg something like this for a beat-reverser:
>>> def my_generator(Track):
    start = Track.duration
    for beat in Track.beats[::-1]:
        start = start - beat.duration
        yield start, beat
>>> syn = synthesize(duration=Track.duration, my_generator)
  • The generator returns an audio container corresponding to a particular sample, and a target position for the sample in the output buffer
  • The synthesize function iterates over the generator and adds samples into the output stream. It returns a new audio object (I guess, an audio container object itself). Stereo/resampling/zc-alignment are all handled within synthesize

This makes it easy to do concatenative synthesis (as above). You can also do additive mixing by having overlapping target times.

Code review process

Just a heads up:

I took the liberty of adding review ninja on here, in an effort to better keep track of what's already been reviewed via PR status.

If people hate it, we can shut it off.

Feature: Bars

@bmcfee had some thoughts about this - I feel like we said that we would not use librosa for this?

reversal example fails in the wild

 1  ⌂ py35   master ×  ~/git/amen/amen/examples 
 →  python reverse.py  ~/git/librosa/tests/data/test1_44100.wav 
/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/compressed.py:739: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
Traceback (most recent call last):
  File "reverse.py", line 19, in <module>
    out = synthesize(beats)
  File "/home/bmcfee/git/amen/amen/synthesize.py", line 92, in synthesize
    sparse_array[1, right_start:right_end] += resampled_audio[1]
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 272, in __getitem__
    return self._get_row_slice(row, col)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 353, in _get_row_slice
    row_slice = self._get_submatrix(i, cslice)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 420, in _get_submatrix
    check_bounds(j0, j1, N)
  File "/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/scipy/sparse/csr.py", line 415, in check_bounds
    " %d <= %d" % (i0, num, i1, num, i0, i1))
IndexError: index out of bounds: 0 <= 52919990 <= 52920000, 0 <= 88360 <= 52920000, 52919990 <= 88360

Feature: Loudness

I am currently giving us amplitude, by using librosa.feature.rmse. Please close if I am doing the right thing!

Pip install for 0.0.1 fails

FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/2c/99n_4g3n0ml7d1gz40y77jc00000gn/T/pip-build-5nik37q_/amen/examples'

Ooops. Working on it.

Analysis object

Let's start scoping this thing out!

What functionality does the Analysis object need to provide, and what should the interface look like?

For now, let's not limit ourselves to compatibility with EN remix. Backwards compatibility can always be tacked on with a translation layer. I'm more interested in making sure the core is well designed and extensible in the right kinds of ways.

I'll start off a check-list of features it should expose, but the interface can come later.

[EDIT: 2015-06-06, restructured the feature list by type]

  • Features (time-indexed)
    • tempo
    • key
    • time signature
    • timbre (ie MFCC)
    • pitch class (ie chroma)
    • pitch (ie cqt)
    • loudness (log-RMSE)
  • Timings
    • onsets and/or segments/tatums
    • beat
    • bar
    • structural boundaries

Blue-sky feature wish list:

  • instrument activation
  • chord estimations
  • melody/f0 contours
  • high-level rhythmic analysis

Some general design principles:

  • all timing measurements should be aligned to the closest zero-crossings, though I'm not sure how to reconcile that with stereo-vs-mono. Anyone have thoughts on this?
  • feature extractors should be modular and pluggable. I could imagine wanting to refine a particular model's implementation without changing its interface. Extractors should therefore include semantic versioning in their metadata. I have a prototype of something like this in seymour, but it could be done much better.
  • In the interest of minimizing redundancy, it might be good to design so that any feature (pitch, pitch class, timbre, etc) can be sampled relative to any timing (track, beat, onset) with appropriate aggregation (mean, max, median, etc). I'm not sure what exactly this means for serialization of analysis objects.
  • As we've discussed offline, I'm planning to implement a suite of analysis modules that live on top of librosa, but in a separate package. Probably that package should be a dependency for amen once it stabilizes, since analysis is useful in broader contexts than remix.

Releasing 0.0.1?

How far out from this are we? We clearly don't have comparable feature extraction compared to the old remix, but is it worth announcing it / putting it on PyPi anyways?

Related to this is that the Monthly Music Hackathon for February is Automatic Music, so we could announce it as one of the talks.

Thoughts?

TimeSlice.get_audio()

In implementing a few remix hacks yesterday, I kept finding myself wanting to construct a new Audio object from a time slice. Currently, the only way to do this is to extract the waveform by get_samples() and then instantiate a new Audio object.

This is undesirable for a few reasons:

  1. It's clunky
  2. It triggers a full copy and re-analysis of the audio

1 is okay, but 2 is a deal breaker if you're extracting short clips (eg beats), which may be too short for certain analyses to make sense.

What do folks think about making a shortcut for this kind of operation that propagates features (and timings) from the source audio of a time slice? This way, we can also preserve things like beat timings within a sliced interval, which might come out differently if the interval is analyzed independently of the full track.

If we're careful about things, the audio buffer could also be shared between audio objects by slicing.

Iterating over features?

One of my favorite things about Remix was the ability to do for beat in beats, and so on.

We don't currently have an iterator over features, e.g. for amp in amplitudes.

Likewise, we don't have an easy way to get TimeSlices and the features to use to manipulate them, unless we do something like:

amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in zip(amps, audio.timings['beats']):
    # do things to each beat based on feature

I feel like we should:
a) make the data in the dataframe of a feature iterable.
b) Allow a feature to reference its timings. Something like feature.with_time(), maybe?

To contrast:

amps = audio.features['amplitude'].at(audio.timings['beats'])
for feature, beat in amps.with_time()
    # do things to each beat based on feature

A problem with that is that feature objects that have not been resampled do not have any TimeSlices to reference.

Thoughts?

Feature: Onsets

I suspect we can get this with librosa.onset.onset_detect

Blue Sky Features

As per #4. Let's do these after we do all the other ones.

Blue-sky feature wish list:

  • instrument activation
  • chord estimations
  • melody/f0 contours
  • high-level rhythmic analysis

zero-crossing alignment walks off the end of the sample buffer

/home/bmcfee/git/amen/amen/synthesize.py in synthesize(inputs)
     66     for i, (time_slice, start_time) in enumerate(inputs):
     67         # if we have a mono file, we return stereo here.
---> 68         resampled_audio, left_offset, right_offset = time_slice.get_samples()
     69 
     70         # set the initial offset, so we don't miss the start of the array

/home/bmcfee/git/amen/amen/timing.py in get_samples(self)
     35         left_offsets, right_offsets = self._get_offsets(starting_sample,
     36                                                         ending_sample,
---> 37                                                         self.audio.num_channels)
     38 
     39         samples = self._offset_samples(starting_sample, ending_sample,

/home/bmcfee/git/amen/amen/timing.py in _get_offsets(self, starting_sample, ending_sample, num_channels)
     60                 ending_offset = 0
     61             else:
---> 62                 ending_crossing = zero_index[bisect_right(zero_index, ending_sample)]
     63                 ending_offset = ending_crossing - ending_sample
     64 

IndexError: index 355701 is out of bounds for axis 0 with size 355701

I think the problem here is that bisect_right(arr, x) can return len(arr) if x > arr[i] for all i. We can detect this case and fall back to bisect_left (or just set it to the last element).

Effects

This is related to #66, but is a bit different: if I want to put an effect (a delay or a compressor or a pitch shifter) on to a certain chunk of audio, how do I do that?

I think a decent answer is to build up effect chains that can then be applied to a given TimeSlice, and that are not applied until synthesize is called. So track-level deformations trigger a new analysis, but you can also just out an EQ on some signal without making a new Audio object.

TimeSlice.get_samples() minor error

I'm not sure this logic is entirely correct for mapping slice points to zero crossings.

The bisection search finds the insertion index i of a value v into a sorted list a, but does not tell you which of a[i], a[i+1] is closer to v.

For example, if a = [10, 20], both 11 and 19 have insertion index of 1, but the closest value positions are different.

This is easily fixable, and probably doesn't matter much in practice anyway.

More Examples!

  • Combining two tracks
  • Chaining functions
  • Sorting
  • Echo Nest json

Feature: Tempo

We summon this when computing beats - we just need to add it to Audio, so we can do audio.tempo

Feature: Key

I guess the parent Audio object should have a key?

Or should we be cute and make it computable per-TimeSlice?

Feature: Pitch Class

Started in #40. I think the only open question is what we should name the keys in the FeatureCollection.

Synthesize fails on mono tracks.

As per @bmcfee:

Ok, my original bad test case works now, but this one fails

→  python reverse.py  ~/data/CAL500/mp3/art_tatum-willow_weep_for_me.mp3 
Traceback (most recent call last):
  File "reverse.py", line 19, in <module>
    out = synthesize(beats)
  File "/home/bmcfee/git/amen/amen/synthesize.py", line 64, in synthesize
    resampled_audio, left_offsets, right_offsets = time_slice.get_samples()
  File "/home/bmcfee/git/amen/amen/time.py", line 32, in get_samples
    left_offsets, right_offsets = self._get_offsets(starting_sample, ending_sample)
  File "/home/bmcfee/git/amen/amen/time.py", line 46, in _get_offsets
    zero_crossings = librosa.zero_crossings(channel)
  File "/home/bmcfee/git/librosa/librosa/core/audio.py", line 526, in zero_crossings
    y[np.abs(y) <= threshold] = 0
TypeError: 'numpy.float32' object does not support item assignment

Can we expand the test fixtures here to have both stereo and mono examples?

(Yes, yes we can.)

Package examples with installation

I am open to not doing this, but it feels sort of nice to give such things to people.

On the other hand, they'll be down in some awkward place, and everyone can just copy the example code from here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.