grrrr / nsgt Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 27.0 166 KB

Non Stationary Gabor Transform (NSGT), Python implementation

Home Page: http://grrrr.org/research/software/nsgt

License: Artistic License 2.0

Python 94.42% Cython 4.25% Makefile 1.34%

nsgt's People

Contributors

Stargazers

Watchers

nsgt's Issues

assemble_coeffs function in ./examples/spectrograms.py returns error

Observing c created on line 108, I see it generates objects of dimension 1 x 66 (a 2d array) therefore the function complains that the index [2] is out of range. This error may well be due to my misunderstanding of arguments.

$ ./spectrogram.py /path/to/wav/audio.wav'
Traceback (most recent call last):
  File "./spectrogram.py", line 112, in <module>
    coefs = assemble_coeffs(c, ncoefs)
  File "./spectrogram.py", line 27, in assemble_coeffs
    out = np.empty((ncoefs,cq0.shape[1],cq0.shape[2]), dtype=cq0.dtype)

If it helps, here is info about the wav file being input:

$ soxi 'MASS/sargon-silenci (metal)/sargon-silenci_1-19_with_effects.wav'

Input File     : 'MASS/sargon-silenci (metal)/sargon-silenci_1-19_with_effects.wav'
Channels       : 2
Sample Rate    : 44100
Precision      : 24-bit
Duration       : 00:00:18.00 = 793801 samples = 1350 CDDA sectors
File Size      : 4.76M
Bit Rate       : 2.12M
Sample Encoding: 24-bit Signed Integer PCM

Seeing as the default for --sr is 44100 I thought it would be fine to run with no additional args set.

Support pyFFTW along with PyFFTW3

I couldn't get PyFFTW3 installed on my machine, but pyFFTW is an alternative wrapper for FFTW that appears better maintained.

You can see a speed comparison of the two wrappers here, they're more or less equivalent:
mperrin/webbpsf#10

I'll submit a pull request in <1 week to get it working.

[Feature Request] A stand alone spectrogram function.

A very useful feature would be a standalone spectrogram function. For example:

nsgt = CQ_NSGT(lowf, fmax, bins, fs, length)
B = nsgt.forward(sig)
nsgt.spectrogram(B)

It could use matplotlib, or any common plotting library on the back end. I had a difficult time working through spectrogram.py to try to plot my numpy array of magnetic field data.

Another very useful function would be a coherence-gram.

Love your work!

Jagged matrix of coefficients

Hello,
The shape returned by nsgt.forward is jagged (i.e. different lengths of nsgt coefficients per frame) - which makes sense, given the varying time-frequency resolution of the NSGT.

However, in MATLAB (who's implementation comes from the same NSGT paper - https://www.mathworks.com/help/wavelet/ref/cqt.html), the cfs matrix returned is square.

Do you know if this is an implementation choice to pad zeros to the maximum length of nsgt coefficients? It's more convenient to be able to treat the CQT output like an STFT (rectangular matrix) and apply some operations on it (masking, filtering, etc.).

How to reconstruct audio signal from the spectrogram (output of assemble_coeffs)

Hello

Thanks a lot for the nice package! I am looking for a "lossless way" to convert audio signal to spectrogram (in mel scale) and back to audio signal again. This seems to be a great solution.

However, after calling the assemble_coeffs() to construct spectrogram from CQT slices. How do I transform the spectrogram back to CQT slices (and thus backward() to audio signal again)? The overlap-add operation in the assemble_coeffs() seems to be irreversible.

Sorry for the dumb question, but it will be great if you could help on this. Thanks!

Python3 Support?

I haven't tried using NSGT in Python3, but would you be open to pull requests for any necessary changes, if any, to make it Py3 compatible?

nsgt/fft.py module is missing!

transform mapping

@grrrr

Hi, my github:
https://github.com/falseywinchnet/streamcleaner

So, recently, i was reading about gabor filters and learned of your excellent constant-q gabor transform.

Not sure I am using it right, but, ie,
from nsgt import NSGT, LogScale, LinScale, MelScale, OctScale, SndReader
rate = 48000
test = data[0:rate]
scl = OctScale(60.41,22050, 48)

nsgt = NSGT(scl, fs=rate/2, Ls=rate , real=True, matrixform=True, reducedform=0)

# forward transform

c = nsgt.forward(test)
s_r = nsgt.backward(c)

Complains about something to do with q factor too high?

/usr/local/lib/python3.9/dist-packages/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 60.41,61.29,62.18,63.08,64.00,64.93,65.87,66.83,67.80,68.78,69.78,70.80,71.83,72.87,73.93,75.00,76.09,77.20,78.32,79.46,80.61,81.78,82.97,84.18,85.40,86.64,87.90,89.18,90.47,91.79,93.12,94.48,95.85,97.24,98.65,100.09,101.54,103.02,104.51,106.03,107.57,109.14,110.72,112.33,113.96,115.62,117.30,119.00,120.73,122.49,124.26,126.07,127.90,129.76,131.65,133.56,135.50,137.47

but seems to reconstruct ok: 4.298e-16
please advise settings for speech, with an emphasis on SSB(<4khz).

Something I noted that I was meaning to ask about is what i will call, because i dont really have a better term, "transform mapping".

That is to say, different transforms which are all reversible will have different certainty in terms of time and frequency, such that something you do to one will show up in the other.

For example, in the STFT conventional form, ie, NFFT=512, hop=128, a typical speech waveform will have harmonics buried in the noise which are not distinguishable from the noise.

However, if the audio is first transformed into the gabor representation as described, and then thresholded below the lowest harmonic(using a statistical approach I developed called atd), and then this is reversed and again transformed into the stft, these tertiary harmonics now dominate the residual structure of the corresponding regions of the specgram, making them visible.

SO, each bin in one representation, or regions of bins, maps (in a complex manner, because you have multiple convolutional steps involved) to a bin, or region of bins, in another representation, and as such, if you, for each bin in the stft, can identify a set of corresponding gabor bins, and identify the maximum value out of all the bins, and map this to the corresponding stft bin, additional structure and dimensionality not typically apparent will manifest in the data set due to the emphasis applied(and as such may be useful for masking).

This, in turn, could it allow the use of perfect reconstruction(in the stft domain) along with additional alternative transforms to better signify the time-frequency localization of energy forms which are buried(convolutionally distributed) in the stft form?

I am presently examining this scenario, but I would appreciate a little insight on how you use nsgt best for speech.

additionally, i noted that when i applied my time domain masking method(called fast_entropy) to the gabor short term transform, and then inverted the remaining bins, and once again applied STFT, due to the complex reconstructive properties, some of the dominant frequency components(the speech) which had been masked(in the time domain), had been reconstructed into the waveform by the inversing, such that the final product was improved further.

I will have to do more research on this.

I read somewhere that the gabor transform uses a guassian function. I have developed an interesting alternative function which is not suitable for reconstruction, but which corresponds to a maximal energy localization and minimal distortion, in the complex domain, which you might use, ie, to generate a mask, of the same dimensions, to then apply to the representation generated with a reversible window(and perhaps using a synthesis window also).
This window is a double inverted logit window, the code for which is :

https://github.com/falseywinchnet/streamcleaner/blob/master/realtime_interactive.py#L102

I would like to know if i can combine this with the gabor transform.

it seems that the gabor transform uses the
https://github.com/grrrr/nsgt/blob/master/nsgt/nsgfwin.py
hann window, but does not yet possibly? make use of improved reconstruction inverting windows:
https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.transform.stft.html#pyroomacoustics.transform.stft.compute_synthesis_window

see the logic here

Also, does the gabor transform suffer from frequency instability as mentioned in

https://dsp.stackexchange.com/questions/72588/synchrosqueezed-stft-phase-transform/72590#72590

in terms of the practical ramifications or modifications necessary, dft cisoid centering is applied simply by

https://github.com/falseywinchnet/streamcleaner/blob/master/realtime.py#L350

padding the input, then basically stacking each segment half backwards,
then windowing with an ifftshifted window-

and then for the inverse applying fftshift on each segment.

best regards

bins or bins_per_octave - inconsistent scale behavior

Hello,
I'm wondering about the design choice that bins can be bins per octave:

OctScale(x, y, 12) will create an arbitrary amount of bins (computed by the limits of fmin and fmax). For one example on my input signals, OctScale(32.7, 22050, 12) creates 123 frequency bins
Non-oct scale (log, mel) will use the input as the exact number of output bins - bins=12 input means 12 bins output

So, ultimately OctScale(32.7, 22050, 12) and MelScale(32.7, 22050, 12) end up with 123 and 12 frequency bins respectively.

Is there a way we can have the OctScale use the bins as the actual n_bins output, and not bins_per_octave?

master branch; python 3.5+

hi!

i had to change the lambda expressions in

nsgt/nsgt/nsigtf.py

Line 91 in f00267f

symm = lambda fc: chain(fc, map(fftsymm,fc[::-1]))

nsgt/nsgt/nsigtf.py

Line 94 in f00267f

symm = lambda fc: chain(fc,map(fftsymm,fc[-2:0:-1]))

diff --git a/nsgt/nsigtf.py b/nsgt/nsigtf.py
index d441514..259f313 100644
--- a/nsgt/nsigtf.py
+++ b/nsgt/nsigtf.py
@@ -88,10 +88,15 @@ def nsigtf_sl(cseq, gd, wins, nn, Ls=None, real=False, reducedform=0, measurefft
         fftsymm = lambda c: np.hstack((c[0],c[-1:0:-1])).conj()
         if reducedform:
             # no coefficients for f=0 and f=fs/2
-            symm = lambda fc: chain(fc, map(fftsymm,fc[::-1]))
+            def symm(_fc):
+                fc = list(_fc)
+                return chain(fc, map(fftsymm, fc[::-1]))
+
             sl = lambda x: chain(x[reducedform:len(gd)//2+1-reducedform],x[len(gd)//2+reducedform:len(gd)+1-reducedform])
         else:
-            symm = lambda fc: chain(fc,map(fftsymm,fc[-2:0:-1]))
+            def symm(_fc):
+                fc = list(_fc)
+                return chain(fc, map(fftsymm, fc[-2:0:-1]))
             sl = lambda x: x
     else:
         ln = len(gd)

to make the tests work, in the version that only relies on numpy, and nothing else.

cheers,
-r

What's the easiest way to scale along time?

Let's say I have a signal and want to scale 2x along time (make it last twice as long).

Should I just zero-pad the input to double the input length, then take the NSGT coefficients and fill and then inverse, or is there a smarter way to go about this?

"python2.7 setup.py tests" no longer works

Hi,

I tried to verify the fix to issue #2 by running "python2.7 setup.py tests" in a freshly cloned tree (revision 8b57366), but it failed with "ImportError: No module named tests". I think this is because the file tests/init.py was removed in commit ac35059 - was this intentional?

Analysis parameters

Hi,

thanks for this repository!
I am trying to analyse a speech sample of 3s duration and 24kHz sampling rate but I'm a bit puzzled at the choice of some parameters.

First this is the classical mel spectrogram of the signal (estimated by librosa melspectrogram):

and this is the result of the cqt (librosa.cqt):

I'd basically like to have the same result than the cqt (I'd like to be able to separate the glottal pulses in the high frequencies) but with a mel scale and the possibility of inversion that comes with the nsgt framework. However I cannot figure out the right choice of parameters that would give me similar results as the cqt display above but on mel scale. This is the best result that I could have

with the following arguments
python spectrogram.py sample.wav --sr 24000 --fmin=80 --fmax=10000 --bins=202 --real --scale=mel --reducedform=1 --fps=100 --plot --matrixform --trlen 200 --sllen 7200

It is kind of similar of the mel-spectrogram above and does not have the fine temporal resolution that I am looking for in the high frequencies. I understand that the choice of sllen and trlen is crucial for this but what should guide the choice?
Also I gather that matrixform is the parameter used to keep the same time division among frequency bins by pooling the bins in the high frequences (that have better time resolution in the nsgt)? Is it a parameter that I should disable to have a better temporal resolution in the high frequencies as in the cqt display above?

Ps: I tried removing the option matrixform and it yields to an error anyway:

  File "spectrogram.py", line 111, in <module>
    coefs = assemble_coeffs(c, ncoefs)
  File "spectrogram.py", line 27, in assemble_coeffs
    out = np.empty((ncoefs, cq0.shape[1], cq0.shape[2]), dtype=cq0.dtype)
IndexError: tuple index out of range```

Update nsgt on PyPI

Would be cool to upload nsgt 0.18 to PyPI. Right now it's still at 0.17, and a pip3 install nsgt will give a version that doesn't work in Python 3. (It can of course be circumvented with pip3 install https://github.com/grrrr/nsgt/archive/0.18.zip.) Cheers!

Q-factor too high for frequencies

Hi,

Many thanks for this great package. I'm using it for musical note analysis and thus I'm putting in the minimum frequency as midi2hz(21) and the maximum as midi2hz(108). 21 and 108 are the lowest and highest notes on a piano respectively, in midi format, and midi2hz converts midi values to Hertz.

My code is thus:

Ls = len(segmentWindow)
numBins = int(np.ceil(108 - 21))
scl = LogScale(midi2hz(21), midi2hz(108), numBins)
nsgt = NSGT(scl, samplingRate, Ls, real=True, matrixform=True)

I'm getting the following warning:
UserWarning: Q-factor too high for frequencies 27.50,29.10,30.78,32.57,34.46,36.46,38.58,40.82,43.19,45.69,48.35,51.15,54.12,57.26,60.59,64.10,67.82,71.76,75.92,80.33,84.99,89.93,95.15,100.67,106.51,112.69,119.23,126.15,133.48,141.22,149.42,158.09,167.27,176.98,187.25 warn("Q-factor too high for frequencies %s"%",".join("%.2f"%fi for fi in f[q >= qneeded]))

Do you know why this is?
How would you recommend fixing this?

Thanks

Variable-Q transform

I notice that the Mel scale construction leads to a variable Q per octave. Can we do similar to create a VQ-NSGT? Variable-Q per octave?
It's mentioned in a few places: https://www.isca-speech.org/archive/interspeech_2015/papers/i15_2744.pdf

Real-time streamed SliCQ

Hi, thanks for this amazing piece of software and the research!

I'm curious to whether it is possible for sliCQ to be used for streamed, real-time samples (e.g. from a microphone). It would be great if it is (and there is an example). I'm curious about the setup that should be used in a streamed scenario

Thanks in advance!

Error running ./example/spectrogram.py on fresh install with all dependencies

Just installed. Trying out examples. Error produced as follows.

./spectrogram.py '/path/to/sound.wav'
Traceback (most recent call last):
  File "./spectrogram.py", line 84, in <module>
    scl = scale(args.fmin, args.fmax, args.bins, beyond=int(args.reducedform == 2))
TypeError: __init__() got an unexpected keyword argument 'beyond'

Proceeds fine when arg removed (creates the log scale object with the default beyond value). However we then hit a warning and then another error:

/home/james/anaconda2/envs/nolearn/lib/python2.7/site-packages/nsgt/nsgfwin_sl.py:113: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  g[kk-1][M[kk-1]//2-M[kk]//2:M[kk-1]//2+ceil(M[kk]/2.)] = hannwin(M[kk])
Traceback (most recent call last):
  File "./spectrogram.py", line 102, in <module>
    ncoefs = int(sf.frames*slicq.coef_factor)
AttributeError: NSGT_sliced instance has no attribute 'coef_factor'

I see that, whilst NSGT_sliced indeed doesn't have an attribute coef_factor, it does have a method!

Failing test cases

Hi,

When I run "python2.7 setup.py test", a number of test cases report failures. This is from git revision dfbceca running on Debian 7.

I would attach the full output as a text file, but github won't let me; it says "Attaching documents requires write permission to this repository. Try again with a PNG, GIF, or JPG.". Here are just a couple of snippets from the output instead:

test_oct (tests.cq_test.TestNSGT) ... /home/gson/sw/nsgt/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 52.15,53.82,55.54,57.32,59.15,61.05,63.00,65.02,67.10,69.25,71.46,73.75,76.11,78.55,81.06,83.66,86.33,89.10,91.95,94.89,97.93,101.06,104.30,107.64,111.08,114.64,118.31,122.09,126.00,130.03,134.20,138.49,142.92,147.50,152.22,157.09,162.12,167.31,172.67,178.19,183.90,189.78,195.86,202.13,208.60,215.27,222.16,229.27,236.61,244.19,252.00,260.07,268.39,276.98,285.85,295.00,304.44,314.19,324.24,334.62,345.33,356.39,367.79,379.57,391.71,404.25,417.19,430.55,444.33,458.55,473.23
warn("Q-factor too high for frequencies %s"%",".join("%.2f"%fi for fi in f[q >= qneeded]))

and

======================================================================
FAIL: gtest_oct (tests.slicq_test.TestNSGT_slices)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gson/sw/nsgt/tests/slicq_test.py", line 53, in gtest_oct
    self.runit(siglen, fmin, fmax, obins, sllen, trlen, real)
  File "/home/gson/sw/nsgt/tests/slicq_test.py", line 18, in runit
    self.assertTrue(close)
AssertionError: False is not true

Very slow cq_test, slicq_test

For example, on Windows10 / AMD64, running python -m unittest discover -s tests -p cq_test.py results in tests taking something like ~10-20 minutes to finish. Is this normal?

What are the dimensions of the NSGT_sliced output

I get the following shape from the NSGT:

# 125 frequency bins
scl = MelScale(78, 22050, 125)

nsgt = NSGT_sliced(scl, 9216, 2304, 44100, real=True, matrixform=True, multichannel=True)

forward = np.asarray(list(nsgt.forward((audio.T,)))).astype(np.complex64)

# shape of forward

# T is number of frames in time - by what division? is it "sllen + (0 <= n <= trlen)"?
# I believe the hop size and/or trlen/transition area is expected to be variable to maintain perfect invertibility.

# 2 channels because audio is stereo

# second-last dimension is specified frequency bins+1, so 126

# the last shape is nsgt.coef_factor*sllen = 304 - what is this dimension?

T x (2 channels) x 126 x 304

In the analog to an STFT (which typically has a shape like I x F x T for chan x frequency_bins x time_frames), what are the two frequency dimensions of the NSGT_sliced?

Are the 126 frequency bins interpolated or duplicated to create a bigger vector of 304 values?

Migrate Scikit.audiolab dependency to PySoundFile?

Hello, I just started playing around with this project but found it hard to get Scikit.audiolab working (it looks like it's been inactive since late 2010, doesn't have Python3 support, etc.).

Would you consider migrating the file I/O routines to something more actively maintained, like PySoundfile? I have a working copy of transform_audio.py with modifications that uses it, and I'm happy to submit pull requests for other call sites. Thanks!

get the hopsize ?

Hi,
thanks for this awesome stuff.

I am wondering: is there some convenient way to get the hopsize for one given transform (NSGT object) ?
thanks

antoine

12-96 bins - why stop at 96?

The paper uses the values 12, 24, 48, and 96 when describing the CQT-NSGT. What if we go higher? 200 bins per octave? 400? What defines these ranges? Is it related to the piano/keyboard standard octaves?

grrrr / nsgt Goto Github PK

nsgt's People

Contributors

Stargazers

Watchers

Forkers

nsgt's Issues

Recommend Projects

Recommend Topics

Recommend Org