grrrr / nsgt Goto Github PK
View Code? Open in Web Editor NEWNon Stationary Gabor Transform (NSGT), Python implementation
Home Page: http://grrrr.org/research/software/nsgt
License: Artistic License 2.0
Non Stationary Gabor Transform (NSGT), Python implementation
Home Page: http://grrrr.org/research/software/nsgt
License: Artistic License 2.0
Observing c
created on line 108, I see it generates objects of dimension 1 x 66 (a 2d array) therefore the function complains that the index [2] is out of range. This error may well be due to my misunderstanding of arguments.
$ ./spectrogram.py /path/to/wav/audio.wav'
Traceback (most recent call last):
File "./spectrogram.py", line 112, in <module>
coefs = assemble_coeffs(c, ncoefs)
File "./spectrogram.py", line 27, in assemble_coeffs
out = np.empty((ncoefs,cq0.shape[1],cq0.shape[2]), dtype=cq0.dtype)
If it helps, here is info about the wav file being input:
$ soxi 'MASS/sargon-silenci (metal)/sargon-silenci_1-19_with_effects.wav'
Input File : 'MASS/sargon-silenci (metal)/sargon-silenci_1-19_with_effects.wav'
Channels : 2
Sample Rate : 44100
Precision : 24-bit
Duration : 00:00:18.00 = 793801 samples = 1350 CDDA sectors
File Size : 4.76M
Bit Rate : 2.12M
Sample Encoding: 24-bit Signed Integer PCM
Seeing as the default for --sr
is 44100 I thought it would be fine to run with no additional args set.
I couldn't get PyFFTW3 installed on my machine, but pyFFTW is an alternative wrapper for FFTW that appears better maintained.
You can see a speed comparison of the two wrappers here, they're more or less equivalent:
mperrin/webbpsf#10
I'll submit a pull request in <1 week to get it working.
A very useful feature would be a standalone spectrogram function. For example:
nsgt = CQ_NSGT(lowf, fmax, bins, fs, length)
B = nsgt.forward(sig)
nsgt.spectrogram(B)
It could use matplotlib, or any common plotting library on the back end. I had a difficult time working through spectrogram.py to try to plot my numpy array of magnetic field data.
Another very useful function would be a coherence-gram.
Love your work!
Hello,
The shape returned by nsgt.forward
is jagged (i.e. different lengths of nsgt coefficients per frame) - which makes sense, given the varying time-frequency resolution of the NSGT.
However, in MATLAB (who's implementation comes from the same NSGT paper - https://www.mathworks.com/help/wavelet/ref/cqt.html), the cfs matrix returned is square.
Do you know if this is an implementation choice to pad zeros to the maximum length of nsgt coefficients? It's more convenient to be able to treat the CQT output like an STFT (rectangular matrix) and apply some operations on it (masking, filtering, etc.).
Hello
Thanks a lot for the nice package! I am looking for a "lossless way" to convert audio signal to spectrogram (in mel scale) and back to audio signal again. This seems to be a great solution.
However, after calling the assemble_coeffs() to construct spectrogram from CQT slices. How do I transform the spectrogram back to CQT slices (and thus backward() to audio signal again)? The overlap-add operation in the assemble_coeffs() seems to be irreversible.
Sorry for the dumb question, but it will be great if you could help on this. Thanks!
I haven't tried using NSGT in Python3, but would you be open to pull requests for any necessary changes, if any, to make it Py3 compatible?
Hi, my github:
https://github.com/falseywinchnet/streamcleaner
So, recently, i was reading about gabor filters and learned of your excellent constant-q gabor transform.
Not sure I am using it right, but, ie,
from nsgt import NSGT, LogScale, LinScale, MelScale, OctScale, SndReader
rate = 48000
test = data[0:rate]
scl = OctScale(60.41,22050, 48)
nsgt = NSGT(scl, fs=rate/2, Ls=rate , real=True, matrixform=True, reducedform=0)
# forward transform
c = nsgt.forward(test)
s_r = nsgt.backward(c)
Complains about something to do with q factor too high?
/usr/local/lib/python3.9/dist-packages/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 60.41,61.29,62.18,63.08,64.00,64.93,65.87,66.83,67.80,68.78,69.78,70.80,71.83,72.87,73.93,75.00,76.09,77.20,78.32,79.46,80.61,81.78,82.97,84.18,85.40,86.64,87.90,89.18,90.47,91.79,93.12,94.48,95.85,97.24,98.65,100.09,101.54,103.02,104.51,106.03,107.57,109.14,110.72,112.33,113.96,115.62,117.30,119.00,120.73,122.49,124.26,126.07,127.90,129.76,131.65,133.56,135.50,137.47
but seems to reconstruct ok: 4.298e-16
please advise settings for speech, with an emphasis on SSB(<4khz).
Something I noted that I was meaning to ask about is what i will call, because i dont really have a better term, "transform mapping".
That is to say, different transforms which are all reversible will have different certainty in terms of time and frequency, such that something you do to one will show up in the other.
For example, in the STFT conventional form, ie, NFFT=512, hop=128, a typical speech waveform will have harmonics buried in the noise which are not distinguishable from the noise.
However, if the audio is first transformed into the gabor representation as described, and then thresholded below the lowest harmonic(using a statistical approach I developed called atd
), and then this is reversed and again transformed into the stft, these tertiary harmonics now dominate the residual structure of the corresponding regions of the specgram, making them visible.
SO, each bin in one representation, or regions of bins, maps (in a complex manner, because you have multiple convolutional steps involved) to a bin, or region of bins, in another representation, and as such, if you, for each bin in the stft, can identify a set of corresponding gabor bins, and identify the maximum value out of all the bins, and map this to the corresponding stft bin, additional structure and dimensionality not typically apparent will manifest in the data set due to the emphasis applied(and as such may be useful for masking).
This, in turn, could it allow the use of perfect reconstruction(in the stft domain) along with additional alternative transforms to better signify the time-frequency localization of energy forms which are buried(convolutionally distributed) in the stft form?
I am presently examining this scenario, but I would appreciate a little insight on how you use nsgt best for speech.
additionally, i noted that when i applied my time domain masking method(called fast_entropy) to the gabor short term transform, and then inverted the remaining bins, and once again applied STFT, due to the complex reconstructive properties, some of the dominant frequency components(the speech) which had been masked(in the time domain), had been reconstructed into the waveform by the inversing, such that the final product was improved further.
I will have to do more research on this.
I read somewhere that the gabor transform uses a guassian function. I have developed an interesting alternative function which is not suitable for reconstruction, but which corresponds to a maximal energy localization and minimal distortion, in the complex domain, which you might use, ie, to generate a mask, of the same dimensions, to then apply to the representation generated with a reversible window(and perhaps using a synthesis window also).
This window is a double inverted logit window, the code for which is :
https://github.com/falseywinchnet/streamcleaner/blob/master/realtime_interactive.py#L102
I would like to know if i can combine this with the gabor transform.
it seems that the gabor transform uses the
https://github.com/grrrr/nsgt/blob/master/nsgt/nsgfwin.py
hann window, but does not yet possibly? make use of improved reconstruction inverting windows:
https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.transform.stft.html#pyroomacoustics.transform.stft.compute_synthesis_window
see the logic here
Also, does the gabor transform suffer from frequency instability as mentioned in
https://dsp.stackexchange.com/questions/72588/synchrosqueezed-stft-phase-transform/72590#72590
?
in terms of the practical ramifications or modifications necessary, dft cisoid centering is applied simply by
https://github.com/falseywinchnet/streamcleaner/blob/master/realtime.py#L350
padding the input, then basically stacking each segment half backwards,
then windowing with an ifftshifted window-
and then for the inverse applying fftshift on each segment.
best regards
Hello,
I'm wondering about the design choice that bins can be bins per octave:
So, ultimately OctScale(32.7, 22050, 12)
and MelScale(32.7, 22050, 12)
end up with 123 and 12 frequency bins respectively.
Is there a way we can have the OctScale use the bins as the actual n_bins output, and not bins_per_octave?
hi!
i had to change the lambda expressions in
Line 91 in f00267f
Line 94 in f00267f
diff --git a/nsgt/nsigtf.py b/nsgt/nsigtf.py
index d441514..259f313 100644
--- a/nsgt/nsigtf.py
+++ b/nsgt/nsigtf.py
@@ -88,10 +88,15 @@ def nsigtf_sl(cseq, gd, wins, nn, Ls=None, real=False, reducedform=0, measurefft
fftsymm = lambda c: np.hstack((c[0],c[-1:0:-1])).conj()
if reducedform:
# no coefficients for f=0 and f=fs/2
- symm = lambda fc: chain(fc, map(fftsymm,fc[::-1]))
+ def symm(_fc):
+ fc = list(_fc)
+ return chain(fc, map(fftsymm, fc[::-1]))
+
sl = lambda x: chain(x[reducedform:len(gd)//2+1-reducedform],x[len(gd)//2+reducedform:len(gd)+1-reducedform])
else:
- symm = lambda fc: chain(fc,map(fftsymm,fc[-2:0:-1]))
+ def symm(_fc):
+ fc = list(_fc)
+ return chain(fc, map(fftsymm, fc[-2:0:-1]))
sl = lambda x: x
else:
ln = len(gd)
to make the tests work, in the version that only relies on numpy, and nothing else.
cheers,
-r
Let's say I have a signal and want to scale 2x along time (make it last twice as long).
Should I just zero-pad the input to double the input length, then take the NSGT coefficients and fill and then inverse, or is there a smarter way to go about this?
Hi,
thanks for this repository!
I am trying to analyse a speech sample of 3s duration and 24kHz sampling rate but I'm a bit puzzled at the choice of some parameters.
First this is the classical mel spectrogram of the signal (estimated by librosa melspectrogram):
and this is the result of the cqt (librosa.cqt):
I'd basically like to have the same result than the cqt (I'd like to be able to separate the glottal pulses in the high frequencies) but with a mel scale and the possibility of inversion that comes with the nsgt framework. However I cannot figure out the right choice of parameters that would give me similar results as the cqt display above but on mel scale. This is the best result that I could have
with the following arguments
python spectrogram.py sample.wav --sr 24000 --fmin=80 --fmax=10000 --bins=202 --real --scale=mel --reducedform=1 --fps=100 --plot --matrixform --trlen 200 --sllen 7200
It is kind of similar of the mel-spectrogram above and does not have the fine temporal resolution that I am looking for in the high frequencies. I understand that the choice of sllen
and trlen
is crucial for this but what should guide the choice?
Also I gather that matrixform
is the parameter used to keep the same time division among frequency bins by pooling the bins in the high frequences (that have better time resolution in the nsgt)? Is it a parameter that I should disable to have a better temporal resolution in the high frequencies as in the cqt display above?
Ps: I tried removing the option matrixform and it yields to an error anyway:
File "spectrogram.py", line 111, in <module>
coefs = assemble_coeffs(c, ncoefs)
File "spectrogram.py", line 27, in assemble_coeffs
out = np.empty((ncoefs, cq0.shape[1], cq0.shape[2]), dtype=cq0.dtype)
IndexError: tuple index out of range```
Would be cool to upload nsgt 0.18 to PyPI. Right now it's still at 0.17, and a pip3 install nsgt
will give a version that doesn't work in Python 3. (It can of course be circumvented with pip3 install https://github.com/grrrr/nsgt/archive/0.18.zip
.) Cheers!
Hi,
Many thanks for this great package. I'm using it for musical note analysis and thus I'm putting in the minimum frequency as midi2hz(21) and the maximum as midi2hz(108). 21 and 108 are the lowest and highest notes on a piano respectively, in midi format, and midi2hz converts midi values to Hertz.
My code is thus:
Ls = len(segmentWindow)
numBins = int(np.ceil(108 - 21))
scl = LogScale(midi2hz(21), midi2hz(108), numBins)
nsgt = NSGT(scl, samplingRate, Ls, real=True, matrixform=True)
I'm getting the following warning:
UserWarning: Q-factor too high for frequencies 27.50,29.10,30.78,32.57,34.46,36.46,38.58,40.82,43.19,45.69,48.35,51.15,54.12,57.26,60.59,64.10,67.82,71.76,75.92,80.33,84.99,89.93,95.15,100.67,106.51,112.69,119.23,126.15,133.48,141.22,149.42,158.09,167.27,176.98,187.25 warn("Q-factor too high for frequencies %s"%",".join("%.2f"%fi for fi in f[q >= qneeded]))
Thanks
I notice that the Mel scale construction leads to a variable Q per octave. Can we do similar to create a VQ-NSGT? Variable-Q per octave?
It's mentioned in a few places: https://www.isca-speech.org/archive/interspeech_2015/papers/i15_2744.pdf
Hi, thanks for this amazing piece of software and the research!
I'm curious to whether it is possible for sliCQ to be used for streamed, real-time samples (e.g. from a microphone). It would be great if it is (and there is an example). I'm curious about the setup that should be used in a streamed scenario
Thanks in advance!
Just installed. Trying out examples. Error produced as follows.
./spectrogram.py '/path/to/sound.wav'
Traceback (most recent call last):
File "./spectrogram.py", line 84, in <module>
scl = scale(args.fmin, args.fmax, args.bins, beyond=int(args.reducedform == 2))
TypeError: __init__() got an unexpected keyword argument 'beyond'
Proceeds fine when arg removed (creates the log scale object with the default beyond value). However we then hit a warning and then another error:
/home/james/anaconda2/envs/nolearn/lib/python2.7/site-packages/nsgt/nsgfwin_sl.py:113: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
g[kk-1][M[kk-1]//2-M[kk]//2:M[kk-1]//2+ceil(M[kk]/2.)] = hannwin(M[kk])
Traceback (most recent call last):
File "./spectrogram.py", line 102, in <module>
ncoefs = int(sf.frames*slicq.coef_factor)
AttributeError: NSGT_sliced instance has no attribute 'coef_factor'
I see that, whilst NSGT_sliced indeed doesn't have an attribute coef_factor
, it does have a method!
Hi,
When I run "python2.7 setup.py test", a number of test cases report failures. This is from git revision dfbceca running on Debian 7.
I would attach the full output as a text file, but github won't let me; it says "Attaching documents requires write permission to this repository. Try again with a PNG, GIF, or JPG.". Here are just a couple of snippets from the output instead:
test_oct (tests.cq_test.TestNSGT) ... /home/gson/sw/nsgt/nsgt/nsgfwin_sl.py:64: UserWarning: Q-factor too high for frequencies 52.15,53.82,55.54,57.32,59.15,61.05,63.00,65.02,67.10,69.25,71.46,73.75,76.11,78.55,81.06,83.66,86.33,89.10,91.95,94.89,97.93,101.06,104.30,107.64,111.08,114.64,118.31,122.09,126.00,130.03,134.20,138.49,142.92,147.50,152.22,157.09,162.12,167.31,172.67,178.19,183.90,189.78,195.86,202.13,208.60,215.27,222.16,229.27,236.61,244.19,252.00,260.07,268.39,276.98,285.85,295.00,304.44,314.19,324.24,334.62,345.33,356.39,367.79,379.57,391.71,404.25,417.19,430.55,444.33,458.55,473.23
warn("Q-factor too high for frequencies %s"%",".join("%.2f"%fi for fi in f[q >= qneeded]))
and
======================================================================
FAIL: gtest_oct (tests.slicq_test.TestNSGT_slices)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/gson/sw/nsgt/tests/slicq_test.py", line 53, in gtest_oct
self.runit(siglen, fmin, fmax, obins, sllen, trlen, real)
File "/home/gson/sw/nsgt/tests/slicq_test.py", line 18, in runit
self.assertTrue(close)
AssertionError: False is not true
For example, on Windows10 / AMD64, running python -m unittest discover -s tests -p cq_test.py
results in tests taking something like ~10-20 minutes to finish. Is this normal?
I get the following shape from the NSGT:
# 125 frequency bins
scl = MelScale(78, 22050, 125)
nsgt = NSGT_sliced(scl, 9216, 2304, 44100, real=True, matrixform=True, multichannel=True)
forward = np.asarray(list(nsgt.forward((audio.T,)))).astype(np.complex64)
# shape of forward
# T is number of frames in time - by what division? is it "sllen + (0 <= n <= trlen)"?
# I believe the hop size and/or trlen/transition area is expected to be variable to maintain perfect invertibility.
# 2 channels because audio is stereo
# second-last dimension is specified frequency bins+1, so 126
# the last shape is nsgt.coef_factor*sllen = 304 - what is this dimension?
T x (2 channels) x 126 x 304
In the analog to an STFT (which typically has a shape like I x F x T for chan x frequency_bins x time_frames), what are the two frequency dimensions of the NSGT_sliced?
Are the 126 frequency bins interpolated or duplicated to create a bigger vector of 304 values?
Hello, I just started playing around with this project but found it hard to get Scikit.audiolab working (it looks like it's been inactive since late 2010, doesn't have Python3 support, etc.).
Would you consider migrating the file I/O routines to something more actively maintained, like PySoundfile? I have a working copy of transform_audio.py with modifications that uses it, and I'm happy to submit pull requests for other call sites. Thanks!
Hi,
thanks for this awesome stuff.
I am wondering: is there some convenient way to get the hopsize for one given transform (NSGT object) ?
thanks
antoine
The paper uses the values 12, 24, 48, and 96 when describing the CQT-NSGT. What if we go higher? 200 bins per octave? 400? What defines these ranges? Is it related to the piano/keyboard standard octaves?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.