Coder Social home page Coder Social logo

fra-rir's Introduction

FRAM-RIR

Python implementation for FRAM-RIR, a fast and plug-and-use multi-channel room impulse response simulation tool without the need of specific hardward acceleration platforms (e.g., GPUs).

Long paper

Interspeech'23 short paper

Update 2023/12/04

  • Add support for customizing microphone and source orientations
  • Fix issues in image sampling which may cause suboptimal performance

Dependencies

  • numpy
  • torch (tested for v1.13 & v2.0.0)
  • torchaudio (tested for v0.11)

Usage

rir, early_rir = FRAM_RIR(mic_pos, sr, T60, room_dim, src_pos, num_src=1, direct_range=(-6, 50), n_image=(512, 2049), src_pattern='omni', src_orientation_rad=None, mic_pattern='omni', mic_orientation_rad=None)

Parameters:

  • mic_pos: The microphone(s) position with respect to the room coordinates, with shape [num_mic, 3] (in meters). Room coordinate system must be defined in advance, with the constraint that the origin of the coordinate is on the floor (so positive z axis points up).
  • sr: RIR sampling rate (Hz).
  • rt60: RT60 (second).
  • room_dim: Room size with shape [3] (meters).
  • src_pos: The source(s) position with respect to the room coordinate system, with shape [num_src, 3] (meters).
  • num_src: Number of sources. Default: 1.
  • direct_range: 2-element tuple, range of early reflection time (milliseconds, defined as the context around the direct path signal) of RIRs. Default: (-6, 50).
  • n_image: 2-element tuple, minimum and maximum number of images to sample from. Default: (512, 2049).
  • src_pattern: Polar pattern for all of the sources. {"omni", "half_omni", "cardioid", "hyper_cardioid", "sub_cardioid", "bidirectional"}. Default: omni. See test_samples.py for examples.
  • src_orientation_rad: Array-like with shape [num_src, 2]. Orientation (rad) of all the sources, where the first column indicate azimuth and the second column indicate elevation, all calculated with respect to the room coordinate system. None (default) is only valid for omnidirectional patterns. For other patterns with src_orientation_rad=None, apply random source orientation.
  • mic_pattern: Polar pattern for all of the receivers. {"omni", "half_omni", "cardioid", "hyper_cardioid", "sub_cardioid", "bidirectional"}. Default: omni. See test_samples.py for examples.
  • mic_orientation_rad: Array-like with shape [num_mic, 2]. Orientation (rad) of all the microphones, where the first column indicate azimuth and the second column indicate elevation, all calculated with respect to the room coordinate system. None (default) is only valid for omnidirectional patterns. For other patterns with mic_orientation_rad=None, assume all microphone pointing up (positive z axis) to mimic the scenario where all microphones are put on a table.

Outputs:

  • rir: RIR filters for all mic-source pairs, with shape [num_mic, num_src, rir_length].
  • early_rir: Early reflection (direct path) RIR filters for all mic-source pairs, with shape [num_mic, num_src, rir_length].

Reference

If you use FRAM-RIR in your project, please consider citing the following papers.

@article{luo2023fast,
title={Fast Random Approximation of Multi-channel Room Impulse Response},
author={Luo, Yi and Gu, Rongzhi},
year={2023},
eprint={2304.08052},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

@inproceedings{luo2023fra,
title={{FRA}-{RIR}: Fast Random Approximation of the Image-source Method},
author= {Luo, Yi and Yu, Jianwei},
year=2023,
booktitle={Proc. Interspeech},
pages={3884--3888}
}

Disclaimer

This is not an officially supported Tencent product.

fra-rir's People

Contributors

moplast avatar yluo42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fra-rir's Issues

是否会考虑麦克风阵列所接收的信号的相位差的问题?

尊敬的luo,

您的工作对于快速RIR有着极大的帮助,在这之前进行模拟RIR通常需要大量的时间。

我有个疑问就是生成麦克风阵列(线性或圆阵)的RIR的时候,直接将音频和对应的RIR进行卷积的话,是否考虑了不同麦克风之间接收所信号的相位差的问题呢?

至此,
敬礼!

Something dismatching with paper in code .

In line 85/86 of[(https://github.com/tencent-ailab/FRA-RIR/blob/main/FRA-RIR.py)]
for i in range(nsource): rir[i][delta_idx[i]] += delta_decay[i]
and delta_idx is treated as an sample index, however, as the paper descreibe:

the number of distant virtual sound sources increases as their distance increases

the matrix delta_decay may have some different values but point to the same distance.
but the above code just keed the last delta_idx[i], i.e., if delta_idx[i]==delta_idx[j],
it just keep one of them
. I'm confused with this and want to know is it true?

line 75

reflect_pertub = torch.FloatTensor(nsource, image).uniform_(a, b) * dist_ratio.pow(tau)

In paper dist_ratio denotes distance but here is dist/d_0

line 58> dist_range = [torch.linspace(1., velocity*T60/direct_dist[i]-1, image) for i in range(nsource)]
range should be [1, c_{0}T_{60}/d_{0}], i guess.

Integrating fast RIR in Lhotse

I came across your paper on ArXiv, and the method looks quite promising. I am interested in integrating this as a data transform in Lhotse (lhotse-speech/lhotse#787). However, my concern is that this code is distributed under a CC-by-NC license, whereas Lhotse provides an Apache license. Of course, there would be a significant rewrite in the integration itself, but the core implementation would be the same. I am not sure if this would be an issue?

仿真生成的数据异常问题

仿真8麦克风线阵,间距35mm,采用生成的rir进行数据合成,发现合成的数据每个通之间没有明显的时延差。以下是我的调用与配置
d = 0.035
mic_arch = [
[0, -3.5d, 0],
[0, -2.5
d, 0],
[0, -1.5d, 0],
[0, -0.5
d, 0],
[0, 0.5d, 0],
[0, 1.5
d, 0],
[0, 2.5d, 0],
[0, 3.5
d, 0],
]
rt60 = [0.4,0.4]
room_dim = [8,6,3]
array_pos = [[2,3,1.5]]
src2_pos = [[4,1,1.5]] # 45

array_pos = np.array(array_pos)
mic_pos = np.array(array_pos) + np.array(mic_arch)

rir, rir_direct = FRA_RIR_ADHOC(mic_arch,16000,rt60,room_dim,array_pos,src2_pos,mic_pos,num_src=1)
soundfile.write('/mnt/data/disk_1/rir/rir_1_45.wav', rir[:, 0, :].transpose(1, 0), 16000)
soundfile.write('/mnt/data/disk_1/rir/rir_45_1_direct.wav', rir_direct[:, 0, :].transpose(1, 0), 16000)

Probably a trivial bug in the code comment

In line 255 of FRAM-RIR.py: # [nmic, nsrc, 1 + sr*2]
sr may represent the sample rate, as the code in line 110: def FRA_RIR_ADHOC(mic_arch, sr=16000, rt60=None, room_dim=None,...
Should the size of dist and reflect_ratio be # [nmic, nsrc, 1 + image] after torch.cat()?
I'm confused with this and want to know which is true?

I print the size for verfication:
image

Is multi mic supported?

hi, I came across your paper on ArXiv, and your idea looks great. is multi mic supported? or, how can it be extended to muti mic (mic array simulation)? thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.