tencent-ailab / fra-rir Goto Github PK

License: Apache License 2.0

Python 100.00%

fra-rir's Introduction

FRAM-RIR

Python implementation for FRAM-RIR, a fast and plug-and-use multi-channel room impulse response simulation tool without the need of specific hardward acceleration platforms (e.g., GPUs).

Long paper

Interspeech'23 short paper

Update 2023/12/04

Add support for customizing microphone and source orientations
Fix issues in image sampling which may cause suboptimal performance

Dependencies

numpy
torch (tested for v1.13 & v2.0.0)
torchaudio (tested for v0.11)

Usage

rir, early_rir = FRAM_RIR(mic_pos, sr, T60, room_dim, src_pos, num_src=1, direct_range=(-6, 50), n_image=(512, 2049), src_pattern='omni', src_orientation_rad=None, mic_pattern='omni', mic_orientation_rad=None)

Parameters:

mic_pos: The microphone(s) position with respect to the room coordinates, with shape [num_mic, 3] (in meters). Room coordinate system must be defined in advance, with the constraint that the origin of the coordinate is on the floor (so positive z axis points up).
sr: RIR sampling rate (Hz).
rt60: RT60 (second).
room_dim: Room size with shape [3] (meters).
src_pos: The source(s) position with respect to the room coordinate system, with shape [num_src, 3] (meters).
num_src: Number of sources. Default: 1.
direct_range: 2-element tuple, range of early reflection time (milliseconds, defined as the context around the direct path signal) of RIRs. Default: (-6, 50).
n_image: 2-element tuple, minimum and maximum number of images to sample from. Default: (512, 2049).
src_pattern: Polar pattern for all of the sources. {"omni", "half_omni", "cardioid", "hyper_cardioid", "sub_cardioid", "bidirectional"}. Default: omni. See test_samples.py for examples.
src_orientation_rad: Array-like with shape [num_src, 2]. Orientation (rad) of all the sources, where the first column indicate azimuth and the second column indicate elevation, all calculated with respect to the room coordinate system. None (default) is only valid for omnidirectional patterns. For other patterns with src_orientation_rad=None, apply random source orientation.
mic_pattern: Polar pattern for all of the receivers. {"omni", "half_omni", "cardioid", "hyper_cardioid", "sub_cardioid", "bidirectional"}. Default: omni. See test_samples.py for examples.
mic_orientation_rad: Array-like with shape [num_mic, 2]. Orientation (rad) of all the microphones, where the first column indicate azimuth and the second column indicate elevation, all calculated with respect to the room coordinate system. None (default) is only valid for omnidirectional patterns. For other patterns with mic_orientation_rad=None, assume all microphone pointing up (positive z axis) to mimic the scenario where all microphones are put on a table.

Outputs:

rir: RIR filters for all mic-source pairs, with shape [num_mic, num_src, rir_length].
early_rir: Early reflection (direct path) RIR filters for all mic-source pairs, with shape [num_mic, num_src, rir_length].

Reference

If you use FRAM-RIR in your project, please consider citing the following papers.

@article{luo2023fast,
title={Fast Random Approximation of Multi-channel Room Impulse Response},
author={Luo, Yi and Gu, Rongzhi},
year={2023},
eprint={2304.08052},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

@inproceedings{luo2023fra,
title={{FRA}-{RIR}: Fast Random Approximation of the Image-source Method},
author= {Luo, Yi and Yu, Jianwei},
year=2023,
booktitle={Proc. Interspeech},
pages={3884--3888}
}

Disclaimer

This is not an officially supported Tencent product.

fra-rir's People

Contributors

Stargazers

Watchers

fra-rir's Issues

是否会考虑麦克风阵列所接收的信号的相位差的问题？

尊敬的luo，

您的工作对于快速RIR有着极大的帮助，在这之前进行模拟RIR通常需要大量的时间。

我有个疑问就是生成麦克风阵列（线性或圆阵）的RIR的时候，直接将音频和对应的RIR进行卷积的话，是否考虑了不同麦克风之间接收所信号的相位差的问题呢？

至此，
敬礼！

Something dismatching with paper in code .

In line 85/86 of[(https://github.com/tencent-ailab/FRA-RIR/blob/main/FRA-RIR.py)]
for i in range(nsource): rir[i][delta_idx[i]] += delta_decay[i]
and delta_idx is treated as an sample index, however, as the paper descreibe:

the number of distant virtual sound sources increases as their distance increases

the matrix delta_decay may have some different values but point to the same distance.
but the above code just keed the last delta_idx[i], i.e., if delta_idx[i]==delta_idx[j],
it just keep one of them. I'm confused with this and want to know is it true?

line 75

reflect_pertub = torch.FloatTensor(nsource, image).uniform_(a, b) * dist_ratio.pow(tau)

In paper dist_ratio denotes distance but here is dist/d_0

line 58> dist_range = [torch.linspace(1., velocity*T60/direct_dist[i]-1, image) for i in range(nsource)]
range should be [1, c_{0}T_{60}/d_{0}], i guess.

Integrating fast RIR in Lhotse

I came across your paper on ArXiv, and the method looks quite promising. I am interested in integrating this as a data transform in Lhotse (lhotse-speech/lhotse#787). However, my concern is that this code is distributed under a CC-by-NC license, whereas Lhotse provides an Apache license. Of course, there would be a significant rewrite in the integration itself, but the core implementation would be the same. I am not sure if this would be an issue?

Energy conservation is not satisfied

仿真生成的数据异常问题

仿真8麦克风线阵，间距35mm,采用生成的rir进行数据合成，发现合成的数据每个通之间没有明显的时延差。以下是我的调用与配置
d = 0.035
mic_arch = [
[0, -3.5d, 0],
[0, -2.5d, 0],
[0, -1.5d, 0],
[0, -0.5d, 0],
[0, 0.5d, 0],
[0, 1.5d, 0],
[0, 2.5d, 0],
[0, 3.5d, 0],
]
rt60 = [0.4,0.4]
room_dim = [8,6,3]
array_pos = [[2,3,1.5]]
src2_pos = [[4,1,1.5]] # 45

array_pos = np.array(array_pos)
mic_pos = np.array(array_pos) + np.array(mic_arch)

rir, rir_direct = FRA_RIR_ADHOC(mic_arch,16000,rt60,room_dim,array_pos,src2_pos,mic_pos,num_src=1)
soundfile.write('/mnt/data/disk_1/rir/rir_1_45.wav', rir[:, 0, :].transpose(1, 0), 16000)
soundfile.write('/mnt/data/disk_1/rir/rir_45_1_direct.wav', rir_direct[:, 0, :].transpose(1, 0), 16000)

Probably a trivial bug in the code comment

In line 255 of FRAM-RIR.py: # [nmic, nsrc, 1 + sr*2]
sr may represent the sample rate, as the code in line 110: def FRA_RIR_ADHOC(mic_arch, sr=16000, rt60=None, room_dim=None,...
Should the size of dist and reflect_ratio be # [nmic, nsrc, 1 + image] after torch.cat()?
I'm confused with this and want to know which is true?

I print the size for verfication:

Is multi mic supported?

hi, I came across your paper on ArXiv, and your idea looks great. is multi mic supported? or, how can it be extended to muti mic (mic array simulation)? thank you in advance.