flutydeer / audio-slicer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from openvpi/audio-slicer

1.2K 6.0 160.0 501 KB

A simple GUI application that slices audio with silence detection

License: MIT License

Python 97.57% PowerShell 2.43%

pyside6 audio-processing gui qt6

audio-slicer's Introduction

Audio Slicer

A simple GUI application that slices audio with silence detection.

中文文档

Screenshots

The app also has a light theme.

Usage

Windows

Download and extract the latest release here.
Run "slicer-gui.exe".

MacOS & Linux

Clone the repository.
Run the following command to install requirements:

pip install -r requirements.txt

Run the following command to launch GUI:

python slicer-gui.py

Just simply add your audio files to the task list by clicking the "Add Audio Files..." button or dragging and drop them to the window, click the "Start" button and wait for it to finish. The progress bar cannot indicate the progress of individual tasks, so it keeps 0% until finished when there is only 1 task in the task list.

Algorithm

Silence detection

This application uses RMS (root mean score) to measure the quiteness of the audio and detect silent parts. RMS values of each frame (frame length set as hop size) are calculated and all frames with an RMS below the threshold will be regarded as silent frames.

Audio slicing

Once the valid (sound) part reached min length since last slice and a silent part longer than min interval are detected, the audio will be sliced apart from the frame(s) with the lowest RMS value within the silent area. Long silence parts may be deleted.

Parameters

Threshold

The RMS threshold presented in dB. Areas where all RMS values are below this threshold will be regarded as silence. Increase this value if your audio is noisy. Defaults to -40.

Minimum Length

The minimum length required for each sliced audio clip, presented in milliseconds. Defaults to 5000.

Minimum Interval

The minimum length for a silence part to be sliced, presented in milliseconds. Set this value smaller if your audio contains only short breaks. The smaller this value is, the more sliced audio clips this application is likely to generate. Note that this value must be smaller than min_length and larger than hop_size. Defaults to 300.

Hop Size

Length of each RMS frame, presented in milliseconds. Increasing this value will increase the precision of slicing, but will slow down the process. Defaults to 10.

Maximum Silence Length

The maximum silence length kept around the sliced audio, presented in milliseconds. Adjust this value according to your needs. Note that setting this value does not mean that silence parts in the sliced audio have exactly the given length. The algorithm will search for the best position to slice, as described above. Defaults to 1000.

Performance

This application runs over 400x faster than real-time on an Intel i7 8750H CPU. Speed may vary according to your CPU and your disk.

audio-slicer's People

Contributors

Stargazers

Watchers

Forkers

sinestriker jobsecond linuslieu aylitat leeseean atestteam hel5ing yagamihikari sandeeptete2020 chunping-xt gutaoyong cameronlares chocologication xiaoqingwang sherry-0218 funny111tw tgougout radflame burnermkali sovietvsallies mzy0802 csantos1209 kevan1700 taichuai cybersys yr666 kk-chat matou9 syuuzzz coding-alt skyjiang yc-weichao mls2009 6zhou66 zzmjohn fsyl22 petercao fishjar zhuyandong knight174 china-ai-research eriche56 ppkliu zweekknndd junhaohuang0615 snowleo819 elviskent james-bond-007 panhiuchuen lasx woodzofficial rainflashpoint yujichen219 tjxj ttwzone 0xvivi wangsijun1986 zoroaster0 vlinr jjandnn natsumeasako pauloboaventura romantices fycvv tythonlee qibiu aofenghanyue ah-ty 1017047882 jump1008 autotriggerrobot ghlee3401 guaju harleywang david20080125 rawsung r-gc shadowkun liucr qw4654134 csjxsw path-a zaferclk35 2706147375 sss011207 casix0 maryland1008 nezzy0000 mustangcoder vidalcon cpmango zswlw123 chuyuhang123 serdarakbb rampagepeter freds0 roxy231 fyphen1223 ychbx-jpg popcorn724

audio-slicer's Issues

mac下安装总是报错

pywin32==306，环境要求有的要求3.9 有的要求小于3.9，死活安装不成功

mp3 support

ModuleNotFoundError: No module named 'qdarktheme'

Hi, i installed requirements but i am getting an error on my Macbook Pro M1. Can you help please?

No module named qdarktheme [OSX]

Traceback (most recent call last): File "/Users/alvarom2/Documents/RVC/AudioslicerGUI/audio-slicer/slicer-gui.py", line 4, in <module> import qdarktheme ModuleNotFoundError: No module named 'qdarktheme'

Cloned the repo and did run requirements installation but when launching it show's this error...
how to fix this?

Unhandled exception in script

I get this error when trying to run slicer-gui.exe:

Unhandled exception in script
Failed to execute script 'slicer-gui' due to unhandled exception:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed."

I tried uninstalling and re-installing NumPy, and it did not fix the error.

can i obtain each segment's timestamps?

Thank you for your great work. However, considering the unique nature of the data I'm working with, the automated output isn't precisely tailored to my needs. I'm interested in learning how to extract timestamps from the processed audio segments so that I can manually refine and merge them as necessary. Could you please provide guidance on how to achieve this? Your help is greatly appreciated!

无法正常使用

[Proposal] Generate mapping file of slices and original audio files

Summary

As a video editor, I want to slice videos based on the audio part. I need to remove silence parts from my live playbacks. If this tool can produce a file that maps sliced audios to time spans of the original audio, I can use the mapping file to slice my videos to remove silence parts automatically with video processing tools (like ffmpeg). I can also use speech to text tools to filter audio slices, then filter spans of my videos based on the mapping file.

File format

The following file maps time spans of 2 original audios to 5 audio slices. The output path of the mapping file needs to be specified from GUI before slicing.

{
  "outputFolder": "C:\\Users\\UserName\\Videos\\PlaybackSlices",
  "tasks": [
    {
      "originalFile": "C:\\Users\\UserName\\Videos\\Playback202405030234.wav",
      "slices": [
        {
          "start": 0,
          "end": 1780,
          "file": "Playback202405030234_0.wav"
        },
        {
          "start": 1780,
          "end": 2460,
          "file": "Playback202405030234_1.wav"
        }
      ]
    },
    {
      "originalFile": "C:\\Users\\UserName\\Videos\\Playback202405040330.wav",
      "slices": [
        {
          "start": 0,
          "end": 2790,
          "file": "Playback202405040330_0.wav"
        },
        {
          "start": 3150,
          "end": 9460,
          "file": "Playback202405040330_1.wav"
        },
        {
          "start": 12800,
          "end": 14690,
          "file": "Playback202405040330_2.wav"
        }
      ]
    }
  ]
}

Note

I'm not sure whether I can implement this feature by myself. Because I'm new to Python. If I managed to implement this feature, I'll open a pull request.

Unable to use

Can't the Windows version run anymore？

请问能否增加一个批量操作的命令

我有许多目录需要进行处理，担心手动操作会有失误。
我看了下slicer.py 和slicer2.py，好像都是对单个文件进行操作的。
于是我写了个linux命令，
find . -type f -name "*.wav" -exec python slicer.py {} \; -exec rm -f {} \;
这个命令可以处理，并删除旧文件。
但是执行效率非常低，用gui 1秒就能处理200多个文件，在命令行5秒只能处理1个文件。
不知道是因为wsl效率低还是python的问题。

issue when running python slicer-gui.py

i got this after running python slicer-gui.py :
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: minimalegl, offscreen, xcb, wayland, eglfs, wayland-egl, minimal, vnc, linuxfb, vkkhrdisplay.

Aborted (core dumped)

How to fix this?