I followed the readme steps before proceeding with the operation. After I finish e

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Unable to run normally after clicking the button about whisper-clip HOT 3 CLOSED

duwenlong2 commented on July 20, 2024

Unable to run normally after clicking the button

from whisper-clip.

Comments (3)

gustavostz commented on July 20, 2024

Hey @duwenlong2 !

Thanks for bringing this to our attention. I've looked into the errors you've encountered, and here's what I've found:

PortAudioError: Error querying device -1: This error usually indicates a problem with the audio input device, such as the microphone not being detected or an issue with the audio drivers. I'd recommend checking if your microphone is properly connected and recognized by your system. You might also want to try updating your audio drivers or testing with a different microphone to see if that resolves the issue.
ValueError: need at least one array to concatenate: This error occurs when there's no audio data captured, likely due to the previous error. I've made a modification to the code to handle this situation more gracefully. Now, if no audio data is recorded, the application will print "No audio data recorded. Please check your audio input device." instead of throwing an error. This should help in diagnosing if the issue is related to the input device.

Please pull the latest changes from the repository and give it another try. If you see the message "No audio data recorded. Please check your audio input device.", it's likely an issue with your input device. Let me know if you have any further questions or if the issue persists.

from whisper-clip.

duwenlong2 commented on July 20, 2024

Thank you.
After being reminded to check my microphone, there is indeed a problem. I used an external microphone,
You can click the button now. I have debugged the following and cannot create an audio file.

I found that an audio file was passed into the model.
I want to implement the following functions:
I am speaking in continuous voice, and then the Whisper can help me provide real-time feedback on the text. When I finish a paragraph, he can combine the previous continuous voice feedback to give me a complete sentence, which is used to correct the error recognition problem of continuous real-time voice.
But currently, I see that this interface can only be submitted once. So my idea is to divide the voice into audio files of about 2 seconds, submit them in segments, and wait for the user to be silent before merging the previous voice. One submission is used to correct the previous voice. Before the correction, all segmented speech will create an ID for storage. Change the ID in the next sentence.
However, due to my poor Python writing ability, I am skilled in C #. I want to use this Python library as part of the SDK and expose the interface to other external voice calls. So I currently cannot assess the amount of work involved. I want to develop this feature into a program that can be deployed on local computers and open source it. Can you roughly help evaluate the following workload? If the workload is not particularly huge, I want to learn Python and try to achieve this goal. If the workload is huge, I give up this idea.

from whisper-clip.

gustavostz commented on July 20, 2024

Great! I'm glad the issue with the microphone has been resolved. I'll go ahead and close this issue since it's been solved.

Regarding your idea/project, it seems doable. If your goal is to implement real-time transcription, you can achieve this by sending audio segments to Whisper for processing and then concatenating the results. However, I'm not entirely sure about the need to break the audio into short segments for accuracy purposes (the "error recognition problem of continuous real-time voice" you mentioned). Whisper is capable of handling long audio files, and breaking them down into smaller segments might not necessarily improve accuracy. In fact, it could potentially reduce accuracy, as Whisper performs better with longer context (based on my experience, though I'm not entirely sure).

As for using C# or Python, it's possible to expose the API for use with other languages. However, working directly in Python might be more efficient, as it is the native language for Whisper, making it easier to configure and work with. That said, you can certainly use any language you prefer for your project. It will take some effort, especially if you take the approach of exposing an API, but I don't see it as a "big overload."

By the way, I found this project (https://github.com/sandrohanea/whisper.net) that runs Whisper in .NET, which might already do what you desire to connect to C#.

Overall, I believe your idea of real-time transcription using segmented audio is feasible, but I'm still not entirely clear about the "continuous voice feedback" and the "error recognition problem."

If you'd like to discuss more about your idea/project, please create a section in the discussion section of the repository, as I'm closing this issue since it's been solved.

from whisper-clip.

Unable to run normally after clicking the button about whisper-clip HOT 3 CLOSED

Comments (3)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent