Coder Social home page Coder Social logo

Comments (3)

gustavostz avatar gustavostz commented on July 20, 2024

Hey @duwenlong2 !

Thanks for bringing this to our attention. I've looked into the errors you've encountered, and here's what I've found:

  1. PortAudioError: Error querying device -1: This error usually indicates a problem with the audio input device, such as the microphone not being detected or an issue with the audio drivers. I'd recommend checking if your microphone is properly connected and recognized by your system. You might also want to try updating your audio drivers or testing with a different microphone to see if that resolves the issue.

  2. ValueError: need at least one array to concatenate: This error occurs when there's no audio data captured, likely due to the previous error. I've made a modification to the code to handle this situation more gracefully. Now, if no audio data is recorded, the application will print "No audio data recorded. Please check your audio input device." instead of throwing an error. This should help in diagnosing if the issue is related to the input device.

Please pull the latest changes from the repository and give it another try. If you see the message "No audio data recorded. Please check your audio input device.", it's likely an issue with your input device. Let me know if you have any further questions or if the issue persists.

from whisper-clip.

duwenlong2 avatar duwenlong2 commented on July 20, 2024

Thank you.
After being reminded to check my microphone, there is indeed a problem. I used an external microphone,
You can click the button now. I have debugged the following and cannot create an audio file.

I found that an audio file was passed into the model.
I want to implement the following functions:
I am speaking in continuous voice, and then the Whisper can help me provide real-time feedback on the text. When I finish a paragraph, he can combine the previous continuous voice feedback to give me a complete sentence, which is used to correct the error recognition problem of continuous real-time voice.
But currently, I see that this interface can only be submitted once. So my idea is to divide the voice into audio files of about 2 seconds, submit them in segments, and wait for the user to be silent before merging the previous voice. One submission is used to correct the previous voice. Before the correction, all segmented speech will create an ID for storage. Change the ID in the next sentence.
However, due to my poor Python writing ability, I am skilled in C #. I want to use this Python library as part of the SDK and expose the interface to other external voice calls. So I currently cannot assess the amount of work involved. I want to develop this feature into a program that can be deployed on local computers and open source it. Can you roughly help evaluate the following workload? If the workload is not particularly huge, I want to learn Python and try to achieve this goal. If the workload is huge, I give up this idea.

from whisper-clip.

gustavostz avatar gustavostz commented on July 20, 2024

Great! I'm glad the issue with the microphone has been resolved. I'll go ahead and close this issue since it's been solved.

Regarding your idea/project, it seems doable. If your goal is to implement real-time transcription, you can achieve this by sending audio segments to Whisper for processing and then concatenating the results. However, I'm not entirely sure about the need to break the audio into short segments for accuracy purposes (the "error recognition problem of continuous real-time voice" you mentioned). Whisper is capable of handling long audio files, and breaking them down into smaller segments might not necessarily improve accuracy. In fact, it could potentially reduce accuracy, as Whisper performs better with longer context (based on my experience, though I'm not entirely sure).

As for using C# or Python, it's possible to expose the API for use with other languages. However, working directly in Python might be more efficient, as it is the native language for Whisper, making it easier to configure and work with. That said, you can certainly use any language you prefer for your project. It will take some effort, especially if you take the approach of exposing an API, but I don't see it as a "big overload."

By the way, I found this project (https://github.com/sandrohanea/whisper.net) that runs Whisper in .NET, which might already do what you desire to connect to C#.

Overall, I believe your idea of real-time transcription using segmented audio is feasible, but I'm still not entirely clear about the "continuous voice feedback" and the "error recognition problem."

If you'd like to discuss more about your idea/project, please create a section in the discussion section of the repository, as I'm closing this issue since it's been solved.

from whisper-clip.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.