microsoft / cognitive-speakerrecognition-python Goto Github PK

Python SDK for the Microsoft Speaker Recognition API, part of Cognitive Services

Home Page: https://www.microsoft.com/cognitive-services/en-us/speaker-recognition-api

License: Other

Python 100.00%

cognitive-speakerrecognition-python's Introduction

Microsoft Speaker Recognition API: Python Sample

This repo contains Python samples (using Python 3) to demonstrate the use of Microsoft Speaker Recognition API, an offering within Microsoft Cognitive Services, formerly known as Project Oxford.

Run the Sample

First, you must obtain a free Speaker Recognition API subscription key by following the instructions on our website.

To use this sample application, there are four different scenarios:

Create a user profile: python Identification\CreateProfile.py <subscription_key>
Print all user profiles: python Identification\PrintAllProfiles.py <subscription_key>
Enroll user profiles: python Identification\EnrollProfile.py <subscription_key> <profile_id> <enrollment_file_path>
Identify test files: python Identification\IdentifyFile.py <subscription_key> <identification_file_path> <profile_ids>...

Contributing

We welcome contributions. Feel free to file issues and pull requests on the repo and we'll address them as we can. Learn more about how you can help on our Contribution Rules & Guidelines.

You can reach out to us anytime with questions and suggestions using our communities below:

Support questions: StackOverflow
Feedback & feature requests: Cognitive Services UserVoice Forum

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

All Microsoft Cognitive Services SDKs and samples are licensed with the MIT License. For more details, see LICENSE.

Sample images are licensed separately, please refer to LICENSE-IMAGE.

cognitive-speakerrecognition-python's People

Contributors

Stargazers

Watchers

Forkers

pk-karthik pkdevboxy qianyang81 kingfener yara11 afrozhussain mrgoogol mramu111 ashwanidv100 wfnuser andresjpico-zz lccxx satoshirobatofujimoto donexie alphaprimex spencerx shenyunhui mathurabhinav shashank545 shantanud bibhutibhusan89 xizeroplus sudhashuallurwar plus-vision leebingqi audiobucket breeef saber5433 pranavna marcinwal zyzisyz berlingrad hahahahahage pranaysaha97 djordjeglbvc yingmuying ming0818 rohitmaggu kirinc frankieandwinne prathamesh60 glauber26 dcmr sagnik20 bhaskers-blu-org2 antonizhubar taffywrinkle pymia claudiusgonzo kbitc flavio58it harshad-patil-git divyansht95 standardgalactic cuchulainx bigdatasciencegroup kawaiim

cognitive-speakerrecognition-python's Issues

Error enrolling

When running the enrollProfile.py I'm getting the following error.

pi@raspberrypi:~/MS_spkr_recog $ python3 Identification/EnrollProfile.py ----------------------------------------- David "/home/pi/MS_spkr_recog/spkr_profile" true
ERROR:root:Error enrolling profile.
Traceback (most recent call last):
File "Identification/EnrollProfile.py", line 70, in
enroll_profile(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
File "Identification/EnrollProfile.py", line 51, in enroll_profile
force_short_audio.lower() == "true")
File "/home/pi/MS_spkr_recog/Identification/IdentificationServiceHttpClientHelper.py", line 226, in enroll_profile
with open(file_path, 'rb') as body:
IsADirectoryError: [Errno 21] Is a directory: '/home/pi/MS_spkr_recog/spkr_profile'

Is the error due to the way the command line arguments are written, or is it something else. the dashes in the command line represent my subscription key.

Thank you,

David Stanley

Guid should contain 32 digits with 4 dashes

When I try to run the identify_file function from the python file IdentifyFile.py in the Identification folder I get the following error...
Exception: Error identifying file: { "error": { "code": "BadRequest", "message": "Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)." } }
I can successfully run the function through terminal, but when I try to call the function with pre filled arguments I get the error above. This is what it looks like when I call for the function in a python file.

identify_file(subscriptionKey,audioFilePath,True, 'ea262cdd-2402-4418-b698-0272e8873dc3')

I thought maybe I had to convert it to a "Guid", which I did but then I just got this error..... (.-.)
File "IdentifyFile.py", line 43, in identify_file str(force_short_audio).lower() == "true") File "/Users/txt-19/Desktop/duckAi-py/Identification/IdentificationServiceHttpClientHelper.py", line 265, in identify_file if len(test_profile_ids) < 1: TypeError: object of type 'UUID' has no len()

Ruby API returns empty headers, no way to track operations

Hello, MS team. Sorry for leaving Ruby related issue in Python repo, but looks like Ruby repo is not available on github. I'm able to create profiles and make enroll request successfully, but it returns empty strings in header instead of id of operation. So that I'm not able to track current status of it. I know, that call was ok, since Get All Profiles method returns recently created profiles with Enrolled status. The same issue is with Identification method (https://dev.projectoxford.ai/docs/services/563309b6778daf02acc0a508/operations/5645c523778daf217c292592). I do request and receive just empty headers. Could your team fix that for Ruby? thanks. Or also, just as suggestion, would be useful to have /operations GET api to return a list of current operations for subscription. Is that possible?

Error: message Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

When I try to run Identifyfile i get this error.
When I try to run the code on an enrolled profile I get the above error.

Here is the code I used
import IdentificationServiceHttpClientHelper
import sys

def identify_file(subscription_key, file_path, force_short_audio, profile_ids):
"""Identify an audio file on the server.

Arguments:
subscription_key -- the subscription key string
file_path -- the audio file path for identification
profile_ids -- an array of test profile IDs strings
force_short_audio -- waive the recommended minimum audio limit needed for enrollment
"""
helper = IdentificationServiceHttpClientHelper.IdentificationServiceHttpClientHelper(
    subscription_key)

identification_response = helper.identify_file(
    file_path, profile_ids,
    force_short_audio.lower() == "true")

print('Identified Speaker = {0}'.format(identification_response.get_identified_profile_id()))
print('Confidence = {0}'.format(identification_response.get_confidence()))

if name == "main":
# if len(sys.argv) < 5:
# print('Usage: python IdentifyFile.py <subscription_key> <identification_file_path>'
# ' <profile_ids>...')
# print('\t<subscription_key> is the subscription key for the service')
# print('\t<identification_file_path> is the audio file path for identification')
# print('\t<force_short_audio> True/False waives the recommended minimum audio limit needed '
# 'for enrollment')
# print('\t<profile_ids> the profile IDs for the profiles to identify the audio from.')
# sys.exit('Error: Incorrect Usage.')

identify_file('702227e**************', 'ironman/fan.wav', 'True', '6f3d4f1b-3b70-439d-8c78-2abb15fbc791')

Opearation Error: SpeakerInvalid

ERROR:root:Error polling the operation status.
ERROR:root:Error identifying file.
Traceback (most recent call last):
  File "data_processor.py", line 239, in <module>
    azure_analysis(se_times, fs1, guid, profiles, speaker_profile)
  File "data_processor.py", line 174, in azure_analysis
    resp_id = identify_file(AZURE_KEY, output_name, 'true', guid)
  File "somepath/CSRP/Identification/IdentifyFile.py", line 50, in identify_file
    force_short_audio.lower() == "true")
  File "somepath/CSRP/Identification/IdentificationServiceHttpClientHelper.py", line 288, in identify_file
    self._poll_operation(operation_url))
  File "somepath/CSRP/Identification/IdentificationServiceHttpClientHelper.py", line 327, in _poll_operation
    operation_response[self._OPERATION_MESSAGE_FIELD_NAME])
Exception: Operation Error: SpeakerInvalid

I am getting the above error when I submit an identify_file request to the server.

I think I am seeing the following case:

Case 4 - failed
HTTP/1.1 200 Ok
Content-Type: application/json
{
  "status": "failed",
  "createdDateTime":  "2015-09-30T01:28:23Z"
  "lastActionDateTime": "2015-09-30T01:35:23Z"
  "message":  "Some failure info"
}

Can I get some more detailed reasons on what the SpeakerInvalid error message means and what are some things I can do to make the file I upload go through?

mic/streaming

is there anyway to use the microphone to identify the enrolled speakers?
so the audio input as a stream?

thanks!

No module named http.client

I get an error saying No module named http.client Sorry not really experienced with python, where can I get the http.client module?

Full error after running the command "python CreateProfile.py (my subscription key)"
Traceback (most recent call last): File "CreateProfile.py", line 33, in <module> import IdentificationServiceHttpClientHelper File "/Users/Nabil/Desktop/Cognitive-SpeakerRecognition-Python/Identification/IdentificationServiceHttpClientHelper.py", line 33, in <module> import http.client ImportError: No module named http.client

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

AN EASIER WAY TO GET STARTED

For anyone in the future who's gonna try this repo, let me give you an easy way out. I spent a lot of time on Speaker Recognition and the official docs say something and the samples given do something else.
I will use the official REST API Docs for the text-independent method.

FOR ONCE AND FOR ALL, IF YOU WANT TO USE PYTHON, THIS IS THE EASIEST AND BEST WAY I FOUND :

CREATE PROFILE 👇

import http.client
import json

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY',
    'Content-type': 'application/json'}

foo = {'locale': 'en-us'}
json_data = json.dumps(foo)

conn.request(method='POST', url='/speaker/verification/v2.0/text-independent/profiles', 
             body=json_data, headers=headers)
response = conn.getresponse().read().decode()
conn.close()

print(response)

ENROLL PROFILE 👇

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY'}

with open('YOUR_WAV_FILE', 'rb') as data:
    conn.request(method='POST', 
                url=f'/speaker/verification/v2.0/text-independent/profiles/{profileId}/enrollments?ignoreMinLength=true', 
                body=data, headers=headers)
    response = conn.getresponse().read().decode()
    conn.close()
    print(response)

LIST PROFILES 👇

conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')

headers = {'Ocp-Apim-Subscription-Key': 'YOUR_KEY'}

conn.request(method='GET', url='/speaker/verification/v2.0/text-independent/profiles?$top=10', 
             headers=headers)
response = conn.getresponse().read().decode()
conn.close()
print(response)

And you can now make your own calls using this method and following the docs link I mentioned above.

Cheers and Good Luck ! 🍻

Getting Bad request error for enrolment - Speaker Recognition

From @rajagopal28 on May 17, 2016 8:4

Hi,
I've been trying to enroll a voice file for a created profile using the python API.
I was able to create a profile and list all profiles successfully. But when I try to enroll a voice (.wav) file with a simple hello world phrase with the created profile, I get the error 'ERROR:root:Error enrolling profile.' which in the trace tells 'Exception: Error enrolling profile: Bad Request'. If needed I can attach the stack trace. Can you help me getting started with this?

Copied from original issue: microsoft/ProjectOxford-ClientSDK#66

How to change the predefined phrases ?

Hi,
I'vs been using the speaker recognition API. But for enrolling we have to say some predefined phrases
but what if I don't want to enroll the voice by saying the predefined phrases. What if I want to say "Abra Kadavra". Can anyone please help we with this?

Please create a simple Speaker identification example. The documentation as well as the github are unorganized and confusing.

DEPENDENCIES LIST ?

My issue is quite simple :

WHERE IS THE LIST OF DEPENDENCIES THAT WE NEED TO RUN THIS ?

Please provide a requirements.txt so that we can at least use this demo. The project in it's current state requires me to debug lots of dependency issues before I can actually run any script.

I expect better from a company like Microsoft.