watson-developer-cloud / speech-to-text-nodejs Goto Github PK

View Code? Open in Web Editor NEW

1.1K 78.0 709.0 47.66 MB

:microphone: Sample Node.js Application for the IBM Watson Speech to Text Service

Home Page: https://speech-to-text-demo.ng.bluemix.net

License: Apache License 2.0

JavaScript 94.59% CSS 4.75% Dockerfile 0.44% Shell 0.22%

speech-to-text-nodejs's Introduction

🎤 Speech to Text Demo

Node.js sample applications that shows some of the the IBM Watson Speech to Text service features.

The Speech to Text service uses IBM's speech recognition capabilities to convert speech in multiple languages into text. The transcription of incoming audio is continuously sent back to the client with minimal delay, and it is corrected as more speech is heard. The service is accessed via a WebSocket interface; a REST HTTP interface is also available;

You can view a demo of this app.

Prerequisites

Sign up for an IBM Cloud account.
Download the IBM Cloud CLI.
Create an instance of the Speech to Text service and get your credentials:
- Go to the Speech to Text page in the IBM Cloud Catalog.
- Log in to your IBM Cloud account.
- Click Create.
- Click Show to view the service credentials.
- Copy the apikey value.
- Copy the url value.

Configuring the application

In the application folder, copy the .env.example file and create a file called .env
```
cp .env.example .env
```
Open the .env file and add the service credentials that you obtained in the previous step.

Example .env file that configures the apikey and url for a Speech to Text service instance hosted in the US East region:
```
SPEECH_TO_TEXT_IAM_APIKEY=X4rbi8vwZmKpXfowaS3GAsA7vdy17Qh7km5D6EzKLHL2
SPEECH_TO_TEXT_URL=https://api.us-east.speech-to-text.watson.cloud.ibm.com
```

Running locally

Install the dependencies
```
npm install
```
Run the application
```
npm start
```
View the application in a browser at localhost:3000

Deploying to IBM Cloud as a Cloud Foundry Application

Login to IBM Cloud with the IBM Cloud CLI
```
ibmcloud login
```
Target a Cloud Foundry organization and space.
```
ibmcloud target --cf
```
Edit the manifest.yml file. Change the name field to something unique. For example, - name: my-app-name.
Deploy the application
```
ibmcloud app push
```
View the application online at the app URL, for example: https://my-app-name.mybluemix.net

License

This sample code is licensed under Apache 2.0.

Contributing

See CONTRIBUTING.

Open Source @ IBM

Find more open source projects on the IBM Github Page

speech-to-text-nodejs's People

Contributors

Stargazers

Watchers

Forkers

aameek kesuskim mirolima vaibhav24 amreid coxauto vmorris imclab huinfolab bhaleraoaj divye cmuething bhargavbhegde7 opmiss josephdougs glavin001 leishmaniasis berendberendsen esbullington ilk2010 evandrozanatta sephrasquall josh-miller gskerry andymakhk zupo venkatesh-sivaraman clickibmhackathon suyanbing indravikas mahe84b newgen123 thisisved daniel-bolanos xingyupan rowed mistobaan vovietanh neil1145 ejirog rahul57bitm ofer43211 yaeldubinsky chrisemoulton abrichr rupayandas sumalla val314159 thomasshunter manojitballav alex-learn jeffreyterry fmacias64 ken-b4u molnfront andrenatal glarik ym0129 samsinite greatsalmon liangnet rd-alex-alex2006hw preethimohan1 shadiakiki1986 kasaby slexaxton sarvendrakumar cloudstdio iot-alex gabrielbastoos spekschoor jgujgu jsloyer szcom immutableltd jklmnop hoai yilmazerd manishravula informatics-lab mauricedw22 angelgarcia4 anujkmr jenicar gandooke thecyman jackmccarthy3 hovosgithub jrn102020 ppr10 shabbies vito-wang chayapan becomingsam kamalkashyap13 dougss10 nalinimsingh wxlu718 davidflorin szsen

speech-to-text-nodejs's Issues

build script is missing

The step npm run build in the README.md is failing because the script "build" doesn't exist in the package.json file. What should I execute in order to make the app run?

Play Sample doesn't work on Safari

Clicking on Play Sample 1 (or 2) on Safari on my Mac -- nothing happens. I get record audio isn't supported on Safari, but Play Sample should work.

reCaptcha option to reset rate-limit

The demo has fairly tight rate-limiting in place, so we'd like to have a reCaptcha option to let users easily un-limit themselves.

Using the librery

Hello,

I was wondering if the use of BlueMix is necessary to develop over the speech to text API.

If the answer is yes, is the free plan enough to use the API?. Hope some of you with the experience on development can help me.

Bye

Redeploy the demo

Hi, German&James, I've updated the navigation link, service icon and favicon. Please help redeploy the demo :-)

does stt allow me to use a raw audio file type

The "word alternative" section of the GUI renders blurry on the Retina displays.

Rendering on any HiDPI screen can potentially be affected.

It looks like the devicePixelRatio and the backingStoreRatio are not used correctly for the "word alternative" canvas.

BTW: the "keywords section" renders correctly so a solution is most likely available within the project already.

it dosen't work on mobilephone

The microphone dosen't work when I used ipad or andorid to test the demo.Could it be the browser's problem?We hope not.😱😱😱😱😱

strange audio capture behavior in firefox

In STT demo, pressing the microphone button to speak works fine the first time. But if we press it to stop recording, and then press to start again, the STT output is very bad. Looks like something wrong with audio capture the second time (and all subsequent attempts are bad too). This is not an issue in Chrome, there it seems to work fine every time.

Testing Chrome and Firefox side by side, if I start/stop recording in both Chrome and Firefox, this seems to make Firefox work better - seemingly suggesting Chrome may be "initializing" something in audio capture that Firefox is not?

developer experience checklist

make sure it scales, it could be in the front page of reddit
- Mainly looking at page weight: Shoot for 2-3mb max and rendering in 5 sec or less.
- Test with dev tools throttling to 3G speeds and make sure things are still reasonable.
Add google analytics
blue-green deployment + travis (see this)
testing + travis (see this)
security.js (helmet + express-rate-limitation) + CSRF (see personality-insights and speech demos)
package.json should not specify node-engine so that Bluemix will always use the latest one.
Google RE-Captcha support (make sure design take this into account when designing a demo)
- Talk to design to add it for existing demos.
~~~Update travis to send emails when there is a tag release.~~~ SDKs only
Bluemix deployment tracker and privacy notice

cannot get "word_alternatives" from the result by using java sdk

I want get "word_alternatives" from the result by using java sdk, but it does not work. Thus, i add some code to get "word_alternatives". May I add some code to java sdk. Looking forward to your reply.

[speech-to-text] Fails during executing - npm run build

I have downloaded and tried to "npm install" it's failing with message,
error: { [Error: not found: git] code: 'ENOGIT'}
Do I need to Git 2.8.1 installer for windows 7?

Below error in command line is given,

the app does not seem to work in mac when using firefox

Using the app in firefox from Mac does not seem to work for all users. I do not have a mac, @nfriedly @germanattanasio can you please take a look and see if it works for you? Apparently the issue gets fixed if restarting firefox... sorry I do not have more details

Dani

Merge livedemo with master and use an environment variable to configure rate-limiting and CSRF

Remove css_browser_selector.js

css_browser_selector.js is using a proprietary license. We need to remove it

how can i add again npm run build

I need do change and i need npm run build how can i add again?

Smart_formatting

Hi,
I would like to add smart formatting with speech to text service.Can anyone help me adding this feature?
I am new to blue-mix. So please help me with this.
Thank you

No Logs in the Console

I ran the app locally. It works, but I only see the very first console.log in my console ('listening to port'). After this no other logs are printed out.

When I run node inspector and open the URL in Chrome, nothing comes up. Not sure whether that's related to the first problem.

Is it just me? Any idea how to fix this? I looked briefly in the code but couldn't find where the logs could have been turned off. Maybe in one of the dependencies? Both issues make it really hard to extend the app.

WebSocket is already in CLOSING or CLOSED state error in console

Stopping a sample that is playing, or attempting to play a different sample, results in a console error.

[Speech to Text]:Not working in mobile

Hello,
I have deployed the app in bluemix. Now the url https://<...>.mybluemix.net is working perfectly in laptops/desktops. It also opens the initial page in mobile but the buttons for microphone, file open dialog etc. are appearing one below another instead of side by side. Import issue is buttons are not working when pressed.
Is there any other version is created for mobile app ?
If not then please let me know how to run successfully in mobile.

Watson Speech to Text Service Failed to Work on Safari 9.x

Hi Team,

I am working on a iOS web app. I found this node.js based SDK can't work on safari 9.x. After investigation, I've found the following issues:

Safari Const declarations are not supported in strict mode.
Websocket failed with "Invalid UTF-8 sequence in header value" when request or response contains any empty header, such as "Content-Type".

Per above two issues, I did some modification on the local JavaScript code

change "const" to "var"
as for the blank header in request for token, I add blank check before post it

if ($('meta[name="ct"]').attr('content').length > 0) {
    tokenRequest.setRequestHeader('csrf-token',$('meta[name="ct"]').attr('content'));
}

But for the blank header in response from Watson server, I can do nothing =(
Is there any way to fix this? Thanks!

Here's some reference for the second issue

No/wrong error message on Safari

I realize this isn't supported on Safari, but when trying demo on the Mac under Safari, when I click "Record Audio", nothing happens (no error message) and when I think click on something else (like Play Sample), I get "Currently audio is being record...".

The "Record Audio" should probably be grayed out if not supported

STT : main text window scroll to show most recent text

At the moment when text overflows in the main text window, users have to scroll down to see most recent text. Could we keep the text window always scrolled to bottom, showing most recently recognized text? Thanks, Vaibhava

microphone capture from W540 laptops

W540 laptops come with factory setting of microphone which completely breaks the STT service. Manually adjusting the microphone settings to disable all audio enhancements works, but people do not know about this. Can we control the microphone better from the demo application, and make sure all enhancements are off?

File upload menu stop running file

Need to ensure that if a file loaded via file upload menu is stopped prematurely, it doesn't try to send a socket message after the socket is closed (results in error message).

Update the sample to use the authorization service.

I've fixed authorization service in the nodejs wrapper. The Text to Speech service was updated to use it so we only need to update this sample based on tts.

select only spanish not work in chrome

http://181.135.63.86:3002/

In chrome:

in firefox is good:

how solve this?

Sample1 and Sample2 : transcription runs ahead of playback

This looks 'magical' - we transcribe even before hearing the audio - makes it look like the whole thing is canned -:). Can we pace the audio for our samples so that transcription follows the audio playback? This is only for 'sample 1' and 'sample 2' (for all 6 models), not for general file upload where we do not know the sampling rate.

[speech-to-text] 1006 Connection dropped by remote peer.

I'm testing files of roughly 75MB through the Websocket route.
After weeks of working consistently, today I am getting the message:
1006 Connection dropped by remote peer.
This is occurring immediately, with no result messages.

About to send ./in/1861034_xxx_lp.f4v-part000.flac
connect. config is {"maxReceivedFrameSize":1048576,"maxReceivedMessageSize":8388608,"fragmentOutgoingMessages":true,"fragmentationThreshold":16384,"webSocketVersion":13,"assembleFragments":true,"disableNagleAlgorithm":true,"closeTimeout":5000,"tlsOptions":{}}
1006 Connection dropped by remote peer.

Is this issue something to report here, or to another ticketing system for the S2T service itself?

Sign up link in README is wrong

Sign up link in the project README points to the old registration. Should point to https://console.ng.bluemix.net/registration/

Show the playback when the url contains a debug=true or 1

We should hide/show the playback button if the url contains a specific query parameter like debug or playback

This will be really useful for users trying to test their microphone. They will be able to see what are they sending to the service.

I would suggest to use debug since that will allow us to do more than just playback.

speech-to-text-demo.mybluemix.net?debug=true

@kasaby you have already implemented the playback functionality. Can you detect the query parameter and show/hide it?

Investigate out of memory crashes

Long running app can crash due to out of memory error. (With 768 MB).

Should investigate whether this is due to heavy simultaneous usage, a very large input, an ongoing memory leak, or something else.

The URL query argument(s) ['X-WDC-PL-OPT-OUT'] are not allowed.

For me this demo neither works when I build it myself nor when I try the official version hosted by you. I tried it with a Logitech USB headset [which correctly records sound in the OS and Firefox 42.0 shares it], but then I get the error from the title, but I have also tried both an .ogg file downloaded from the TTS demo (which btw works - the other demo, not using the file, same error) as well as finally using the files from the Python STT demo, which also fail with the same error. Finally, since there is "opt out" in the message I played with the option whether Watson may learn from this session. However, whether I allow it or not, this demo still breaks.

Any suggestions how to fix this?
Thanks in advance.

Best,
Joe

STT DEMO -- Play Sample 2, Text Box typos for French broadband model--missing apostrophe in d' opinion, d'expression

Re: STT DEMO -- Typos in Text box of French broadband model; see "Play Sample 2"

When testing STT demo for French broadband model, via this link --> https://speech-to-text-demo-june20th.mybluemix.net/, Keerthana and I sighted this issue.

There are a few typos in the "Text" tab of the demo page, for "Play Sample 2".

Expected:
i) d'opinion
ii) d'expression

Actual text:
i) d opinion
ii) d expression

See details below.
Please let me know if you have any questions or comments.

Alexandra and Keerthana

Complete text below:
Tout individu a droit à la liberté d opinion et d expression. Ce qui implique le droit de ne pas être inquiété pour ses opinions et celui de chercher de recevoir et de répandre sans considérations de frontières les informations et les idées par quelque moyen d expression que ce soit.

Also, here's the JSON,for your reference:

{
"results": [
{
"word_alternatives": [
{
"start_time": 4.06,
"alternatives": [
{
"confidence": 0.9998,
"word": "ce"
}
],
"end_time": 4.27
},
{
"start_time": 4.27,
"alternatives": [
{
"confidence": 0.9995,
"word": "qui"
}
],
"end_time": 4.46
},
{
"start_time": 4.46,
"alternatives": [
{
"confidence": 0.9959,
"word": "implique"
},
{
"confidence": 0.0036,
"word": "applique"
}
],
"end_time": 4.92
},
{
"start_time": 4.92,
"alternatives": [
{
"confidence": 1,
"word": "le"
}
],
"end_time": 5.08
},
{
"start_time": 5.08,
"alternatives": [
{
"confidence": 0.9998,
"word": "droit"
}
],
"end_time": 5.4
},
{
"start_time": 5.4,
"alternatives": [
{
"confidence": 0.9998,
"word": "de"
}
],
"end_time": 5.53
},
{
"start_time": 5.53,
"alternatives": [
{
"confidence": 1,
"word": "ne"
}
],
"end_time": 5.64
},
{
"start_time": 5.64,
"alternatives": [
{
"confidence": 0.9998,
"word": "pas"
}
],
"end_time": 5.95
},
{
"start_time": 5.95,
"alternatives": [
{
"confidence": 0.9998,
"word": "être"
}
],
"end_time": 6.21
},
{
"start_time": 6.21,
"alternatives": [
{
"confidence": 0.9999,
"word": "inquiété"
}
],
"end_time": 6.71
},
{
"start_time": 6.71,
"alternatives": [
{
"confidence": 1,
"word": "pour"
}
],
"end_time": 7.01
},
{
"start_time": 7.01,
"alternatives": [
{
"confidence": 1,
"word": "ses"
}
],
"end_time": 7.2
},
{
"start_time": 7.2,
"alternatives": [
{
"confidence": 1,
"word": "opinions"
}
],
"end_time": 7.75
},
{
"start_time": 7.75,
"alternatives": [
{
"confidence": 1,
"word": "et"
}
],
"end_time": 7.78
},
{
"start_time": 7.78,
"alternatives": [
{
"confidence": 1,
"word": "celui"
}
],
"end_time": 8.19
},
{
"start_time": 8.19,
"alternatives": [
{
"confidence": 0.9999,
"word": "de"
}
],
"end_time": 8.33
},
{
"start_time": 8.33,
"alternatives": [
{
"confidence": 1,
"word": "chercher"
}
],
"end_time": 9.05
},
{
"start_time": 9.42,
"alternatives": [
{
"confidence": 1,
"word": "de"
}
],
"end_time": 9.59
},
{
"start_time": 9.59,
"alternatives": [
{
"confidence": 1,
"word": "recevoir"
}
],
"end_time": 10.52
},
{
"start_time": 10.55,
"alternatives": [
{
"confidence": 0.9999,
"word": "et"
}
],
"end_time": 10.64
},
{
"start_time": 10.64,
"alternatives": [
{
"confidence": 1,
"word": "de"
}
],
"end_time": 10.8
},
{
"start_time": 10.8,
"alternatives": [
{
"confidence": 0.9999,
"word": "répandre"
}
],
"end_time": 11.49
},
{
"start_time": 11.49,
"alternatives": [
{
"confidence": 1,
"word": "sans"
}
],
"end_time": 11.76
},
{
"start_time": 11.76,
"alternatives": [
{
"confidence": 1,
"word": "considérations"
}
],
"end_time": 12.8
},
{
"start_time": 12.8,
"alternatives": [
{
"confidence": 0.9999,
"word": "de"
}
],
"end_time": 12.93
},
{
"start_time": 12.93,
"alternatives": [
{
"confidence": 0.9999,
"word": "frontières"
}
],
"end_time": 13.7
},
{
"start_time": 13.77,
"alternatives": [
{
"confidence": 0.9999,
"word": "les"
}
],
"end_time": 14.4
},
{
"start_time": 14.4,
"alternatives": [
{
"confidence": 0.9999,
"word": "informations"
}
],
"end_time": 15.15
},
{
"start_time": 15.15,
"alternatives": [
{
"confidence": 1,
"word": "et"
}
],
"end_time": 15.19
},
{
"start_time": 15.19,
"alternatives": [
{
"confidence": 1,
"word": "les"
}
],
"end_time": 15.39
},
{
"start_time": 15.39,
"alternatives": [
{
"confidence": 1,
"word": "idées"
}
],
"end_time": 15.71
},
{
"start_time": 15.71,
"alternatives": [
{
"confidence": 0.9999,
"word": "par"
}
],
"end_time": 16.02
},
{
"start_time": 16.02,
"alternatives": [
{
"confidence": 0.9998,
"word": "quelque"
}
],
"end_time": 16.42
},
{
"start_time": 16.42,
"alternatives": [
{
"confidence": 1,
"word": "moyen"
}
],
"end_time": 16.78
},
{
"start_time": 16.78,
"alternatives": [
{
"confidence": 0.9999,
"word": "d"
}
],
"end_time": 16.9
},
{
"start_time": 16.9,
"alternatives": [
{
"confidence": 0.9999,
"word": "expression"
}
],
"end_time": 17.48
},
{
"start_time": 17.48,
"alternatives": [
{
"confidence": 1,
"word": "que"
}
],
"end_time": 17.64
},
{
"start_time": 17.64,
"alternatives": [
{
"confidence": 0.997,
"word": "ce"
},
{
"confidence": 0.003,
"word": "se"
}
],
"end_time": 17.8
},
{
"start_time": 17.8,
"alternatives": [
{
"confidence": 0.9894,
"word": "soit"
},
{
"confidence": 0.0076,
"word": "soient"
}
],
"end_time": 18.05
}
],
"keywords_result": {
"frontières": [
{
"normalized_text": "frontières",
"start_time": 12.93,
"confidence": 1,
"end_time": 13.7
}
],
"idées": [
{
"normalized_text": "idées",
"start_time": 15.39,
"confidence": 1,
"end_time": 15.71
}
]
},
"alternatives": [
{
"word_confidence": [
[
"ce",
0.9449882800711019
],
[
"qui",
0.9889126786825012
],
[
"implique",
0.7893408949680603
],
[
"le",
1
],
[
"droit",
1
],
[
"de",
1
],
[
"ne",
1
],
[
"pas",
0.9999999999999686
],
[
"être",
0.9999999999999686
],
[
"inquiété",
0.9999999999998692
],
[
"pour",
0.999999999999869
],
[
"ses",
0.9999999999999156
],
[
"opinions",
0.9999999999998309
],
[
"et",
0.9999999999997383
],
[
"celui",
0.9999999999998693
],
[
"de",
0.9999999999998689
],
[
"chercher",
0.9999999999998691
],
[
"de",
0.9999999999999346
],
[
"recevoir",
0.9999999999999342
],
[
"et",
0.9999999999999345
],
[
"de",
0.9999999999999344
],
[
"répandre",
0.9999999999999345
],
[
"sans",
0.9999999999999348
],
[
"considérations",
0.9999999999998692
],
[
"de",
0.999999999999869
],
[
"frontières",
0.999999999999874
],
[
"les",
0.9999999999997897
],
[
"informations",
0.9999999999997999
],
[
"et",
0.9999999999998037
],
[
"les",
0.9999999999997448
],
[
"idées",
0.9999999999997871
],
[
"par",
0.9999999999998365
],
[
"quelque",
0.9999999999998694
],
[
"moyen",
0.9999999999999178
],
[
"d",
0.8446927233799434
],
[
"expression",
0.9999999999998421
],
[
"que",
0.9999999999998566
],
[
"ce",
0.9835743381046492
],
[
"soit",
0.9999999999998803
]
],
"confidence": 0.99,
"transcript": "ce qui implique le droit de ne pas être inquiété pour ses opinions et celui de chercher de recevoir et de répandre sans considérations de frontières les informations et les idées par quelque moyen d expression que ce soit ",
"timestamps": [
[
"ce",
4.06,
4.27
],
[
"qui",
4.27,
4.46
],
[
"implique",
4.46,
4.92
],
[
"le",
4.92,
5.08
],
[
"droit",
5.08,
5.4
],
[
"de",
5.4,
5.53
],
[
"ne",
5.53,
5.64
],
[
"pas",
5.64,
5.95
],
[
"être",
5.95,
6.21
],
[
"inquiété",
6.21,
6.71
],
[
"pour",
6.71,
7.01
],
[
"ses",
7.01,
7.2
],
[
"opinions",
7.2,
7.75
],
[
"et",
7.75,
7.78
],
[
"celui",
7.78,
8.19
],
[
"de",
8.19,
8.33
],
[
"chercher",
8.33,
9.05
],
[
"de",
9.42,
9.59
],
[
"recevoir",
9.59,
10.52
],
[
"et",
10.55,
10.64
],
[
"de",
10.64,
10.8
],
[
"répandre",
10.8,
11.49
],
[
"sans",
11.49,
11.76
],
[
"considérations",
11.76,
12.8
],
[
"de",
12.8,
12.93
],
[
"frontières",
12.93,
13.7
],
[
"les",
13.77,
14.4
],
[
"informations",
14.4,
15.15
],
[
"et",
15.15,
15.19
],
[
"les",
15.19,
15.39
],
[
"idées",
15.39,
15.71
],
[
"par",
15.71,
16.02
],
[
"quelque",
16.02,
16.42
],
[
"moyen",
16.42,
16.78
],
[
"d",
16.78,
16.9
],
[
"expression",
16.9,
17.48
],
[
"que",
17.48,
17.64
],
[
"ce",
17.64,
17.8
],
[
"soit",
17.8,
18.05
]
]
},
{
"transcript": "ce qui implique le droit de ne pas être inquiété pour ses opinions et celui de chercher de recevoir et de répandre sans considérations de frontières les informations et les idées par quelque moyen d expression que ce soient "
},
{
"transcript": "ce qui applique le droit de ne pas être inquiété pour ses opinions et celui de chercher de recevoir et de répandre sans considérations de frontières les informations et les idées par quelque moyen d expression que ce soit "
}
],
"final": true
}
],
"result_index": 1

Split front-end code into a library

First off, this is an awesome demo and you guys did a great job on it.

That said, the front-end code is fairly complex and I think that at least part of it probably belongs in a library that we stick in bower and/or npm and then just reference it here.

That would ease developing new application because it would be something that one could simply drop in and have a clear line of "what code I can reuse without thinking about it" vs "what code I need to write for myself".

doubts / questions / suggest / request

I have Dragon Naturally speaking 12 professional installed and i create a UNIVERSAL VOICE PROFILE, i have author rights about it, this is a recent video (sorry for spanish audio)

https://youtu.be/wvVl1eeMBRo?t=117

What is delay, why?, broadband?
Is possible have this offline?
What is accuracy
Is possible add new vocabulary
What is maximum of record in time
.
.
.

Tnks

A real-time captioning service is possible with this...

Recognized Text Formatting

First, thank you so much for your work.

I have a local node system all setup and working well on a Linux VM. I cant seem to figure out why the recognized text prior to final result is being displayed as one long string. Maybe cause im tired, but it's not clear to me. I would like each word regardless of final result to be displayed separately.

Currently pre final result it looks like this:

Thequickbrownfox

I would like it to display

The quick brown fox

Regardless of whether this gets corrected/changed.

Thanks!

Sample Audio File not working

See question on Dev Works: https://developer.ibm.com/answers/questions/205005/speech-to-text-demo-play-sample-audio-not-working.html

Steps Taken:

Click 'Deploy to Bluemix' Button in the README file
Click through to deploy application in my own bluemix space
Click route (http://speech-to-text-deploybutton.mybluemix.net/) then click any of the options (record audio, upload file, play sample 1, play sample 2) and there is no response.

There is most likely a step I'm forgetting here, but I appreciate any help you can provide.

speech to text display : simple formatting

Right now the text is treated as one long sentence. Can we use pause duration, and mark a sentence boundary at >1 sec pause?

MediaStream.stop() deprecated in Chrome

When running this sample on Chrome Version 45.0.2454.101 (64-bit) for Mac OS, I get the following message on the JavaScript console from file Microphone.js

'MediaStream.stop()' is deprecated and will be removed in M47, around November 2015. Please use 'MediaStream.active' instead.

models dropdown is not being populated

@kasaby can you update the demo to fallback to the models.json when the request to /v1/models fails.

Browser freezes on very large files

Possibly insert an async setTimeout(callback(data), 0) call to the utils.onFileProgress on data callback to resolve (needs testing).

app.js config wrong?

The app.js config doesn't work for me.
I had to change it to:

var config = { version: 'v1', url: 'https://stream.watsonplatform.net/speech-to-text/api', username: '<username>', password: '<password>' };

use the browser's resampler rather than writing your own in JS

Using the offlineaudiocontext for resampling as in https://stream.watsonplatform.net/speech-to-text/api/v1/websocketClient (specifically, in https://stream.watsonplatform.net/speech-to-text/api/v1/static/mic.js from August 1,2015) might be more efficient and less magical.

redeploy the demo with updated icon files

STT DEMO KWS -- Text normalization is needed for all models; all keywords are erroneously not spotted for all non-US English models

Re: STT DEMO KWS -- Text normalization is needed for all models; all keywords are erroneously not spotted for all non-US English models. Currently, all keywords are spotted for only US English models.

When testing STT demo, via this link --> https://speech-to-text-demo-june20th.mybluemix.net/, I sighted this issue.

For all models, (except US Engish broadband and narrowband models), when more than one keyword is entered with different casings (lowercase vs. uppercase) of the same keyword, (i.e., para, PARA) in the key word spotting box, erroneously, not all of these words are spotted.

For this example, where text is..
"PERO QUIERO PREGUNTARLE SI SOLO EXISTEN PRODUCTOS. PARA LA PIEL PARA TRATAR EL ACNE. OH HAY OTRA FORMA DE TRATARLO. "

Expected:

para (2)
Start: 5.89 End: 6.12 Confidence: 100.0%
Start: 6.77 End: 7.02 Confidence: 99.9%
PARA (2)
Start: 5.89 End: 6.12 Confidence: 100.0%
Start: 6.77 End: 7.02 Confidence: 99.9%

Actual:
para

PARA (2)
Start: 5.89 End: 6.12 Confidence: 100.0%
Start: 6.77 End: 7.02 Confidence: 99.9%

Please let me know if you have any questions or comments.

Alexandra

Speech to text recognize method

I am trying to create a restfull service in bluemix. which will call speech to text service but i want to send base64string or byte array instead of file. Could you please help me with that

Installation error in ubuntu 32 bits

I have ubuntu 14 with 2 gb of RAM but with 32 bit architecture, i have:

sudo nano /home/notroot/speech-to-text-nodejs/server.js

var port = process.env.VCAP_APP_PORT || 3000;
to
var port = process.env.VCAP_APP_PORT || 3002;

when

npm install

I have installed locally with

username: process.env.STT_USERNAME || '',
password: process.env.STT_PASSWORD || ''

I other instance i need to create other?

Extra and confusing .js files in projects

It appears the public/flash and public/js folders are not necessary and should be removed. The microphone code seems to have been migrated to src/Microphone.js and related files. Took awhile to determine those extra folders weren't used anymore.

Thanks