mozilla-extensions / firefox-voice Goto Github PK

View Code? Open in Web Editor NEW

286.0 286.0 122.0 110.77 MB

Firefox Voice is an experiment in a voice-controlled web user agent

License: Mozilla Public License 2.0

Shell 0.44% CSS 13.61% JavaScript 70.82% HTML 2.02% Python 0.51% Kotlin 12.09% SCSS 0.51%

firefox-voice's People

Contributors

Stargazers

Watchers

Forkers

harraton pdehaan patil2099 andrenatal lozinska khushilmistry miggs125 vrajjmehta jcsteh maloney8 maitrella forksbot jcambre shubhamchinda vanekcsi espertus kathyreid sudostatus200 brierjon abahmed tawawhite srinath-c oliviaruyinzhang michael-mml janvimahajan14 manasa2850 ettoolong yuanyu90221 keenwarrior shulammite-aso gangachatrvedi mohinderps reuben naima-shk ajinkabeer noi5e maverick-27 dave-ok saumyasinghal danielamormocea melvin2016 aemiej wolfitis prastutiupadhaya shreyajain25 jm-mendez xlisachan lilylme fleur101 arushivii timothyrlamora734 zuhairhassan farhatsharifh abrilanchondo akanksha1212 tonynguyen111997 muskangupta-iitr nuraynab amandhamija98 shreyaa-s-zz shaymaa91 sav22999 texas86 cscd01 liumichael noelymarques sabina-rohman pascalulor techboynagar 1756816642 lelouchb vandnakapoor19 devenlu simpcyclassy ikkyodufade annlinros gavinv albaalah msgpo awuorm rohanharikr brianrhea ra2003 saurabhnarhe dragomirstefanacatalina vicentejsp mvortizr pankajtanwarbanna ottah sirinartk ishakikani9117 mdlglobal-atlassian-net veecee424 surfndez lazybrain-research abhivaidya95 widlok snehal199 saranshbarua gwe-n

firefox-voice's Issues

Create log viewer

I added a logging system, but the log messages are very ad hoc. I think it might be useful to have a general log viewer, that shows us both incidental information that's logged, and some essential information like text input, parsed intents, error messages, etc.

This doesn't have any product value, but I think I may find it useful if only for my own work.

Replace starting chime sound

We should decide on another chime sound for when you start audio (replacing https://jcambre.github.io/vf/mic_open_chime.ogg)

Register intents instead of hardcoding them

Right now we have a bunch case statements around intents. Instead intents should register themselves. Ideally this would include any regexes, sample statements, and the handlers.

Card rendering needs better design

Seems the max height for a popup is 600px, so we need to work within those constraints even when a card has lots of text, an image, etc. CC @awallin

Fix text input styling

Right now, the text input field is overlapping with the header. Also need to fix the submit button so it shows proper behavior on hover

Add text summarization intents

We could support intents based on text summaries of an article:

Read me a summary
Copy a summary of this

Presumably we'd just use some text summarization service

Create acceptance criteria for intent parsing

We need some tests we can run against our different intent parsing approaches (regex, Snips, remote, etc):

List of utterances
We should attempt to do STT on these utterances to look for frequent incorrect transcriptions
Mapping of utterance to intent name and slots

[browser command] Add pin/unpin tab intents

We could add an intent to pin tabs:

Pin/unpin this tab

We could potentially support other tab movement intents:

Make this tab first/last

The action implementation is trivial.

Fix auto-narrate in reader mode

Right now the read intent handler tries to enter reader mode and start narration, but we've had a problem actually getting it to start. Maybe something in https://github.com/mozilla/firefox-narrate-experiment would help.

Create label system

We might want a bunch of standard ones (e.g., a few of these – but not all!) We should decide how we want to do discussion, UX, etc.

Reimplement read intent

In #52 the read intent didn't get copied over (or maybe it's empty?). Also we need to make it work.

Onboarding needs to close mic

Right now it opens the mic and never closes

Remove tabContent.js

This file isn't used any longer.

Import log.js

I'd like to move log.js from screenshots or maybe personal-history-archive

Change from bodymovin to lottie

We use bodymovin for animations, but we want to use lottie instead. They both consume the same animations.

Add music intents

We want to add intents to play music:

Play/pause
Play music by [artist]
Play [genre]
Play best of [artist]
etc.

To do this we need to create a list of music services we want to support, we need to detect what music service the user already uses, we may want to allow setting the music service, and then we have to create music-service-specific code to interact with the individual players.

Vendor in webrtc_vad during build process

While the other files in vendor/ are copied as part of npm install, I wasn't able to figure out where webrtc_vad.(js,wasm) came from (or when I found some files they didn't match what we had).

Ideally we would minimize the JS file as we copy it in.

Move popup UI into its own module

We should have a popup/ui.js file, which handles the UI, but does not have non-UI internal logic. This includes moving some HTML into popup.html.

Auto-dismiss popup on applicable intents

The popup should call window.close() after giving feedback to the user for a reasonable amount of time

Remove background.js/browserAction code

The code in background.js all involves starting the experience, but that happens automatically now with default_popup in manifest.json.

Come up with some testing system

This might even be as simple as a script, but I feel like the surface area is somewhat unclear without some documentation or a process.

Technically Selenium testing should be possible, but ugh.

Create github milestones

I like to use milestones for prioritization and triage, with issues with no milestone as untriaged, and a milestone for the next release, one for backlog, and maybe one extra for Stretch or something like that (especially for code-based issues that don't effect the product experience, but I might want to do anyway).

Add note-taking intents

In this model you would first indicate where you are taking notes, then add things:

Take notes here (ideally a document in a tab with a focused element)
Add note [X]
Make a note of this tab/link

This has some relation to #77 and copy intents, except we'd immediately put the text into a specific destination. Integrating with different note-taking tools would take some effort (especially if we want, for instance, to be cursor-position-neutral), but it's not incredibly hard.

Setup CircleCI

We want a circleci task that builds and maybe lightly tests the project (all we have currently are eslint-style tests).

Keyboard shortcut doesn't work

When trying to use the keyboard shortcut (Command-.), getUserMedia never returns in the popup. This is similar to the behavior when the extension doesn't have media permission.

Microphone isn't acquired on first start

At least in testing (npm start) I'm frequently seeing a problem where the microphone isn't acquired in the 2 second time. I haven't seen if it's an indefinite problem or not. Just trying again seems to fix it.

One possible hacky fix: if it's an issue of warming up the mic and/or permission, we could open the onboarding tab on startup, and close it after the mic is acquired. That would be OK for onboarding generally.

Add Sentry error collection

In several products we use Sentry to collect unexpected exceptions (i.e., get field reports of bugs in our product). Maybe we should do that for this project?

Getting access to Sentry is pretty easy, but we have to add some collection to the extension, especially to collect errors that come from content scripts and some more unusual locations.

[browser command] Add save/download screenshot intent

It wouldn't be a huge amount of work to add some simple screenshot-related intents:

Download a screenshot of this
Copy a screenshot of this
Download a screenshot of the entire page
Copy a screenshot of the entire page

The first two implicitly take a screenshot of the viewport.

This would not use Firefox Screenshots, but would simply make the screenshot (which isn't very hard). Taking a screenshot of just a portion of the page would be out of scope.

Include intent parser

We aren't sure yet if we have to use a remote-hosted intent parser, or if we might be able to construct a local wasm-based parser. This will be an ongoing experiment.

Use safe search

Any Google query should have Safe Search on.

Fix find intent

The find intent doesn't seem to work right. It always falls back to a search for me, and I have a hard time constructing the right words to trigger the regex.

Add "next search result" intent

We could support an intent that lets you move through search results:

Next search result
Previous search result
What is the next search result?
Next/Previous

To implement this we will have to detect and save information about any searches in a tab, then detect and save the list of search results, and then match the current page against that list to determine what would come next. The implementation is fairly involved.

Find a host for the built add-on

We want to deploy builds of the add-on (via #2) to some server. We need to be able to upload to this server from CircleCI. The URL isn't too important (it won't host any site, just the xpi and an update.xml file).

Some S3 location perhaps? Circle needs to be able to do the uploading.

Create code glossary

We should have a code glossary of the terms we are using in the codebase. Something simple and short in docs/

Add weather intents

Some ways to ask for weather:

What is the weather (in [city])
What’s the temperature (in [city])
Tell me the forecast (in / for [city])
[city] temperature
[city] weather
What’s it like outside?
How’s the weather / temperature?
Is it nice out
Is it (cold / warm / raining / drizzling / snowing / hailing / windy / chilly / sunny / cloudy / storming / stormy / thunderstorming) out
Check the weather (in [city])

In almost all cases Google returns an appropriate weather card for its search. Can we simply detect these and display that card in the popup? Google does not display things like 10 day forecast for Keene particularly well.

Use <input> in popup

Right now the text input seems to be a <span>. This is extra work and has accessibility problems. We should just use an input. All the styles can still be overridden so it can look like whatever (though it takes somewhat more work).

Allow other extensions to add intent handlers

We'd like other extensions to be able to extend the capabilities of this project.

An open issue: how do we extend the intent parser given these extensions?

Two options for extensions that could support Firefox Voice to demonstrate how this works:

Email Tabs: composes emails based on one or more tabs
Side View: opens a mobile view of a tab in the sidebar

(These are good options because I developed them and can make the changes.)

Add word definition intent

Simply:

define [term]

Ideally this would display in the popup (not as a new tab). A google search result usually produces a good result. DuckDuckGo cards don't seem to work well here.

Convert webrtc code to emit events

Right now the code in content.js, specifically around stm_start does direct UI manipulation. Instead it should fire off some kind of events, and some other code will hook those up to UI changes (per #34)

Generate manifest.json from a template

We'll need to do some substitutions in manifest.json, so we should generate it. While I've used mustache in the past, I think ejs is even simpler.

The template should be rerendered everytime npm start is run.

Separate out intent handlers

We should make an intents/ directory in the extension, and each intent should go in there. We should use one directory per "category" of intent (e.g., there is a play music and pause music, but they would both be in a music/ folder).

For now I think it can be as simple as, say, intents/find/find.js (I find a kajillion index.js files a bit hard to handle, so I'd rather clone the directory name as the main file).

Add copy/clipboard intents

We might want to support copying some text or pieces of a page:

Copy link
Copy title
Copy title and link
Copy markdown link (which includes title)
~~Copy article (Reader Mode view)~~
Copy screenshot
Copy selection
~~Copy main image~~

What we copy can be HTML or text, but unfortunately we can't have a smart choice of which based on paste (we can try to make the text paste work OK when we copy HTML, but it's limited).

Convert sendMessage from ports to one-off

Right now we setup ports in the add-on, but I don't think we need to, we can just use browser.runtime.sendMessage/onMessage, which I think will be slightly easier to manage.

Add translation intents

We can translate both pages and specific text:

Translate this [to language]
Translate [X] [to language]

Both could use translate.google.com. Translating a specific word could happen in the popup.

[browser command] Add previous tab intent

We could support:

Go back to previous tab
Show previous tabs

To actually do this we'd have to create our own record of active tabs, as the APIs don't reflect tab history very well. We might want to consider how long a user has to dwell on a tab for us to treat it as "active".

Collect telemetry data in extension

We should accept some messages in the background process that will be used to assemble and then submit the telemetry ping.

We should keep a pending payload, and allow intents or other components to add partial data. Then some final message/event will send the ping, and the payload will be reset.

[browser command] Add move this to a new window intent

We can pop a tab into its own window

Open/move this in/to a new window

Implementation is trivial.

We can't interact with the web using natural voice commands

We should fix that. And figure out what it means along the way.

Native timer

Timer intents would support:

Start timer
Start timer for [X]
Pause timer
Stop timer
Change timer to [X]

We could implement a timer natively, or attempt some integration. Technically Google supports timers via search, but we'd have to leave the page open, and the interface isn't particularly nice.

Create telemetry schema

We need to propose a schema and submit it to

We have some work in this Google doc

We need to create something like the work in this repository