Coder Social home page Coder Social logo

mozilla-extensions / firefox-voice Goto Github PK

View Code? Open in Web Editor NEW
286.0 286.0 122.0 110.77 MB

Firefox Voice is an experiment in a voice-controlled web user agent

License: Mozilla Public License 2.0

Shell 0.44% CSS 13.61% JavaScript 70.82% HTML 2.02% Python 0.51% Kotlin 12.09% SCSS 0.51%

firefox-voice's People

Contributors

abhivaidya95 avatar andrenatal avatar annlinros avatar awallin avatar clouserw avatar danielamormocea avatar dave-ok avatar espertus avatar fabricedesre avatar farhatsharifh avatar fleur101 avatar gangachatrvedi avatar gwe-n avatar harraton avatar ianb avatar ikkyodufade avatar janvimahajan14 avatar jcambre avatar jenniferharmon avatar johngruen avatar khushilmistry avatar lelouchb avatar maitrella avatar marwendoukh avatar melvin2016 avatar mrstegeman avatar pdehaan avatar saucekode avatar simpcyclassy avatar xlisachan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

firefox-voice's Issues

Create log viewer

I added a logging system, but the log messages are very ad hoc. I think it might be useful to have a general log viewer, that shows us both incidental information that's logged, and some essential information like text input, parsed intents, error messages, etc.

This doesn't have any product value, but I think I may find it useful if only for my own work.

Register intents instead of hardcoding them

Right now we have a bunch case statements around intents. Instead intents should register themselves. Ideally this would include any regexes, sample statements, and the handlers.

Fix text input styling

Right now, the text input field is overlapping with the header. Also need to fix the submit button so it shows proper behavior on hover

Add text summarization intents

We could support intents based on text summaries of an article:

  • Read me a summary
  • Copy a summary of this

Presumably we'd just use some text summarization service

Create acceptance criteria for intent parsing

We need some tests we can run against our different intent parsing approaches (regex, Snips, remote, etc):

  • List of utterances
  • We should attempt to do STT on these utterances to look for frequent incorrect transcriptions
  • Mapping of utterance to intent name and slots

[browser command] Add pin/unpin tab intents

We could add an intent to pin tabs:

  • Pin/unpin this tab

We could potentially support other tab movement intents:

  • Make this tab first/last

The action implementation is trivial.

Import log.js

I'd like to move log.js from screenshots or maybe personal-history-archive

Add music intents

We want to add intents to play music:

  • Play/pause
  • Play music by [artist]
  • Play [genre]
  • Play best of [artist]
  • etc.

To do this we need to create a list of music services we want to support, we need to detect what music service the user already uses, we may want to allow setting the music service, and then we have to create music-service-specific code to interact with the individual players.

Vendor in webrtc_vad during build process

While the other files in vendor/ are copied as part of npm install, I wasn't able to figure out where webrtc_vad.(js,wasm) came from (or when I found some files they didn't match what we had).

Ideally we would minimize the JS file as we copy it in.

Move popup UI into its own module

We should have a popup/ui.js file, which handles the UI, but does not have non-UI internal logic. This includes moving some HTML into popup.html.

Come up with some testing system

This might even be as simple as a script, but I feel like the surface area is somewhat unclear without some documentation or a process.

Technically Selenium testing should be possible, but ugh.

Create github milestones

I like to use milestones for prioritization and triage, with issues with no milestone as untriaged, and a milestone for the next release, one for backlog, and maybe one extra for Stretch or something like that (especially for code-based issues that don't effect the product experience, but I might want to do anyway).

Add note-taking intents

In this model you would first indicate where you are taking notes, then add things:

  • Take notes here (ideally a document in a tab with a focused element)
  • Add note [X]
  • Make a note of this tab/link

This has some relation to #77 and copy intents, except we'd immediately put the text into a specific destination. Integrating with different note-taking tools would take some effort (especially if we want, for instance, to be cursor-position-neutral), but it's not incredibly hard.

Setup CircleCI

We want a circleci task that builds and maybe lightly tests the project (all we have currently are eslint-style tests).

Keyboard shortcut doesn't work

When trying to use the keyboard shortcut (Command-.), getUserMedia never returns in the popup. This is similar to the behavior when the extension doesn't have media permission.

Microphone isn't acquired on first start

At least in testing (npm start) I'm frequently seeing a problem where the microphone isn't acquired in the 2 second time. I haven't seen if it's an indefinite problem or not. Just trying again seems to fix it.

One possible hacky fix: if it's an issue of warming up the mic and/or permission, we could open the onboarding tab on startup, and close it after the mic is acquired. That would be OK for onboarding generally.

Add Sentry error collection

In several products we use Sentry to collect unexpected exceptions (i.e., get field reports of bugs in our product). Maybe we should do that for this project?

Getting access to Sentry is pretty easy, but we have to add some collection to the extension, especially to collect errors that come from content scripts and some more unusual locations.

[browser command] Add save/download screenshot intent

It wouldn't be a huge amount of work to add some simple screenshot-related intents:

  • Download a screenshot of this
  • Copy a screenshot of this
  • Download a screenshot of the entire page
  • Copy a screenshot of the entire page

The first two implicitly take a screenshot of the viewport.

This would not use Firefox Screenshots, but would simply make the screenshot (which isn't very hard). Taking a screenshot of just a portion of the page would be out of scope.

Include intent parser

We aren't sure yet if we have to use a remote-hosted intent parser, or if we might be able to construct a local wasm-based parser. This will be an ongoing experiment.

Fix find intent

The find intent doesn't seem to work right. It always falls back to a search for me, and I have a hard time constructing the right words to trigger the regex.

Add "next search result" intent

We could support an intent that lets you move through search results:

  • Next search result
  • Previous search result
  • What is the next search result?
  • Next/Previous

To implement this we will have to detect and save information about any searches in a tab, then detect and save the list of search results, and then match the current page against that list to determine what would come next. The implementation is fairly involved.

Find a host for the built add-on

We want to deploy builds of the add-on (via #2) to some server. We need to be able to upload to this server from CircleCI. The URL isn't too important (it won't host any site, just the xpi and an update.xml file).

Some S3 location perhaps? Circle needs to be able to do the uploading.

Create code glossary

We should have a code glossary of the terms we are using in the codebase. Something simple and short in docs/

Add weather intents

Some ways to ask for weather:

  • What is the weather (in [city])
  • What’s the temperature (in [city])
  • Tell me the forecast (in / for [city])
  • [city] temperature
  • [city] weather
  • What’s it like outside?
  • How’s the weather / temperature?
  • Is it nice out
  • Is it (cold / warm / raining / drizzling / snowing / hailing / windy / chilly / sunny / cloudy / storming / stormy / thunderstorming) out
  • Check the weather (in [city])

In almost all cases Google returns an appropriate weather card for its search. Can we simply detect these and display that card in the popup? Google does not display things like 10 day forecast for Keene particularly well.

Use <input> in popup

Right now the text input seems to be a <span>. This is extra work and has accessibility problems. We should just use an input. All the styles can still be overridden so it can look like whatever (though it takes somewhat more work).

Allow other extensions to add intent handlers

We'd like other extensions to be able to extend the capabilities of this project.

An open issue: how do we extend the intent parser given these extensions?

Two options for extensions that could support Firefox Voice to demonstrate how this works:

  • Email Tabs: composes emails based on one or more tabs
  • Side View: opens a mobile view of a tab in the sidebar

(These are good options because I developed them and can make the changes.)

Add word definition intent

Simply:

  • define [term]

Ideally this would display in the popup (not as a new tab). A google search result usually produces a good result. DuckDuckGo cards don't seem to work well here.

Convert webrtc code to emit events

Right now the code in content.js, specifically around stm_start does direct UI manipulation. Instead it should fire off some kind of events, and some other code will hook those up to UI changes (per #34)

Generate manifest.json from a template

We'll need to do some substitutions in manifest.json, so we should generate it. While I've used mustache in the past, I think ejs is even simpler.

The template should be rerendered everytime npm start is run.

Separate out intent handlers

We should make an intents/ directory in the extension, and each intent should go in there. We should use one directory per "category" of intent (e.g., there is a play music and pause music, but they would both be in a music/ folder).

For now I think it can be as simple as, say, intents/find/find.js (I find a kajillion index.js files a bit hard to handle, so I'd rather clone the directory name as the main file).

Add copy/clipboard intents

We might want to support copying some text or pieces of a page:

  • Copy link
  • Copy title
  • Copy title and link
  • Copy markdown link (which includes title)
  • Copy article (Reader Mode view)
  • Copy screenshot
  • Copy selection
  • Copy main image

What we copy can be HTML or text, but unfortunately we can't have a smart choice of which based on paste (we can try to make the text paste work OK when we copy HTML, but it's limited).

Convert sendMessage from ports to one-off

Right now we setup ports in the add-on, but I don't think we need to, we can just use browser.runtime.sendMessage/onMessage, which I think will be slightly easier to manage.

Add translation intents

We can translate both pages and specific text:

  • Translate this [to language]
  • Translate [X] [to language]

Both could use translate.google.com. Translating a specific word could happen in the popup.

[browser command] Add previous tab intent

We could support:

  • Go back to previous tab
  • Show previous tabs

To actually do this we'd have to create our own record of active tabs, as the APIs don't reflect tab history very well. We might want to consider how long a user has to dwell on a tab for us to treat it as "active".

Collect telemetry data in extension

We should accept some messages in the background process that will be used to assemble and then submit the telemetry ping.

We should keep a pending payload, and allow intents or other components to add partial data. Then some final message/event will send the ping, and the payload will be reset.

Native timer

Timer intents would support:

  • Start timer
  • Start timer for [X]
  • Pause timer
  • Stop timer
  • Change timer to [X]

We could implement a timer natively, or attempt some integration. Technically Google supports timers via search, but we'd have to leave the page open, and the interface isn't particularly nice.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.