Coder Social home page Coder Social logo

Comments (16)

youennf avatar youennf commented on August 26, 2024 1

Right, this is precisely why I want some experimentations and experimentation results before defining a new API/new API model. This will allow us to make sure we are all comfortable implementing the new model. And to decide whether this is a whole new model or not from getUserMedia as well.

In general, I see constraints as a way to help the user agent pick the default selected devices presented to the user. That is all, maybe also rank the devices but I think this will be counter-intuitive to user. It does not seem to make sense to allow camera and microphone selection but only a subset would actually be selectable by the user.

If a user wants to select microphone and not camera, user should have the choice. In that sense, this is not Audio&Video, but Audio|Video that constraints would end up defining. Or maybe we should have an API for microphone and a separate API for camera?

I also do not really like the model of a prompt that would happen or not based on what the web page would provide as input. If a webpage says it wants the default camera and default microphone, why shouldn't the user be allowed to override this selection with different devices, or say no to camera but yes to microphone?

I guess there might be some latitude here for heuristics based on page capture history, but the whole story is not clear to me. And I am unclear as to whether the spec will provide guidelines/requirements in that area as well/

from mediacapture-extensions.

alvestrand avatar alvestrand commented on August 26, 2024 1

Exact constraints were added to the spec because participants thought that in some cases, for some apps, the app would prefer not to work at all rather than have to work beyond those requirements.
If the app wants the user to have the widest range of choice, the app should use ideal constraints.

My position at the moment is that we don't need to change this; removing required constraints now would only increase the uncertainty for developers, and have zero benefit for the users.

from mediacapture-extensions.

henbos avatar henbos commented on August 26, 2024

https://previews.123rf.com/images/capeman29/capeman291811/capeman29181100003/117260512-can-of-worms-cartoon-character.jpg

from mediacapture-extensions.

henbos avatar henbos commented on August 26, 2024

@jan-ivar and @youennf please share your thoughts

from mediacapture-extensions.

henbos avatar henbos commented on August 26, 2024

Also I saw that the other ones were labeled "April 2020 Interim" but it's March 30th, right?

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

I think this is a straw man. The sole difference is the tool to build a device picker no longer exposes labels. Everything remains fundamentally the same wrt user experience.

The app retains the same power it had before to limit choice. Want to build a picker with only choices the user doesn't want? Go ahead. You can do that today.

The fingerprint probing exposure from failed gum calls is exactly the same. We've said this is ok because tracking libraries won't risk a prompt.

I think I answered everything else in w3c/mediacapture-main#667 (comment)

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

Last point: I don't think a transition plan to a more limited API will succeed.

I think constraints are here to stay.

from mediacapture-extensions.

youennf avatar youennf commented on August 26, 2024

If we go with a device picker, we should design it for the best user experience, from scratch.
If we have to update the API to make sure the user experience is great, we should probably do it.
Once we have a good model, we should think of the transition plan.

Last point: I don't think a transition plan to a more limited API will succeed.

It depends. You could always reduce the power of the old API, for instance by only selecting the default devices with the old API (after some deprecation time) and using constraints as ideal in that case.

Also, I doubt that the fact of modifying how we handle constraints will be seen as a more limited API. I might be biased but to me, constraints are over-complex and sub-used. Simplifying constraints would be beneficial.

from mediacapture-extensions.

henbos avatar henbos commented on August 26, 2024

A powerful reason for a potential "user-chooses" API is that it works straight out of the box - no need to partially implement picking logic in the application - and that, if we get it right, we could guarantee consistent prompting behavior across browsers.

Besides, if the prompt is actually good enough to do its job, why would prompting be a problem? If the user is to choose, why would letting the user choose be a problem?

I would argue that there is a difference between an undesirable superflous yes/no re-prompt when the application has already done the choosing for you and a prompt that is actually selecting something that the user wants to select (and would otherwise select inside of application picker logic).

All of my reasoning though is based on sometime in the future deprecating the old way. If we give up on that then making a suboptimal API is an option. But if it's optional, even far down the road, then we haven't really addressed the privacy concerns.

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

@youennf Let's not confuse API with user experience. Firefox has had a picker forever. Join us. 😉

@henbos I'd caution against oversimplifying the problem. While it may seem appealing to expose prompt methods wholesale to JS, they're often a bad idea (see permission.request() or roc's old blog).

There's a role for user agents to negotiate permission at an app's point of (media) access.

Thus the media access API design is still an abstraction separate from a user agent's prompting story, even when that API puts requirements on it.

I don't even agree "consistent prompting behavior across browsers" is a general goal. That's a common web developer ask, a different problem from what we're solving (end user privacy).

For instance, Firefox has not committed to removing its prompt on seeing "browser-chooses", and there's no spec language to force it, because that wasn't the goal.

The goal of "user-chooses" was to minimally guarantee a prompt only when the user's choices exceed what the app asks for, allowing apps to replace their "control setting"-type pickers with it.

Like today, the app remains in control of what it wants to ask.

if the prompt is actually good enough to do its job, why would prompting be a problem?

Because users don't want to be prompted for their camera and microphone every time.

@henbos I don't want to dismiss criticisms that constraints were over-designed (they were), and while I'm glad "user-chooses" sparked an opportunity to think further, the two events seem largely orthogonal.

That's important to stress since @youennf seems to suggest UX has to come before API here, even though the workin group agreed last interim to move ahead with w3c/mediacapture-main#667.

from mediacapture-extensions.

aboba avatar aboba commented on August 26, 2024

@henbos I think you have raised a lot of good questions. Thank you.

Overall, we are really talking about a very different model from the current Media Capture approach, much more like Screen Capture. Just as Screen Capture forced us to re-think the role of constraints, it seems to me that an "in-chrome" approach to Media Capture will require some new thinking.

Given that we are really talking about a new model, I am wondering whether the right way to handle this might be to create a new "Media Capture and Streams Version 2" work item, rather than trying to make all these changes to the existing Media Capture and Streams document before bringing it to PR.

This doesn't imply a new API, just that we use separate documents for the old approach and the new one, so that we can clearly document each one. Otherwise, I am concerned that we could confuse the reader, who will not be able to distinguish "old" from "new" approaches.

from mediacapture-extensions.

youennf avatar youennf commented on August 26, 2024

Maybe we are trying to fit too many things in a single getUserMedia method.

The main/sole use case for MediaDeviceInfo.label is a a web-based device picker.
Given labels are only available after page starts to capture, what we might actually want is a way to change the device being used for a given live capture track.
We could use a browser picker that could already be in use and invocable from browser UI, similarly to what Chrome is apparently implementing for getDisplayMedia.

If we attach this API to a MediaStreamTrack, we could add a new API or try extending applyConstraints, given it can potentially already be used to switch between user and environment cameras. We probably want user activation whenever changing the device.

This would also be conceptually consistent with what we are trying to do for speakers, where speaker authorisation is a simple user click, and, in the case user wants to change the selected speaker, a new API would trigger a browser device picker.

As an extra bonus, no new track means no need to call replaceTrack, update MediaStreams, WebAudio nodes... This might simplify things for web developers.

One potential worry is the handling of cloned tracks. Maybe that would be the web app job to update any cloned track with the newly selected device (no user prompt needed here, could be done with applyConstraints).

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

We can do a lot of things, but I think we need to start with problems we want to solve.

A couple of red flags for me here: to me it's not the goal of this spec to express all these things, but to create a model within which user agents can experiment in a web compatible way, and JS can express its needs. That model is:

  1. User agents decide the units exposed as unique media input devices
  2. Apps express their constraints for a media input device it wants
  3. User agents pick device within those constraints
  4. A (source) device is shared as one or more tracks.
  5. "Once selected, the source of the MediaStreamTrack MUST NOT change."
  6. A track's label "MUST return the label of the object's corresponding source"
  7. Tracks can be cloned
  8. (Cloned) tracks may manipulate (output from) the source through applyConstraints (but not change source)
  9. Tracks can end (which may terminate a permission envelope)

what we might actually want is a way to change the device being used for a given live capture track.

That's expressly forbidden in this model.

Also, that's just one use case (needing to replacing a source with another). The general use case is adding a second source. The latter more general use case supports the former.

applyConstraints, given it can potentially already be used to switch between user and environment

It cannot. applyConstraints cannot change the source of a track. Not unless the user agent exposes a single unique camera that returns:

console.log(track.getCapabilities().facingMode); // ["user", "environment"]

E.g. like a motorized pivot camera.

Sure, a user agent could in theory expose a bunch of virtual device all with the capability to mimic every other device of its kind, but it undermines the value of the model by doing so.

What you describe sounds like an entirely different model. That's fine, but I think I'm going to need a convincing problem we don't solve today, to justify spending time considering a new model at this point.

from mediacapture-extensions.

youennf avatar youennf commented on August 26, 2024

We can do a lot of things, but I think we need to start with problems we want to solve.

The problem we want to solve here is removing label info either entirely or for devices not used by the given web page.

Also, that's just one use case (needing to replacing a source with another). The general use case is adding a second source. The latter more general use case supports the former.

I am not sure how common it is to add a second source of the same device type.
Can you be more specific about the use case?

Anyway, let's say we want that. The way we are doing this right now (and this proposal does not change anything) is for the web app to call getUserMedia a second time to pick a second device, potentially using enumerateDevices information to pass some specific constraints, like a deviceId.

It seems one underlying goal that you might have is to allow a user agent to only expose granted devices as part of enumerateDevices. This is a fine goal and maybe we can go there one day. We probably want to expose some information anyway to let know the page that other devices can be used for instance. This would need more effort and can already be experimented by user agents by gradually exposing enumerateDevices information instead with the devicechange event.

  1. User agents decide the units exposed as unique media input devices

OK

  1. Apps express their constraints for a media input device it wants

This is the current model.
I would rephrase it to: "apps express the constraints for a media input data it wants, not a media input device". Most apps do not care about a particular device as long as it provides audio or video the user actually wants to use.

Some apps might want to select the same device as last time. Current API allows that. In general, I think the user agent is best suited to do that job.

  1. User agents pick device within those constraints
  2. A (source) device is shared as one or more tracks.
  3. "Once selected, the source of the MediaStreamTrack MUST NOT change."

As long as the user gives consent to use the new device, I do not see any real issue here.
Can you be more specific about what will break here for the user? Or the web page, given the web page asked to change the source?

On the phone, several apps have a simple button to switch from the user facing camera to the environment facing camera. From the user point of view, the feed remains the same and is expected to go wherever it goes, only the source is changing.

  1. A track's label "MUST return the label of the object's corresponding source"

Sounds fine. Aren't we somehow trying to deprecate label though?

  1. Tracks can be cloned

Similar to 4 somehow, I do not see any difference between two cloned tracks and two getUserMedia tracks using the same underlying device.

  1. (Cloned) tracks may manipulate (output from) the source through applyConstraints (but not change source)

'but not change source' seems like an artificial limitation.
As long as a web page is granted access to both devices and they have the same media type, I do not see any real issue in changing the underlying source, at least when web page is aware of the change.

For a phone, a user agent can decide to expose one camera device, supporting both environment and user or two camera devices. Why shouldn't the user agent be allowed to have some UI allowing the user to switch cameras on the fly outside of the web page control?

Also, if the page is capturing with both devices, there is no difference between applyConstraints(use_the_other_source) and clone-the-other-source-track-then-applyConstraints-then-replace-track-wherever-needed.

  1. Tracks can end (which may terminate a permission envelope)

OK.

What you describe sounds like an entirely different model. That's fine, but I think I'm going to need a convincing problem we don't solve today, to justify spending time considering a new model at this point.

Maybe I am missing how different this model differs from today's model. Can you be more explicit?

This change seems to me like an incremental change, which targets the issue of 'removing that offending label' or 'removing that device picker'.
In particular, this does not require existing web sites to change anything about their current flow to enter a call, grant the prompt... The only adoption needed is in the 'device picker' pane, which might be less crucial.

The other problem that it could solve is the fact that, apparently, user agents are not allowed to change the source of a capture track. I question this limitation.
A web page might actually want to opt-in to a behavior where the audio input source matches the audio output so that plugging in a headset would automatically start using the headset microphone if audio output goes to the headset speakers.

This change also has a potential good story for migrating web sites. First implement it, then sanitize labels for not granted devices to things like microphone1, camera2...

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

So we have some experience with in-browser camera & mic pickers in Firefox to draw on.

The problem we want to solve here is removing label info either entirely or for devices not used by the given web page.

I agree on the problem statement, but I see no reason to change our whole model over it. Conservatively if we look at how labels are used today, sites build rudimentary pickers. All they need to replace that effort is a tool to provoke an in-browser picker. This already almost works in Firefox by default, which is why I find it hard to comprehend what a leap this is for some:

await getUserMedia({video: {deviceId: {exact: [...allOtherDeviceIds]}});

Admittedly, the above has API problems and UX problems, but we need to separate them:

The API problems:

  1. Won't prompt once you check ☑ Remember this decision. It's still a permission prompt.
  2. For web compat, we can't prompt if site already has permission to one of the choices.

The UX problems (which are orthogonal i.e. already exist today on second-device requests!):

  1. Canceling a second-device request is overly harsh on the site. Bug 1609578
  2. Our UX heavily biases toward a default choice, too many clicks to change device.
  3. Our (lack of) preview is biased toward initial prompt (where it might freak people out)
  4. We don't do a good job of simplifying our UX when there's just one choice.

If you work on (or predominantly use) a browser without per-device permission (that doesn't tell you which device you're sharing), you'll be forgiven for thinking these problems as intrinsically linked. They are not.

We solve the API problems with:

await navigator.mediaDevices.getUserMedia({video: true, semantics: "user-chooses"});

This would be enough to force all user agents to show a picker of all devices.

This seems to solve the API problem with no change to the model, with near-parity with all existing in-content device selection I've seen.

That wins in my book. Leave UX to user agents.

Now there are interesting UX-related corner cases here we can discuss in the interest of sharing, but I want to leave the pie in the sky first.

allow a user agent to only expose granted devices as part of enumerateDevices.

We have a basket for that. Just like w3c/mediacapture-main#646 would prevent sites from optimizing out camera- and mic-launching buttons, removing info of other devices would prevent sites from optimizing out camera- and mic-changing buttons in their config panel(s).

The "interesting" UX-related corner cases I alluded to, include what to do e.g. when there's only one choice.

from mediacapture-extensions.

jan-ivar avatar jan-ivar commented on August 26, 2024

@henbos Specifically on removing required constraints, note that Chrome today implements info.getCapabilities() which gives the site capability information about all devices after gUM.

That API exists to allow a site to enforce its constraints while building a picker, or choosing another device outright. Most sites enforce some constraints.

That API is also a trove of fingerprinting information.

Luckily, "user-chooses" provides feature-parity with this, without the massive information leak:

await getUserMedia({video: constraints, semantics: "user-chooses"});

So merging w3c/mediacapture-main#667 would let us retire info.getCapabilities() provided we leave constraints alone. 🎉

from mediacapture-extensions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.