webmachinelearning / proposals Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 2.0 6 KB

🚀 Proposals for future work

proposals's People

Contributors

Stargazers

Watchers

Forkers

jbingham qpc-github

proposals's Issues

Add use case on “Content Filtering” in WebNN specifications

Proposal Name: Add use case on “Content Filtering” in WebNN specifications

Currently, there is a gap when it comes to web standards for supporting content blocking and filtering. With this proposal, we suggest to add content filtering as a use case in the webNN specifications.

Short description

(Moving the original PR from webNN repository to the more general proposals repo.)

We at eyeo have been working on machine learning (ML)-based content filtering and have pioneered the use of ML in ad filtering. As discussed in the issue previously, we highlighted that there is a gap when it comes to web standards for supporting content blocking and filtering. This hinders in implementing solutions that upholds the W3C ethical principle that "People should be able to render web content as they want".

Hence we propose to add a use case on “Content Filtering” to existing use cases: https://webmachinelearning.github.io/webnn/#use cases-application

Example use cases

ML-based content filtering can be applied to a number of use cases like intrusive ad filtering, user privacy protection, cyber bullying detection and avoidance, clean-page user experience for specially-abled users.

A rough idea or two about implementation

We propose to add a use case in webNN specs as follows:

Content Filtering

A user is cautious about her online privacy and wants to be protected from the online trackers, malware and any third-parties present on the web pages that she visits. The ML-based content-filter [REF] identifies and blocks the third party content, while allowing her to safely surf her favorite websites. Thus she is safe and more in control of her online experience [REF2].

[REF] Ad-blocking: A Study on Performance, Privacy and Counter-measures
[REF2] Point [2.12] of the W3C Ethical Web Principles

With this proposal, we submit the use case on "Content Filtering" for addition to the WebNN specs. Having this use case in the specs will allow us as the web community to not only allow implementation of ML-based content filtering but also influence the shaping of web extensions and discuss improvements to APIs such as webRequest and declarativeNetRequest.

Looking forward to next steps.

Please let me know if you have any questions or comments.

Thanks,
Humera
[email protected]

cc: @anssiko

Hybrid AI Exploration

Authors

Michael McCool
Geoff Gustafson
Sudeep Divakaran

Introduction

ML on the client supports many use cases better than server-based approaches, and with lower cost for the application provider. However, clients can vary significantly in capabilities. A hybrid approach that can flexibly shift work between server and client can support elasticity and avoid the problem of developers targeting only the weakest clients’ capabilities.

The overall goal of hybrid AI is to maximize the user experience in machine learning applications by providing the web developer the tools to manage the distribution of data and compute resources between servers and the client.

For example, ML models are large. This creates network cost, transfer time, and storage problems. As mentioned, client capabilities can vary. This creates adaptation, partitioning, and versioning problems. We would like to discuss potential solutions to these problems, such as shared caches, progressive model updates, and capability/requirements negotiation.

Requirements and Goals

For the end user, most of the existing WebNN use cases share common user requirements:

Enhance User Experience
- Reduce load times
- Meet latency targets for human interactions
Portability and Elasticity
- Minimize compute, storage, and network transfer costs
- Support clients of different capability levels, including older/newer clients
- Adaptive to varying resource availability
Data Privacy
- User choice for location of data storage and computation
- Video and audio streams both high bandwidth and generally private
- Personal data (personally-identifiable information)
- Confidential business information

Even though it is not a primary requirement, developer ease of use is a factor for adoption. An approach that easily allows a developer to shift load between the server and the client using simple, consistent abstractions will allow for more Hybrid AI applications to be developed faster than one with completely different programming models.

Open Issues

Current implementations of hybrid AI applications (see User Research and References) have the following problems when targeting many of the WebNN use cases:

If the model runs on the server, then large amounts of (possibly private) data may need to be streamed to the server. This incurs a per-use latency.
If the model runs on the client, large models need to be downloaded, possibly multiple times in different contexts. This incurs a startup latency.
Users need control over private data, so the choice of whether or not a model needs to run on the client may have to be overridden by the user's preferences in some cases.
Clients vary in capabilities, so the developer does not know in advance how to split up the computational work.
Models are large and can consume significant storage on the client, which needs to be managed.
Applications may use multiple models that need to communicate with each other, but may each run on either the client or server.
Multiple applications may be present and need to share resources such as storage, memory, and compute. This may cause the actual capabilities of the client to vary over time.
It may be necessary to hide the exact capabilities of the client from the developer to avoid fingerprinting. However, the platform must be able to match models with client capabilities. An application may provide a choice of multiple models to support elasticity. See Performance Adaptation.

Non-goals

Protecting proprietary models downloaded to the client from interception is a non-goal (but may be addressed in implementations or later work).
Automatic factoring of models is a non-goal. The developer needs to break models into pieces that are managed atomically, each of which runs on either the server or client.
Automatic optimization of models is a non-goal. The developer needs to consider how to minimize the size of models; however they may be able to use generic features expected on clients such as small data types.
Model training is a non-goal. The system will be focused on inference. However, fine tuning may be used in limited circumstances.
Complete modelling of the client’s capabilities is a non-goal.
Extreme scalability is a non-goal. While there may be multiple applications on the client there should be only a handful that a single user can use at once.
Extreme performance is a non-goal. While important, other goals such as portability and security, which are also important to the user experience, create trade-offs.
Managing models outside the web client is a non-goal. For example, internal platform models may be present and used to support system features but the proposed system would not manage them. However, access to those separately managed platform models may be useful.

User Research and References

Existing WebNN Use Cases - Set of agreed-upon use cases for WebML. Most of these have latency or privacy requirements and several require large models (e.g. language translation).
Storage APIs for caching and sharing large models across origins discussion in the WebML WG - Some previous discussion within the WebML on the problem of model caching. Includes discussion of experience with prototype and highlights cross-site sharing as a key issue.
Google on Hybrid AI - A general discussion of Hybrid AI based on using smaller models on client and falling back to server only when necessary. Mentions need to cache models, and potentially breaking up model between client and server. Mentions potential privacy benefits of partitioned models, surveys several applications already using hybrid approach.
Qualcomm - Getting Personal with On-Device AI - Describes several interesting personalized AI client use cases.
Moor Insights on AI PC - Describes the AI PC opportunity and lists several client AI applications, with a focus on Windows and Microsoft.
Priority of Constituencies - The general W3C framework for prioritizing requirements.
WNIG Cloud-Edge Coordination Use Cases - Two explicit ML use cases but others may have ML aspects, e.g. video editing.
While these emphasize edge computing (offload from the client) several can also be interpreted as use cases simply needing additional performance on the client.

Operation-specific APIs

This is a proposal to define and implement a small number of standalone APIs for individual compute-intensive operations (like convolution 2D and matrix multiplication) that are often the target of hardware acceleration. The APIs would be atomic, and would not be tied to a graph or model loader implementation. It would be up to javascript libraries or WASM to call into these low-level APIs.

Short description

Across many common machine learning models, there are a handful of compute-intensive operations that may account for 90-99% of inference time, based on the benchmarking done for Web NN. If these few operations were offered as standalone APIs, hardware acceleration could give much of the performance benefit with a small simple API surface, without needing to define all of the many other instructions and graph topology needed for a higher-level API like a graph or model loader. As a benefit, it ought to be faster to get this handful of APIs shipped.

JavaScript ML libraries would need to be updated to take advantage of the APIs, just like they can take advantage of Web GL today.

Example use cases

Image classification typically uses convolution and matrix multiplication. With hardware accelerated versions of these two operations, the performance boost would be close to the optimal that could be achieved with a complete graph or model execution API.

A rough idea or two about implementation

Maybe the closest example is Web GL compute shaders, except that these operations would be much simpler.

Supporting JAX-inspired WebML frameworks/libraries

I'm not qualified to write an actual proposal here so this is just a placeholder issue for discussion about supporting JAX-inspired JS frameworks. I originally created an issue in the WebNN repo was advised by @anssiko to create an issue here instead.

The thrust of the original issue was that JAX is becoming more popular, and that as the foundations of WebNN/WebML are built, it may be important to take into account its growing popularity so that highly-performant "JAX.js" type frameworks are possible in the future.

In the original issue I said:

IIRC, WebNN's initial focus is on inference for networks trained using non-web frameworks, which makes sense, but this question is more about the long-term trends, given the seeming possibility of JAX-like frameworks becoming the norm.

But I'd also like to add that gradients are required for the "guiding" done by models like VQGAN+CLIP. It seems like guiding embeddings/latents/inputs via models like CLIP is becoming more popular. It may end up being important for WebML to support this type of "inference", rather than just purely forward-prop inference, but maybe it's too early to say. Either way, allowing for the possibility of this when designing the foundations seems like a good idea.

Data processing proposal

Proposal name

Data Processing API for Web

Short description

The needs are mainly addressed by the fact that the deep learning models can not work independently and the data process is needed for both the inputs and outputs of one model.

Since we are drafting the web-dl spec, we should also pay attention to a standard data process spec. Furthermore, the data process should be compatible with js syntax.

Example use cases

const [trainData, testData] = rawImgDatas.map(it => it.resize([224, 224]).blur()).shuffle().splitTrainTest();

const tablarData = rawTablarData.head(10).shuffle();

A rough idea or two about implementation

We are currently working on datacook to implement some data-related processing methods based on tfjs & danfo. And we finish the API level design here and re-implement some methods natively within browsers.

webmachinelearning / proposals Goto Github PK

proposals's People

Contributors

Stargazers

Watchers

Forkers

proposals's Issues

Proposal Name: Add use case on “Content Filtering” in WebNN specifications

Short description

Example use cases

A rough idea or two about implementation

Hybrid AI Exploration

Authors

Introduction

Requirements and Goals

Open Issues

Non-goals

User Research and References

Operation-specific APIs

Short description

Example use cases

A rough idea or two about implementation

Links:

Proposal name

Short description

Example use cases

A rough idea or two about implementation

Recommend Projects

Recommend Topics

Recommend Org