Coder Social home page Coder Social logo

guess-js / guess Goto Github PK

View Code? Open in Web Editor NEW
7.0K 84.0 200.0 13.61 MB

🔮 Libraries & tools for enabling Machine Learning driven user-experiences on the web

Home Page: https://guess-js.github.io/

License: MIT License

TypeScript 77.77% JavaScript 21.81% Smarty 0.42%
machine-learning performance web-performance prefetch prerender hacktoberfest

guess's Introduction

Build Status

Guess.js (alpha)

Libraries and tools for enabling data-driven user-experiences on the web.

Quickstart

For Webpack users:

⚫ Data-driven bundling

Install and configure GuessPlugin - the Guess.js webpack plugin which automates as much of the setup process for you as possible.

Should you wish to try out the modules we offer individually, the packages directory contains three packages:

  • ga - a module for fetching structured data from the Google Analytics API to learn about user navigation patterns.
  • parser - a module providing JavaScript framework parsing. This powers the route-parsing capabilities implemented in the Guess webpack plugin.
  • webpack - a webpack plugin for setting up predictive fetching in your application. It consumes the ga and parser modules and offers a large number of options for configuring how predictive fetching should work in your application.

For non-Webpack users:

⚫ Data-driven loading

Our predictive-fetching for sites workflow provides a set of steps you can follow to integrate predictive fetching using the Google Analytics API to your site.

This repo uses Google Analytics data to determine which page a user is mostly likely to visit next from a given page. A client-side script (which you'll add to your application) sends a request to the server to get the URL of the page it should fetch, then prefetches this resource.

Learn More

What is Guess.js?

Guess.js provides libraries & tools to simplify predictive data-analytics driven approaches to improving user-experiences on the web. This data can be driven from any number of sources, including analytics or machine learning models. Guess.js aims to lower the friction of consuming and applying this thinking to all modern sites and apps, including building libraries & tools for popular workflows.

Applying predictive data-analytics thinking to sites could be applied in a number of contexts:

  • Predict the next page (or pages) a user is likely to visit and prefetch these pages, improving perceived page load performance and user happiness.
    • Page-level: Prerender/Prefetch the page which is most likely to be visited next
    • Bundle-level: Prefetch the bundles associated with the top N pages. On each page navigation, at all the neighbors of the current page, sorted in descending order by the probability to be visited. Fetch assets (JavaScript chunks) for the top N pages, depending on the current connection effective type.
  • Predict the next piece of content (article, product, video) a user is likely to want to view and adjust or filter the user experience to account for this.
  • Predict the types of widgets an individual user is likely to interact with more (e.g games) and use this data to tailor a more custom experience.

By collaborating across different touch-points in the ecosystem where data-driven approaches could be easily applied, we hope to generalize common pieces of infrastructure to maximize their applicability in different tech stacks.

Problems we're looking to solve

  • Developers using <link rel=prefetch> for future navigations heavily rely on manually reading descriptive analytics to inform their decisions for what to prefetch.
  • These decisions are often made at a point in time and..
    • (1) are often not revisited as data trends change
    • (2) are very limited in how they are used. Implementations will often only prefetch content from a homepage or very small set of hero pages, but otherwise not do this for all of the possible entry points on a site. This can leave performance opportunities on the table.
    • (3) Require some amount of confidence about the data being used to drive decisions around using prefetching means that developers may not be adopting it out of worry they will waste bandwidth. <link rel=prefetch> is currently used on 5% of total Chrome pageloads, but this could be higher.
  • Implementing predictive analytics is too complex for the average web developer.
    • Most developers are unfamiliar with how to leverage the Google Analytics API to determine the probability a page will be visited next. We lack:
    • (1) Page-level solution: a drop-in client-side solution for prefetching pages a user will likely visit
    • (2) Bundling-level solution: a set of plugins/tools that work with today’s JavaScript bundlers (e.g webpack) to cluster and generate the bundles/chunks a particular set of navigation paths could load quicker were they to be prefetched ahead of time.
  • Most developers are not yet familiar with how Machine Learning works. They are generally:
    • (1) Unsure how (and why) ML could be integrated into their existing (web) tech stacks
    • (2) What the value proposition of TensorFlow is or where solutions like the CloudML engine fit in. We have an opportunity to simplify the overhead associated with leveraging some of these solutions.
  • Best-in-class / low-friction approaches in this space are still slowly emerging and are not yet as accessible to web developers without ML or data-science backgrounds.
    • Machine Learning meets Cloud: Intelligent Prefetching by IIH Nordic
      • Tag Managers like Google Tag Manager can be used to decouple page content from the code tracking how the content is used. This allows web analysts to upgrade the tracking code in real-time with no site downtime. Tag managers allow a general solution for code injection and can be used to deploy intelligent prefetching. The advantages: analytics used to build the model comes from the tag manager. We can also send data live to the predictor without additional tracker overhead. After adding a few (of IIH Nordic’s) tags to a GTM install, a site can start to prefetch resources of the next pages and track load time saving opportunities.
      • IIH Nordic moved the predictive prefetching model to a web service the browser queries when a user visits a new page. The service responds to each request and takes advantage of Google Cloud, App Engine and Cloud ML. Their solution chooses the most accurate model, choices include a Markov model or most often a deep neural net in TensorFlow.
      • With user behavior changing over time, predictive models require updating (training) from time to time. Training a model involves collecting and transforming data and fitting the parameters of the model accordingly. IIH Nordic use Google Cloud to pull data from a customer’s analytics service into a private data bucket in BigQuery. They process this data, train and test predictive models, updating the prediction service seamlessly.
      • IIH Nordic suggest small/slow sites update their models monthly. Larger sites may need to retrain daily or even hourly for news websites.
      • The benefit of training ML models in the cloud is ease of scale as additional machines, GPUs and processors can be added as needed.
      • Machine Learning-Driven Bundling. The Future of JavaScript Tooling by Minko

Initial priority: Improved Performance through Data-driven Prefetching

The first large priority for Guess.js will be improving web performance through predictive prefetching of content.

By building a model of pages a user is likely to visit, given an arbitrary entry-page, a solution could calculate the likelihood a user will visit a given next page or set of pages and prefetch resources for them while the user is still viewing their current page. This has the possibility of improving page-load performance for subsequent page visits as there's a strong chance a page will already be in the user's cache.

Possible approaches to predictive fetching

In order to predict the next page a user is likely to visit, solutions could use the Google Analytics API. Google Analytics session data can be used to create a model to predict the most likely page a user is going to visit next on a site. The benefit of this session data is that it can evolve over time, so that if particular navigation paths change, the predictions can stay up to date too.

With the availability of this data, an engine could insert <link rel="[prerender/prefetch/preload]"> tags to speed up the load time for the next page request. In some tests, such as Mark Edmondson's Supercharging Page-Loads with R, this led to a 30% improvement in page load times. The approach Mark used in his research involved using GTM tags and machine-learning to train a model for page predictions. This is an idea Mark continued in Machine Learning meets the Cloud - Intelligent Prefetching.

While this approach is sound, the methodology used could be deemed a little complex. Another approach that could be taken (which is simpler) is attempting to get accurate prediction data from the Google Analytics API. If you ran a report for the Page and Previous Page Path dimension combined with the Pageviews and Exits metrics this should provide enough data to wire up prefetches for most popular pages.

Machine Learning for predictive fetching

ML could help improve the overall accuracy of a solution's predictions, but is not a necessity for an initial implementation. Predictive fetching could be accomplished by training a model on the pages users are likely to visit and improving on this model over time.

Deep neural networks are particularly good at teasing out the complexities that may lead to a user choosing one page over another, in particular, if we wanted to attempt a version of the solution that was catered to the pages an individual user might visit vs. the pages a "general/median" user might visit next. Fixed page sequences (prev, current, next) might be the easiest to begin dealing with initially. This means building a model that is unique to your set of documents.

Model updates tend to be done periodically, so one might setup a nightly/weekly job to refresh based on new user behaviour. This could be done in real-time, but is likely complex, so doing it periodically might be sufficient. One could imagine a generic model representing behavioural patterns for users on a site that can either be driven by a trained status set, Google Analytics, or a custom description you plugin using a new layer into a router giving the site the ability to predictively fetch future pages, improving page load performance.

Possible approaches to speculative prefetch

Speculative prefetch on page load

Speculative prefetch can prefetch pages likely be navigated to on page load. This assumes the existence of knowledge about the probability a page will need a certain next page or set of pages, or a training model that can provide a data-driven approach to determining such probabilities.

Prefetching on page load can be accomplished in a number of ways, from deferring to the UA to decide when to prefetch resources (e.g at low priority with <link rel=prefetch>), during page idle time (via requestIdleCallback()()) or at some other interval. No further interaction is required by the user.

Speculative prefetch when links come into the viewport

A page could speculatively begin prefetching content when links in the page are visible in the viewport, signifying that the user may have a higher chance of wanting to click on them.

This is an approach used by Gatsby (which uses React and React Router). Their specific implementation is as follows:

  • In browsers that support IntersectionObserver, whenever a <Link> component becomes invisible, the link "votes" for the page linked to to be prefetched votes are worth slightly less points each time so links at the top of the page are prioritized over ones lower down
  • e.g. the top nav if a page is linked to multiple times, its vote count goes higher the prefetcher takes the top page and starts prefetching resources.
  • It's restricted to prefetching one page at a time so as to reduce contention over bandwidth with on page stuff (not a problem on fast networks. If a user visits a page and its resources haven't been fully downloaded, prefetching stops until the page is loaded to ensure the user waits as little time as possible.

Speculative prefetch on user interaction

A page could begin speculatively prefetching resources when a user indicates they are interested in some content. This can take many forms, including when a user chooses to hover over a link or some portion of UI that would navigate them to a separate page. The browser could begin fetching content for the link as soon as there was a clear indication of interest. This is an approach taken by JavaScript libraries such as InstantClick.

Risks

Data consumption

As with any mechanism for prefetching content ahead of time, this needs to be approached very carefully. A user on a restricted data-plan may not appreciate or benefit as much from pages being fetched ahead of time, in particular if they start to eat up their data. There are mechanisms a site/solution could take to be mindful of this concern, such as respecting the Save-Data header.

Prefetching undesirable pages

Prefetching links to "logout" pages is likely undesirable. The same could be said of any pages that trigger an action on page-load (e.g one-click purchase). Solutions may wish to include a blacklist of URLs which are never prefetched to increase the likelihood of a prefetched page being useful.

Web Standards

Future of rel=prerender

Some of the attempts to accomplish similar proposals in the past have relied on <link rel=prerender>. The Chrome team is currently exploring deprecating rel=prerender in favor of NoStatePrefetch - a lighter version of this mechanism that only prefetches to the HTTP cache but uses no other state of the web platform. A solution should factor in whether it will be relying on the replacement to rel=prerender or using prefetch/preload/other approaches.

There are two key differences between NoStatePrefetch and Prefetch:

  1. nostate-prefetch is a mechanism, and <link rel=prefetch> is an API. The nostate-prefetch can be requested by other entry points: omnibox prediction, custom tabs, <link rel=prerender>.

  2. The implementation is different: <link rel=prefetch> prefetches one resource, but nostate-prefetch on top of that runs the preload scanner on the resource (in a fresh new renderer), discovers subresources and prefetches them as well (without recursing into preload scanner).

Relevant Data Analytics

There are three primary types of data analytics worth being aware of in this problem space: descriptive, predictive and prescriptive. Each type is related and help teams leverage different kinds of insight.

Descriptive - what has happened?

Descriptive analytics summarizes raw data and turns it into something interpretable by humans. It can look at past events, regardless of when the events have occurred. Descriptive analytics allow teams to learn from past behaviors and this can help them influence future outcomes. Descriptive analytics could determine what pages on a site users have previously viewed and what navigation paths they have taken given any given entry page.

Predictive - what will happen?

Predictive analytics “predicts” what can happen next. Predictive analytics helps us understand the future and gives teams actionable insights using data. It provides estimates of the likelihood of a future outcome being useful. It’s important to keep in mind, few algorithms can predict future events with complete accuracy, but we can use as many signals that are available to us as possible to help improve baseline accuracy. The foundation of predictive analytics is based on probabilities we determine from data. Predictive analytics could predict the next page or set of pages a user is likely to visit given an arbitrary entry page.

Prescriptive - what should we do?

Prescriptive analytics enables prescribing different possible actions to guide towards a solution. Prescriptive analytics provides advice, attempting to quantify the impact future decisions may have to advise on possible outcomes before these decisions are made. Prescriptive analytics aims to not just predict what is going to happen but goes further; informing why it will happen and providing recommendations about actions that can take advantage of such predictions. Prescriptive analytics could predict the next page a user will visit, but also suggest actions such as informing you of ways you can customize their experience to take advantage of this knowledge.

Relevant Prediction Models

Markov Models

The key objective of a prediction model in the prefetching problem space is to identify what the subsequent requests a user may need, given a specific page request. This allows a server or client to pre-fetch the next set of pages and attempt to ensure they are in a user’s cache before they directly navigate to the page. The idea is to reduce overall loading time. When this is implemented with care, this technique can reduce page access times and latency, improving the overall user experience.

Markov models have been widely used for researching and understanding stochastic (random probability distribution) process [Ref, Ref] . They have been demonstrated to be well-suited for modeling and predicting a user’s browsing behavior. The input for these problems tends to be the sequence of web pages accessed by a user or set of users (site-wide) with the goal of building Markov models we can use to model and predict the pages a user will most likely access next. A Markov process has states representing accessed pages and edges representing transition probabilities between states which are computed from a given sequence in an analytics log. A trained Markov model can be used to predict the next state given a set of k previous states.

In some applications, first-order Markov models aren’t as accurate in predicting user browsing behaviors as these do not always look into the past to make a distinction between different patterns that have been observed. This is one reason higher-order models are often used. These higher-order models have limitations with state-space complexity, less broad coverage and sometimes reduced prediction accuracy.

All-Kth-Order Markov Model

One way [Ref] to overcome this problem is to train varying order Markov models, which we then use during the prediction phase. This was attempted in the All-Kth-Order Markov model proposed in this Ref. This can make state-space complexity worse, however. Another approach is to identify frequent access patterns (longest repeating subsequences) and use this set of sequences for predictions. Although this approach can have an order of magnitude reduction on state-space complexity, it can reduce prediction accuracy.

Selective Markov Models

Selective Markov models (SMM) which only store some states within the model have also been proposed as a solution to state-space complexity tradeoffs. They begin with a All-Kth-Order Markov Model - a post-pruning approach is then used to prune states that are not expected to be accurate predictors. The result of this is a model which has the same prediction power of All-Kth-Order models with less space complexity and higher prediction accuracy. In Deshpane and Karpis, different criteria to prune states in the model before prediction (frequency, confidence, error) are looked at.

Semantic-pruned Selective Markov Models

In Mabroukeh and Ezeife, the performance of semantic-rich 1st and 2nd order Markov models was studied and compared with that of higher-order SMM and semantic-pruned SMM. They discovered that semantic-pruned SMM have a 16% smaller size than frequency-pruned SMM and provide nearly an equal accuracy.

Clustering

Observing navigation patterns can allow us to analyze user behavior. This approach requires access to user-session identification, clustering sessions into similar clusters and developing a model for prediction using current and earlier access patterns. Much of the previous work in this field has relied on clustering schemes like the K-means clustering technique with Euclidean distance for improving confidence of predictions. One of the drawbacks to using K-means is difficulty deciding on the number of clusters, selecting the initial random center and the order of page visits is not always considered. Kumar et al investigated this, proposing a hierarchical clustering technique with a modified Levenshtein distance, pagerank using access time length, frequency and higher order Markov models for prediction.

Research review

Many of the papers referenced in the following section are centered around the Markov model, association rules and clustering. Papers highlighting relevant work related to pattern discovery for evolving page prediction accuracy are our focus.

Uses first-order Markov models to model the sequence of web-pages requested by a user for predicting the next page they are likely to access. Markov chains allow the system to dynamically model URL access patterns observed in navigation logs based on previous state. A “personalized” Markov model is trained for each user and used to predict a user’s future sessions. In practice, it’s overly expensive to construct a unique model for each user and the cost of scaling this becomes more challenging when a site has a large user-base.

First paper to investigate Hidden Markov Models (HMM). Author collected web server logs, pruned the data and patched the paths users passed by. Based on HMM, author constructed a specific model for web browsing that predicts whether the users have the intention to purchase in real-time. Related measures, like speeding up the operation and their impact when in a purchasing mode are investigated.

Elli Voudigari [2010-2011] ” A Framework for Web Page Rank Prediction”.

Proposes a framework to predict ranking positions of a page based on their previous rankings. Assuming a set of successive Top-K rankings, the author identifies predictors based on different methodologies. Prediction quality is quantified as the similarity between predicted and actual rankings. Exhaustive experiments were performed on a real-world large scale dataset for both global and query-based top-K rankings. A variety of existing similarity measures for comparing Top-K ranked lists including a novel one captured in the paper.

Proposes using N-hop Markov models to predict the next web page users are likely to access. Pattern matches the user’s current access sequence with the user’s historical web access sequences to improve the prediction accuracy for prefetches.

Proposes dynamic clustering-based methods to increase Markov model accuracy in representing a collection of web navigation sessions. Uses a state cloning concept to duplicate states in a way separating in-links whose corresponding second-order probabilities diverge. The method proposed includes a clustering technique determining a way to assign in-links with similar second-order probabilities to the same clone.

Extends the use of a page-rank algorithm with numerous navigational attributes: size of the page, duration time of the page, duration of transition (two page visits sequentially), frequency of page and transition. Defines a Duration Based Rank (DPR) and Popularity Based Page Rank (PPR). Author looked at the popularity of transitions and pages using duration information, using it with page size and visit frequency. Using the popularity value of pages, this paper attempts to improve conventional page rank algorithms and model a next page prediction under a given Top-N value.

References

Team


Minko Gechev

Addy Osmani

Katie Hempenius

Kyle Mathews

guess's People

Contributors

addyosmani avatar alan-agius4 avatar arcanis avatar chrisguttandin avatar danielruf avatar dharmin avatar dijs avatar irustm avatar kasperstorgaard avatar kevinfarrugia avatar khempenius avatar kyleamathews avatar lionralfs avatar maoberlehner avatar mgechev avatar mitkotschimev avatar pborreli avatar ra80533 avatar renovate-bot avatar renovate[bot] avatar sergeycherman avatar slavoroi avatar spencerb02 avatar vprasanth avatar wardpeet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guess's Issues

Parcel Support

Could I use Guess with parcel bundler, or it's limited to webpack?

Implement Wikipedia + Gatsby demo

👜 Carried over from docs..

This demo aims to create a Wikipedia client using Gatsby.js (by Kyle Mathews). It will take advantage of the concepts outlined in Guess.js - primarily, data-driven prefetching of data using the Google Analytics API and machine learning - to improve the page-load performance of subsequent navigations.

The application will use offline-wikipedia as a reference implementation, treating the WikiMedia/Wikipedia API as an endpoint. Given how Gatsby works, we may wish/need to have a build-time available source for Wikipedia content, however the decision on how to structure the data for the implementation best to follow Gatsby’s idiomatic way of doing things is deferred to Kyle.

The core data-driven fetching logic for the application should be powered by Guess.js.

Gatsby Wikipedia will:

  • P0: Implement the same sets of views as offline-wikipedia.
  • P1: Implement a URL flag to enable/disable ML-based prefetching.
    • Offline Wikipedia exposes this type of URL-based configuration via a flags page too, however, we could do something simpler if needed. This is to let us do before/after testing
  • P0: Use its own responsive styling. We are open to using a Material Design inspired UI as was used in offline-wikipedia, or something slightly different if clean.
  • P2: Use Gatsby 2.0 if timelines allow.
  • P0: Work cross-browser (the latest versions of Chrome, Safari, Firefox, Edge)
    • Prefetch is available everywhere except Safari (https://caniuse.com/#search=prefetch). We may wish to attempt to implement a JS-based polyfill for the feature there using a feature detect.
  • P1: Aim to have a Lighthouse score of ~90+ on WebPageTest.org/easy #perfmatters
  • P0: Be documented in a project README (e.g instructions on how to generate dev, prod builds or call any other commands necessary to use the implementation, basic architecture notes to help folks attempting to learn from the repo)
  • Have WebPageTest traces showing the before/after impact of using ML-based prefetching for the implementation

Advertising package

It looks like this project depends on https://github.com/feross/funding, a project intended to show advertising in a projects build log.

I'm not familiar with your project, I'm just looking through a list of projects that depend on feross/funding. It's my understanding that right now there aren't any advertising messages actually showing, but if you're uncomfortable with the possibility of advertisements in build logs that's something you might want to look into.

Repository structure

Probably good to decide on this as a team :)

Proposed top-level folders

  • /packages
  • /docs (for design docs, moving things from Google Docs over)
  • /infra (previous called /scripts in mlx)
  • /demos
  • CONTRIBUTING.md
  • README.md
  • LICENSE
  • .gitignore
  • {package, package-lock, yarn, other package manager files}

@mgechev wdyt? anything worth changing or adding here?

How to measure success?

What metric should I use to measure the success of adding prefetching to my static site? Thanks

[Docs] Type of sites we are targeting

A common question I get is what size/scale of site Guess.js is targeting. I think it's worth documenting this a little more clearly. My two cents:

  • Sites with nested pages (e.g have a minimum of three levels of nesting). Good examples: news and publisher sites, e-commerce, large SaaS sites)
  • Sites who have already been using Analytics tracking for a while (likely to have a rich set of data that can be used for predictive fetching).

The Angular parser does not properly recognize lazy routes

For example:

// app.ts
{
  path: 'foo',
  loadChildren: 'foo/foo.module#FooModule'
}

// foo.ts
{
  path: 'index',
  component: FooComponent
}

The parser will return two URLs, although there's only one:

foo - lazy (wrong)
foo/index - not lazy (wrong)

Guess parser does not work on Windows

The parser in the src/angular/index.ts file (and probably the other parsers) do non-cross platform compatible path manipulations.

This should be fixed so that the parsing works properly on Windows.

Cannot read property 'guess' of undefined

am trying to use guess with laravel through a pre-defined routes file like in nuxt example so the setup is

  • guess-routes.json ex.
{
    "\/": {
        "\/login": 2,
        "\/about": 1
    },
    "\/about": {
        "\/contact-us": 1
    },
    "\/contact-us": {
        "\/": 1
    }
}
  • webpack
const {readFileSync} = require('fs')
const {GuessPlugin} = require('guess-webpack')

mix.webpackConfig({
    plugins: [
        new GuessPlugin({
            reportProvider() {
                return Promise.resolve(JSON.parse(readFileSync('./guess-routes.json')))
            }
        })
    ]
})
  • app.js
import {guess} from 'guess-webpack/api'
guess()
  • error
    demo

Building repo doesn't quite finish

npm run build

> [email protected] build /Users/kylemathews/programs/guess
> ts-node infra/build.ts

Hash: bf143cb0a0b4932eec72
Version: webpack 3.11.0
Time: 1743ms
                      Asset       Size  Chunks             Chunk Names
         ./dist/ga/index.js    17.7 kB       0  [emitted]  main
         dist/ga/index.d.ts   26 bytes          [emitted]
        dist/ga/src/ga.d.ts  304 bytes          [emitted]
    dist/ga/src/client.d.ts  324 bytes          [emitted]
dist/common/interfaces.d.ts  899 bytes          [emitted]
 dist/ga/src/normalize.d.ts  192 bytes          [emitted]
   [0] ./index.ts 205 bytes {0} [built]
   [1] ./src/ga.ts 5.16 kB {0} [built]
   [2] ./src/client.ts 7.29 kB {0} [built]
   [4] ./src/normalize.ts 1.61 kB {0} [built]
    + 1 hidden module

Hash: a15f1cf2b91a4d961071
Version: webpack 3.11.0
Time: 3694ms
                               Asset       Size  Chunks             Chunk Names
    dist/parser/src/react/index.d.ts  175 bytes          [emitted]
              ./dist/parser/index.js    20.8 kB       0  [emitted]  main
         dist/parser/src/parser.d.ts  126 bytes          [emitted]
  dist/parser/src/angular/index.d.ts  334 bytes          [emitted]
         dist/common/interfaces.d.ts  899 bytes          [emitted]
              dist/parser/index.d.ts  181 bytes          [emitted]
     dist/parser/src/react/base.d.ts  200 bytes          [emitted]
dist/parser/src/react/react-tsx.d.ts  133 bytes          [emitted]
dist/parser/src/react/react-jsx.d.ts  129 bytes          [emitted]
 dist/parser/src/detector/index.d.ts   26 bytes          [emitted]
dist/parser/src/detector/detect.d.ts  138 bytes          [emitted]
   [2] ./src/react/index.ts 355 bytes {0} [built]
   [4] ./src/angular/index.ts 5.15 kB {0} [built]
   [5] ../common/interfaces.ts 407 bytes {0} [built]
   [6] ./src/detector/index.ts 205 bytes {0} [built]
   [7] ./index.ts 466 bytes {0} [built]
   [8] ./src/parser.ts 1.47 kB {0} [built]
  [10] ./src/react/base.ts 4.16 kB {0} [built]
  [11] ./src/react/react-tsx.ts 1.2 kB {0} [built]
  [12] ./src/react/react-jsx.ts 757 bytes {0} [built]
  [13] ./src/detector/detect.ts 2.14 kB {0} [built]
    + 4 hidden modules

Error: Command failed: cd /Users/kylemathews/programs/guess/packages/webpack && rm -rf dist && webpack
    at checkExecSyncError (child_process.js:575:11)
    at Object.execSync (child_process.js:612:13)
    at _loop_1 (/Users/kylemathews/programs/guess/infra/build.ts:45:17)
    at build (/Users/kylemathews/programs/guess/infra/build.ts:62:13)
    at Object.<anonymous> (/Users/kylemathews/programs/guess/infra/build.ts:75:3)
    at Module._compile (internal/modules/cjs/loader.js:654:30)
    at Module.m._compile (/Users/kylemathews/programs/guess/node_modules/ts-node/src/index.ts:403:23)
    at Module._extensions..js (internal/modules/cjs/loader.js:665:10)
    at Object.require.extensions.(anonymous function) [as .ts] (/Users/kylemathews/programs/guess/node_modules/ts-node/src/index.ts:406:12)
    at Module.load (internal/modules/cjs/loader.js:566:32)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] build: `ts-node infra/build.ts`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/kylemathews/.npm/_logs/2018-05-05T23_45_56_625Z-debug.log

Cache the GA report

We should cache the GA report for given time range so we can reuse it on frequent calls of the GuessPlugin.

Fail when login with Google when using guess-webpack

I try to using guess-webpack with my nuxt.js. When I serve my project my browser is open Google login page. When login complete and redirect back to localhost:xxxxx < some random port this page is empty. How I fix it?

enable Travis CI

Currently Travis CI is not enabled but there is a config. Would be great to have automatic testing of every build / push and PR.

Ahead-of-Time bundle prefetching ✨

Currently, Guess.js analyzes the application, fetches data from Google Analytics, and builds a model. In the main chunk of the application, we introduce the following two pieces of code:

  • The model itself (could be a Markov chain, recurrent neural network, etc.)
  • Small runtime which queries the model on each page navigation

The drawback of this approach is that we need to ship to the browser the model & also introduce a runtime overhead, querying it and tracking the user's navigation.

Since at build-time, we already have the mapping between routes & bundles, we can introduce a small piece of logic in each bundle which prefetches its neighbors that are likely to be visited.

This approach has the following benefits:

  • We can reduce the bundle size a lot! We no longer need to ship the entire model and a potential framework (e.g., TensorFlow) to query it.
  • We don't have any runtime overhead - we no longer need to query the model which on mobile can be inefficient for a deep neural network.

Gatsby build fails after adding gatsby-plugin-guess-js

Summary

After adding 'gatsby-plugin-guess-js' builds for my Gatsby site started failing.

Code changes

// package.json

{
  // ...
  "dependencies": { 
    "gatsby-plugin-guess-js": "^1.1.0",
  },
}
// .env

GA_VIEW_ID=int
GA_SERVICE_ACCOUNT=...@....iam.gserviceaccount.com
GA_SERVICE_ACCOUNT_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
// gatsby-config.js

const dynamicPlugins = []
const startDate = new Date()
startDate.setMonth(startDate.getMonth() - 3)
if (
  process.env.GA_SERVICE_ACCOUNT
  && process.env.GA_SERVICE_ACCOUNT_KEY
  && process.env.GA_VIEW_ID
) {
  dynamicPlugins.push({
    resolve: 'gatsby-plugin-guess-js',
    options: {
      GAViewID: `${process.env.GA_VIEW_ID}`,
      jwt: {
        client_email: process.env.GA_SERVICE_ACCOUNT,
        private_key: process.env.GA_SERVICE_ACCOUNT_KEY,
      },
      period: {
        startDate,
        endDate: new Date(),
      },
    },
  })
}

module.exports = {
  plugins: [
    // ...
  ].concat(dynamicPlugins),
}

Error log

...

info bootstrap finished - 5.486 s

error (node:51068) DeprecationWarning: Tapable.plugin
success Building production JavaScript and CSS bundles
⠀
error Building static HTML failed for path
"/blog/another-post/"



  TypeError: undefined is not a function

  - Array.map

  - render-page.js:25330 Object../node_modules/gatsby-p
    lugin-guess-js/gatsby-ssr.js.exports.onRenderBody
    /Users/bartol/javascript/bartol.dev/public/render-p
    age.js:25330:39

  - render-page.js:232
    /Users/bartol/javascript/bartol.dev/public/render-p
    age.js:232:36

  - Array.map

  - render-page.js:227 ./.cache/api-runner-ssr.js.modul
    e.exports
    /Users/bartol/javascript/bartol.dev/public/render-p
    age.js:227:25

  - render-page.js:1016 Module.default
    /Users/bartol/javascript/bartol.dev/public/render-p
    age.js:1016:3

  - worker.js:35
    [bartol.dev]/[gatsby]/dist/utils/worker.js:35:36

  - debuggability.js:313 Promise._execute
    [bartol.dev]/[bluebird]/js/release/debuggability.js
    :313:9

  - promise.js:488 Promise._resolveFromExecutor
    [bartol.dev]/[bluebird]/js/release/promise.js:488:1
    8

  - promise.js:79 new Promise
    [bartol.dev]/[bluebird]/js/release/promise.js:79:10

  - worker.js:31
    [bartol.dev]/[gatsby]/dist/utils/worker.js:31:37

  - util.js:16 tryCatcher
    [bartol.dev]/[bluebird]/js/release/util.js:16:23

  - map.js:61 MappingPromiseArray._promiseFulfilled
    [bartol.dev]/[bluebird]/js/release/map.js:61:38

  - promise_array.js:114 MappingPromiseArray.PromiseArr
⠇ Building static HTML for pages
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Roadmap proposals

Roadmap/Ideas

Features / Ease of use

  • Improve Guess API #42
  • Re-introduce chunk clustering #6
  • Cache the OAuth2 access token #12

Code-sharing

  • Investigate better code-sharing between bundling/non-bundling solutions
  • Investigate code re-use with 3P providers (e.g IIH Nordic)

Machine Learning

  • Evaluate using neural networks vs current markov chain approach
  • Evaluate using LSTM

Community

  • Micro-site
  • Codelab: Data-driven loading
  • Codelab: Data-driven bundling

Integrations

  • Parser: return all routes for an Angular app #44
  • Parser: support for Vue.js
  • Parser: support for Next.js

Any other ideas we want to consider? @mgechev @khempenius @KyleAMathews @MarkEdmondson1234

[guess-parser] has incomplete type definitions

Hi, thanks for publishing a new version of guess-parser to npm. It works great, when used in JavaScript code. However it can't be used as part of a TypeScript codebase. The type definition file dist/guess-parser/index.d.ts currently looks like this:

export { parseRoutes } from './src/parser';
export { detect } from './src/detector';
export { parseRoutes as parseAngularRoutes } from './src/angular';
export * from './src/react';
export * from './src/preact';

It refers to the ./src directory which is not part of the npm package. I hope it is just a matter of configuring the build process. Unfortunately I don't really know how your build process works.

Integrating Katie's predictiveFetching work

Hey folks. @khempenius has been working on a set of experimental solutions for the average site wishing to take advantage of predictive fetching over in https://github.com/guess-js/guess/tree/predictiveFetching/predictiveFetching.

I'd love to figure out this week how best to position this work inside the repo. I'm currently thinking that we keep packages as-is for reusable modules that we deploy to npm (or we could roll Katie's work into a new package entry in there). I was otherwise viewing Katie's work as a workflow (which we'll hopefully eventually figure out the convergence points between compared to the rest of the packages we have).

Perhaps this work would make sense in either:

  • \workflows\predictive-fetching-sites\
  • \tutorials\predictive-fetching-sites\
  • \demos\predictive-fetching-sites\
  • \experiments\predictive-fetching-sites\

There may also be a better framing for this that I'm missing. Would love your thoughts @mgechev and @khempenius. I think that once we've nailed this down we can make some minor adjustments to the branch and move it to a PR. It would be great if we could attempt to land that work before Tuesday.

Wdyt?

Use a logger with different logging levels

Currently, inside of the Guess.js webpack plugin there are a lot of statements in the form:

if (this._config.debug) {
  console.warn(...);
}

Instead of having this duplication of this._config.debug checks, it'd be better if we isolate the logic into a logger with different logging levels. We can either create a custom logger (no dependencies, smaller payload) or go with an existing one (code reuse).

Unable to build the React Project uisng webpack guess plugin

I am not able to build my react project using the webpack guess plugin with below error

95% emitting/Users/ajasingh/Work/CXE-UECP/node_modules/webpack-sources/lib/ConcatSource.js:42
source += typeof child === "string" ? child : child.source();
^

TypeError: child.source is not a function
at ConcatSource.source (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack-sources/lib/ConcatSource.js:42:56)
at Compiler.writeOut (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/lib/Compiler.js:338:26)
at Compiler. (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/lib/Compiler.js:328:20)
at /Users/ajasingh/Work/CXE-UECP/node_modules/webpack/node_modules/async/dist/async.js:3110:16
at eachOfArrayLike (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/node_modules/async/dist/async.js:1069:9)
at eachOf (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/node_modules/async/dist/async.js:1117:5)
at Object.eachLimit (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/node_modules/async/dist/async.js:3172:5)
at Compiler.emitFiles (/Users/ajasingh/Work/CXE-UECP/node_modules/webpack/lib/Compiler.js:317:20)
at /Users/ajasingh/Work/CXE-UECP/node_modules/mkdirp/index.js:30:20
at FSReqWrap.args [as oncomplete] (fs.js:140:20)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] build: cross-env NODE_ENV=production EQIX_APP=ecp EQIX_ENV=dev webpack --config internals/webpack/webpack.prod.babel.js --color -p --progress
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] build script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/ajasingh/.npm/_logs/2019-07-28T03_21_26_080Z-debug.log

Demo breaked

Visible in the console from https://guess-static-sites-demo.firebaseapp.com/

Access to XMLHttpRequest at 'https://guess-static-sites-demo.appspot.com/' from origin 'https://guess-static-sites-demo.firebaseapp.com' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Trying to use Guess.js in Angular 6

Implement the Guess.JS webpack plugin with angular 6 @angular-builders/custom-webpack:browser method

config:

const { GuessPlugin } = require('guess-webpack');

module.exports = {
  plugins: [
    new GuessPlugin({
      GA: 'XXXXXX',
      runtime: {
        delegate: true,
      },
      mode: 'angular',
      layout: {
        tsconfigPath: 'src/tsconfig.app.json'
      },
      routeProvider: false
    })
  ]
};

i got this error:

node_modules/guess-webpack/dist/guess-webpack/main.js:1
(function (exports, require, module, __filename, __dirname) { !function(e,t){if("object"==typeof exports&&"object"==typeof module)module.exports=t(require("guess-parser"),require("go
ogleapis"),require("flat-cache"),require("guess-ga"),require("google-oauth2-node"),require("webpack"),require("memory-fs"),require("webpack-sources"),require("lodash.template"),requi
re("path"),require("fs"));else if("function"==typeof define&&define.amd)define(["guess-parser","googleapis","flat-cache","guess-ga","google-oauth2-node","webpack","memory-fs","webpac
k-sources","lodash.template","path","fs"],t);else{var r="object"==typeof exports?t(require("guess-parser"),require("googleapis"),require("flat-cache"),require("guess-ga"),require("go
ogle-oauth2-node"),require("webpack"),require("memory-fs"),require("webpack-sources"),require("lodash.template"),require("path"),require("fs")):t(e["guess-parser"],e.googleapis,e["flat-cache"],e["guess-ga

TypeError: Cannot read property 'source' of undefined

Microsite 📃

Now that we have guessjs.com, we might want to put together a microsite for the project 🤔

@mgechev and I like the idea of using Gatsby as our static site framework. Given we already have a reference implementation of using Guess.js with it thanks to @KyleAMathews, this seems like a reasonable direction? What do folks think?

Some things to discuss regardless of stack:

  • Goals of the site
  • Information architecture
  • How we present the bundling vs non-bundling solutions (building on @khempenius's work in README)
  • Having a clear/easy getting started experience
  • Keeping it easy to edit/contribute

Once we're further along with the direction, I'm happy to try finding an owner to work on the site.

SPFJS - GUESS

Hi @addyosmani @mgechev , this is a little unrelated topic ... but i was building some micro front end services kind of framework that fetch content and render to page as fragments etc ...

I saw SPF-js from google and found very futuristic for that time and is great and i see guess-js doing similar thing in another level - but they share some similar goals --- one of them is prefetch content to improve experience for user .

But today with SW and all browser capabilities , what do you recommend for an architecture similar to SPF-JS sites .

Big web application that needs to render fragments from different teams in same page , if possible all SSRendered . being independent sharing routing from server to client.

I know is kind of silly question but was the unique place i can sit down both of the you.

Model for predictive analytics

Current approach

At the moment, Guess.js uses a Markov chain in order to predict the next route the user will navigate to. We build the Markov chain by using a report fetched from Google Analytics (GA) where for each page path, we get the previous page path. The model has several advantages such as:

  • Doesn't require a lot of runtimes. The prediction can be made with a simple O(1) lookup so we don't have to ship a lot of code to the browser. On top of that, the matrix can be easily compressed, so even for large apps, its size will be reasonable.
  • Simplicity. We do not have to trace users' complex navigation patterns. Also, the model is quite easy to reason about

This approach has its own cons. We ignore a lot of potentially useful features such as:

  • More comprehensive navigation patterns including more than two pages visited in a session
  • navigator.locale & navigator.platform
  • etc.

Improving accuracy

We're thinking of exploring a more advanced model using neural networks. We've been looking at LSTM using tensorflow.js. Currently, there are few unknowns we need to research further, such as:

  • What'd be the most efficient way to extract longer navigation sequences from GA
  • What'd be the runtime that we should ship to the browser if we move to a predictive model using neural networks (not applicable to static websites)
  • How much the build-time will increase if we train the model while bundling the application (not applicable to static websites)
  • How to measure accuracy efficiently without violating users' privacy
  • Can we ship only part of the model to the browser without loading the entire tensorflow.js
  • What additional features to include in order to improve accuracy

Additional questions

The problem that we're solving looks quite similar to a recommender system and the path we've taken is collaborative filtering. Is it worth exploring content-based filtering or a mixture between the two?

[guess-parser] Return all routes of an Angular app

Hi @mgechev,

this is a follow up issue of #1 of @mlx/parser. I guess guess-parser is the successor of @mlx/parser.

I can confirm, that parsing an Angular App which has a routing module generated by the CLI works now. However when there are nested routes guess-parser does not pick up those routes. I was abled to track down the problem. It seems to be problematic that the CLI does extract nested routes into their own modules as well but guess-parser expects them to be defined in the root module of any lazily loaded module.

Since explaining the problem is a bit difficult I created a little example project. It is an app created by the Angular CLI. The README contains all the commands I used. It also contains a little script to invoke the parseAngularRoutes() function. It can be invoked by running node parse.js. I would expect it to return two routes (zero and zero/first) but it only returns the first.

Please let me know if you need more infos to reproduce the bug.

Enhancement: Improve User Experience for React based Applications

As React is a fairly popular framework it would be awesome to have the initial user experience with GuessJS better. In practise, this would mean that a Router configuration file (which is either js or jsx) is read as static routes for Guess JS to parse. Either that or a json-like configuration that the user could specify in their app. Refering to conversation at JSConfEU to improve the impression of using the framework as a whole.

In addition, a method to run static apps once and generating a list over routes used would be great so that to programatic interface is needed to run GuessJS.

GuessPlugin throws error with Angular 8 / Node 10

What

After upgrading guess-angular demo project to Angular 8 and when i try to run the build command ng build on Node v10.15.0(npm v 6.4.1) , it throws below error. Looks like issue with GuessPlugin

> ng build

 95% emitting GuessPlugin(node:21777) UnhandledPromiseRejectionWarning: Error: Cannot find the entry point for /Users/kumaran/mydrive/personal/guess-angular/src/app/about/about.module.ts
    at d (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:3675)
    at y (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:4314)
    at /Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:6338
    at x (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:5293)
    at visitNodes (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16514:30)
    at Object.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16681:24)
    at NodeObject.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:120242:23)
    at x (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:4840)
    at visitNodes (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16514:30)
    at Object.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16694:21)
    at NodeObject.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:120242:23)
    at x (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:4840)
    at visitNodes (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16514:30)
    at Object.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:16681:24)
    at NodeObject.forEachChild (/Users/kumaran/mydrive/personal/guess-angular/node_modules/typescript/lib/typescript.js:120242:23)
    at x (/Users/kumaran/mydrive/personal/guess-angular/node_modules/guess-parser/dist/guess-parser/index.js:1:4840)
(node:21777) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:21777) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

How to Reproduce the Issue

Get the code from my forked repo and make sure you are using Node 10

  • Clone the repo and checkout the master branch

    git clone https://github.com/kumaran-is/guess-angular.git
    cd guess-angular
    npm install
    

Run the build and try to launch the application in http://localhost:4200

npm run build && http-server ./dist/guess-angular  -p 4200

Cannot build with guess api

When using:

  runtime: {
    delegate: true
  }

and

import { guess } from 'guess-webpack/api'

Webpack is throwing this:

Module not found: Error: Can't resolve 'guess-webpack/api' in '...'

Any ideas?

Gatsby plugin usage query (Getting a Webpack Error during build)

I am trying to get the gatsby plugin working in my project and during the build I get this error

WebpackError: ENOENT: no such file or directory, open '/home/travis/build/ERS-HCL/gatsby-demo-app/.cache/data.json

It would be great if you could let me know if I am missing something here

Here is the code change for including the gatsby plugin (on the same lines as your example) https://github.com/ERS-HCL/gatsby-demo-app/commit/43cadc9f0b1f6d2726fb485ab93429629465946f

The travis build error is here https://travis-ci.org/ERS-HCL/gatsby-demo-app/builds/379160574

Thanks ,
Tarun

Optional parsing for applications with pre-existing pre-fetching logic

Currently, the parser is the most complex module of Guess mostly because it requires a lot of knowledge about the project structure. Applications using frameworks with very dynamic route declaration, such as React, make the route extraction impossible without following strict conventions.

In the same time, often the application itself has the route-bundle mapping metadata available at runtime. An example for this is Gatsby, which uses its own pre-fetching mechanism. Such applications can use only the model generated by Guess even if the parser cannot properly extract the routing metadata.

After a chat with @KyleAMathews today, I extracted a function called score which uses the Markov chain in order to rank what's the probability the user to visit a list of links from given page.

Although this API is convenient, it'll not be usable in case the parser cannot produce the application's routes because score uses the page graph of the application (i.e. it the route declarations instead of the actual links), which requires routing metadata.

In order to make the score function available to applications which are not supported by the Guess parser, we can provide a flag which hints the GuessPlugin to skip parsing. In such case, the Markov chain will reflect the actual application links (i.e. it'll not aggregate links corresponding to the same route declaration).

[question] In which way is ML / AI used for guess' predictions ?

I am trying to get a better understanding of the features of guess.
At what point on an implementation level is ML used? Is the GA page transition data used as an input for the model application? Does the ML happen on the server and is done on the Google Analytics layer or is it run in the client (after retrieving data from GA?) ?

Thank you very much for pointing me in the right direction

[RFC] Guess function API

Prior art

Currently, guess has the following signature:

interface GuessFn {
  (route: string, links?: string[]): Predictions;
}

interface Predictions {
  [route: Route]: Probability;
}

type Route = string;
type Probability = number;

The call guess(route) returns an object literal which as keys has URLs/routes likely to be visited next, and values the associated probabilities.

If as a second argument we pass a list of URLs/routes the function will return an object with keys from the intersection between the passed data & the known possible navigations.

Few problems with this API is:

  • links: string[] seem redundant. The consumer of guess can easily filter the links
  • The returned result does not adapt based on the connection's effective type

Proposal

I'm proposing the following API:

interface GuessFn {
  (params?: GuessFnParams): Predictions;
}

interface GuessFnParams {
  route?: Route;
  thresholds?: ConnectionEffectiveTypeThresholds;
  connection?: ConnectionEffectiveType;
}

type Route = string;
type Probability = number;
type ConnectionEffectiveType = '4g' | '3g' | '2g' | 'slow-2g';

interface ConnectionEffectiveTypeThresholds {
  '4g': Probability;
  '3g': Probability;
  '2g': Probability;
  'slow-2g': Probability;
}

interface Predictions {
  [route: Route]: Probability;
}

This way, guess could be invoked without any arguments (i.e. guess()). Such call will:

  • Set params.route to window.location.pathname (since not specified)
  • Use the build-time value for params.thresholds (since not specified)
  • Use window.navigator.connection.effectiveType as value of params.connection (since not specified)
  • guess will after that query the model and filter the results by the respective connection and thresholds values and return an object literal subtype of Predictions

Ideally, we'd want to have a method prefetch() which uses guess() internally and prefetches all the bundles associated with the returned routes. The problem here is that with this low-level API we don't have access to the route-bundle mapping. For Angular & React we resolve the mapping at build-time with guess-parser.

// cc @addyosmani @KyleAMathews

Re-introduce chunk clustering

Prior implementation

The initial chunk clustering implementation accepts a minimum number of chunks that need to be produced by the build process and goes through a graph clustering based on Tarjan's algorithm for finding strongly connected components.

This algorithm has a number of problems:

  • It treats chunks belonging to nested routes and chunks without a common ancestor the same way.
  • It terminates when the number of chunks in the application is at most the minimum number of chunks specified in the configuration. Configuring the minimum number of chunks the application should consist is not a good heuristic for concatenating JavaScript files together.
  • The algorithm does not consider the chunk size but only the probability two chunks to be visited in the same session.

Suggested implementation

Combine only chunks of nested routes where the probability of two chunks to be used together is high (threshold should be discussed).

In such case, if we have the following routes in the application:

/a/b/c
/b
/c
/d
/e/f
/e

We'll recommend developers to load everything lazily (i.e. even load lazily /a, /a/b, and /a/b/c).

In such case, the probability /a, /a/b, and /a/b/c to be visited in the same session is 1 (because we have only one route with root /a), for /e and /e/f the probability is n.

In this case, we'll cluster the chunks from /a, /a/b, and /a/b/c into a single file. Depending on the value of n, we may also combine /e and /e/f.

When the user is at any page (e.g. /a/b/c) we'll prefetch the other bundles based on the Markov chain which we've built in the prefetch plugin.

Note: this algorithm does not solve the problem which occurs when combining large chunks together.

Feature Support for GuessJS in Nx Workspace

I have started to implement GuessJS in my Mono repo project but after going through the Guess source code it looks like it won't support the project structure as the path to the source folder is hardcoded in as a sibling to the package.json as per:

if (dd('@angular/cli') && exists(join('src', 'tsconfig.app.json'))) {
    return {
      type: ProjectType.AngularCLI,
      version: dd('@angular/cli'),
      details: {
        typescript: dd('typescript'),
        tsconfigPath: join(base, 'src', 'tsconfig.app.json'),
        sourceDir: 'src'
      }
    };
}

Maybe we can add a config option for srcPath in the GuessPluginConfig interface? that way people that have a different folder structure than package.json and src beside each other can use GuessJS?

I can try and make the change and open a pull request.

Guess.js Webpack plugin

Goal: Lowering the friction for unlocking @MLX’s capabilities

Authors: Joint proposal from Addy and Minko

In Machine Learning: Data-driven Bundling, a set of packages was introduced for getting started with data-driven bundling (DDB). Together, these packages enabled a developer to leverage their analytics to generate data-driven bundle layouts and code for intelligently prefetching JavaScript chunks to improve page-load performance.

These modules accomplished the following tasks (from the blog post):

  • @mlx/ga - fetched structured information from the Google Analytics API. This provides a navigation graph that needs to be translated into a page graph. @mlx/ga can automatically do this if supplied with all the paths in the application.

  • @mlx/parser - extracts the routes of our application, finds the associated chunk entry points of the lazy-loaded routes and builds the routing bundle tree. Once the application is parsed with @mlx/parser, get an array with all the routes, their corresponding chunk entry points, and their parent chunk entry points.

  • @mlx/cluster - performs a clustering algorithm based on the data from Google Analytics and the bundle routing tree, from @mlx/parser.

  • @mlx/webpack - webpack plugins which use @mlx/parser and @mlx/cluster in order to produce data-driven bundle layout for the application, and generate code for data-driven pre-fetching.

Guess plugin

In order to lower the friction for using these packages, we propose a Webpack plugin that:

  • Allows a developer to start using data-driven bundling in < 5 minutes

  • Attempts to remove some of the manual work involved in intermediate steps

    • e.g generating a data.json file with Google Analytics navigation graph data needed for later stages. This could be generating behind the scenes for you and passed along to subsequent steps.
  • Reduces the configuration overhead for each package

    • Where possible we will provide sane defaults and simplify decision making. e.g in @mlx/parser, we could expose a way for developers to inform us which framework they are using so we can pick the correct package to extract routes.

API

An API sketch for this webpack plugin would be:

new GuessPlugin({

  GA: '8765456',       
  // *String*. Supply the ViewID needed by @mlx/ga
  // The first time the plugin loads, if a ViewID is supplied we will attempt
  // to authenticate the user, taking them to a browser. We will then get a 
  // session token and store this so that auth is not required in future. This 
  // simplifies the need to setup a credentials.json file manually. 

  mode: 'react'        
  // String: `mode` will accept one of three values: react, angular or gatsby
   // For each `mode` supported, an @mlx/parser must be available which 
   // can extracts routes of our application, find the associated chunk
   // entry points of the lazy-loaded routes and build the routing bundle tree
})

Open questions:

  • Should we expose a way to provide more granular configuration options back to the other packages? Such as specifying the period -> startDate and endDate for GA

    • This could be done in a few ways. e.g { ga: { }, parser: {}, cluster: {} }
  • Or should we encourage users needing further configuration needs to just use the other packages directly?

  • Are there any configuration options that are absolutely necessary for the above?

Dependencies

In order to support a ‘gatsby’ mode, we will need a Gatsby equivalent of the @mlx/parser packages for React and Angular. This is something we can pursue as part of our collaboration with Gatsby.

API requires

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.