Coder Social home page Coder Social logo

sirko-io / engine Goto Github PK

View Code? Open in Web Editor NEW
43.0 3.0 3.0 231 KB

Benefit from new browsers' technologies to speed up your site

License: GNU General Public License v3.0

Elixir 99.90% Shell 0.10%
progressive-web-app offline precaching-resources prediction

engine's People

Contributors

dnesteryuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

engine's Issues

Dynamic pages

Example:

http://localhost/article/1
http://localhost/article/2
http://localhost/article/3

There should be an option to get dynamic pages out of the scope in order to avoid blowing up the DB with junk. There might be dynamic pages which stay popular for long time, but most of them won't be like that.

A returning user with an expired session key

If a user returns with an expired session key, a new session should be started for the user. Currently, the engine keeps tracking transitions for the expired session key.

Research

The session key can be expired by a browser. Are we able to prolong an expiry for a defined cookie? It may be used to push the expiry if the user keeps navigating the site.

A user with a removed session key

Context: A user stays on the page for a long time. Meantime, their session gets expired and session key gets removed from the browser. They move to another page.
Expected result: A new session gets started, the transition between the referrer and the current page is created.
Actual result: The current page gets linked to the starting point, hence, we loose valid data about the transition.

A login page is predicted for an authorized user

Context: There is a site which prevents authorized users from accessing the login page. If the authorized user tries to access the login page, that user gets redirected to the index page behind the login.

Problem: When the login page gets predicted for the authorized user, the browser pre-renders the index page. After logout, it results in showing the index page instead of the login.

This scenario depends on an actual implementation of a site. But, it is a real case and the engine should provide a way to configure such things in order to avoid prerendering of wrong pages.

Configurable session options

The following options should be configurable:

Sites having less traffic might increase stale_session_in (otherwise, they won't get enough data to make predictions), on the other hand sites having huge traffic may decrease this value (it will help them to avoid blowing the DB).

Add support of a confidence threshold

The prerendering means the additional load on the backend of a site. Hence, customers of the engine may like an idea of adding a hint for the browsers only if there is high confidence that the current user visits the predicted page.

For example, the engine sees the following pages which might be visited by the current user:

Path Confidence
/about 10%
/projects 30%
/blog 20%
/video 20%
/contact 20%

If the confidence threshold is 20%, the projects page gets preredered, but when the config value is 50%, nothing gets prerendered.

Doing this way, customers can get rid of the harmful load.

Integrate Rollbar

Users of the engine should know when something goes wrong. Therefore, we need to integrate some tool which will inform them about errors. https://rollbar.com is a good candidate as it has a free plan which is ok for small projects.

Neo4j and fault tolerance

The engine's modules must be restarted when:

  • Neo4j goes down
  • Neo4j isn't accessible right away (it is possible when the engine gets started through the docker compose)

In both cases, the engine should retry to connect to Neo4j.

The referral parameter may not be important

According to this article, a value of optimization different group of pages is different. The sirko client might only be applied to certain sections of the site.

The referral parameter was introduced to get a smooth path of a user's navigation. We tried to avoid missing transitions between pages. However, It might be ok, we have to check it.

Considering the fact that the sirko client might only be added to certain pages, the referral becomes useless.

Do we really need the starting point?

It was added to get a smooth path, but if the active user types the url of a page, there isn't a smooth path anymore, the page isn't get linked to the starting point.

Probably, we can remove it.

Pages behind login

Scenario:

  1. A user is on the login page: http://localhost/login.
  2. The user logins and gets forwarded to the internal page: http://localhost/index.

First of all the index page is private, it cannot be pre-rendered. Even if the index page is public, the page may be personalized after the login, hence, it must not be pre-rendered. Otherwise, the user will see the stale content.

How to handle such case?

Consider exists in computing confidence

We use the confidence threshold to be sure only pages which meet it get prefetched. But, currently we exclude exists, thus, confidence computed for pages aren't accurate. For example, if the confidence threshold is 0.5, although, there is 0.8 probability that the user leaves the page, nothing must be prefetched.

Stale transitions

The structure of a site changes over time. Therefore, the engine should be able to:

  • get rid of transitions which don't take place anymore (For instance, the link between pages was removed.)
  • get rid of old pages (For instance, some pages were removed. It should be easy to do once the previous one is fulfilled.)

Transition relations without count

The transition relations linking pages and the exit point doesn't keep the count of transitions. This mistake leads to prediction the exit point instead of a correct page with a higher count.

Refactor the GET /predict action

Unfortunately, plugs get applied to all actions. Because of that, the Sirko.Plugs.Session got an additional option:

plug Sirko.Plugs.Session,
   on: "/predict"

in order to specify when it must be applied. In general, the logic from the Sirko.Plugs.Session and get "/predict" must be applied to the same request. Therefore, the code from the Sirko.Web and Sirko.Plugs.Session must be extracted to a new module (for instance, Sirko.Web.Predictor). Doing this way, we will be able to decouple the logic from the plug library.

Outcome: Better structured code.

Create a command for computing the accuracy

There isn't any possibility to understand how good the model works for a particular site. So, we need a command (mix task) which will measure the accuracy of the prediction.

80% of the loaded sessions should be used for training the model and 20% for validating (the model mustn't access them).

RMSE might be used for measuring errors.

The session relation keeps a count field

A user may come back to the same page a few times. It doesn't make any sense create a new session relation between pages again and again. Therefore, a new count field has to be added to the session relation. The field will imply how many times the user has visited a particular page during the session.

Active user goes directly to a new page

Context: An active user's typed the url to an unknown page (the system doesn't know it yet). Then, using a link on the page, they go the next page.
Expected result: The transition from the first page to the second one gets tracked.
Actual result: The transition isn't tracked.

Add support of subdomains

Currently, the engine works with a certain host, for example: demo.sirko.io. But, it should work well for all subdomains within a domain. Even when a user moves between different parts of a site hosted on different subdomains, we can add the hint for the browsers.

Import sessions from Google Analytics

GA has a nice API which can be used to export sessions. If we use those sessions, new customers will be able to get more accurate predictions earlier.

The engine should provide a simple shell command (script) to import data.

Short expired sessions

Expired sessions having only one transition

(:Page {start: true})-[:SESSION {key: "skey32"}]->(:Page {path: '/list'})-[:SESSION {key: "skey32"}]->(:Page {exit: true})

must be removed and excluded from increasing counts on transition relations. Short sessions don't add any value.

Reading speed from service worker

Upon preliminary analysis, reading speed of the service worker in sirko-io seems comparable to network. Low latency, but also low reading speed.

Data:
image

Total load time demo: 1.0 - 1.5s

Is that a bug / some IO blocking in the SW ? How can that number be brought down to 20-30ms as normal read operations?

Gather statistics about made predictions

The client will send information about made predictions. The engine should store that info and there should be a simple command which can be called from the command-line in order to see info about correct predictions VS incorrect ones.

The idea of this task is to track how the model works for a particular site.

The engine gets released with the sirko js library

The mix supports a feature for archiving projects. It might be useful for releasing the engine and shipping it to the community. Besides the engine, releases should include the sirko js library.

For example, an instance of the app is hosted on http://sirko.example.org. To require the sirko js library, the client references it like http://sirko.example.org/sirko.js.

It will simplify the installation for clients who don't use Npm or don't want to add a new dependency to their package.json.

Also, there is a distillery library which looks promising to manage releases. We need to evaluate both options and choose the best one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.