The engine from sirko-io

Dynamic pages

Example:

http://localhost/article/1
http://localhost/article/2
http://localhost/article/3

There should be an option to get dynamic pages out of the scope in order to avoid blowing up the DB with junk. There might be dynamic pages which stay popular for long time, but most of them won't be like that.

Document how to setup the engine

development mode
production mode

No active prediction in demo

FYI - https://demo.sirko.io/home does not seem to generate predictions when navigating the demo site.

This very likely is an issue with the stats connected to the demo instance, not an issue in the source itself.

A returning user with an expired session key

If a user returns with an expired session key, a new session should be started for the user. Currently, the engine keeps tracking transitions for the expired session key.

Research

The session key can be expired by a browser. Are we able to prolong an expiry for a defined cookie? It may be used to push the expiry if the user keeps navigating the site.

Add a mix task to clean up the DB

It will be useful for development and production when people want to start from scratch.

A user with a removed session key

Context: A user stays on the page for a long time. Meantime, their session gets expired and session key gets removed from the browser. They move to another page.
Expected result: A new session gets started, the transition between the referrer and the current page is created.
Actual result: The current page gets linked to the starting point, hence, we loose valid data about the transition.

A login page is predicted for an authorized user

Context: There is a site which prevents authorized users from accessing the login page. If the authorized user tries to access the login page, that user gets redirected to the index page behind the login.

Problem: When the login page gets predicted for the authorized user, the browser pre-renders the index page. After logout, it results in showing the index page instead of the login.

This scenario depends on an actual implementation of a site. But, it is a real case and the engine should provide a way to configure such things in order to avoid prerendering of wrong pages.

Configurable session options

The following options should be configurable:

inactive_session_in which means when a session is treated as inactive.
stale_session_in which means when a session is treated as stale.

Sites having less traffic might increase stale_session_in (otherwise, they won't get enough data to make predictions), on the other hand sites having huge traffic may decrease this value (it will help them to avoid blowing the DB).

Add support of a confidence threshold

The prerendering means the additional load on the backend of a site. Hence, customers of the engine may like an idea of adding a hint for the browsers only if there is high confidence that the current user visits the predicted page.

For example, the engine sees the following pages which might be visited by the current user:

Path	Confidence
/about	10%
/projects	30%
/blog	20%
/video	20%
/contact	20%

If the confidence threshold is 20%, the projects page gets preredered, but when the config value is 50%, nothing gets prerendered.

Doing this way, customers can get rid of the harmful load.

Integrate Rollbar

Users of the engine should know when something goes wrong. Therefore, we need to integrate some tool which will inform them about errors. https://rollbar.com is a good candidate as it has a free plan which is ok for small projects.

Neo4j and fault tolerance

The engine's modules must be restarted when:

Neo4j goes down
Neo4j isn't accessible right away (it is possible when the engine gets started through the docker compose)

In both cases, the engine should retry to connect to Neo4j.

Reject requests coming from unknown hosts

The app keeps the client url in the settings. We can use it to protect the app from harmful requests.

Transition relations with identical counts

When the Sirko.Predictor module has to choose between transitions with identical counts, a transition having the freshest occurrence date must be chosen.

Add licence

Setup CI

The referral parameter may not be important

According to this article, a value of optimization different group of pages is different. The sirko client might only be applied to certain sections of the site.

The referral parameter was introduced to get a smooth path of a user's navigation. We tried to avoid missing transitions between pages. However, It might be ok, we have to check it.

Considering the fact that the sirko client might only be added to certain pages, the referral becomes useless.

Log queries' execution

Now it isn't possible to understand which queries are slow.

Do we really need the starting point?

It was added to get a smooth path, but if the active user types the url of a page, there isn't a smooth path anymore, the page isn't get linked to the starting point.

Probably, we can remove it.

Pages behind login

Scenario:

A user is on the login page: http://localhost/login.
The user logins and gets forwarded to the internal page: http://localhost/index.

First of all the index page is private, it cannot be pre-rendered. Even if the index page is public, the page may be personalized after the login, hence, it must not be pre-rendered. Otherwise, the user will see the stale content.

How to handle such case?

Consider exists in computing confidence

We use the confidence threshold to be sure only pages which meet it get prefetched. But, currently we exclude exists, thus, confidence computed for pages aren't accurate. For example, if the confidence threshold is 0.5, although, there is 0.8 probability that the user leaves the page, nothing must be prefetched.

Speed up the method for expiring sessions

Elixir provides a very simple and powerful mechanism to execute code in parallel https://hexdocs.pm/gen_stage/Experimental.Flow.html#content. This library can be used to improve the Sirko.Session.expire_all_inactive method.

Upgrade Distillery

Upgrade to Elixir 1.5.x

https://elixir-lang.org/blog/2017/07/25/elixir-v1-5-0-released/

Stale transitions

The structure of a site changes over time. Therefore, the engine should be able to:

get rid of transitions which don't take place anymore (For instance, the link between pages was removed.)
get rid of old pages (For instance, some pages were removed. It should be easy to do once the previous one is fulfilled.)

Transition relations without count

The transition relations linking pages and the exit point doesn't keep the count of transitions. This mistake leads to prediction the exit point instead of a correct page with a higher count.

Execute 2 processes in parallel

This 2 steps can be executed in parallel. So, we will reduce the response time.

Refactor the GET /predict action

Unfortunately, plugs get applied to all actions. Because of that, the Sirko.Plugs.Session got an additional option:

plug Sirko.Plugs.Session,
   on: "/predict"

in order to specify when it must be applied. In general, the logic from the Sirko.Plugs.Session and get "/predict" must be applied to the same request. Therefore, the code from the Sirko.Web and Sirko.Plugs.Session must be extracted to a new module (for instance, Sirko.Web.Predictor). Doing this way, we will be able to decouple the logic from the plug library.

Outcome: Better structured code.

Create a command for computing the accuracy

There isn't any possibility to understand how good the model works for a particular site. So, we need a command (mix task) which will measure the accuracy of the prediction.

80% of the loaded sessions should be used for training the model and 20% for validating (the model mustn't access them).

RMSE might be used for measuring errors.

Add indexes to the DB

Neo4j supports indexes (http://neo4j.com/docs/developer-manual/current/cypher/schema/index/) which aren't used by the engine now. It makes sense to add the index to the path property of the page nodes.

The session relation keeps a count field

A user may come back to the same page a few times. It doesn't make any sense create a new session relation between pages again and again. Therefore, a new count field has to be added to the session relation. The field will imply how many times the user has visited a particular page during the session.

Update the README before releasing 0.1

add info about the fallback for Firefox, Safari
update paragraphs referring to old config options.

The assets added through Distillery isn't accessible to the plug after packaging

When the 'priv/static' directory is added to the root directory before packaging, it is accessible to the plug. But, it isn't accessible when the directory is added after packaging.

Store assets given by the client and return them along with the predicted page

The client will send a list of assets for each being tracked page. That list of assets should be stored per page node. Once the prediction is made, the engine should return a list of assets of the predicted page along with the path.

Active user goes directly to a new page

Context: An active user's typed the url to an unknown page (the system doesn't know it yet). Then, using a link on the page, they go the next page.
Expected result: The transition from the first page to the second one gets tracked.
Actual result: The transition isn't tracked.

Add support of subdomains

Currently, the engine works with a certain host, for example: demo.sirko.io. But, it should work well for all subdomains within a domain. Even when a user moves between different parts of a site hosted on different subdomains, we can add the hint for the browsers.

Import sessions from Google Analytics

GA has a nice API which can be used to export sessions. If we use those sessions, new customers will be able to get more accurate predictions earlier.

The engine should provide a simple shell command (script) to import data.

Newer statistics has higher impact on the prediction

The engine should compute trends for transitions and consider them during prediction (New popular transitions have higher weight than fading transitions).

Short expired sessions

Expired sessions having only one transition

(:Page {start: true})-[:SESSION {key: "skey32"}]->(:Page {path: '/list'})-[:SESSION {key: "skey32"}]->(:Page {exit: true})

must be removed and excluded from increasing counts on transition relations. Short sessions don't add any value.

Reading speed from service worker

Upon preliminary analysis, reading speed of the service worker in sirko-io seems comparable to network. Low latency, but also low reading speed.

Data:

Total load time demo: 1.0 - 1.5s

Is that a bug / some IO blocking in the SW ? How can that number be brought down to 20-30ms as normal read operations?

Migrate to a bolt protocol

The bolt protocol should be faster.

Create a CONTRIBUTING.md file

Good resources to look at:

Tests clean all data from the DB

After launching tests all data from Neo4j gets removed. Tests must only remove data they create.

Write a doc describing how to install the app without Docker

It will be useful for users who don't use Docker.

Gather statistics about made predictions

The client will send information about made predictions. The engine should store that info and there should be a simple command which can be called from the command-line in order to see info about correct predictions VS incorrect ones.

The idea of this task is to track how the model works for a particular site.

A session gets expired earlier then it should be

There is a case when an active session gets expired.

The engine gets released with the sirko js library

The mix supports a feature for archiving projects. It might be useful for releasing the engine and shipping it to the community. Besides the engine, releases should include the sirko js library.

For example, an instance of the app is hosted on http://sirko.example.org. To require the sirko js library, the client references it like http://sirko.example.org/sirko.js.

It will simplify the installation for clients who don't use Npm or don't want to add a new dependency to their package.json.

Also, there is a distillery library which looks promising to manage releases. We need to evaluate both options and choose the best one.

sirko-io / engine Goto Github PK

engine's People

Contributors

Stargazers

Watchers

Forkers

engine's Issues

Research

Recommend Projects

Recommend Topics

Recommend Org