sirko-io / engine Goto Github PK
View Code? Open in Web Editor NEWBenefit from new browsers' technologies to speed up your site
License: GNU General Public License v3.0
Benefit from new browsers' technologies to speed up your site
License: GNU General Public License v3.0
Example:
http://localhost/article/1
http://localhost/article/2
http://localhost/article/3
There should be an option to get dynamic pages out of the scope in order to avoid blowing up the DB with junk. There might be dynamic pages which stay popular for long time, but most of them won't be like that.
FYI - https://demo.sirko.io/home does not seem to generate predictions when navigating the demo site.
This very likely is an issue with the stats connected to the demo instance, not an issue in the source itself.
If a user returns with an expired session key, a new session should be started for the user. Currently, the engine keeps tracking transitions for the expired session key.
The session key can be expired by a browser. Are we able to prolong an expiry for a defined cookie? It may be used to push the expiry if the user keeps navigating the site.
It will be useful for development and production when people want to start from scratch.
Context: A user stays on the page for a long time. Meantime, their session gets expired and session key gets removed from the browser. They move to another page.
Expected result: A new session gets started, the transition between the referrer and the current page is created.
Actual result: The current page gets linked to the starting point, hence, we loose valid data about the transition.
Context: There is a site which prevents authorized users from accessing the login page. If the authorized user tries to access the login page, that user gets redirected to the index page behind the login.
Problem: When the login page gets predicted for the authorized user, the browser pre-renders the index page. After logout, it results in showing the index page instead of the login.
This scenario depends on an actual implementation of a site. But, it is a real case and the engine should provide a way to configure such things in order to avoid prerendering of wrong pages.
The following options should be configurable:
Sites having less traffic might increase stale_session_in
(otherwise, they won't get enough data to make predictions), on the other hand sites having huge traffic may decrease this value (it will help them to avoid blowing the DB).
The prerendering means the additional load on the backend of a site. Hence, customers of the engine may like an idea of adding a hint for the browsers only if there is high confidence that the current user visits the predicted page.
For example, the engine sees the following pages which might be visited by the current user:
Path | Confidence |
---|---|
/about | 10% |
/projects | 30% |
/blog | 20% |
/video | 20% |
/contact | 20% |
If the confidence threshold is 20%, the projects page gets preredered, but when the config value is 50%, nothing gets prerendered.
Doing this way, customers can get rid of the harmful load.
Users of the engine should know when something goes wrong. Therefore, we need to integrate some tool which will inform them about errors. https://rollbar.com is a good candidate as it has a free plan which is ok for small projects.
The engine's modules must be restarted when:
In both cases, the engine should retry to connect to Neo4j.
The app keeps the client url in the settings. We can use it to protect the app from harmful requests.
When the Sirko.Predictor module has to choose between transitions with identical counts, a transition having the freshest occurrence date must be chosen.
According to this article, a value of optimization different group of pages is different. The sirko client might only be applied to certain sections of the site.
The referral parameter was introduced to get a smooth path of a user's navigation. We tried to avoid missing transitions between pages. However, It might be ok, we have to check it.
Considering the fact that the sirko client might only be added to certain pages, the referral becomes useless.
Now it isn't possible to understand which queries are slow.
It was added to get a smooth path, but if the active user types the url of a page, there isn't a smooth path anymore, the page isn't get linked to the starting point.
Probably, we can remove it.
Scenario:
http://localhost/login
.http://localhost/index
.First of all the index page is private, it cannot be pre-rendered. Even if the index page is public, the page may be personalized after the login, hence, it must not be pre-rendered. Otherwise, the user will see the stale content.
How to handle such case?
We use the confidence threshold to be sure only pages which meet it get prefetched. But, currently we exclude exists, thus, confidence computed for pages aren't accurate. For example, if the confidence threshold is 0.5, although, there is 0.8 probability that the user leaves the page, nothing must be prefetched.
Elixir provides a very simple and powerful mechanism to execute code in parallel https://hexdocs.pm/gen_stage/Experimental.Flow.html#content. This library can be used to improve the Sirko.Session.expire_all_inactive
method.
The structure of a site changes over time. Therefore, the engine should be able to:
The transition relations linking pages and the exit point doesn't keep the count of transitions. This mistake leads to prediction the exit point instead of a correct page with a higher count.
This 2 steps can be executed in parallel. So, we will reduce the response time.
Unfortunately, plugs get applied to all actions. Because of that, the Sirko.Plugs.Session got an additional option:
plug Sirko.Plugs.Session,
on: "/predict"
in order to specify when it must be applied. In general, the logic from the Sirko.Plugs.Session and get "/predict" must be applied to the same request. Therefore, the code from the Sirko.Web and Sirko.Plugs.Session must be extracted to a new module (for instance, Sirko.Web.Predictor). Doing this way, we will be able to decouple the logic from the plug library.
Outcome: Better structured code.
There isn't any possibility to understand how good the model works for a particular site. So, we need a command (mix task) which will measure the accuracy of the prediction.
80% of the loaded sessions should be used for training the model and 20% for validating (the model mustn't access them).
RMSE might be used for measuring errors.
Neo4j supports indexes (http://neo4j.com/docs/developer-manual/current/cypher/schema/index/) which aren't used by the engine now. It makes sense to add the index to the path property of the page nodes.
A user may come back to the same page a few times. It doesn't make any sense create a new session relation between pages again and again. Therefore, a new count field has to be added to the session relation. The field will imply how many times the user has visited a particular page during the session.
When the 'priv/static' directory is added to the root directory before packaging, it is accessible to the plug. But, it isn't accessible when the directory is added after packaging.
The client will send a list of assets for each being tracked page. That list of assets should be stored per page node. Once the prediction is made, the engine should return a list of assets of the predicted page along with the path.
Context: An active user's typed the url to an unknown page (the system doesn't know it yet). Then, using a link on the page, they go the next page.
Expected result: The transition from the first page to the second one gets tracked.
Actual result: The transition isn't tracked.
Currently, the engine works with a certain host, for example: demo.sirko.io
. But, it should work well for all subdomains within a domain. Even when a user moves between different parts of a site hosted on different subdomains, we can add the hint for the browsers.
GA has a nice API which can be used to export sessions. If we use those sessions, new customers will be able to get more accurate predictions earlier.
The engine should provide a simple shell command (script) to import data.
The engine should compute trends for transitions and consider them during prediction (New popular transitions have higher weight than fading transitions).
Expired sessions having only one transition
(:Page {start: true})-[:SESSION {key: "skey32"}]->(:Page {path: '/list'})-[:SESSION {key: "skey32"}]->(:Page {exit: true})
must be removed and excluded from increasing counts on transition relations. Short sessions don't add any value.
Upon preliminary analysis, reading speed of the service worker in sirko-io seems comparable to network. Low latency, but also low reading speed.
Total load time demo: 1.0 - 1.5s
Is that a bug / some IO blocking in the SW ? How can that number be brought down to 20-30ms as normal read operations?
The bolt protocol should be faster.
After launching tests all data from Neo4j gets removed. Tests must only remove data they create.
It will be useful for users who don't use Docker.
The client will send information about made predictions. The engine should store that info and there should be a simple command which can be called from the command-line in order to see info about correct predictions VS incorrect ones.
The idea of this task is to track how the model works for a particular site.
There is a case when an active session gets expired.
The mix supports a feature for archiving projects. It might be useful for releasing the engine and shipping it to the community. Besides the engine, releases should include the sirko js library.
For example, an instance of the app is hosted on http://sirko.example.org
. To require the sirko js library, the client references it like http://sirko.example.org/sirko.js
.
It will simplify the installation for clients who don't use Npm or don't want to add a new dependency to their package.json.
Also, there is a distillery library which looks promising to manage releases. We need to evaluate both options and choose the best one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.