Comments (7)
Hey @AndreasKleineberg, there's a ttl
property for the features that you can set which will discard older events: https://docs.metarank.ai/reference/overview/feature-extractors#configuration. Default is 3 months, but you can set it for a smaller scope, depending on your use case
from metarank.
If I understand it correctly, the individual events (item/user/ranking/interaction) are persistently stored in Redis (or local storage). The individual events are merged and additionally stored as click-through events, correct? The mentioned ttl
property makes sure that individual features (e.g. memory eating embeddings) are deleted after some time. But what happens if the underlying item can still show up in rankings? Is the feature then recalculated? Or does the item event have to be reloaded before the 3 months expire?
On a side note, I think you've got a really interesting project going, but it's never taken me this long to grasp the context of a project (especially in terms of production use). Nevertheless, good work!
from metarank.
@AndreasKleineberg Metarank does not store raw events at all. It only stores some derived feature values used for the ranking, so the original idea was to stick to a soft expiration logic with TTLs in Redis to purge old data.
For example:
- when you send an item even with a single field price=100 (which is used as-is in the ranking as a scalar feature), it creates a redis KV record of
item/id1/price
=100
with TTL=90days, and then discards the original event. - when you send a click event (and you track CTR for example), metarank updates the daily counters for clicks for a specific item (by doing
hincrby
in redis), and marks the item as clicked in the original ranking event. And then also discards the event.
By doing this approach we're storing not all the raw events in Redis, but only update feature values needed for the ranking.
But you're right, bulk removal of values is not implemented right now in the way you ask.
from metarank.
So I guess for your use-case the simplest way to go is:
- set TTLs for all the features (AFAIK the default is 90 days, you may prefer a smaller value)
- when the item is removed, then it won't appear in the ranking events (so it's never going to be presented to a customer), so the TTL will be never refreshed
- with time all the inactive user/item features will be eventually expired and removed.
There is still a chance that you may want to explicitly nuke a significant part of the inventory (like off-boarding a large vendor from a marketplace), but considering that this use case is quite rare - we still not sure that it should be part of our roadmap.
from metarank.
Here is a fix for a bunch of issues with Redis TTL pass-through: #1114
from metarank.
* when the item is removed, then it won't appear in the ranking events (so it's never going to be presented to a customer), so the TTL will be never refreshed
Ah okay, I think that's the point I've missed so far. So the features that I have set a TTL for are updated whenever they appear in a ranking event. So then that also means if a product (item event) never appears in a ranking, it automatically flies out after the TTL expires.
One last question about this: What happens if a product was never shown in the rankings before its features expired, then the features expire and afterwards someone does look at the product (so it does show up in a ranking event). Is it then simply ignored? The features are then no longer available in Redis.
from metarank.
One last question about this: What happens if a product was never shown in the rankings before its features expired, then the features expire and afterwards someone does look at the product (so it does show up in a ranking event). Is it then simply ignored? The features are then no longer available in Redis.
Then the feature value would be a NaN, an empty value. All the backends like lightgbm/xgboost/catboost do support this natively and use this information in training as a yet another signal. More details on how it works: https://datascience.stackexchange.com/questions/65956/how-do-gbm-algorithms-handle-missing-data
from metarank.
Related Issues (20)
- Getting secondary ranking with Metarank HOT 5
- Doc correction in cross-encorder HOT 1
- Import LightGBM trained model to metarank HOT 3
- Feature storage is not optimal for shared fields
- The "termfreq" command could detect an existing target file and abort instead of failing at the end
- The “sort” command should not erase the source file when “data” equals “out”
- Diversity feature throws an exception when the data is missing.
- The "import" command should be able to continue importing when the state file exists.
- Offline import should be able to wipe the state at the end
- doc mentions obsolete relative_number feature
- Add blocklist for feature names: values, models
- validate crashes when dataset has clicks with references to non-existing ranking events
- Make eval metrics configurable
- GCP Memstore redis: print a warning for lack of cache invalidation support
- warmup with synthetic traffic support
- doc: describe setup with file-based immutable store
- Memory leak when using Redis persistence
- : and / cannot be used in search queries using Redis persistence layer HOT 3
- Kinesis client throws errors on expired iterator
- There seems to be a bug after training a model that is flushed to redis.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metarank.