Just for testing i have set feed max_length=1000 and

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Not thread safe? about stream-framework HOT 15 CLOSED

tschellenbach commented on May 20, 2024

Not thread safe?

from stream-framework.

Comments (15)

tschellenbach commented on May 20, 2024

Thread safety
The thread safety for aggregated and notification feeds depends on the Redis.lock functionality.
I can think of two ways this could go wrong:

Subclassing or other customized functionality not taking into account locking
Redis running out of memory (at which point locking stops working)

Merge max length
The problem is that we use a sorted set in redis. See http://redis.io/topics/data-types#sorted-sets
So the thing by which you sort is also what's used for lookups. In python terms the dict looks more like: {time: value, time2: value2}
That means we can't simply lookup the right aggregated activity, but we need to loop through the activities. (Another reason is that you can also built aggregation which is more complicated than group key -> activity)

Solutions
Redis is really fast, I expect that even with 1000 items you can still have a reasonably performant system.

Alternatively there are secondary indexes supported in Cassandra (really hard to setup and maintain)

Maybe you could store your own index in Redis so you only need two quick lookups. (though the 2 sequential lookups might be slower than just looping through 1000 rows, depends a bit on the size of the row and latency to redis)

@tbarbugli more ideas?

from stream-framework.

intellisense commented on May 20, 2024

When i was testing it i was looking into the memory continuously. I have 14GiB of Memory and the operation never exceeded that.
If i regenerate feeds from existing users actions using celery mode CELERY_ALWAYS_EAGER=False Duplicate aggregated activities happen. But if i do it with CELERY_ALWAYS_EAGER=True (which obviously takes a lot of time to complete the process of regenerating feeds) duplicated didn't happen. So thats why i thought redis.lock has some issues or may not have implemented properly.

According to my knowledge feedly is not created with the thought that there should never be duplicate aggregated activities group, otherwise merge_max_length should never be existed. So maybe somewhere something needs to be improved or looked into?

from stream-framework.

tbarbugli commented on May 20, 2024

@Thierry where is the locking done at aggregated feed? I cant find it in
feedly and I suspect its not there at all (if it was there we would need a
cassandra implementation or at least some configuration for the lock
storage which sounds challenging :))

@intellisense how do you regenerate feeds?

The only problem I can think about the redis lock (as we use in the
notification feed) is that it gets automatically released after 10 seconds,
thats a way to prevent deadlocks.

2014-02-24 18:34 GMT+01:00 intellisense [email protected]:

When i was testing it i was looking into the memory continuously. I have
14GiB of Memory and the operation never exceeded that.
If i regenerate feeds from existing users actions using celery mode
CELERY_ALWAYS_EAGER=False Duplicate aggregated activities happen. But if
i do it with CELERY_ALWAYS_EAGER=True (which obviously takes a lot of
time to complete the process of regenerating feeds) duplicated didn't
happen. So thats why i thought redis.lock has some issues or may not have
implemented properly.

According to my knowledge feedly is not created with the thought that
there should never be duplicate aggregated activities group, otherwise
merge_max_length should never be existed. So maybe somewhere something
needs to be improved or looked into?

Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-35912265
.

from stream-framework.

intellisense commented on May 20, 2024

Re: How do you regenerate feeds?
Basically i am storing every user action in a generic database model called Action provided by django-activity-stream. So regenerating feed is just matter of looping through the queryset and first creating an instance of feedly.activity.Activity from action object and adding it by doing feedly.add_user_activity(action.actor.id, activity).

P.S I am not spawning any extra celery tasks by myself. Celery tasks are those which feedly issues for add_operation

from stream-framework.

tschellenbach commented on May 20, 2024

I'm not sure we're talking about the same problem.

Problem scenario A:
Multiple updates happen on the same aggregated activity. Without locking this results in the activity being copied multiple times. So if you have aggregated activity
start [1]
and you have multiple add operations running, you can end up with
start [1,2]
start [1,3]
Instead of start [1,2,3]

Problem scenario B (Search depth)
If there is an aggregated activity past your search depth, you'll end up with 2 distinct aggregated activities with the same value for the group param.

PS
Tommaso you are right, the aggregated feed locking seems to be specific to Fashiolista. (Indeed because of the trouble of combining Redis and Cassandra. Here's the Fashiolista example:
(you could also use a different locking mechanism than Redis, but you need some sort of central lock to prevent duplication of aggregated activities)

class AggregatedFeed(FashiolistaBaseFeed, CassandraAggregatedFeed):
    aggregator_class = FashiolistaAggregator
    timeline_cf_name = 'timeline_aggregated'
    key_format = 'feed:aggregated:%(user_id)s'
    lock_format = 'feed:aggregated:lock:%(user_id)s'
    max_length = 2400
    merge_max_length = 40

    def __init__(self, *args, **kwargs):
        CassandraAggregatedFeed.__init__(self, *args, **kwargs)
        self.lock_key = self.lock_format % {'user_id': self.user_id}
        self.redis = get_redis_connection()

    def add_many(self, *args, **kwargs):
        with self.redis.lock(self.lock_key, timeout=30):
            results = super(AggregatedFeed, self).add_many(*args, **kwargs)
            return results

    def remove_many(self, *args, **kwargs):
        with self.redis.lock(self.lock_key, timeout=30):
            results = super(AggregatedFeed, self).remove_many(*args, **kwargs)
            return results

    def trim(self, *args, **kwargs):
        pass

from stream-framework.

Thierry commented on May 20, 2024

Folks, you have the wrong 'Thierry' among the recipients.

Could you please remove my name from this thread?

Thanks in advance
Le 24 févr. 2014 18:54, "intellisense" [email protected] a écrit :

Re: How do you regenerate feeds?
Basically i am storing every user action in a generic database model
called Action provided by django-activity-streamhttps://github.com/justquick/django-activity-stream.
So regenerating feed is just matter of looping through the queryset and
first creating an instance of feedly.activity.Activity from action object
and adding it by doing feedly.add_user_activity(action.actor.id, activity)
.

P.S I am not spawning any extra celery tasks by myself. Celery tasks are
those which feedly issues for add_operation

Reply to this email directly or view it on GitHubhttps://github.com//issues/42#issuecomment-35914473
.

from stream-framework.

tschellenbach commented on May 20, 2024

Hi, sorry about that, they meant to contact me.
That's quite spam sensitive though.

from stream-framework.

tschellenbach commented on May 20, 2024

I think only you can unsubscribe (there's a button) on the top right

from stream-framework.

intellisense commented on May 20, 2024

@tschellenbach I think its Problem Scenario A. Using redis.lock should fix it. I will give it a try.

from stream-framework.

intellisense commented on May 20, 2024

Any chance of a deadlock in using the above locking scheme? Because I can see 20%-30% celery fanout tasks are stuck forever while some succeeded. Using the locking technique is similar to celery Synchronous Subtasks Problem. Any suggestions?

from stream-framework.

tschellenbach commented on May 20, 2024

Hi!

The redis lock clears out after the timeout. So it would need to go wrong
many times during a fanout for tasks to seemingly hang forever.

Since it locks per feed and it processes one feed at the time deadlocks
arent possible.

It is possible that a lock is set but not cleared. If that happens a task
should hang for the timeout period.

What Celery broker are you using? We've had many issues running anything
else than rabbitmq as a broker.
On Jul 21, 2014 2:51 PM, "intellisense" [email protected] wrote:

Any chance of a deadlock in using the above locking scheme? Because I can
see 20%-30% celery fanout tasks are stuck forever while some succeeded.
Using the locking technique is similar to celery Synchronous Subtasks
Problem
http://docs.celeryproject.org/en/latest/userguide/tasks.html#avoid-launching-synchronous-subtasks.
Any suggestions?

—
Reply to this email directly or view it on GitHub
#42 (comment).

from stream-framework.

intellisense commented on May 20, 2024

I am using redis as a broker. And redis comes from AWS Elastic Cache. Lock will be cleared automatically when exit from python with statement.

from stream-framework.

tbarbugli commented on May 20, 2024

Thats partially correct; killing the process that holds the locks will
leave the lock acquired until the redis key expires.

Regarding the issue, since locks are per feed, if you have lots of updates
on the same feed you will end up with lot of processes waiting for the lock
to be available and quite some waste; if thats the case i suggest batching
updates together to reduce lock contention and reduce read overhead
(batched updates are much more efficient than lot of single updates)

On Monday, 21 July 2014, intellisense [email protected] wrote:

I am using redis as a broker. And redis comes from AWS Elastic Cache. Lock
will be cleared automatically when exit from python with statement.

—
Reply to this email directly or view it on GitHub
#42 (comment).

sent from iphone (sorry for the typos)

from stream-framework.

intellisense commented on May 20, 2024

Hi, As written here

Adds many activities to the feed
Unfortunately we can't support the batch interface.
The writes depend on the reads.
Also subsequent writes will depend on these writes.
So no batching is possible at all.

So I guess there is no work arround to add multiple user activities together in user aggregated feed and then fanout to followers?

from stream-framework.

tbarbugli commented on May 20, 2024

Hi,
thats correct; it is possible to add multiple activities together to an
aggregated feed but unfortunately it is not possible to run add_many on
many aggregated feeds in one batch (using the batch operator).

What backend storage are you using?

2014-08-16 23:14 GMT+02:00 intellisense [email protected]:

Hi, As written here
http://feedly.readthedocs.org/en/latest/_modules/feedly/feeds/aggregated_feed/base.html#AggregatedFeed.add_many

Adds many activities to the feed
Unfortunately we can't support the batch interface.
The writes depend on the reads.
Also subsequent writes will depend on these writes.
So no batching is possible at all.

So I guess there is no work arround to add multiple user activities
together in user aggregated feed and then fanout to followers?

—
Reply to this email directly or view it on GitHub
#42 (comment).

from stream-framework.

Not thread safe? about stream-framework HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent