Coder Social home page Coder Social logo

[Ask About Hard Delete Message] about chat HOT 20 CLOSED

tinode avatar tinode commented on May 22, 2024
[Ask About Hard Delete Message]

from chat.

Comments (20)

or-else avatar or-else commented on May 22, 2024 1

Hi Riandy,

Thank for looking at the pull request.

  1. Since in current version user could only delete messages by using range, essentially server will delete messages with seq id 1 until 300, one by one.

The user sends a delete request for a range {low: 1, hi: 301}. The rethinkdb adapter will receive the request to delete messages 1 <= seq < 301 and will delete the whole range at once: https://github.com/tinode/chat/blob/better-delete/server/db/rethinkdb/adapter.go#L979

The adapter will receive the ranges to delete just like they were sent by the client. It's up to the adapter to execute the delete strategy. For instance, all SQL databases can do an atomic DELETE FROM messages WHERE seq BETWEEN 1 AND 301. Mongo DB can do it with db.collection.deleteMany()

  1. But if user could also delete messages by using clear, it just need to set the clear value on User A's subscription to topic X.

Yes, I removed this option. That was the old soft-delete method. It was only good for "hide all messages with IDs below clear". It could not be used to delete individual messages by IDs, which is one of the most common use cases.

on DynamoDB which update operation is restricted to only 1 update per request, it is really troublesome.

Yes, I can see how this could be a problem.

What if instead we tell the user which items were actually deleted:
User asks to delete {low: 1, hi: 301} (or just {hi: 301}), DynamoDb adapter deletes 1 to 150 then fails. The adapter returns a "partial success", the server responds to the client with {ctrl code=207 params={what="del", delseq={low: 1, hi: 151}}}? This way the client understands that the operation was only partially successful.

Alternatively, since this is an unusual limitation specific to DynamoDB, you can solve it in the DynamoDB adapter by creating a separate table just for the clear or keeping a custom field like dynamo_db_clear on subscriptions in the existing schema. When reading the messages in MessageGetAll you can just lookup the clear value from that place. It will be a cheap lookup by the primary key. The adapter owns the storage schema. You can modify it for you case in any way you see fit. It's an optimization specific to the adapter. Adapter should own it.

At a high level there are only three ways of dealing with non-atomic backend operations:

  • Have client issue only such requests which can be performed atomically by the server.
  • Have server report partial success. If the operation executed partially, let the client know which parts were executed and which failed. Let the client clean up the mess. For instance, the client can just reissue the request.
  • Same as above but try to clean up the mess on the server.

from chat.

or-else avatar or-else commented on May 22, 2024

Hi Ryandy,

Yep, hard deleting the message right now cannot be detected by offline users. The right approach is probably not to delete the message but to delete the Content only. I.e. make hard deleting the same as soft-deleting + setting Content to null. What do you think?

from chat.

or-else avatar or-else commented on May 22, 2024

Actually it should be setting Content and Head to null, not just Content.

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

Hmm…

Suppose there is a grpABCDEF which has Last SeqId = 10 & having 3 members: userA, userB, & userC where userA is topic owner. Suppose all the messages has been delivered to all members devices.

In other side, the Tinode client on each user’s device purposely cached the retrieved messages so when user load the topic it could only fetch the latest messages.

Suppose userA hard-delete messages with SeqId = 5 & SeqId = 6 while userB is online & userC is offline. Since it is hard-delete operation, it will be announced to all members of topic.

Since userB is online, he will receive {pres} notification so the client could remove the cache. But since userC is offline, he won’t receive anything so the deleted messages would still appear on userC device while they shouldn't.

So your suggested solution will not work because client simply doesn’t reload the messages from the beginning every time user load the topic. So I think there should be some kind of notification to offline user about the hard-delete action so the client could remove the cache.

If I remember correctly, Tinode used to have notification feature on me topic. So why did this feature removed especially for announcing hard-delete action?

from chat.

or-else avatar or-else commented on May 22, 2024

It will work the same way it works for soft deleted messages.

Imagine that in your scenario userB accesses topic from two different devices (for instance, android phone and web) has soft-deleted the message seq=6 from the web while android was offline. The android will find out that the message is deleted by sending {sub topic= grpABCDEF get=data[ifmodifiedsince=timestamp_of_the_cache]}. Then it will get the list of messages updated after the timestamp, including the soft-deleted message 6, marked as deleted.

I propose to do the same for hard-deleted messages.

from chat.

or-else avatar or-else commented on May 22, 2024

I just checked the code and actually that's how it works right now for messages deleted by list: https://github.com/tinode/chat/blob/master/server/db/rethinkdb/adapter.go#L907
I'm going to add null-ing the Head too.

from chat.

or-else avatar or-else commented on May 22, 2024

Like this: fc51910

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

The android will find out that the message is deleted by sending {sub topic= grpABCDEF get=data[ifmodifiedsince=timestamp_of_the_cache]}

Ah I see, I forgot that Tinode now already support browse options based on time. In my adapter I forgot to implement it so I was still using since & before. Hahahaha…

Btw, I found bug on rethinkdb adapter though. If I send {sub topic=grpABCDEF get=data[after:timestamp_of_the_cache]} (see here), it will return following error:

topic: error loading topics  gorethink: Cannot order by index `Topic_SeqId` after calling BETWEEN on index `Topic_UpdatedAt`. in:
r.DB("tinode").Table("messages").Between(["grpxaV0xERsHTg", {$reql_type$="TIME", epoch_time=1.510110138515e+09, timezone="+00:00"}], ["grpxaV0xERsHTg", r.MaxVal()], index="Topic_UpdatedAt").OrderBy(index=r.Desc("Topic_SeqId")).Limit(24)

Also I think when we delete message, we should also update its UpdatedAt field. Otherwise we won’t be able to get it since the UpdatedAt value still the same like CreatedAt value. Notice that index on messages table is Topic_UpdatedAt not Topic_DeletedAt.

from chat.

or-else avatar or-else commented on May 22, 2024

I added the new time-based options exactly because since and before did not work for deleted messages. I did not want to break old code so I did not remove them.

I'll fix the bugs tomorrow. It should be easy.

from chat.

or-else avatar or-else commented on May 22, 2024

And yes, you are right about UpdatedAt. I'll fix that too.

from chat.

or-else avatar or-else commented on May 22, 2024

Here is another idea.

Message SeqId can be viewed as a mutation number. A topic has a set of messages. When the set is mutated (changed) by adding a message to it (someone sent {pub}), the SeqId is incremented by 1. Because messages are sent one by one, every message can be assigned a unique ID which is also a mutation index.

What if I do the same with deletions? Deletion is also a mutation after all. In addition to SeqID I will create another Id, say DelId. Any time someone soft- or hard-deletes messages the DelId is incremented by 1 and the delId is assigned to the deleted messages. Then I'll add what="del" (in addition to "desc", "sub" and "data") to {sub} and {get} messages and will report DelId to users. The server will report the list of deleted message IDs in another {meta} message.

I think this is a cleaner and better approach than dealing with UpdatedAt.

What do you think?

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

Hmm…, yeah, I think that’s better approach👍

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

Hmmm..., wait. After some thought, wouldn't it simpler especially from client perspective if we just use UpdatedAt approach?

If we use UpdatedAt approach, client just simply store the last message of the timestamp for next request. On the next request client could just use this stored timestamp to fetch the latest messages including deleted messages. For every messages retrieved, client then just need to check the DeletedAt field. If it is nil, append to cache, but if it's not then delete from cache.

But if we used DelID approach, client also need to keep track of last DelID so it doesn't need to retrieve full list of deleted message ids every time it make request to server.

On the server side, if use UpdatedAt approach we just need to change a bit our adapter & keep existing architecture. But if we use DelID approach, we might change a lot existing architecture, not only just the adapter.

What do you think?

from chat.

or-else avatar or-else commented on May 22, 2024

If we use UpdatedAt approach, client just simply store the last message of the timestamp for next request.

Suppose you want to browse messages back in time. How do you know that there are no more older messages? You would need to know the timestamp of the very first message. Or I would have to tell you somehow "there are no more messages". Right now it's not a problem because the first message has SeqId=1.

On the next request client could just use this stored timestamp to fetch the latest messages including deleted messages

Yes, that's how it supposed to work now. There is a problem though. Once connected, the client requests subscriptions from me. Each subscription has a seq. The client compares that to the stored value. If the one from me is greater than the stored value, that means there are new messages in that subscription. But there is no equivalent way to find out if messages were deleted. So I have to add either UpdatedAt or DelId to subscriptions.

But if we used DelID approach, client also need to keep track of last DelID so it doesn't need to retrieve full list of deleted message ids every time it make request to server.

Yes. Without it it would have to keep track of the UpdatedAt. It's equivalent.

I don't mind implementing UpdatedAt. It's just does not seem as clean, because for some things browsing happens by id, for some by timestamp. It would be nice to use one or the other for everything.

from chat.

or-else avatar or-else commented on May 22, 2024

Regardless of how it's implemented, by DelId or UpdatedAt, what do you think about adding specific command for getting deleted messages, i.e. {get what=del}. The server would respond to that with a single {meta} message instead of a bunch of empty {data} messages?

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

But there is no equivalent way to find out if messages were deleted.

Hoo…, I see…

Yes, you’re right, currently we need to subscribe to the topic first to know whether or not there is a deleted messages.

It's just does not seem as clean, because for some things browsing happens by id, for some by timestamp.

Hmm…, yeah you’re right. It is much cleaner to use DelId.

what do you think about adding specific command for getting deleted messages, i.e. {get what=del}

Yup, I think that’s a great idea.

Then that means in MessagesGetAll() we need to only return messages which hasn’t been deleted, right? Since it will be redundant.

from chat.

or-else avatar or-else commented on May 22, 2024

Then that means in MessagesGetAll()

Correct. Plus another, similar method will be needed to return IDs of deleted messages only.

from chat.

or-else avatar or-else commented on May 22, 2024

Please take a look #44

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

Hello, Gene

Yeah, I think it's great (y)

But I think it would be nice if user could also perform delete messages by using clear beside range though. For example imagine following scenario:

  1. User A have a topic named topic X which has large number of messages, for example 300.
  2. User A want to clear messages in topic X.
  3. Since in current version user could only delete messages by using range, essentially server will delete messages with seq id 1 until 300, one by one.
  4. But if user could also delete messages by using clear, it just need to set the clear value on User A's subscription to topic X.

I don't know whether or not step 3 on RethinkDB is atomic, but on DynamoDB which update operation is restricted to only 1 update per request, it is really troublesome. Because if the error happens in the middle of delete process (for example when server processing message with seq id 150), then the changes which have been made to the messages so far would not be able to be reverted.

So I think it would be nice if user could also delete messages by using clear.

What do you think, Gene?

from chat.

riandyrn avatar riandyrn commented on May 22, 2024

Thanks a lot for the suggestions, Gene👍

These are really awesome ideas😎👍

from chat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.