Coder Social home page Coder Social logo

Inconsistent State in MVCC about hyrise-v1 HOT 14 CLOSED

hyrise avatar hyrise commented on July 16, 2024
Inconsistent State in MVCC

from hyrise-v1.

Comments (14)

bastih avatar bastih commented on July 16, 2024

Maybe we need to keep track of creation and expiration CIDs (which is for example what postgres mvcc does). (see p 58ff http://momjian.us/main/writings/pgsql/internalpics.pdf)

from hyrise-v1.

grundprinzip avatar grundprinzip commented on July 16, 2024

This is true. Anyone volunteers to replace the valid column with a creation vector and update the validation methods?

from hyrise-v1.

bastih avatar bastih commented on July 16, 2024

I'll take care of it.

from hyrise-v1.

mrks avatar mrks commented on July 16, 2024

The Postgres approach provides fewer transactional guarantees though: "The above assumes the default
read committed isolation level". Currently (ignoring the problem above), we guarantee serializable transactions. As far as I understand it, the Postgres approach would introduce lost updates, phantom reads, and non-repeatable reads.

from hyrise-v1.

grundprinzip avatar grundprinzip commented on July 16, 2024

Basically, we are not changing much: The CID becomes the end timestamp and we add a creation timestamp. The creation timestamp is written once together with the CID of the insertion commit. Validity rules remain as there are with the addition of the creation check.

A record with an end CID is not valid
A record with a creation CID is valid
A record with no CIDs and a TID is currently written
...

David has a state table for that I think...

from hyrise-v1.

bastih avatar bastih commented on July 16, 2024

do we actually need the valid bit then?

from hyrise-v1.

grundprinzip avatar grundprinzip commented on July 16, 2024

nope...

from hyrise-v1.

schwald avatar schwald commented on July 16, 2024

Ok, so if I get this right we would have the following for each row: TID, BeginCID, EndCID.
When writing the following states should occur (ignoring inconsistent states):

v.TID v.BeginCID v.EndCID LastCID Status
-- Inf Inf 12 Init
5 Inf Inf 12 Insert uncommitted
5 13 Inf 12 Insert commit in prog.
5 13 Inf 13 Insert committed
-- 13 Inf 13 Insert committed (cleaned up)
-- 13 14 13 delete commit in prog.
-- 13 14 14 delete committed

For the version visibility this results in:

v.TID == tx.TID tx.LastCID >= v.BeginCID tx.LastCID >= v.EndCID Keep? Comment
yes yes yes No Impossible
no yes yes No Past Delete
yes no yes No Impossible
(yes) yes no No Own Delete (delete list in TM)
no no yes No Impossible
yes no no Yes Own Insert
no yes no Yes Past Insert, Future Delete
no no no No Uncommitted Insert, Future Insert

Anything I missed?

from hyrise-v1.

mrks avatar mrks commented on July 16, 2024

We had a discussion about the implementation of deletes yesterday:

Having the delete list in the transaction manager might turn into a performance issue, depending on how many deletes we have. If ValidatePositions has to check for the existence of a deletion entry in the delete list, the best we can achieve O(log n).

The other idea would be to use some kind of delete lock, which could be implemented by reusing the TID. As you can see in the first table above, we do not need the TID field any more as soon as BeginCID is set. What we could do is to add a "delete uncommitted" state with the TID set to the TID of the deleting transaction and leaving the other fields untouched.

This might cause live locks when two transactions try to delete two rows in a different orders and always have to get aborted. However, this case does not violate the ACID criteria. Also, it can happen in other databases, like in MySQL.

from hyrise-v1.

grundprinzip avatar grundprinzip commented on July 16, 2024

Just my two cents here: validate positions will always preform a sequential scan over the deleted positions thus this is not o(LOG N). During the scan it will evaluate of it can commit.

However this does not solve the serialization of commits if they're executed in parallel as the is no possibility to mark a row as delete pending.

A way to mark rows as pending would be great and we can use the tid field for that.

If tid and begin cid is set it's a pending delete while a tid and no begin cid is a insert.

Open question : how to delete your own writes? How to handle abort and rollback of transactions correctly.

from hyrise-v1.

schwald avatar schwald commented on July 16, 2024

Regarding the decision if a transaction can commit or if it detects conflicts, we clearly need the list of deleted positions inside the TM.

However, I think the problem Markus refers to is a different one. If we do not write a TID in a deleted row and only remember the deleted position in a list in the TM, how does a transaction detect its own deletes. If it reads a table and has a list of positions, the ValidatePosScan has to detect all the own deletes.

If both lists were sorted, this could be done easily by iterating through the sorted lists in parallel. However, the delete list might not be in sort order but could be sorted before the ValidatePosScan. Can we assume that the list of positions that will be validated is sorted?

from hyrise-v1.

schwald avatar schwald commented on July 16, 2024

I think the issue of parallelizing commits should be a different discussion. See #138.

from hyrise-v1.

grundprinzip avatar grundprinzip commented on July 16, 2024

Regarding the sort order of the position lists, we cannot assume any ordering. So it's true we need something else and the proposed way of using the tid column is fine for me.

from hyrise-v1.

schwald avatar schwald commented on July 16, 2024

Ok, so let's assume we write the TID into a row when deleting the row in a transaction. So we have the following status combinations for a row:

v.TID v.BeginCID v.EndCID LastCID Status
-- Inf Inf 12 Init
5 Inf Inf 12 Insert uncommitted
5 13 Inf 12 Insert commit in prog.
5 13 Inf 13 Insert committed
-- 13 Inf 13 Insert committed (cleaned up)
6 13 Inf 13 delete uncommited
6 13 14 13 delete commit in prog.
6 13 14 14 delete committed

If we consider a set TID in a row as an exclusive lock, can the step between 'insert committed' and 'insert committed cleaned up' make any trouble? So we could have rows that are already committed but still locked. Do you see any problem here?

from hyrise-v1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.