Comments (12)
Comment by agiardullo
Thursday Jun 18, 2015 at 03:39 GMT
-
Yes, I still need to add iterator support. A hacky workaround would be to temporary re-read keys that you iterate through until iterator support is present. (Fyi, we do not have plans to add gap-locking support right now.)
-
From discussions with Yoshinori, I was under the impression that we did not need to support partial rollback of a transaction. But this is do-able if needed. How critical is this support in the short-term?
-
Correct. We discussed not supporting shared locks in v1 and will likely add support for this in the future. The initial plan was to have lock in shared mode simply take an exclusive lock in the meantime.
from mysql-5.6.
Comment by spetrunia
Tuesday Jun 23, 2015 at 17:38 GMT
- From discussions with Yoshinori, I was under the impression that we did not need to support partial rollback of a transaction. But this is do-able if needed. How critical is this support in the short-term?
In order to have SQL semantics, one has to be able to rollback a failed statement. The statement may be a part of a transaction, in that case the statement must be rolled back, but the transaction must remain open (i.e. neither commit nor rollback).
If we only use Pessimistic Transaction API for locking, we can achieve correct behavior by simply not releasing failed statement's locks util the transaction has either committed or rolled back. This will limit concurrency but is a correct behavior.
If we use Pessimistic Transaction API to also hold transaction's changes, then we need to be able roll back: If a statement inside a transaction changes a few rows and then fails, these changes must not be visible by the rest of the transaction.
from mysql-5.6.
Comment by spetrunia
Tuesday Jul 14, 2015 at 10:45 GMT
I've read the new Pessimistic Transactions patch (posted at https://reviews.facebook.net/D40869 ). It looks like the patch provides everything that MyRocks needs.
from mysql-5.6.
Comment by spetrunia
Friday Jul 24, 2015 at 11:02 GMT
Getting close to getting something to work, found one thing that I forgot about.
In the current system, row lock waits are integrated into MySQL. SHOW PROCESSIST
shows waits as:
+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+
| Id | User | Host | db | Command | Time | State | Info | Rows examined | Rows sent | Tid |
+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+
| 1 | root | localhost | j8 | Sleep | 87 | | NULL | 0 | 0 | 9827 |
| 3 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 | 9907 |
| 4 | root | localhost | j8 | Query | 9 | Waiting for row lock | insert into t1 values (1,'one-one') | 0 | 0 | 9830 |
+----+------+-----------+------+---------+------+----------------------+-------------------------------------+---------------+-----------+------+
and one can use KILL QUERY
to immediately terminate the query that is waiting for the lock.
With my new code that uses pessimistic RocksDB trx API, we have state=Updating and the thread is not KILLable:
+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+
| Id | User | Host | db | Command | Time | State | Info | Rows examined | Rows sent | Tid |
+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+
| 18 | root | localhost | j5 | Query | 0 | cleaning up | show processlist | 0 | 0 | 5583 |
| 19 | root | localhost | j5 | Sleep | 19 | | NULL | 1 | 0 | 5584 |
| 20 | root | localhost | j5 | Query | 15 | updating | update test set value = 12 where id = 1 | 0 | 0 | 7194 |
+----+------+-----------+------+---------+------+-------------+-----------------------------------------+---------------+-----------+------+
In order to achieve state="Waiting for row lock" and KILLability, MyRocks does the following in storage/rocksdb/rdb_locks.cc
:
thd= current_thd;
thd_enter_cond(thd, &found_lock->cond, &found_lock->mutex,
&stage_waiting_on_row_lock, &old_stage);
...
res= mysql_cond_timedwait(&found_lock->cond, &found_lock->mutex,
&wait_timeout);
// check if there was SQL KILL command:
killed= thd_killed(thd);
thd_exit_cond(current_thd, &old_stage);
That is, we use MySQL's wrappers over pthread_cond_t and pthread_mutex_t, mysql's wait function, and also we inform MySQL that we start/finish waiting.
Possible ways out:
- Make "Pessimistic Transaction API" support all of the above (i.e. user-provided wait condition, mutex, and wait function)
- Forget about all this for now.
from mysql-5.6.
Comment by spetrunia
Friday Jul 24, 2015 at 18:13 GMT
Consider two cases:
Case 1.
trx1> txn1= txn_db->BeginTransaction(...);
trx2> txn2= txn_db->BeginTransaction(...);
trx1> txn1->Put($key, $value1); // This gets a lock
trx2> txn2->Put($key, $value2); // This waits for the lock
The last call times out and returns:
(gdb) p s
$52 = {code_ = rocksdb::Status::kBusy, state_ = 0x0}
I have made MyRocks to return ER_LOCK_WAIT_TIMEOUT to SQL layer in this case.
Case 2:
trx2> txn2= txn_db->BeginTransaction(...);
trx1> txn1= txn_db->BeginTransaction(...);
trx1> txn1->Put($key, $value1);
trx1> txn1->Commit();
trx2> txn2->Put($key, $value2);
The returned value is the same as in Case 1:
(gdb) print s
$54 = {code_ = rocksdb::Status::kBusy, state_ = 0x0}
But the situation is different. In the Case 1, it makes sense to wait more. In Case 2, increasing wait timeout or retrying won't help.
The return value is the same, though, so MyRocks returns ER_LOCK_TIMEOUT in Case 2, too. For the SQL user, the error message is misleading.
I am not sure how big of a problem this is.
from mysql-5.6.
Comment by agiardullo
Friday Jul 24, 2015 at 22:43 GMT
This is good feedback. About distinguishing error cases, I can change the api to return a different status in each case. I believe there are 4 interesting cases:
Case 1 above: Transaction timed out waiting to acquire a lock. In this case, it is possible that having a longer timeout could have succeeded (but we dont know since we never got the lock).
Case 2 above: Detected a write conflict.
Case 3: Transaction has an expiration time set and is expired.
Case 4: We don't have enough memtable history to determine whether there are any conflicts (User could then choose to tune max_write_buffer_number_to_maintain).
Should we have a different Status for each of these 4 cases? How about:
case1: Status::LockWaitTimeout (new status code)
case2: Status::Busy
case3: Status::TimedOut (there is already precedent in rockdsb for using TimedOut to mean 'expired').
case4: Status::TryAgain (new status code)
from mysql-5.6.
Comment by agiardullo
Friday Jul 24, 2015 at 22:49 GMT
Re MySql locking: I think we could come up with an api that lets you override what mutex/condvar is used to do locking. But I will need to look into mysql a bit and think about this some more.
from mysql-5.6.
Comment by agiardullo
Thursday Aug 06, 2015 at 03:10 GMT
Sergei, how about we chat about myrocks locking api? I just sent you a fb msg.
from mysql-5.6.
Comment by spetrunia
Tuesday Sep 08, 2015 at 10:50 GMT
Got an interesting problem while rebasing the patch over the current tree.
rocksdb.rocksdb test started to fail with an error like this:
MySQL [test]> create table t1 (
-> id int not null,
-> key1 int,
-> PRIMARY KEY (id),
-> index (key1) comment '$per_index_cf'
-> ) engine=rocksdb;
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> insert into t1 values (1,1);
Query OK, 1 row affected (0.00 sec)
MySQL [test]> select * from t1;
+----+------+
| id | key1 |
+----+------+
| 1 | 1 |
+----+------+
1 row in set (0.00 sec)
MySQL [test]> alter table t1 add col2 int;
ERROR 1815 (HY000): Internal error: Operation failed. Try again.: Transaction
could not check for conflicts for opearation at SequenceNumber 9 as the
MemTable only contains changes newer than SequenceNumber 10. Increasing the
value of the max_write_buffer_number_to_maintain option could reduce the
frequency of this er
while the expected error was:
ERROR 42000: This version of MySQL doesn't yet support 'ALTER TABLE on table with per-index CF'
The new error is caused by this scenario:
- ALTER TABLE starts execution
- ha_rocksdb::external_lock() gets a snapshot with current sequence number (it's 9)
- The code in Sequence_generator::get_and_update_next_number
makes a write which increases current sequence number to 10 (NEW-WRITE) - A new column family is created for $per_index_cf. As part of that, a MemTable
is created. The MemTable holds changes that start from sequence_no=10. - We attempt to write a row into the new column family. The tranaction's
snapshot has SequenceNumber=9, while MemTable has data starting from
SequenceNumber=10.(SEQ-CMP) We get an error.
The problem is not observable on the current repository, because old-style
locking doesn't care about Sequence Numbers.
The problem was not observable before the rebase, because the write (NEW-WRITE) didn't
happen, and comparison (SEQ-CMP) compared two equal sequence numbers.
from mysql-5.6.
Comment by spetrunia
Tuesday Sep 08, 2015 at 12:17 GMT
- Tried to disallow creation of "temporary" tables (temporary here means the name is
#sql_nnnn
) with$per_index_cf
column families. There is no clear way to do this, it seems. - The problem only affects
ALTER TABLE
statements.CREATE TABLE ... SELECT
is not affected, because it creates the destination table before it starts the transaction.
from mysql-5.6.
Comment by yoshinorim
Tuesday Sep 08, 2015 at 17:28 GMT
I recently committed https://reviews.facebook.net/D45963 that increments index_id and commits into data dictionary at Sequence_generator::get_and_update_next_number(). The internal begin->commit happens per index creation. I discussed with Herman in the diff --
I think there are two ways -- one is using Transaction API and doing begin -> select for update -> update -> commit. the other is what Herman suggested -- begin->update->commit at get_next_number(). I think the latter is easier and fine. Performance is not a concern since get_next_number() is called at index creation (DDL) only.
Maybe it's better to switch to select for update -> update at Sequence_generator::get_and_update_next_number(), within a transaction created at ha_rocksdb::create_key_defs(), then commit altogether?
from mysql-5.6.
Comment by hermanlee
Tuesday Sep 08, 2015 at 17:51 GMT
The select for update on the sequence_number would block other create table requests until the transaction is committed? If we're performing a restore of multiple databases where tables are created in parallel, could one of the table creates fail due to timing out on the select for update?
from mysql-5.6.
Related Issues (20)
- How to achieve ~20% replication throughput improvement using Read Free Replication (RFR) feature
- Keep long-running MyRocks mtr tests in their own suite HOT 1
- MyRocks 8.0.28 has poor performance of primary key range query HOT 1
- create secondary index needs attention
- issue while install HOT 1
- Error during create secondary index
- Update the Build Steps page HOT 1
- MyRocks engine should respect WITH_UNIT_TESTS
- Allow users to manually set the number of block cache shards HOT 1
- Provide a counter to show pending compaction bytes for RocksDB
- MyRocks does commit step non-durably under server group 2PC protocol HOT 1
- alter talbe add index optimization
- Range lock support HOT 2
- Cached RocksDB transaction object accessed after delete by XA COMMIT
- optimize table has no effect on HIDDEN_PK table
- Rdb_iterator_base::next_with_direction: too many compares for eof check HOT 2
- Determin secondary index value emptiness by datadic
- undefined reference to `sgemm_' When compiling vector DB
- -DWITH_LZ4=system doesn't find the lz4 library on the fb-mysql-8.0.28 branch
- is_thd_db_read_only_by_name CPU overhead can be reduce
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mysql-5.6.