Comments (4)
Comment by yoshinorim
Tuesday Sep 22, 2015 at 16:46 GMT
Here are basic design plans. Since some data dictionary format change is needed, it makes some sense to add this feature before GA.
- DDL
create a hidden auto_increment internal primary key - Dictionary
Adding a flag of using hidden pk or not - Row format
PK: {key_id, hidden_primary_key, other columns}
SK: {key_id, sec_key, hidden_primary_key} - Utility functions
has_primary_key() - CF
hidden pk are always stored into default CF - Insert
get next value
assign hidden pk value (auto_inc) - Read
- memcmp is always done by SK
- User-Defined unique indexes
I think we can simply disallow this -- normally it should be easier to convert from unique key to primary key. We don't have any table that has unique key but does not have primary key. - RBR slave
If the table has no PK, the slave will use the first applicable index to find the row to update. It is the same with InnoDB+RBR.
RDBSE_KEYDEF member functions
- RDBSE_KEYDEF::get_primary_key_tuple()
- ha_rocksdb::get_row_by_rowid()
- get by hidden pk
- ha_rocksdb::convert_record_from_storage_format()
- RDBSE_KEYDEF::unpack_record()
- if this table uses hidden pk, exclude the pk from packed_key then unpack.
- RDBSE_KEYDEF::pack_record()
from mysql-5.6.
Comment by jkedgar
Tuesday Sep 22, 2015 at 17:09 GMT
By using an auto increment key you have to store the last (or next) value somewhere and protect it with a mutex when incrementing it, potentially causing a bottleneck in the code. What about instead of an auto-incrementing column using a 64-bit random value? There would be a very small chance of picking a value that is already in use, but if that occurred you would just get a new value during insert.
I'm not sure what our maximum number of rows in the database is, but even at 100 billion rows, only about 0.000000005% of the possible values would be used making the chance of duplicates still very small.
The pros:
- no mutex during insert
- no special storage for next value
The cons:
- a 64-bit value per row (although this would also be necessary for the auto-increment method if you can't guarantee that the total number of rows inserted is less than about 4 billion).
- very, very rarely (almost never - but a very small possibility) the insert would fail because of a duplicate value and a new primary key would need to be generated randomly.
- testing the code that handled a duplicate primary key on insert would need special consideration (i.e. some way of forcing a duplicate value).
from mysql-5.6.
Comment by yoshinorim
Tuesday Sep 22, 2015 at 20:35 GMT
- Tables without primary keys are relatively rare, and they're not heavily accessed, so improving performance is not much worth spending time.
- In InnoDB, next auto-inc value is kept in memory. At startup, it just calculates by reading the table -- max(auto_inc)+1. I think we can take same approach in MyRocks. This does not use extra storage. It's less efficient at startup, but would be ok since performance is not much important.
from mysql-5.6.
closed by https://reviews.facebook.net/D52839
from mysql-5.6.
Related Issues (20)
- force_flush_memtable_and_lzero_now always compacts to physical L1
- Make compilation faster HOT 1
- How to achieve ~20% replication throughput improvement using Read Free Replication (RFR) feature
- Keep long-running MyRocks mtr tests in their own suite HOT 1
- MyRocks 8.0.28 has poor performance of primary key range query HOT 1
- create secondary index needs attention
- issue while install HOT 1
- Error during create secondary index
- Update the Build Steps page HOT 1
- MyRocks engine should respect WITH_UNIT_TESTS
- Allow users to manually set the number of block cache shards HOT 1
- Provide a counter to show pending compaction bytes for RocksDB
- MyRocks does commit step non-durably under server group 2PC protocol HOT 1
- alter talbe add index optimization
- Range lock support HOT 2
- Cached RocksDB transaction object accessed after delete by XA COMMIT
- optimize table has no effect on HIDDEN_PK table
- Rdb_iterator_base::next_with_direction: too many compares for eof check HOT 2
- Determin secondary index value emptiness by datadic
- undefined reference to `sgemm_' When compiling vector DB
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mysql-5.6.