This issue has been solved in the WordPress/sqlite-database-integration plugin, which is available in WordPress Plugin Directory. I'll leave it open here for posterity.
The problem
A lot of WordPress unit tests fail with the SQLite plugin is installed.
The root cause is the method of rewriting MySQL queries to SQLite queries: regular expressions and string lookups. It's hard to support the variety of SQL queries used in WordPress using these tools. I explored patching this plugin in WordPress Playground, but there's too many cases to cover โ I included a few examples of failing queries at the bottom of this description.
Proposed solution
Parsing MySQL queries and using contextual information would make this plugin much more bulletproof.
Some queries are hard to reason about with tools like preg_replace
or strpos
:
-- Valid SQL inside quotes
INSERT INTO wp_options VALUES ('ON DUPLICATE KEY UPDATE')
-- Four "datetime" occurrences, each with a different meaning:
CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )
-- Comma inside a string value derails one of the explode(",") calls
UPDATE wp_options SET option_value=','
In contrast, applying the SQL grammar would make rewriting these queries much easier:
// 'ON DUPLICATE KEY' can't be mistaken for SQL keyword:
[
'INSERT INTO' => [
'table' => 'wp_options',
'values' => ['ON DUPLICATE KEY UPDATE']
]
]
// It's clear how to treat each part of the query:
[
'CREATE TABLE' => [
'name' => 'test',
'fields' => [
'datetime' => [
'type' => 'datetime',
'notnull' => true
]
],
'keys' => [
'datetime' => ['datetime']
]
]
]
// There's no risk of mistaking the inserted "," for a field separator:
[
'UPDATE' => [
'table' => 'wp_options',
'set' => [
'option_value' => ','
]
]
]
How to parse SQL queries?
Full parsing may not be needed. Even the array representation from the examples above may be too much. Creating many objects, arrays, and function calls for each SQL query would make WordPress extremely slow. There's a faster alternative.
Consider WP_HTML_Tag_Processor, a new WordPress API that enables adding, removing, and updating HTML attributes. It never turns the input HTML into a tree. Instead, it consumes the input HTML string one tag and one attribute at a time, and uses the contextual information to build the output HTML:
$w = new WP_HTML_Tag_Processor( '<a href="#" class="button"></a>' );
$w->next_tag( 'a' );
$w->remove_attribute( 'href' );
echo $w->get_updated_html();
// Outputs: <a class="button"></a>
// No tree was constructed. Instead, the processor used the HTML parsing rules
// to find the start and end of href="#", and then excluded that attribute from
// the output HTML
Perhaps the same technique could be used to rewrite MySQL queries?
For example:
$mysql = "CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )";
$sqlite = '';
$at = 0;
$max = strlen($sql);
while($at < $max) {
$last_at = $at;
// Find the first character that's not " ", "\t", or "("
$at += strcspn($mysql, "\t (", $at);
// Find out what token we're looking at now:
$token = consume_token($mysql, $at);
if ( $token === 'CREATE TABLE' ) {
// If it's a CREATE TABLE query, append the correct SQLite
// command to the output string:
$sqlite .= 'CREATE TABLE ';
// Expect the table name to at the first non-whitespace character
$at += strcspn($mysql, " \t", $at);
$table_name_starts = $at;
// Skip backticks
if($query[$at] === "`") {
++$at;
}
// Expect the table name to consist of alnum
$at += strspn($mysql, "abcdefghijklmnopqrstuwxyzABCDEFGHIJKLMNOPQRSTUWXYZ0123456789", $at);
// Skip backticks
if($query[$at] === "`") {
++$at;
}
$table_name_ends = $at;
$table_name = substr($mysql, $table_name_starts, $table_name_ends - $table_name_starts );
if ( !$table_name ) {
return false; // Syntax error
}
// Append a quoted table name to the output query
$sqlite .= '"' . $table_name . '"';
// Find and skip the '('
$at += strcspn($mysql, '(', $at) + 1;
// Consume the field name and definition
// ...
} else if($token === 'SELECT' ) {
// Consume a SELECT query
} else if // ...
}
// Consume CREATE
Too low-level? Here's a higher-level idea:
preg_match(
'^(INSERT|CREATE|UPDATE|SELECT)\s+',
$query,
&$matches,
PREG_OFFSET_CAPTURE,
$at
);
Use preg_match
to capture the next token, but nothing else. The parser would still move forward a single token at a time and make decisions based on what it just captured. The idea is to never write a regexp for the entire query. It's more readable than the lower-level counterparts, still reasonably fast, and only treats one token at a time. Also โ if it works, it could be replaced by a lower-level implementation relatively easily.
While I am proposing a major change, I specifically don't mean a full rewrite. This plugin got a lot of things right:
- Data types are mapped
- Different types of queries are recognized and treated separately
- When 1<=>1 translation isn't possible, this plugin translates a MySQL query to a series of SQLite queries
The existing work is an excellent foundation and I only propose to take it to the next level.
Finally, thank you for maintaining this plugin @aaemnnosttv! It made WordPress Playground possible.
Failing queries
Here's a few examples of queries that didn't work for me at one point or another:
DELETE a, b FROM wp_options a, wp_options b
WHERE a.option_name LIKE '_transient_%'
AND a.option_name NOT LIKE '_transient_timeout_%'
AND b.option_name = CONCAT( '_transient_timeout_', SUBSTRING( a.option_name, 12 ) )
AND b.option_value < 1675613957
CREATE TABLE wp_wc_download_log (
download_log_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
timestamp datetime NOT NULL,
permission_id BIGINT UNSIGNED NOT NULL,
user_id BIGINT UNSIGNED NULL,
user_ip_address VARCHAR(100) NULL DEFAULT '',
PRIMARY KEY (download_log_id),
KEY permission_id (permission_id),
KEY timestamp (timestamp)
CREATE TABLE test ( timestamp datetime NOT NULL, KEY timestamp ( timestamp ) )
ALTER TABLE wp_wc_product_download_directories CHANGE COLUMN `url` url varchar(256) NOT NULL
ALTER TABLE wp_woocommerce_downloadable_product_permissions ADD PRIMARY KEY (`permission_id`)
cc @aristath @felixarntz @dmsnell @gziolo