Comments (8)
The errors from those methods should be repeated by anything that is called on the returned Stmt after prepare. As those errors are properly annotated with source, it should be safe to return them. I'll make a change to do that.
If you're seeing SQLITE_INTERRUPT, that almost certainly means the context associated with your connection has been cancelled.
from sqlite.
If you're seeing SQLITE_INTERRUPT, that almost certainly means the context associated with your connection has been cancelled.
I made sure it wasn't that! see https://github.com/fasterthanlime/chao and this long-ass twitter thread
tl;dr I was seeing a lot of SQLITE_INTERRUPT before the context was cancelled, I'm now pretty sure what I'm seeing is a duplicate of #17
from sqlite.
Could you add a printf here and see if you're hitting this code path?
Line 242 in 3f55bce
From what you're describing I don't think that's the problem, but after staring at it for a while I think that code path is problematic.
from sqlite.
Ah I see from your twitter thread that the SQLITE_INTERRUPT is coming from the sqlite3_prepare
call. Hmm.
from sqlite.
One thought is these sqlite docs:
Any new SQL statements that are started after the sqlite3_interrupt() call and before the running statements reaches zero are interrupted as if they had been running prior to the sqlite3_interrupt() call.
This makes it sound like the problem is your connection has an in-progress Stmt somewhere when you that is keeping the underlying connection interrupted when you go to prep another statement.
One way to test this: when you go to reset the context on the connection, do it by calling dbpool.Put
and then dbpool.Get
again. The new code I just added will catch the problematic in-progress statement when you call Put and panic, which will tell you where it is.
(I should do the same thing when someone calls SetInterrupt
.)
from sqlite.
Ah I see from your twitter thread that the SQLITE_INTERRUPT is coming from the sqlite3_prepare call. > Hmm.
At least it did at some point! I've also seen SQLITE_INTERRUPT returned from _step(), from _reset(), from _clearbinding(), etc.
Any new SQL statements that are started after the sqlite3_interrupt() call and before the running statements reaches zero are interrupted as if they had been running prior to the sqlite3_interrupt() call.
Yep, that + the sqlite_stmt_busy
docs made me think that that's what was happening - cycling through all cached statement and resetting them on dbpool.Get
seems to have fixed that problem.
..but then I ran into other issues, like SetInterrupt
hanging forever on a cancelCh
receive (in the goroutine, not in cancelInterrupt), and other times I was just getting consistent SQLITE_LOCKED
out of both connections of the pool and I don't think I can invest any more time in exploring this for now, sorry 😞
from sqlite.
Update:
I've updated 🔥 https://github.com/fasterthanlime/chao so it:
- Has a Makefile that builds it without optimizations & inlining
- This makes it easy to attach https://github.com/derekparker/delve to it
- or record it with https://github.com/mozilla/rr and replay it with delve
- Does not depend on any of my libraries (just
go-xorm/builder
)
What chao does
chao attempts to flirt with the amount of INSERT ON CONFLICT DO UPDATE SET
(upserts) it can do in memory in a certain deadline.
The goal is to have part of the queries succeed and part of the queries be interrupted, hopefully at various stages:
- It'll never interrupt in Prepare() because all the statements are cached
- It sometimes gets interrupted in
pool.Get()
- but there's retry logic built-in - It sometimes gets interrupted in
Stmt.Reset()
- It sometimes gets interrupted in
Stmt.Step()
- etc.
It tries to use a small (but > 2) connection pool (so it's easy to investigate in dlv
) and a small amount of workers (so the logs aren't too cluttered), and varies the workload:
- If the query succeeds too early, it increases the number of updates
- If the query is interrupted, it decreases the number of updates
It runs several "rounds" of heavy querying, each separated by a few seconds of "recess".
After all rounds are done, it runs a very simple query on each connection of the pool, with a 2-second timeout. If this query succeeds, the connection is deemed "healthy".
Without the workaround
Without the workaround, a lot of queries get interrupted, because we intentionally try to push the workload until we can't complete it within the deadline anymore:
(-) numUpdates = 4605
0 ...- [0xc4200a2780] succeeded 97.696724ms early
(+) numUpdates = 5526
0 .... [0xc4200a2320] interrupted 172.563µs after deadline (173.172563ms duration)
(-) numUpdates = 4973
0 ...- [0xc4200a2780] succeeded 140.473827ms early
(+) numUpdates = 5967
0 .... [0xc4200a2640] interrupted 263.001µs after deadline (111.263001ms duration)
(-) numUpdates = 5370
0 .... [0xc4200a25a0] interrupted 316.853µs after deadline (136.316853ms duration)
(-) numUpdates = 4833
0 .... [0xc4200a2460] interrupted 387.727µs after deadline (199.387727ms duration)
(-) numUpdates = 4349
0 .... [0xc4200a2500] interrupted 264.914µs after deadline (180.264914ms duration)
(-) numUpdates = 3914
0 .... [0xc4200a26e0] interrupted 400.294µs after deadline (125.400294ms duration)
0 ...- [0xc4200a2780] succeeded 133.105419ms early
(+) numUpdates = 4696
0 .... [0xc4200a2190] interrupted 368.89µs after deadline (147.36889ms duration)
(-) numUpdates = 4226
0 .... [0xc4200a23c0] interrupted 364.055µs after deadline (125.364055ms duration)
(-) numUpdates = 3803
0 .... [0xc4200a2320] interrupted 438.773µs after deadline (129.438773ms duration)
0 .... [0xc4200a2280] interrupted 299.835µs after deadline (168.299835ms duration)
At the final connection health check, I'm usually seeing "1 / 10 healthy conns":
Now testing connections...
Testing connection 0...
[0] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 1...
[1] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 2...
[2] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 3...
[3] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 4...
[4] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 5...
Testing connection 6...
[6] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 7...
[7] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 8...
[8] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Testing connection 9...
[9] sqlite.Exec: Stmt.Step: SQLITE_INTERRUPT (SELECT * FROM thieves LIMIT 1)
Found 1 / 10 healthy conns
:(
With the workaround
The workaround is to just... not set a deadline, ie. never cancel the connection's context.
So it's not doing the same thing, it effectively robs us of the ability to cancel requests early (for my use case, that's ok).
As a result, all the queries succeed (the only failure condition is us not waiting on the pool long enough):
5 .... [0x0] pool timeout (105.102637ms / 105ms)
5 .... [0x0] pool timeout (105.156695ms / 105ms)
3 .... [0x0] pool timeout (129.141502ms / 129ms)
5 .... [0x0] pool timeout (161.115876ms / 161ms)
5 ...+ [0xc4200a2460] succeeded 223.406454ms late
0 ...+ [0xc4200a2280] succeeded 64.341857ms late
1 ...+ [0xc4200a2280] succeeded 120.187759ms late
7 ...+ [0xc4200a2460] succeeded 145.737485ms late
5 .... [0x0] pool timeout (122.159863ms / 122ms)
3 .... [0x0] pool timeout (143.103395ms / 143ms)
8 .... [0x0] pool timeout (176.092786ms / 176ms)
1 ...+ [0xc4200a25a0] succeeded 462.855671ms late
10 ...+ [0xc4200a2640] succeeded 206.411653ms late
1 ...+ [0xc4200a2640] succeeded 162.279826ms late
6 ...+ [0xc4200a2780] succeeded 213.400416ms late
0 ...+ [0xc4200a2780] succeeded 101.938924ms late
10 ...+ [0xc4200a2320] succeeded 228.328918ms late
7 ...+ [0xc4200a2500] succeeded 238.080748ms late
5 ...+ [0xc4200a2190] succeeded 242.975096ms late
7 ...+ [0xc4200a2500] succeeded 241.771349ms late
15 ...+ [0xc4200a2780] succeeded 116.676525ms late
9 ...+ [0xc4200a2500] succeeded 239.820994ms late
18 ...+ [0xc4200a2460] succeeded 201.173352ms late
1 ...+ [0xc4200a26e0] succeeded 187.169667ms late
1 ...+ [0xc4200a2320] succeeded 232.396621ms late
18 ...+ [0xc4200a2640] succeeded 226.160701ms late
8 ...+ [0xc4200a2190] succeeded 210.415338ms late
24 ...+ [0xc4200a2320] succeeded 136.373109ms late
9 ...+ [0xc4200a23c0] succeeded 153.115875ms late
1 ...+ [0xc4200a2280] succeeded 224.001045ms late
0 ...+ [0xc4200a2190] succeeded 101.808727ms late
10 ...+ [0xc4200a26e0] succeeded 273.496389ms late
1 ...+ [0xc4200a2320] succeeded 176.150518ms late
All done in 7.47433119s
Sleeping 2 seconds to let SQLite settle...
At the end, all connections are healthy:
Now testing connections...
Testing connection 0...
Testing connection 1...
Testing connection 2...
Testing connection 3...
Testing connection 4...
Testing connection 5...
Testing connection 6...
Testing connection 7...
Testing connection 8...
Testing connection 9...
Found 10 / 10 healthy conns
:)
What could cause SQLite to return SQLITE_INTERRUPT?
The main suspect (as far as I'm concerned) is this:
Any new SQL statements that are started after the sqlite3_interrupt() call and before the running statements reaches zero are interrupted as if they had been running prior to the sqlite3_interrupt() call
What does SQLite consider a "running statement" ?
The sqlite3_stmt_busy(S) interface returns true (non-zero) if the prepared statement S has been stepped at least once using sqlite3_step(S) but has neither run to completion (returned SQLITE_DONE from sqlite3_step(S)) nor been reset using sqlite3_reset(S). The sqlite3_stmt_busy(S) interface returns false if S is a NULL pointer. If S is not a NULL pointer and is not a pointer to a valid prepared statement object, then the behavior is undefined and probably undesirable.
This interface can be used in combination sqlite3_next_stmt() to locate all prepared statements associated with a database connection that are in need of being reset. This can be used, for example, in diagnostic routines to search for prepared statements that are holding a transaction open.
What about the pool.Put
check?
I don't think the check is strong enough. Consider this:
Lines 501 to 534 in 3f55bce
In the codepath where we get SQLITE_INTERRUPT
from wait_for_unlock_notify
:
- we return
false
forrowReturned
lastHasRow
is set tofalse
- ...even if a previous
Step()
call did return a row
The same thing happens (I think) when stmt.interrupted
returns a non-nil error.
What doesn't fix this?
Things I've tried:
- Resetting all cached statements (conn.stmts) on
Pool.Put
- Resetting all cached statements (conn.stmts) on
Pool.Get
defer stmt.Reset()
inexec
- although I suspect this doesn't work because
stmt.Reset()
never callssqlite3_reset
if the conn is interrupted bydoneCh
- although I suspect this doesn't work because
What might fix this?
As I'm writing this (I haven't tried it yet), I suspect something along the lines of:
defer func() {
if rowReturned {
stmt.lastHasRow = true
}
}()
...might fix it. I'll try it now, but I did want to leave my findings somewhere.
How do I run chao myself?
- Make sure you have the latest
crawshaw/sqlite
in your $GOPATH - Review the code to make sure I didn't sneak in any harmful stuff
- Heck, run it in a VM
- Review the
Makefile
- It runs
go get -v -x
with gcflags to disable inlining and optimization
- It runs
- Run
make
- Run
chao
You might want to run rr chao
to capture an execution trace instead. mozilla's rr is Linux-only afaict, and your distro probably has outdated packages for it:
- https://github.com/mozilla/rr
- Note that I've seen
rr chao
get stuck for seemingly no reason - You won't be able to
dlv attach
to an instance of chao already being recorded byrr
Then:
- Study
vars.go
- Enable the workaround
- Run
make
andchao
again - All connections should be healthy at the end
If you want to debug:
- Run
dlv replay ~/.local/share/rr/chao-XXX
if you captured an execution trace (recommended)
or:
- Enable
facilitateDebugging
invars.go
- Rebuild (
make
) - chao will print a dlv command you can copy-paste, with the right PID
- Run the command (something like
dlv attach 999
) - Weep whenever delve says it can't read part of the memory
I was able to debug with delve on both Windows 10 64-bit & Linux 64-bit, but I wasn't able to gather any surprising information.
from sqlite.
I found what I believe to be the two root causes so I opened #20 and #21 which are more palatable.
from sqlite.
Related Issues (20)
- Code generation? HOT 1
- doubled quoted key names as argument to json_object cause Prepare to fail HOT 1
- Multiple sqlitex.Pools giving off by 1 values HOT 2
- sqlitex.ExecScript fails to run scripts that include statements that cannot be executed within a transaction HOT 2
- API to retrieve column blob into slice of arbitrary length HOT 2
- BindBytes stores empty blobs as NULL HOT 2
- Pool.Close leaves behind WAL files HOT 8
- The state of the project. HOT 8
- No way to write blob in custom function
- Compile Time error HOT 5
- Compilation warning about memcpy with gcc 11.2.0 HOT 8
- sqlite.Stmt.Finalize should panic if called on a Stmt returned from Conn.Prepare HOT 5
- Running PRAGMA statments in sqlitex.OpenInit script
- sqlite3 math extension HOT 4
- Unable to build on Windows
- Build failure when using gcc 12. HOT 1
- ExecScript and Save panic on corrupt database
- sqlitex.ExecTransient and sqlitex.Exec not executing statements
- Support sqlite3_db_cacheflush HOT 5
- Blob struct should not have io interfaces as fields HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sqlite.