Coder Social home page Coder Social logo

Comments (31)

simonw avatar simonw commented on June 23, 2024

I need reliable steps to reproduce, then I can bisect and figure out which exact version of Datasette introduced the problem.

I have a hunch that it relates to changes made to the datasette/database.py module, maybe one of these changes here: 0.61...0.63.1#diff-4e20309c969326a0008dc9237f6807f48d55783315fbfc1e7dfa480b550e16f9

from datasette.

simonw avatar simonw commented on June 23, 2024

The good news is that this bug is currently unlikely to affect most users since named in-memory databases (created using datasette.add_memory_database("airtable_refs") (docs) are a pretty obscure feature, only available to plugins.

from datasette.

simonw avatar simonw commented on June 23, 2024

Just managed to get this exception trace:

    return await self.route_path(scope, receive, send, path)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/app.py", line 1354, in route_path
    response = await view(request, send)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 134, in view
    return await self.dispatch_request(request)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 91, in dispatch_request
    return await handler(request)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 361, in get
    response_or_template_contexts = await self.data(request, **data_kwargs)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 158, in data
    return await self._data_traced(request, default_labels, _next, _size)
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 568, in _data_traced
    await gather(execute_facets(), execute_suggested_facets())
  File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 177, in _gather_parallel
    return await asyncio.gather(*args)
asyncio.exceptions.CancelledError
INFO:     127.0.0.1:64109 - "GET /airtable_refs/airtable_refs?_facet=table_name&table_name=Sessions HTTP/1.1" 500 Internal Server Error
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit
    t.join()
  File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

from datasette.

simonw avatar simonw commented on June 23, 2024

The bug exhibits when I try to add a facet. I think it's caused by the parallel query execution I added to facets at some point.

http://127.0.0.1:8045/airtable_refs/airtable_refs - no error
http://127.0.0.1:8045/airtable_refs/airtable_refs?_facet=table_name#facet-table_name - hangs the server

Crucial line in the traceback:

    await gather(execute_facets(), execute_suggested_facets())

From here:

await gather(execute_facets(), execute_suggested_facets())

from datasette.

simonw avatar simonw commented on June 23, 2024

That line was added in 942411e which first shipped in 0.62a0.

from datasette.

simonw avatar simonw commented on June 23, 2024

I wrote this test, but it passes:

@pytest.mark.asyncio
async def test_facet_against_in_memory_database():
    ds = Datasette()
    db = ds.add_memory_database("mem")
    await db.execute_write("create table t (id integer primary key, name text)")
    await db.execute_write_many(
        "insert into t (name) values (?)", [["one"], ["one"], ["two"]]
    )
    response1 = await ds.client.get("/mem/t.json")
    assert response1.status_code == 200
    response2 = await ds.client.get("/mem/t.json?_facet=name")
    assert response2.status_code == 200
    assert response2.json() == {
        "ok": True,
        "next": None,
        "facet_results": {
            "results": {
                "name": {
                    "name": "name",
                    "type": "column",
                    "hideable": True,
                    "toggle_url": "/mem/t.json",
                    "results": [
                        {
                            "value": "one",
                            "label": "one",
                            "count": 2,
                            "toggle_url": "http://localhost/mem/t.json?_facet=name&name=one",
                            "selected": False,
                        },
                        {
                            "value": "two",
                            "label": "two",
                            "count": 1,
                            "toggle_url": "http://localhost/mem/t.json?_facet=name&name=two",
                            "selected": False,
                        },
                    ],
                    "truncated": False,
                }
            },
            "timed_out": [],
        },
        "rows": [
            {"id": 1, "name": "one"},
            {"id": 2, "name": "one"},
            {"id": 3, "name": "two"},
        ],
        "truncated": False,
    }

from datasette.

simonw avatar simonw commented on June 23, 2024

Landing a version of that test anyway.

from datasette.

simonw avatar simonw commented on June 23, 2024

Maybe it's not related to faceting - I just got it on a hit to http://127.0.0.1:8045/airtable_refs/airtable_refs instead.

from datasette.

simonw avatar simonw commented on June 23, 2024

Sometimes it takes a few clicks for the bug to occur, but it does seem to always be within the in-memory database.

from datasette.

simonw avatar simonw commented on June 23, 2024

I managed to trigger it by loading http://127.0.0.1:8045/airtable_refs/airtable_refs - which worked - and then hitting refresh on that page a bunch of times until it hung.

from datasette.

simonw avatar simonw commented on June 23, 2024

I switched that particular implementation to using an on-disk database instead of an in-memory database and could no longer recreate the bug.

from datasette.

simonw avatar simonw commented on June 23, 2024

OK, I can trigger the bug like this:

datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs

Can I write a bash script that fails (and terminates the process) if it takes longer than X seconds?

from datasette.

simonw avatar simonw commented on June 23, 2024

This worked, including on macOS even though GPT-4 thought timeout would not work there: https://chat.openai.com/share/cc4628e9-5240-4f35-b640-16a9c178b315

#!/bin/bash

# Run the command with a timeout of 5 seconds
timeout 5s datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs

# Check the exit code from timeout
if [ $? -eq 124 ]; then
    echo "Error: Command timed out after 5 seconds."
    exit 1
fi

from datasette.

simonw avatar simonw commented on June 23, 2024

I'm now trying this test script:

#!/bin/bash

port=8064
# Start datasette server in the background and get its PID
datasette pottery2.db -p $port &
server_pid=$!

# Wait for a moment to ensure the server has time to start up
sleep 2

# Initialize counters and parameters
retry_count=0
max_retries=3
success_count=0
path="/airtable_refs/airtable_refs"

# Function to run curl with a timeout
function test_curl {
    # Run the curl command with a timeout of 3 seconds
    timeout 3s curl -s "http://localhost:${port}${path}" > /dev/null
    if [ $? -eq 0 ]; then
        # Curl was successful
        ((success_count++))
    fi
}

# Try three parallel curl requests
while [[ $retry_count -lt $max_retries ]]; do
    # Reset the success counter
    success_count=0

    # Run the curls in parallel
    echo "  Running curls"
    test_curl
    test_curl
    test_curl #  & test_curl & test_curl &

    # Wait for all curls to finish
    #wait

    # Check the success count
    if [[ $success_count -eq 3 ]]; then
        # All curls succeeded, break out of the loop
        echo "  All curl succeeded"
        break
    fi

    ((retry_count++))
done

# Kill the datasette server
echo "Killing datasette server with PID $server_pid"
kill -9 $server_pid
sleep 2

# Print result
if [[ $success_count -eq 3 ]]; then
    echo "All three curls succeeded."
    exit 0
else
    echo "Error: Not all curls succeeded after $retry_count attempts."
    exit 1
fi

I run it like this:

git bisect reset
git bisect start
git bisect good 0.59.4
git bisect bad 1.0a6
git bisect run ../airtable-export/testit.sh

But... it's not having the desired result, I think because the bug is intermittent so each time I run it the bisect spits out a different commit as the one that is to blame.

from datasette.

simonw avatar simonw commented on June 23, 2024

Output while it is running looks like this:

running  '../airtable-export/testit.sh'
INFO:     Started server process [75649]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
  Running curls
  Running curls
  Running curls
Killing datasette server with PID 75649
../airtable-export/testit.sh: line 54: 75649 Killed: 9               datasette pottery2.db -p $port
Error: Not all curls succeeded after 3 attempts.
Bisecting: 155 revisions left to test after this (roughly 7 steps)
[247e460e08bf823142f7b84058fe44e43626787f] Update beautifulsoup4 requirement (#1703)
running  '../airtable-export/testit.sh'
INFO:     Started server process [75722]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
  Running curls
  Running curls
  Running curls
Killing datasette server with PID 75722
../airtable-export/testit.sh: line 54: 75722 Killed: 9               datasette pottery2.db -p $port
Error: Not all curls succeeded after 3 attempts.
Bisecting: 77 revisions left to test after this (roughly 6 steps)
[3ef47a0896c7e63404a34e465b7160c80eaa571d] Link rel=alternate header for tables and rows
running  '../airtable-export/testit.sh'
INFO:     Started server process [75818]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
  Running curls

from datasette.

simonw avatar simonw commented on June 23, 2024

I tried it with a path of / and everything passed - so it's definitely the path of /airtable_refs/airtable_refs (an in-memory database created by an experimental branch of https://github.com/simonw/airtable-export) that triggers the problem.

from datasette.

simonw avatar simonw commented on June 23, 2024

I knocked it down to 1 retry just to see what happened.

from datasette.

simonw avatar simonw commented on June 23, 2024

Turned out I wasn't running the datasette from the current directory, so it was not testing what I intended.

FIxed that with pip install -e . in the datasette/ directory.

Now I'm seeing some passes, which look like this:

running  '../airtable-export/testit.sh'
INFO:     Started server process [77810]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
  Running curls
INFO:     127.0.0.1:59439 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
INFO:     127.0.0.1:59440 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
INFO:     127.0.0.1:59441 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
  All curl succeeded
Killing datasette server with PID 77810
../airtable-export/testit.sh: line 54: 77810 Killed: 9               datasette pottery2.db -p $port
All three curls succeeded.
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[7463b051cf8d7f856df5eba9f7aa944183ebabe5] Cosmetic tweaks after blacken-docs, refs #1718
running  '../airtable-export/testit.sh'
INFO:     Started server process [77826]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
  Running curls

from datasette.

simonw avatar simonw commented on June 23, 2024

OK it looks like it found it!

942411ef946e9a34a2094944d3423cddad27efd3 is the first bad commit
commit 

Author: Simon Willison <[email protected]>
Date:   Tue Apr 26 15:48:56 2022 -0700

    Execute some TableView queries in parallel
    
    Use ?_noparallel=1 to opt out (undocumented, useful for benchmark comparisons)
    
    Refs #1723, #1715

 datasette/views/table.py | 93 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 67 insertions(+), 26 deletions(-)
bisect found first bad commit

942411e does look like the cause of this problem.

from datasette.

simonw avatar simonw commented on June 23, 2024

I also confirmed that http://127.0.0.1:8064/airtable_refs/airtable_refs?_noparallel=1 does not trigger the bug but http://127.0.0.1:8064/airtable_refs/airtable_refs does.

from datasette.

simonw avatar simonw commented on June 23, 2024

Now that I've confirmed that parallel query execution of the kind introduced in 942411e can cause hangs (presumably some kind of locking issue) against in-memory databases, some options:

  1. Disable parallel execution entirely and rip out related code.
  2. Disable parallel execution entirely by leaving that code but having it always behave as if _noparallel=1
  3. Continue digging and try and find some way to avoid this problem

The parallel execution work is something I was playing with last year in the hope of speeding up Datasette pages like the table page which need to execute a bunch of queries - one for each facet, plus one for each column to see if it should be suggested as a facet.

I wrote about this at the time here: https://simonwillison.net/2022/May/6/weeknotes/

My hope was that despite Python's GIL this optimization would still help, because the SQLite C module releases the GIL once it gets to SQLite.

But... that didn't hold up. It looked like enough work was happening in Python land with the GIL that the optimization didn't improve things.

Running the nogil fork of Python DID improve things though! I left the code in partly on the hope that the nogil fork would be accepted into Python core.

... which it now has! But it will still be a year or two before it fully lands: https://discuss.python.org/t/a-steering-council-notice-about-pep-703-making-the-global-interpreter-lock-optional-in-cpython/30474

So I'm not particularly concerned about dropping the parallel execution. If I do drop it though do I leave the potentially complex code in that relates to it?

from datasette.

simonw avatar simonw commented on June 23, 2024

Looking again at this code:

def connect(self, write=False):
if self.memory_name:
uri = "file:{}?mode=memory&cache=shared".format(self.memory_name)
conn = sqlite3.connect(
uri,
uri=True,
check_same_thread=False,
)
if not write:
conn.execute("PRAGMA query_only=1")
return conn
if self.is_memory:
return sqlite3.connect(":memory:", uri=True)
# mode=ro or immutable=1?
if self.is_mutable:
qs = "?mode=ro"
if self.ds.nolock:
qs += "&nolock=1"
else:
qs = "?immutable=1"
assert not (write and not self.is_mutable)
if write:
qs = ""
if self.mode is not None:
qs = f"?mode={self.mode}"
conn = sqlite3.connect(
f"file:{self.path}{qs}", uri=True, check_same_thread=False
)
self._all_file_connections.append(conn)
return conn

check_same_thread=False really stands out here.

Python docs at https://docs.python.org/3/library/sqlite3.html

check_same_thread (bool) -- If True (default), ProgrammingError will be raised if the database connection is used by a thread other than the one that created it. If False, the connection may be accessed in multiple threads; write operations may need to be serialized by the user to avoid data corruption. See threadsafety for more information.

I think I'm playing with fire by allowing multiple threads to access the same connection without doing my own serialization of those requests.

I do do that using the write connection - and in this particular case the bug isn't coming from write queries, it's coming from read queries - but perhaps SQLite has issues with threading for reads, too.

from datasette.

simonw avatar simonw commented on June 23, 2024

Using SQLite In Multi-Threaded Applications

That indicates that there's a SQLite option for "Serialized" mode where it's safe to access anything SQLite provides from multiple threads, but as far as I can tell Python doesn't give you an option to turn that mode on or off for a connection - you can read sqlite3.threadsafety to see if that mode was compiled in or not, but not actually change it.

On my Mac sqlite3.threadsafety returns 1 which means https://docs.python.org/3/library/sqlite3.html#sqlite3.threadsafety "Multi-thread: In this mode, SQLite can be safely used by multiple threads provided that no single database connection is used simultaneously in two or more threads." - it would need to return 3 for that serialized mode.

from datasette.

simonw avatar simonw commented on June 23, 2024

My current hunch is that SQLite gets unhappy if multiple threads access the same underlying C object - which sometimes happens with in-memory connections and Datasette presumably because they are faster than file-backed databases.

I'm going to remove the asyncio.gather() code from the table view. I'll ship a 0.x release with that fix too.

from datasette.

simonw avatar simonw commented on June 23, 2024

The one other thing affected by this change is this documentation, which suggests a not-actually-safe pattern:

datasette/docs/internals.rst

Lines 1292 to 1321 in 6ed7908

This example uses the :ref:`register_routes() <plugin_register_routes>` plugin hook to add a page at ``/parallel-queries`` which executes two SQL queries in parallel using ``asyncio.gather()`` and returns their results.
.. code-block:: python
from datasette import hookimpl
from datasette import tracer
@hookimpl
def register_routes():
async def parallel_queries(datasette):
db = datasette.get_database()
with tracer.trace_child_tasks():
one, two = await asyncio.gather(
db.execute("select 1"),
db.execute("select 2"),
)
return Response.json(
{
"one": one.single_value(),
"two": two.single_value(),
}
)
return [
(r"/parallel-queries$", parallel_queries),
]
Adding ``?_trace=1`` will show that the trace covers both of those child tasks.

from datasette.

simonw avatar simonw commented on June 23, 2024

Added a note to that example in the documentation:

Note that running parallel SQL queries in this way has `been known to cause problems in the past <https://github.com/simonw/datasette/issues/2189>`__, so treat this example with caution.

from datasette.

yozlet avatar yozlet commented on June 23, 2024

/me munches popcorn at a furious rate, utterly entralled

from datasette.

simonw avatar simonw commented on June 23, 2024

I'm going to release this in 1.0a7, and I'll backport it to a 0.64.4 release too.

from datasette.

simonw avatar simonw commented on June 23, 2024

Release 0.64.4: https://docs.datasette.io/en/stable/changelog.html#v0-64-4

from datasette.

simonw avatar simonw commented on June 23, 2024

We're planning a breaking change in 1.0a7:

Since that's a breaking change I'm going to ship 1.0a7 right now with this fix, then ship that breaking change as 1.0a8 instead.

from datasette.

simonw avatar simonw commented on June 23, 2024

1.0a7 is out with this fix as well now: https://docs.datasette.io/en/1.0a7/changelog.html#a7-2023-09-21

from datasette.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.