Coder Social home page Coder Social logo

Expose SQLite? about node HOT 40 CLOSED

benjamingr avatar benjamingr commented on September 27, 2024 28
Expose SQLite?

from node.

Comments (40)

JoshuaWise avatar JoshuaWise commented on September 27, 2024 12

I have a few thoughts on this.

Thoughts on an appropriate API

I designed better-sqlite3 to be easy to use for people who have never used SQLite's C API. I tried to capture the higher-level concepts into the simplest API I could think of, while still exposing most of what advanced users would want via options and utility methods. That said, I don't necessarily think that philosophy aligns with existing APIs within Node.js core. For example, node:fs is actually pretty hard to understand unless the user is already familiar with the C API of Unix/Linux-style filesystems (with the exception of a few high-level convenience methods like fs.readFile). Of course, for something built into the runtime, that low-level exposure makes sense. Therefore, I do think it would be appropriate for the API to (as much as possible) be a 1-to-1 mapping with SQLite's low-level C API, but with a few convenience methods built-in. I see no reason to depart from the method/methodSync naming convention used by Node.js's core libraries.

However, it should be noted that many asynchronous APIs would simply break or result in undefined behavior or unsafe memory accesses when combined with much of the behavior provided by SQLite. For example, SQLite allows you to register user-defined SQL functions in the host language (i.e., JavaScript in this case). These function would be invoked during certain SQL queries, but if those SQL queries were executed in a thread pool, it will result in unsafe cross-thread usage of JavaScript contexts (which is very, very bad). Similarly, transactions become prone to error and terrible performance when performed in an asynchronous manner, since transactions in SQLite lock the entire database. Therefore, any asynchronous API for SQLite in Node.js would need to be highly limited, exposing only the most basic functionality that SQLite offers.

To safely separate the synchronous and asynchronous worlds, I recommend exposing two different classes, SQLiteDatabaseAsync and SQLiteDatabaseSync, each of which represents a connection to a SQLite database, with SQLiteDatabaseSync providing access to the entire (or almost the entire) SQLite C API, while SQLiteDatabaseAsync would only provide basic functionality like making queries. The goal here is to make it impossible for a single database connection to use both async/non-blocking methods AND advanced SQLite APIs that would have detrimental effects when invoked from the libuv thread pool.

I believe an approach like this would align with the existing patterns/philosophy of Node.js's core libraries, and would satisfy the memory safety that is expected of the Node.js runtime. Sprinkle in a few high-level convenience methods, and let user-land do the rest.

Thoughts on SQLite versions

Perhaps most importantly (much more important than the API, in my opinion), is how to deal with the issue of SQLite versions. Besides the fact that the SQLite developers release new versions many times per year, SQLite is designed to be customized at compile-time, giving the user the power to enable certain advanced features, disable deprecated features that may harm performance or be unsafe, or even change the behavior of basic SQL expressions (like case-sensitivity, unicode, etc.). In better-sqlite3, it's very common for users to need a different build of SQLite than the one provided by default. I do provide a way for users to compile a custom version of SQLite, but the process is relatively painful. If Node.js exposed SQLite to users, I think it would be crucial to solve this need. Perhaps it can be solved with dynamic linking or something. Just be aware that you will inevitably run into this.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 10

I imagine this idea will get a lot of support and a lot of opposition. If someone else handles the consensus building aspect, I'd be happy to work on the implementation side.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 6

FWIW, SQLite has followed semver since version 3.9.0 in 2015, and the current major version is still 3:

The current value for X is 3, and the SQLite developers plan to support the current SQLite database file format, SQL syntax, and C interface through at least the year 2050. Hence, one can expect that all future versions of SQLite for the next several decades will begin with "3.".

from node.

asg017 avatar asg017 commented on September 27, 2024 6

One note: if you do decide to include SQLite in Node.js, please statically compile an up-to-date SQLite library instead of relying on the host's SQLite library!

This is a problem with Python's SQLite library and Bun's SQLite library: By default they use whatever SQLite version the OS has installed, which leads to headaches. Some default SQLite builds omit features like SQLite extensions, JSON, or math functions. Or they are stuck on old version of SQLite which lack window functions, JSONB, datetime utils, or STRICT tables. This is even documented in Bun's docs and Python's docs.

So if you statically compile a recent SQLite version, say 3.46, you'd have full control of what SQLite features are available across different platforms. And if someone wanted to use a different/more up-to-date SQLite version, they can always reach for better-sqlite3 or another 3rd party package. This is what we do for Datasette, which is in Python - we use the standard Python SQLite library and tell people to use the 3rd party pysqlite3 package if their builtin SQLite is too old.

from node.

GeoffreyBooth avatar GeoffreyBooth commented on September 27, 2024 4

For reference:

from node.

benjamingr avatar benjamingr commented on September 27, 2024 3

One suggestion from the peanut gallery: I’d suggest this API live in a node:sqlite package (like bun:sqlite) rather than node:util, just to avoid the “util is a dumping ground” trope.

This seems to be the consensus so far.

from node.

benjamingr avatar benjamingr commented on September 27, 2024 3

@cjihrig honestly since naming is very bikesheddy in this case I think you should just pick one. Here is an example color for our bike shed:

  • require("node:sqlite").sync.query(...) for synchronous queries
  • require("node:sqlite").promises.query(...) for asynchronous queries

But I think that's not one of the things that will prevent a PR from landing so I wouldn't worry too much about it.

from node.

H4ad avatar H4ad commented on September 27, 2024 2

We already see different opinions on the API. If there is no "one size fits all" API

Currently the different opinions are basically sync vs async.

The basic API could be something like this

class Database {
  // path can be :memory: or the file path
  constructor(path: string, options?: { readonly?: boolean, create?: boolean }) {}

  query(statement: string): Statement; // works like prepare, like bun is doing right know
  open(): void; // maybe we can just open automatically? and add a option to disable auto-open
  close(): void;
}

type AllResult = Array<any>;
type RunResult = void; // maybe boolean?
type GetResult = any;

class Statement {
  all(...params: any[]): Promise<AllResult>;
  run(...params: any[]): Promise<RunResult>;
  get(...params: any[]): Promise<GetResult>;

  // we can also introduce sync methods, like fs
  allSync(...params: any[]): AllResult;
  runSync(...params: any[]): RunResult;
  getSync(...params: any[]): GetResult;
}

Sync has advantages, and async too, so ship both and be happy.

In the first iteration, we can stick with the easiest/simplest implementation, which is probably the sync version.
So we can iterate and add the asynchronous versions and then add more methods (pluck, iterate, values, etc...)

EDIT: I didn't talk about transactions because it didn't even matter in this first iteration, is better to discuss this method when we have at least the basic methods, then we will have more knowledge on what could be the best API for exposing transactions.

If we make it possible for native addons to link against the SQLite dependency that's embedded in Node.js, then that should let third-party packages define whatever API they want without having to ship SQLite.

The point is that you don't need to use third-party packages to be able to use sqlite, but you can still do that even if we have our own API.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 2

I think it makes the most sense to do something similar to the fs module here and have synchronous and asynchronous (Promise based, no callbacks) APIs. For the majority of cases, synchronous APIs are good enough, but there are almost certainly cases where async will be preferred (and it is more Node-like). I have started writing some code for the DB connection already.

EDIT: Very early, but here is the branch.

from node.

benjamingr avatar benjamingr commented on September 27, 2024 2

@asg017 Node bundles 3.46.0 for web storage support already so presumably that's the one that will be used in this case.

from node.

mceachen avatar mceachen commented on September 27, 2024 2

Proper SQLite support within Node.js would be splendid!

A couple thoughts:

  1. I’d suggest not skipping a Transaction API in the first iteration: many use cases require ACID behavior.
  2. PRAGMA support is another possibly-surprising feature of SQLite that is a hard prerequisite for reasonable performance and stability for many use cases.
  3. A codebase that uses both sync and async calls concurrently could be prone to deadlock/busy timeouts, especially when mixing different transaction flavors (like BEGIN DEFERRED in the sync call, and BEGIN IMMEDIATE in the async call). A “solution” to this issue could simply be a warning in the documentation to not do that.

I like the idea of using better-sqlite3’s API wholecloth, as that could mean switching between the “native” node api, bun, and better-sqlite3 would be a simple import twiddle. Async API support would break this, of course. Edit: as @cjihrig mentions below, this already isn't the case: I think better-sqlite3 API's main value-add for this effort should be informative with respect to what SQLite functions should be considered. Also know that better-sqlite3's API is not a comprehensive set of sqlite3's API.

One suggestion from the peanut gallery: I’d suggest this API live in a node:sqlite package (like bun:sqlite) rather than node:util, just to avoid the “util is a dumping ground” trope.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 2

Another question regarding the API - if we were to start off following the better-sqlite3 API and wanted to add async support in the future, what would that look like? Node has generally had async APIs, and appended Sync to the name for synchronous equivalents. The better-sqlite3 API would sort of flip that naming pattern around.

from node.

punkish avatar punkish commented on September 27, 2024 2

it's very common for users to need a different build of SQLite than the one provided by default

I second this assertion of @JoshuaWise and think it is very important to build this capability in the node's integrated version of SQLite. I want the same (possibly customized) version of SQLite powering my node application as well as my CLI so there are no surprises when I am building my application or analyzing my data. For this reason itself, it is very important that not only can I customize my SQLite with the extensions I need, but do it in multiple locations, and as easily as possible because I am not a C programmer. Even the most obvious steps for a C programmer can be very confusing for me.

from node.

GeoffreyBooth avatar GeoffreyBooth commented on September 27, 2024 1

I think at the very least let's land the localStorage PR which would get SQLite into the codebase. After that, my concern would be semver: if we expose SQLite, then we can't update it any faster than our release cycle, unless we make some kind of exception for it.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 1

Any reason to put it in node:util?

Not particularly. A new core module is a bigger undertaking - particularly since the decision to require the node: scheme - so I thought it was at least worth raising the point. I'm fine either way.

from node.

cjihrig avatar cjihrig commented on September 27, 2024 1

I like the idea of using better-sqlite3’s API wholecloth, as that could mean switching between the “native” node api, bun, and better-sqlite3 would be a simple import twiddle.

I'm not trying to knock the better-sqlite3 API at all, but my understanding is that this is already not the case. The Bun docs mention that their API was inspired by better-sqlite3, but they already seem to have diverged from it some (and probably will continue to do so). That approach also would not offer any compatibility with other sqlite modules like Deno's or node-sqlite3 (which has roughly twice the weekly downloads according to npm).

from node.

tniessen avatar tniessen commented on September 27, 2024 1

If someone wanted something outside of that, they should use a 3rd party package like better-sqlite3. This would include:

  • A different SQLite version
  • Custom compilation arguments
  • A completely different SQLite library (ie sqlciper or libsql)

FWIW, the SQLite developers strongly recommend against having more than a single copy of SQLite per process, see How To Corrupt An SQLite Database File.

from node.

gabrielschulhof avatar gabrielschulhof commented on September 27, 2024

Shall we use the same interface as https://github.com/TryGhost/node-sqlite3?

from node.

tniessen avatar tniessen commented on September 27, 2024

@gabrielschulhof On the rare occasion that I voluntarily write JavaScript code, I personally prefer the better-sqlite3 API, which also tends to be significantly faster.

That being said, the non-blocking behavior of node-sqlite3 better matches the Node.js I/O model. If you have a slow storage medium or if you use a slow VFS, e.g., with a network resource as its backend, then the asynchronous node-sqlite3 API is a better choice. Similarly, applications that do not utilize SQLite transactions efficiently will spend a lot of time waiting for each change to be committed, which is slow even if the underlying storage is fast, so using synchronous APIs in those cases unnecessarily blocks the event loop. Applications that use well-designed transactions and that run on a very fast storage backend, on the other hand, will see a lot of unnecessary overhead from asynchronous APIs.

Perhaps Node.js could allow native addons to link against the bundled SQLite dependency instead and leave the API to third-party packages.

from node.

KhafraDev avatar KhafraDev commented on September 27, 2024

I would like to bring attention to esqlite by @mscdex, which outperform(s/ed) better-sqlite3 in a number of cases while being asynchronous. https://github.com/mscdex/esqlite

from node.

benjamingr avatar benjamingr commented on September 27, 2024

It looks like we have consensus around experimenting with this (i.e. prototyping a sqlite binding in core), just for further visibility I'll also ping the TSC in chat.


As for the API I personally really prefer a synchronous one over an asynchronous one since SQLite is built to run in-process. I think in the common use case this is significantly simpler and likely faster. That said, an asynchronous promise based API should also be fine.

Funnily enough 2 people asked me about this feature today - a good luck omen it's something users find interesting.

from node.

tniessen avatar tniessen commented on September 27, 2024

We already see different opinions on the API. If there is no "one size fits all" API, should we have one in core? If we make it possible for native addons to link against the SQLite dependency that's embedded in Node.js, then that should let third-party packages define whatever API they want without having to ship SQLite.

from node.

benjamingr avatar benjamingr commented on September 27, 2024

We already see different opinions on the API

We've had very little api discussion so far, I think it's fine to talk about stuff and see if we can reach consensus?

from node.

benjamingr avatar benjamingr commented on September 27, 2024

@tniessen looping back to your comment here: #53264 (comment) I tend to think the better-sqlite3 sync API is better for most cases and given SQLite's execution model and the motivation of users asking to add it to Node.js (not as a DB server replacement but for local configuration, at least that's what I've been hearing). I see your point though.

There is interesting discussion here: https://sqlite.org/forum/forumpost/7c90893579 as well.

I think it may be a good idea to reach out to the SQLite team and ask, so I went ahead and opened https://sqlite.org/forum/forumpost/96f9f78fc1

from node.

cjihrig avatar cjihrig commented on September 27, 2024

Another thing I think is worth discussing is whether or not it makes sense to expose this as its own module (node:sqlite) or as part of another module such as node:util.

from node.

asg017 avatar asg017 commented on September 27, 2024

Also, regarding sync vs async: the better-sqlite3 README says:

node-sqlite3 uses asynchronous APIs for tasks that are either CPU-bound or serialized. That's not only bad design, but it wastes tons of resources. It also causes mutex thrashing which has devastating effects on performance.

I've never benchmarked it myself, but this is the sole reason why I always use better-sqlite3 over node-sqlite in my projects.

I know a lot of Node.js developers would expect an async API (like node:fs/promises), but if they knew of potential performance problems, I'd imagine they'd let it slide.

from node.

glommer avatar glommer commented on September 27, 2024

Just to add my 2-cents, sync makes sense for simple queries. But analytical queries, things like computeing averages and sum can take a fair amount of time.

Given that some users will want to use this for things like dashboards, that can issue a bunch of aggregation queries, an async API would make more sense.

from node.

cjihrig avatar cjihrig commented on September 27, 2024

I agree with that point from better-sqlite3. A counterpoint is that sqlite might not be the only thing the application is doing. But if a sqlite call is blocking, then nothing else can happen. This is why both sync and async APIs should be provided.

from node.

benjamingr avatar benjamingr commented on September 27, 2024

This is also my conclusion based on https://sqlite.org/forum/forumpost/96f9f78fc1

from node.

benjamingr avatar benjamingr commented on September 27, 2024

I'll also escalate the questions to the TSC chat to make sure we have consensus.

from node.

legendecas avatar legendecas commented on September 27, 2024

Another thing I think is worth discussing is whether or not it makes sense to expose this as its own module (node:sqlite) or as part of another module such as node:util.

Any reason to put it in node:util?

from node.

tniessen avatar tniessen commented on September 27, 2024

This is also my conclusion based on https://sqlite.org/forum/forumpost/96f9f78fc1

FWIW, that discussion also raised this point:

it's our collective opinion that there neither can be, nor should be, One True Wrapper (...)
Having multiple wrappers is, in our considered opinions, healthy for the ecosystem and provides the most all-around flexibility when selecting which set of trade-offs (compared to the C API) a project wants to use.

This reinforces my opinion that we should make it easy for third-party packages to link against the bundled version of SQLite, assuming we are committing to shipping SQLite forever anyway.

from node.

tniessen avatar tniessen commented on September 27, 2024

With asynchronous APIs, I think it's important to very carefully design a good transaction interface. better-sqlite3 and Bun's SQLite API use wrapped functions for that, which is somewhat reasonable in a purely synchronous setting. node-sqlite3, on the other hand, relies on the user manually distinguishing between calls that may run in parallel and call that may not. Other bindings, such as Rusqlite have created much better interfaces in my opinion, but those don't tend to work that well in a messy language like JavaScript.

from node.

benjamingr avatar benjamingr commented on September 27, 2024

This reinforces my opinion that we should make it easy for third-party packages to link against the bundled version of SQLite, assuming we are committing to shipping SQLite forever anyway.

Yes and that sounds like a good idea. That does not preclude us from shipping our own wrapper as well. I think a major theme in recent Node features (the test runner, task runner, watch mode, util.parseArgs etc) isn't to be the best or have the richest functionality in those spaces - it's to provide sensible and safe defaults for most projects.

I think no matter what API we ship, userland will still provide alternatives and projects using sqlite heavily for more than configuration are likely to still use those wrappers. (Similarly to projects needing browser testing, advanced task runner features, rich argument parsing, more fs convenience methods etc).

So I am not that concerned about our API not being "feature complete"

Other bindings, such as Rusqlite have created much better interfaces in my opinion, but those don't tend to work that well in a messy language like JavaScript.

Assuming you had no messy language constraints whatsoever - what API would you pick?

I personally think we might want to stick with disposers (Symbol.dispose) and not the disposer pattern - but honestly even if we have very rudimentary transaction support and leave richer APIs to userland I'm fine with that.

from node.

mcollina avatar mcollina commented on September 27, 2024

I personally recommend using most of the amazing API of https://github.com/WiseLibs/better-sqlite3, i.e. use synchronous APIs. They generically offer better performance with SQLite on modern HW. I'm not opposed to have sync and async version of those.

cc @JoshuaWise @mceachen for visiblity.

from node.

asg017 avatar asg017 commented on September 27, 2024

I have a slightly different view: node:sqlite should be a no-nonsense "just works" SQLite package that application developers use when they want to read/write to a SQLite database. They would mostly care about:

  1. Having an up-to-date SQLite version
  2. Ability change a few settings (like PRAGMA and runtime configuration options)
  3. Using SQLite extensions to add new functionality (either as application defined functions or dynamically loadable extensions)

If someone wanted something outside of that, they should use a 3rd party package like better-sqlite3. This would include:

  • A different SQLite version
  • Custom compilation arguments
  • A completely different SQLite library (ie sqlciper or libsql)

This is kindof how Python works: the builtin sqlite3 library is a "just works" version of SQLite, while if someone wanted a customized build, they'd need to reach for a 3rd party package like pysqlite3.

That means that the SQLite library offered in node:sqlite needs to precisely define which compilation arguments to include and default settings, which is difficult. But I think offering an API that can dynamically handle multiple SQLite libraries would be more difficult

from node.

cjihrig avatar cjihrig commented on September 27, 2024

FWIW, the SQLite developers strongly recommend against having more than a single copy of SQLite per process

That seems like it would raise a more immediate question about WebStorage and its impact on code already using a userland sqlite module. How do Deno/Bun/Python etc. deal with that?

from node.

asg017 avatar asg017 commented on September 27, 2024

@tniessen ooh thanks for sharing, hadn't seen that before!

But in this case, if Node.js statically links a "golden" SQLite library for node:sqlite, would other dynamically linked SQLite libraries from 3rd party npm packages (better-sqlite3, libsql, sqlcipher, etc) be effected as well? It's hard to tell from that link, they only say "linked" but don't specify if this would happen in dynamically linked version of SQLite with a statically linked version.

My hunch is that there could be only 1 statically linked copy of SQLite, but other dynamically linked SQLite copies would be fine? Especially since this pattern is used in Python/Deno (which uses SQLite for localStorage + KV), but I'm no expert here

from node.

GeoffreyBooth avatar GeoffreyBooth commented on September 27, 2024

would other dynamically linked SQLite libraries from 3rd party npm packages (better-sqlite3, libsql, sqlcipher, etc) be effected as well?

I would think that Node’s internal SQLite would use different files on disk for storing the database than the files on disk used by these various packages? And if that’s the case, then I don’t see why they would interfere with each other.

from node.

punkish avatar punkish commented on September 27, 2024

right, as long as I continue to use the SQLite driver with which I created my db, I would not expect any problem. So, if I use node:sqlite to create my files, I continue to use it with those files. And, if I use a custom version of SQLite with, say, better-sqlite3, I would continue to happily use that combo. Using the correct driver with the files they created would be my responsibility

from node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.