idbwrapper has outlived its usefulness to us, and has performance issues <p dir="a

According to observations noted in <a class="issue-link js-issue-link" data-error-text

swap out idbwrapper with straight indexeddb code about level-js HOT 12 CLOSED

level commented on May 10, 2024

swap out idbwrapper with straight indexeddb code

from level-js.

Comments (12)

commented on May 10, 2024

idb-blob-store chunks messages, so you can probably get even faster with raw idb

from level-js.

mcollina commented on May 10, 2024

I agree, and not just for speed. That would solve the issue on the null
values as well.
Il giorno dom 6 dic 2015 alle 16:32 James Halliday [email protected]
ha scritto:

idb-blob-store chunks messages, so you can probably get even faster with
raw idb

—
Reply to this email directly or view it on GitHub
#46 (comment).

from level-js.

timkuijsten commented on May 10, 2024

Guys,

I've been working on replacing idb-wrapper with native indexedDB. I do have a working implementation ready[1]. But before I open up a PR I'd like to discuss the approach I've taken which is to store JS types natively and only coerce in _get() or _iterator() if asBuffer is true. Otherwise, if asBuffer is false, always return the native type. Previously this behavior was available via options.raw.

This required only one patch in the tests of abstract-leveldown (which I will submit if you guys think this is the right approach).

Additional changes worth mentioning (see commits in [1]):

remove unused dependencies and bump the ones that will stay with us

open:

support multiple stores in the same database
implement createIfMissing and errorIfExists

destroy:

delete the given object store, and if no other object stores exist
in the database, delete the database as well.

batch:

save keys and values using native JS types, no coercion / encoding
support sync on batch level, not operation level and only
supported in Firefox with "dom.indexedDB.experimental" pref set to
true

iterator:

If keyAsBuffer or valueAsBuffer, try to coerce most types into a
String. Drop redundant support for the raw option. Just pass
keyAsBuffer = false and/or valueAsBuffer = false.
Make sure that each callback passed to _next() is called exactly
once, and always in the next tick.
Support reopening the indexedDB cursor if it timed out before the
end of the range was reached via reopenOnTimeout option. A snapshot
is created once per cursor so default to false. Support either snapshot mode (default) or back pressure.

put:

save keys and values using native JS types, no coercion / encoding
with experimental support for sync (Firefox only)

get:

If asBuffer, try to coerce most types into a String. Drop redundant
support for the raw option. Just pass asBuffer = false.

del:

with experimental support for sync (Firefox only)

[1] living in https://github.com/timkuijsten/level.js/tree/idbunwrapper

from level-js.

timkuijsten commented on May 10, 2024

According to observations noted in #51 (comment) (aggressive IndexedDB cursor timeouts) I've reworked the iterator to reopen a cursor if the stream is not ended and the end of the cursor was not reached.

from level-js.

timkuijsten commented on May 10, 2024

Just realized that automatically reopening a new cursor on a transaction timeout breaks the default snapshot property of leveldb and indexeddb. I've added the reopenOnTimeout option which defaults to false so that the default behavior mimics that of LevelDB (at the cost of getting a TransactionInactiveError if you're calling next not fast enough, which currently breaks some abstract tests).

from level-js.

vweevers commented on May 10, 2024

at the cost of getting a TransactionInactiveError if you're calling next not fast enough

It would be better to cache cursor results, so that it doesn't matter when next() is called, it can simply take a kv pair from the cache. This will make the behavior similar to leveldown and others, and not break streams (built on iterators) for example. I've got a sample implementation here.

I've added the reopenOnTimeout option

How about naming it snapshot, which defaults to true. If false, it keeps reading until the cache reaches a certain size (a highWaterMark like in streams). One can keep nexting until the cache is empty, after which a new cursor is created, and the cache refilled.

from level-js.

timkuijsten commented on May 10, 2024

It would be better to cache cursor results, so that it doesn't matter when next() is called, it can simply take a kv pair from the cache. This will make the behavior similar to leveldown and others, and not break streams (built on iterators) for example.

and

How about naming it snapshot, which defaults to true.

I like the idea in general, but I'm not sure what cache.push and shift for every item do to performance/GC. Furthermore, if a large database is iterated with a slow stream consumer then big memory spikes might occur.

... If false, it keeps reading until the cache reaches a certain size (a highWaterMark like in streams). One can keep nexting until the cache is empty, after which a new cursor is created, and the cache refilled.

Sounds good. Would be nice to have some feedback from one of the maintainers.

from level-js.

vweevers commented on May 10, 2024

if a large database is iterated with a slow stream consumer then big memory spikes might occur.

No way around that, sadly. One can get proper snapshotting or proper backpressure, but not both. All level.js can do is offer a choice.

from level-js.

max-mapper commented on May 10, 2024

Just wanted to chime in to say apologies I haven't been able to dive in to provide feedback on this yet, I think the work here is really awesome so far.

One question I have re: snapshots on top of IDB in general (not something I've thought about much until this point): Does creating a new cursor mean creating a snapshot based at a later point in time? Or can you create a cursor pointing at some old state to at least get the same read state out of the db as the previous one.

Re: the caching proposal, I do think we should find a way to paper over the TransactionInactiveError issue so that users of e.g. leveldown backed by level.js won't have to know/worry about it. I think in general a default mode should try to prioritize compatibility with modules that will be running in node on leveldown, so we should try to make snapshot semantics match as much as possible. If people want to tune performance in the browser it should be an opt-in.

from level-js.

vweevers commented on May 10, 2024

Does creating a new cursor mean creating a snapshot based at a later point in time?

Yes. The implementation might not use snapshots, but effectively, for the purposes of this discussion, yes. It means creating a new readonly transaction, which guarantees:

As long as a "readonly" transaction is running, the data that the implementation returns [..] must remain constant. There are a number of ways that an implementation ensures this. The implementation can prevent any "readwrite" transaction [..] until the "readonly" transaction finishes. Or the implementation can allow the "readonly" transaction to see a snapshot of the contents of the object stores which is taken when the "readonly" transaction started.

from level-js.

timkuijsten commented on May 10, 2024

I've just pushed a fairly big update of the iterator.

I've replaced the reopenOnTimeout option with a snapshot option, that defaults to true as suggested by @vweevers. It either reads a snapshot without timeouts (at the cost of memory spikes) or, if you pass snapshot = false, it uses back pressure using default stream semantics and highWaterMarks (at the cost of not iterating over a snapshot of the database).

I've split out the indexeddb cursor iterator into a separate module so that it can be used as a nodejs stream. It hopefully makes the iterator in level.js a bit easier to read and understand.

p.s. I've also tried to upgrade to abstract-leveldown 2.6.0, but this introduces some problems with serializing object values and I was not able to get the new _serializeValue feature to work. Maybe someone can give that a look? I'm a bit diffuse on how asBuffer, valueEncoding, _serializeValue etc. should all work together. For now my earlier suggested patch should be used to get all tests working with abstract-leveldown 2.4.1.

from level-js.

vweevers commented on May 10, 2024

Regarding encoding and serialization:

level.js doesn't need to concern itself with encoding, only serialization.
The flow of writes is: user value -> encode -> serialize -> IDB. The serialization is meant to convert encoded values (usually strings or Buffers) to whatever type the underlying store (IDB) can handle. Because IDB can store values of any type, _serializeValue() should just return values as-is. With maybe a special handling of null, depending on which patch lands in abstract-leveldown. There's no need to do anything with Buffers, because the browser will store those as Uint8Arrays (Buffer being a subclass of Uint8Array).
The flow of reads is: IDB -> deserialize -> decode -> user value. The way I see it, deserialization is an implicit step in abstract-leveldown, where asBuffer comes into play. If the encoding expects a Buffer, or if the user specified { asBuffer: true } in the get() or iterator options, then abstract-leveldown will pass { asBuffer: true } to level.js. In this case, level.js should convert typed arrays back to Buffers. Use typedarray-to-buffer for optimal performance.
Keys can be of type Array, String, Date or Number (except NaN). IndexedDB Second Edition also supports binary keys (ArrayBuffers or views thereof), though browser support isn't there yet. See here for an example of _serializeKey that handles all this.
Unlike with other *down adapters, the id encoding (short for identity, aliased as none) is very powerful with level.js, because one can store anything. We should however warn (in the readme) that with the id encoding, level.js returns the value as stored by IDB. asBuffer is false in this case. If Buffers went in, Uint8Arrays come out.

@timkuijsten seeing as I already implemented the above in a private fork, I can publish that if you want, but I can't make promises as to when.

One other note, with regards to removing IDBWrapper. I'm using level.js in production, and at one point it had to store 1.5 million entries. A bottleneck in IDBWrapper, depending on batch size, is that its batch function creates all the put and delete requests at once. I bypassed IDBWrapper and wrote a batch function that waits for a request to complete before making the next, resulting in a huge CPU drop:

function next() {
  if (index === ops.length) return

  var op = ops[index++]
    , req = op.type === 'del' ? store.delete(op.key) : store.put(op.value, op.key)

  req.onsuccess = next
  req.onerror = onRequestError
}

from level-js.

swap out idbwrapper with straight indexeddb code about level-js HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent