Coder Social home page Coder Social logo

juji-io / datalevin Goto Github PK

View Code? Open in Web Editor NEW
1.0K 27.0 58.0 44.42 MB

A simple, fast and versatile Datalog database

Home Page: https://github.com/juji-io/datalevin

License: Eclipse Public License 1.0

Clojure 65.01% Shell 0.33% Java 34.62% Batchfile 0.04%
key-value-store client-server-databases embedded-database fulltext-search datalog-query-engine

datalevin's Introduction

datalevin logo

Datalevin

🧘 Simple, fast and versatile Datalog database for everyone 💽

datalevin on cljdoc datalevin on clojars bb compatible

datalevin linux/macos amd64 build status datalevin windows build status datalevin apple silicon build status

I love Datalog, why hasn't everyone used this already?

Datalevin is a simple durable Datalog database. Here's what a Datalog query looks like:

(d/q '[:find  ?name ?total
       :in    $ ?year
       :where [?sales :sales/year ?year]
              [?sales :sales/total ?total]
              [?sales :sales/customer ?customer]
              [?customer :customers/name ?name]]
      (d/db conn) 2024)

❓ Why

The rationale is to have a simple, fast and open source Datalog query engine running on durable storage.

It is our observation that many developers prefer the flavor of Datalog popularized by Datomic® over any flavor of SQL, once they get to use it. Perhaps it is because Datalog is more declarative and composable than SQL, e.g. the automatic implicit joins seem to be its killer feature. In addition, the recursive rules feature of Datalog makes it suitable for graph processing and deductive reasoning.

The feature set of Datomic® may be an overkill for some use cases. One thing that may confuse casual users is its temporal features. To keep things simple and familiar, Datalevin behaves the same way as most other databases: when data are deleted, they are gone.

Datalevin started out as a port of Datascript in-memory Datalog database to LMDB for persistence. We then added a cost-based query optimizer to enhance query performance.

Datalevin can be used as a library, embedded in applications to manage state, e.g. used like SQLite; or it can run in a networked client/server mode (default port is 8898) with full-fledged role-based access control (RBAC) on the server, e.g. used like Postgres.

Datalevin relies on the robust ACID transactional database features of LMDB. Designed for concurrent read intensive workloads, LMDB also performs well in writing large values (> 2KB). Therefore, it is fine to store documents in Datalevin.

Datalevin can be used as a fast key-value store for EDN data. The native EDN data capability of Datalevin should be beneficial for Clojure programs.

Moreover, Datalevin has a built-in full-text search engine that has competitive search performance.

Presentation:

As a Clojure library, Datalevin is simple to add as a dependency to your Clojure project. There are also several other options. Please see details in Installation Documentation

🎉 Usage

Datalevin is aimed to be a versatile database.

Use as a Datalog store

In addition to our API doc, since Datalevin has almost the same Datalog API as Datascript, which in turn has almost the same API as Datomic®, please consult the abundant tutorials, guides and learning sites available online to learn about the usage of Datomic® flavor of Datalog.

Here is a simple code example using Datalevin:

(require '[datalevin.core :as d])

;; Define an optional schema.
;; Note that pre-defined schema is optional, as Datalevin does schema-on-write.
;; However, attributes requiring special handling need to be defined in schema,
;; e.g. many cardinality, uniqueness constraint, reference type, and so on.
(def schema {:aka  {:db/cardinality :db.cardinality/many}
             ;; :db/valueType is optional, if unspecified, the attribute will be
             ;; treated as EDN blobs, and may not be optimal for range queries
             :name {:db/valueType :db.type/string
                    :db/unique    :db.unique/identity}})

;; Create DB on disk and connect to it, assume write permission to create given dir
(def conn (d/get-conn "/tmp/datalevin/mydb" schema))
;; or if you have a Datalevin server running on myhost with default port 8898
;; (def conn (d/get-conn "dtlv://myname:mypasswd@myhost/mydb" schema))

;; Transact some data
;; Notice that :nation is not defined in schema, so it will be treated as an EDN blob
(d/transact! conn
             [{:name "Frege", :db/id -1, :nation "France", :aka ["foo" "fred"]}
              {:name "Peirce", :db/id -2, :nation "france"}
              {:name "De Morgan", :db/id -3, :nation "English"}])

;; Query the data
(d/q '[:find ?nation
       :in $ ?alias
       :where
       [?e :aka ?alias]
       [?e :nation ?nation]]
     (d/db conn)
     "fred")
;; => #{["France"]}

;; Retract the name attribute of an entity
(d/transact! conn [[:db/retract 1 :name "Frege"]])

;; Pull the entity, now the name is gone
(d/q '[:find (pull ?e [*])
       :in $ ?alias
       :where
       [?e :aka ?alias]]
     (d/db conn)
     "fred")
;; => ([{:db/id 1, :aka ["foo" "fred"], :nation "France"}])

;; Close DB connection
(d/close conn)

Use as a key-value store

Datalevin packages the underlying LMDB database as a convenient key-value store for EDN data.

(require '[datalevin.core :as d])
(import '[java.util Date])

;; Open a key value DB on disk and get the DB handle
(def db (d/open-kv "/tmp/datalevin/mykvdb"))
;; or if you have a Datalevin server running on myhost with default port 8898
;; (def db (d/open-kv "dtlv://myname:mypasswd@myhost/mykvdb" schema))

;; Define some table (called "dbi", or sub-databases in LMDB) names
(def misc-table "misc-test-table")
(def date-table "date-test-table")

;; Open the tables
(d/open-dbi db misc-table)
(d/open-dbi db date-table)

;; Transact some data, a transaction can put data into multiple tables
;; Optionally, data type can be specified to help with range query
(d/transact-kv
  db
  [[:put misc-table :datalevin "Hello, world!"]
   [:put misc-table 42 {:saying "So Long, and thanks for all the fish"
                        :source "The Hitchhiker's Guide to the Galaxy"}]
   [:put date-table #inst "1991-12-25" "USSR broke apart" :instant]
   [:put date-table #inst "1989-11-09" "The fall of the Berlin Wall" :instant]])

;; Get the value with the key
(d/get-value db misc-table :datalevin)
;; => "Hello, world!"
(d/get-value db misc-table 42)
;; => {:saying "So Long, and thanks for all the fish",
;;     :source "The Hitchhiker's Guide to the Galaxy"}


;; Range query, from unix epoch time to now
(d/get-range db date-table [:closed (Date. 0) (Date.)] :instant)
;; => [[#inst "1989-11-09T00:00:00.000-00:00" "The fall of the Berlin Wall"]
;;     [#inst "1991-12-25T00:00:00.000-00:00" "USSR broke apart"]]

;; This returns a PersistentVector - e.g. reads all data in JVM memory
(d/get-range db misc-table [:all])
;; => [[42 {:saying "So Long, and thanks for all the fish",
;;          :source "The Hitchhiker's Guide to the Galaxy"}]
;;     [:datalevin "Hello, world!"]]

;; This allows you to iterate over all DB keys inside a transaction.
;; You can perform writes inside the transaction.
;; kv is of of type https://www.javadoc.io/doc/org.lmdbjava/lmdbjava/latest/org/lmdbjava/CursorIterable.KeyVal.html
;; Avoid long-lived transactions. Read transactions prevent reuse of pages freed by newer write transactions, thus the database can grow quickly.
;; Write transactions prevent other write transactions, since writes are serialized.
;; LMDB advice: http://www.lmdb.tech/doc/index.html
;; Conclusion: It's ok to have long transactions if using a single thread.
(d/visit db misc-table
            (fn [kv]
               (let [k (d/read-buffer (d/k kv) :data)]
                  (when (= k 42)
                    (d/transact-kv db [[:put misc-table 42 "Don't panic"]]))))
              [:all])

(d/get-range db misc-table [:all])
;; => [[42 "Don't panic"] [:datalevin "Hello, world!"]]

;; Delete some data
(d/transact-kv db [[:del misc-table 42]])

;; Now it's gone
(d/get-value db misc-table 42)
;; => nil

;; Close key value db
(d/close-kv db)

📗 Documentation

Please refer to the API documentation for more details. You may also consult online materials for Datascript or Datomic®, as the Datalog API is similar.

🚀 Status

Datalevin is extensively tested with property-based testing. It is also used in production at Juji.

Running the benchmark suite adopted from Datascript, which write 100K random datoms in several conditions, and run several queries on them, on a Ubuntu Linux server with an Intel i7 3.6GHz CPU and a 1TB SSD drive, here is how it looks.

query benchmark write benchmark

In this benchmark, both Datomic and Datascript are running in in-memory mode, as they require another database for persistence. The init write condition, i.e. bulk loading prepared datoms, is not available in Datomic. Datalevin write here is configured with LMDB nosync mode to better match the in-memory conditions, i.e. the operating system is responsible for flushing data to disk.

In all benchmarked queries, Datalevin is the fastest among the three tested systems, as Datalevin has a cost based query optimizer while Datascript and Datomic do not. Datalevin also has a caching layer for index access.

Writes are slower, as expected, as Datalevin does write to disk even though sync is not explicitly called, while others are purely in memory. The bulk loading speed is good, writing 100K datoms to disk in less than 0.2 seconds; the same data can also be transacted with all the integrity checks as a whole or five datoms at a time in less than 1.5 seconds. Transacting one datom at a time, it takes longer time. Therefore, it is preferable to have batch transactions.

See here for a detailed analysis of the results.

🎂 Upgrade

Please read this for information regarding upgrading your existing database from older versions.

🌎 Roadmap

These are the tentative goals that we try to reach as soon as we can. We may adjust the priorities based on feedback.

  • 0.4.0 Native image and native command line tool. [Done 2021/02/27]
  • 0.5.0 Native networked server mode with role based access control. [Done 2021/09/06]
  • 0.6.0 As a search engine: full-text search across database. [Done 2022/03/10]
  • 0.7.0 Explicit transactions, lazy results loading, and results spill to disk when memory is low. [Done 2022/12/15]
  • 0.8.0 Long ids; composite tuples; enhanced search engine ingestion speed. [Done 2023/01/19]
  • 0.9.0 New Datalog query engine with improved performance. [Done 2024/03/09]
  • 1.0.0 New rule evaluation algorithm, incremental view maintenance, documentation in book form.
  • 1.1.0 Option to store data in compressed form.
  • 2.0.0 Dense numeric vector indexing and similarity search.
  • 2.1.0 Transaction log storage and access API.
  • 2.2.0 Read-only replicas for server.
  • 3.0.0 Automatic document indexing.
  • 3.1.0 Fully automatic schema migration on write.
  • 4.0.0 loom graph protocols and common graph algorithms.
  • 5.0.0 Distributed mode.

💾 Differences from Datascript

Datascript is developed by Nikita Prokopov that "is built totally from scratch and is not related by any means to" Datomic®. Datalevin started out as a port of Datascript to LMDB, but differs from Datascript in more significant ways than just the difference in data durability and running mode:

  • Datalevin has a cost-based query optimizer, so queries are truly declarative and clause ordering does not affect query performance.

  • Datalevin is not an immutable database, and there is no "database as a value" feature. Since history is not kept, transaction ids are not stored.

  • Datoms in a transaction are committed together as a batch, rather than being saved by with-datom one at a time.

  • ACID transaction and rollback are supported.

  • Lazy results set and spill to disk are supported.

  • Entity and transaction integer ids are 64 bits long, instead of 32 bits.

  • Respects :db/valueType. Currently, most Datomic® value types are supported, except uri. Values of the attributes that are not defined in the schema or have unspecified types are treated as EDN blobs, and are de/serialized with nippy.

  • In addition to composite tuples, Datalevin also supports heterogeneous and homogeneous tuples.

  • Has a value leading index (VAE) for datoms with :db.type/ref type attribute; The attribute and value leading index (AVE) is enabled for all datoms, so there is no need to specify :db/index, similar to Datomic® Cloud. Does not have AEV index, in order to save storage and improve write speed.

  • Stored transaction functions of :db/fn should be defined with inter-fn, for function serialization requires special care in order to support GraalVM native image. It is the same for functions that need to be passed over the wire to server or babashka.

  • Attributes are stored in indices as integer ids, thus attributes in index access are returned in attribute creation order, not in lexicographic order (i.e. do not expect :b to come after :a). This is the same as Datomic®.

  • Has no features that are applicable only for in-memory DBs, such as DB as an immutable data structure, DB pretty print, etc.

👶 Limitations

  • Attribute names have a length limitation: an attribute name cannot be more than 511 bytes long, due to LMDB key size limit.

  • Because keys are compared bitwise, for range queries to work as expected on an attribute, its :db/valueType should be specified.

  • Floating point NaN cannot be stored.

  • Big integers do not go beyond the range of [-2^1015, 2^1015-1], the unscaled value of big decimal has the same limit.

  • The maximum individual value size is 2GB. Limited by the maximum size of off-heap byte buffer that can be allocated in JVM.

  • The total data size of a Datalevin database has the same limit as LMDB's, e.g. 128TB on a modern 64-bit machine that implements 48-bit address spaces.

  • Currently supports Clojure on JVM 8 or the above, but adding support for other Clojure-hosting runtime is possible, since bindings for LMDB exist in almost all major languages and available on most platforms.

🛍️ Alternatives

If you are interested in using the dialect of Datalog pioneered by Datomic®, here are your current options:

  • If you need time travel and features backed by the authors of Clojure, you should use Datomic®.

  • If you need an in-memory store that has almost the same API as Datomic®, Datascript is for you.

  • If you need a graph database, you may try Asami.

  • If you need features such as bi-temporal graph queries, you may try XTDB.

  • If you need a durable store with some storage choices, you may try Datahike.

  • There was also Eva, a distributed store, but it is no longer in active development.

  • If you need a simple, fast and versatile durable store with a battle tested backend, give Datalevin a try.

🔃 Contact

We appreciate and welcome your contributions or suggestions. Please feel free to file issues or pull requests.

If commercial support is needed for Datalevin, talk to us.

You can talk to us in the #datalevin channel on Clojurians Slack.

License

Copyright © 2020-2024 Juji, Inc..

Licensed under Eclipse Public License (see LICENSE).

datalevin's People

Contributors

abrooks avatar andersmurphy avatar benfleis avatar boxed avatar brandonbloom avatar cavallium avatar cgrand avatar cjsauer avatar claj avatar darkleaf avatar den1k avatar dthume avatar dvingo avatar garrett-hopper avatar huahaiy avatar ieugen avatar izirku avatar jdf-id-au avatar kennyjwilli avatar lread avatar lynaghk avatar mattsenior avatar montyxcantsin avatar ngrunwald avatar refset avatar respatialized avatar theexgenesis avatar thegeez avatar tonsky avatar zachcp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datalevin's Issues

More thaw failures

Running into an error similar to #13:

Execution error (ExceptionInfo) at datalevin.lmdb.LMDB/get_range (lmdb.clj:739).
Fail to get-range: "Thaw failed against type-id: 16"
clojure.lang.ExceptionInfo: Fail to get-range: "Thaw failed against type-id: 16" {:dbi "datalevin/eav", :k-range [:closed #object[datalevin.bits.Indexable 0x2f4befaf "datalevin.bits.Indexable@2f4befaf"] #object[datalevin.bits.Indexable 0x58644bae "datalevin.bits.Indexable@58644bae"]], :k-type :eavt, :v-type :long}

It only appears when I try to flatten a query, e.g.

this breaks

(d/q '[:find [(pull ?e [*]) ...] ; <--- difference
       :where [?e :block/refs _]]
  @conn)

but this

(d/q '[:find (pull ?e [*])
       :where [?e :block/refs _]]
     @conn)

works.

using 0.3.3

Indexes not sorted?

Looks like unlike datascript/datomic, the indexes are not sorted in datalevin, example:

; in datalevin
(def index-test-conn (d/create-conn {} "temp/index-test"))

(d/transact! index-test-conn [{:edited 1598030089543} {:edited 2001} {:edited 1598030089543} {:edited 4}])
(d/datoms @index-test-conn :avet :edited)

=> [#datalevin/Datom[1 :edited 1598030089543 536870912 true]
 #datalevin/Datom[3 :edited 1598030089543 536870912 true]
 #datalevin/Datom[4 :edited 4 536870912 true]
 #datalevin/Datom[2 :edited 2001 536870912 true]]
;; not sorted by value
results in datascript are sorted by value:
(#datascript/Datom[4 :edited 4 536870913 true]
 #datascript/Datom[2 :edited 2001 536870913 true]
 #datascript/Datom[1 :edited 1598030089543 536870913 true]
 #datascript/Datom[3 :edited 1598030089543 536870913 true])

This is either a bug or not part of the feature set. However, the docstring of datalevin.core/datoms contains

; find N entities with lowest attr value (e.g. 10 earliest posts)
       (->> (datoms db :avet attr) (take N))

; find N entities with highest attr value (e.g. 10 latest posts)
 (->> (datoms db :avet attr) (reverse) (take N))

which should be changed in the latter case.

`dtlv` quote seems to be resolving to syntax-quote, not quote

Thanks for your continued work on datalevin - I integrated it into a clojure-y WM tool I'm working on today and it has been very useful, particularly the babashka pod integration - thanks much!

I hit a strange quirk while diving - before going for the pod, I was trying to wrap dtlv usage via the shell, but it seems that some datalog queries fail because of a quote/syntax-quote issue.

Here's a quick reproduction:

(def conn (get-conn "./somedb"))
(transact! conn [{:name "Datalevin"} {:some "value"} {:another "val"}])
(q (quote [:find ?e :where (or [?e :name "Datalevin"] [?e :some "value"])]) @conn)
(close conn)
❯ dtlv

  Datalevin (version: 0.4.28)

  Type (help) to see available functions. Clojure core functions are also available.
  Type (exit) to exit.

user> (def conn (get-conn "./somedb"))
#'user/conn
user> (transact! conn [{:name "Datalevin"} {:some "value"} {:another "val"}])
{:datoms-transacted 3}
user> (q (quote [:find ?e :where (or [?e :name "Datalevin"] [?e :some "value"])]) @conn)
Unknown rule 'clojure.core/or in (clojure.core/or [?e :name "Datalevin"] [?e :some "value"])
user>

The error shows clojure.core/or being attempted.

I ended up using the bb-pod and things work great that way, but I didn't want to leave this hanging.

Add server mode

Since my stated goal for Datalevin is a versatile database, why not make it a networked server as well while retaining the embedded mode? There are precedences, e.g. neo4j, H2, extremeDB, etc. It is also more conventional in term of database usage. All we need is to add a socket server and a client.

Compare error

It looks like datalevin walks collections and compares items one by one. When two items at the same index have a different type, the comparator errors:

(def temp-conn (d/create-conn "data/datalevin/temp"))
(d/transact! temp-conn [{:foo [:a "a"]}
                        {:foo [:a 1]}])

=> Execution error (ClassCastException) at java.lang.String/compareTo (String.java:134).
class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')

FWIW, this only happens when the collections have the same count length.

how can you connect to existing on-disk database?

@huahaiy

What is the best way to initiate a connection to a saved, on-disk database?

;; example from readme:  initiate DB/add data/close conn
;;
(require '[datalevin.core :as d])

(def conn (d/create-conn {} "/tmp/datalevin-test"))

(d/transact! conn
             [{:name "Frege", :db/id -1, :nation "France", :aka ["foo" "fred"]}
              {:name "Peirce", :db/id -2, :nation "france"}
              {:name "De Morgan", :db/id -3, :nation "English"}])

(d/close! conn)

;; how do I re-establish a DB connection to the existing DB?
;; reopen?
(d/open-db-conn "/tmp/datalevin-test")

Distributed mode

The current implementation is a standalone mode, we will add a distributed mode to allow data replications across multiple nodes.

Design criteria:

Strong consistency. CP in term of CAP theorem: transactions have a consistent total order; support linearziable reads; support dynamic cluster membership. Pass Jepsen tests.

We choose CP because a fast failing (unavailable) system is simpler to program around than a system that sometimes produces wrong results. Simplicity for users is the main design objective for Datalevin. All our design choices, Datalog, mutable DB, and CP are consistent with this goal.

Implementation will use Raft consensus algorithm:

  • Any node can read/write.

  • Write in total order: first goes to the leader, the leader first writes to a transaction log, sends the write to the followers, waits for quorum confirmations, then commits the write and reports success. Write is unavailable if quorum cannot be reached.

  • User can choose one of the three read consistency levels:

  1. By default, read goes to leader to check if this node has the latest data, the leader then asks the followers to obtain quorum confirmation, and replies to the read requester. If confirmed, the reader reads from its local LMDB. This provides linearizable reads.

  2. If leader lease is enabled, the leader doesn't ask the followers, just reply. This level requires the nodes to have clock synchronization. This saves quorum confirmation so it is a bit faster way of achieving linearizable reads.

  3. Optionally, if the user doesn't mind reading stale data, she can choose to bypass the leader and read from local LMDB directly. This has the same read speed as standalone mode, but the data may be outdated.

Raft is a much better solution than the designated transactor concept of Datomic. In raft, the leader is elected, not fixed. With raft, the same transaction total order is achieved without the cost and complexity of operating designated transactors. Even with a standby, transactors are still a single point of failure.

Also, Datomic doesn't seem to have mechanism to ensure linearized reads. "database as a value" does't say the value is the latest version. It could well be an outdated version. The main supported storage backend of Datomic, DynmoDB, is AP only. So consistency is not guaranteed.

We will apply raft globally in the cluster, which means that the cluster size is not unbounded. Since sharding should be something handled on the application level, as the application has more context, the database should not automate sharding. So if unbounded scaling is needed, instead of "just adding more nodes", just run more clusters.

Problem with :db/valueType :db.type/string

Hi, thanks for datalevin!

One thing that took a while to find out, was that specifying a string attribute with :db/valueType :db.type/string seem to mess up string encoding somehow/somewhere. German umlauts for example break when specifying the string type. When NOT specifying the attribute at all beforehand then the encoding stays intact. Don't know if this is because of using a blob when a type has not been specified ...

Hm ... can I help with this?

  • Axel

Incorporate Datascript changes after 0.18.13

There are quite a lot of changes due to Clojurist Together funding of Datascript's Summer of Bugs project. Need to integrate these changes after this flurry of activities settles down. Mainly the composite tuple feature.

Transaction fails on larger byte-arrays

(def conn
  (d/create-conn
    "data/test-conn2"
    {:foo-bytes     {:db/valueType :db.type/bytes}}))

(d/transact! conn [{:foo-bytes (.getBytes "foooo")}])
;; => {:datoms-transacted 1}
(d/transact! conn [{:foo-bytes (.getBytes (str (range 1000)))}])
;; =>
;Execution error (ExceptionInfo) at datalevin.binding.java.LMDB/transact_kv (java.clj:329).
;Fail to transact to LMDB: "Error putting r/w value buffer of \"datalevin/giants\": \"_hash\""

lein test errors

➜  datalevin git:(master) lein version
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
Leiningen 2.9.4 on Java 16.0.1 OpenJDK 64-Bit Server VM
lein test
<boxed math warnings>
Boxed math warning, datalevin/test/core.cljc:42:20 - call: public static boolean clojure.lang.Numbers.isPos(java.lang.Object).
Boxed math warning, datalevin/test/db.cljc:32:24 - call: public static boolean clojure.lang.Numbers.gt(java.lang.Object,long).
Boxed math warning, datalevin/test/db.cljc:35:21 - call: public static java.lang.Number clojure.lang.Numbers.divide(java.lang.Object,long).
Boxed math warning, datalevin/test/query_fns.cljc:90:20 - call: public static boolean clojure.lang.Numbers.gt(java.lang.Object,long).
Syntax error (ClassNotFoundException) compiling new at (cheshire/factory.clj:57:11).
com.fasterxml.jackson.core.TSFBuilder

Full report at:
/var/folders/0l/rm9ggfzd5457q6c5_sy182m40000gn/T/clojure-2396174131371324047.edn
Tests failed.
➜  datalevin git:(master) cat /var/folders/0l/rm9ggfzd5457q6c5_sy182m40000gn/T/clojure-2396174131371324047.edn
{:clojure.main/message
 "Syntax error (ClassNotFoundException) compiling new at (cheshire/factory.clj:57:11).\ncom.fasterxml.jackson.core.TSFBuilder\n",
 :clojure.main/triage
 {:clojure.error/phase :compile-syntax-check,
  :clojure.error/line 57,
  :clojure.error/column 11,
  :clojure.error/source "factory.clj",
  :clojure.error/symbol new,
  :clojure.error/path "cheshire/factory.clj",
  :clojure.error/class java.lang.ClassNotFoundException,
  :clojure.error/cause "com.fasterxml.jackson.core.TSFBuilder"},
 :clojure.main/trace
 {:via
  [{:type clojure.lang.Compiler$CompilerException,
    :message
    "Syntax error compiling new at (cheshire/factory.clj:57:11).",
    :data
    {:clojure.error/phase :compile-syntax-check,
     :clojure.error/line 57,
     :clojure.error/column 11,
     :clojure.error/source "cheshire/factory.clj",
     :clojure.error/symbol new},
    :at [clojure.lang.Compiler analyzeSeq "Compiler.java" 7119]}
   {:type java.lang.NoClassDefFoundError,
    :message "com/fasterxml/jackson/core/TSFBuilder",
    :at [java.lang.Class getDeclaredConstructors0 "Class.java" -2]}
   {:type java.lang.ClassNotFoundException,
    :message "com.fasterxml.jackson.core.TSFBuilder",
    :at
    [jdk.internal.loader.BuiltinClassLoader
     loadClass
     "BuiltinClassLoader.java"
     602]}],
  :trace
  [[jdk.internal.loader.BuiltinClassLoader
    loadClass
    "BuiltinClassLoader.java"
    602]
   [jdk.internal.loader.ClassLoaders$AppClassLoader
    loadClass
    "ClassLoaders.java"
    178]
   [java.lang.ClassLoader loadClass "ClassLoader.java" 522]
   [java.lang.Class getDeclaredConstructors0 "Class.java" -2]
   [java.lang.Class privateGetDeclaredConstructors "Class.java" 3215]
   [java.lang.Class getConstructors "Class.java" 1957]
   [clojure.lang.Compiler$NewExpr <init> "Compiler.java" 2579]
   [clojure.lang.Compiler$NewExpr$Parser parse "Compiler.java" 2671]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7111]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7099]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler access$300 "Compiler.java" 38]
   [clojure.lang.Compiler$LetExpr$Parser parse "Compiler.java" 6388]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7111]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7099]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7099]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyze "Compiler.java" 6749]
   [clojure.lang.Compiler$BodyExpr$Parser parse "Compiler.java" 6124]
   [clojure.lang.Compiler$LetExpr$Parser parse "Compiler.java" 6440]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7111]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7099]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyze "Compiler.java" 6749]
   [clojure.lang.Compiler$BodyExpr$Parser parse "Compiler.java" 6124]
   [clojure.lang.Compiler$FnMethod parse "Compiler.java" 5471]
   [clojure.lang.Compiler$FnExpr parse "Compiler.java" 4033]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7109]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7099]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler access$300 "Compiler.java" 38]
   [clojure.lang.Compiler$DefExpr$Parser parse "Compiler.java" 596]
   [clojure.lang.Compiler analyzeSeq "Compiler.java" 7111]
   [clojure.lang.Compiler analyze "Compiler.java" 6793]
   [clojure.lang.Compiler analyze "Compiler.java" 6749]
   [clojure.lang.Compiler eval "Compiler.java" 7185]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [cheshire.core$eval28931$loading__6737__auto____28932
    invoke
    "core.clj"
    1]
   [cheshire.core$eval28931 invokeStatic "core.clj" 1]
   [cheshire.core$eval28931 invoke "core.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7170]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [babashka.pods.impl$eval28861$loading__6737__auto____28862
    invoke
    "impl.clj"
    1]
   [babashka.pods.impl$eval28861 invokeStatic "impl.clj" 1]
   [babashka.pods.impl$eval28861 invoke "impl.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7170]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [babashka.pods.jvm$eval28853$loading__6737__auto____28854
    invoke
    "jvm.clj"
    1]
   [babashka.pods.jvm$eval28853 invokeStatic "jvm.clj" 1]
   [babashka.pods.jvm$eval28853 invoke "jvm.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7170]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [babashka.pods$eval28847$loading__6737__auto____28848
    invoke
    "pods.clj"
    1]
   [babashka.pods$eval28847 invokeStatic "pods.clj" 1]
   [babashka.pods$eval28847 invoke "pods.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7170]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [pod.huahaiy.datalevin_test$eval28841$loading__6737__auto____28842
    invoke
    "datalevin_test.clj"
    1]
   [pod.huahaiy.datalevin_test$eval28841
    invokeStatic
    "datalevin_test.clj"
    1]
   [pod.huahaiy.datalevin_test$eval28841 invoke "datalevin_test.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7170]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.RT loadResourceScript "RT.java" 381]
   [clojure.lang.RT loadResourceScript "RT.java" 372]
   [clojure.lang.RT load "RT.java" 459]
   [clojure.lang.RT load "RT.java" 424]
   [clojure.core$load$fn__6856 invoke "core.clj" 6115]
   [clojure.core$load invokeStatic "core.clj" 6114]
   [clojure.core$load doInvoke "core.clj" 6098]
   [clojure.lang.RestFn invoke "RestFn.java" 408]
   [clojure.core$load_one invokeStatic "core.clj" 5897]
   [clojure.core$load_one invoke "core.clj" 5892]
   [clojure.core$load_lib$fn__6796 invoke "core.clj" 5937]
   [clojure.core$load_lib invokeStatic "core.clj" 5936]
   [clojure.core$load_lib doInvoke "core.clj" 5917]
   [clojure.lang.RestFn applyTo "RestFn.java" 142]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$load_libs invokeStatic "core.clj" 5974]
   [clojure.core$load_libs doInvoke "core.clj" 5958]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [clojure.core$require invokeStatic "core.clj" 5996]
   [clojure.core$require doInvoke "core.clj" 5996]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.core$apply invokeStatic "core.clj" 669]
   [user$eval233 invokeStatic "form-init5029336140147651408.clj" 1]
   [user$eval233 invoke "form-init5029336140147651408.clj" 1]
   [clojure.lang.Compiler eval "Compiler.java" 7181]
   [clojure.lang.Compiler eval "Compiler.java" 7171]
   [clojure.lang.Compiler load "Compiler.java" 7640]
   [clojure.lang.Compiler loadFile "Compiler.java" 7578]
   [clojure.main$load_script invokeStatic "main.clj" 475]
   [clojure.main$init_opt invokeStatic "main.clj" 477]
   [clojure.main$init_opt invoke "main.clj" 477]
   [clojure.main$initialize invokeStatic "main.clj" 508]
   [clojure.main$null_opt invokeStatic "main.clj" 542]
   [clojure.main$null_opt invoke "main.clj" 539]
   [clojure.main$main invokeStatic "main.clj" 664]
   [clojure.main$main doInvoke "main.clj" 616]
   [clojure.lang.RestFn applyTo "RestFn.java" 137]
   [clojure.lang.Var applyTo "Var.java" 705]
   [clojure.main main "main.java" 40]],
  :cause "com.fasterxml.jackson.core.TSFBuilder",
  :phase :compile-syntax-check}}

Dynamic library path.

So I'm testing on MacOS and I had to symlink the library for lmdb to do this:

ln -s ~/homebrew/Cellar/lmdb/lib/liblmdb.dylib /usr/local/opt/lmdb/lib/liblmdb.dylib

This is because Homebrew, as you can tell, is installed in a different location than it normally is.

Not sure how to fix it. Here's the original error.

$ dtlv help
dyld: Library not loaded: /usr/local/opt/lmdb/lib/liblmdb.dylib
  Referenced from: /Users/main/.local/bin/dtlv
  Reason: image not found
fish: 'dtlv help' terminated by signal SIGABRT (Abort)

Performance on Windows JDK 8 2-3x slower

Posting here for posterity, as referenced in the reddit thread here related to the benchmark fork with datahike added datalevinbench.

Results on ubuntu with same JDK were not reproduced, so it appears to be a lmdb-java windows problem.
I attempted to see if the new lmbd-java native libs per lmdb-java #148 would have an impact to no avail. Datalevin ran fine with lmbd-java 0.9.24-1 though (benchmarks completed).

jline integration in command line shell

Right now, to get an acceptable experience in dtlv, one has to use rlwrap, i.e. launch it like this rlwrap dtlv. It would be nice to integrate with jline3 library to have a better experience natively.

benchmark errors

➜  bench git:(master) clj ./bench.clj
WARNING: When invoking clojure.main, use -M
version   	init
latest-datalevin 	WARNING: When invoking clojure.main, use -M
Syntax error (ClassNotFoundException) compiling at (datalevin/util.cljc:1:1).
org.graalvm.nativeimage.ImageInfo

Full report at:
/var/folders/0l/rm9ggfzd5457q6c5_sy182m40000gn/T/clojure-2933252024446546360.edn
Syntax error (ExceptionInfo) compiling at (./bench.clj:135:1).
ERROR

Full report at:
/var/folders/0l/rm9ggfzd5457q6c5_sy182m40000gn/T/clojure-12924256625911664920.edn

Transaction fails when replacing large byte-arrays

@huahaiy, I ran into the error from #54 again with a narrower use case. Can you reproduce the following:

(-> (d/empty-db nil {:id {:db/valueType :db.type/string
                          :db/unique :db.unique/identity}
                     :bytes {:db/valueType :db.type/bytes}})
    (d/db-with [{:id "foo"
                 :bytes (.getBytes (apply str (range 1000)))}])
    (d/db-with [{:id    "foo"
                 :bytes (.getBytes (apply str (range 1000)))}]))

;; Execution error (ExceptionInfo) at datalevin.binding.java.LMDB/transact_kv (java.clj:329).
;; Fail to transact to LMDB: "Error putting r/w key buffer of \"datalevin/giants\": nil"

When not transacting :id the identity key, it works without error

Remove dependency on LMDBJava

LMDBJava implements LMDB comparators in Java, which creates a lot of overhead, as each comparison calls into Java from C. Since our native LMDB wrapper has already implemented the same logic, we can remove these overhead by switching to our own implementation that uses our own native C comparator. This also removes some duplicated code to save some maintenance burden. Another benefit is that we only need to implements the part that we need. Unfortunately, our native wrapper only works in GraalVM, so we need to write this wrapper again for regular JVM.

There's no point redoing the work that LMDBJava already does that uses Unsafe and reflections, as these are going to be removed from JDK in the future. If we don't want to do wasted work, we will have to wait for Project Panama. At least, we can try the new memory access API that seems to have already landed in JDK14, see design and a blog post.

So for JDK version prior to 14, we use LMDBJava, for versions after 14, we uses the new memory access API. Eventually, the dependency on LMDBJava can be removed.

Support fulltext function

Similar to fulltext function of Datomic, but work across multiple attributes. Use a Clojure native search expression syntax, instead of using a lucene search string, which is not very composable. Finally, allow fuzzy search as typos are common.

Package binary

We should make dtlv command line tool available for users to easily install, including installing the dependency (LMDB). For MacOS and Linux, we can use homebrew. For Windows, need to figure out. Also release datalevin as a babashka pod.

Locking on primitives and/or Longs

I stumbled upon an issue while trying to compile this project with GraalVM native-image: https://github.com/borkdude/datalevin-native

Just sharing a conversation I had with Alex Miller on Slack about it.

alexmiller: You can't lock on a primitive long - it has to be an object
That code doesn't make sense
borkdude yeah, I figured the same. it comes from here:
https://github.com/juji-io/datalevin/blob/0588dde496b18ced06fed036db3f969c35c69ef8/src/datalevin/storage.clj#L226 
alexmiller yeah, that's bad
3:14 PM
borkdude so it does not work, but it doesn't fail in the JVM either?
3:25 PM
alexmiller my guess would be that in some cases the primitive long is getting up cast. that's the only way it could work (but locking on a Long value is still a bad idea). this code is just wrong.
3:27 PM
the Java numeric value objects can be cached around 0 and that means the scope of your Long sharing can escape your own code (similarly, you should never lock on a String instance which might be interned) (edited) 
3:28 PM
borkdude :thumbsup: I'll make an issue with this conversation in their repo
alexmiller it's best to make an explicit lock Object

Portable tempfile path

Current implementation implicitly expects that "/tmp" is system temporary file, which holds except for Windows. On windows it's typically c:/users/joinr/AppData/local/Temp I think. JVM will tell you via (System/getProperty "java.io.tmpdir"). from #19

Comparing dates in :where clause

I have a query to which I pass some Date, by which I would like to filter some date attribute. In Datomic I can use predicates like [(.after ?myattr ?dateparam)] but doing it here I get error:
Execution error (ExceptionInfo) at datalevin.query/filter-by-pred (query.cljc:577). Unknown predicate '.after in [(.after ?tm ?d)]

Type hints in query don't change anything. Passing long value as a date gives:

Execution error (ClassCastException) at datalevin.query/-call-fn$fn (query.cljc:564). java.util.Date cannot be cast to java.lang.Number

So how could I filter dates within query?

Some Data Freeze/Thaw Issues

Hi @huahaiy ,

I love the idea of Datalevin and have been toying with the idea of generating DBs for scientific data that I could then distribute (e.g. sqlite for datalog). I have tried to use Datalevin on a download from the NPAtlas and in doing so have encountered a few errors/bugs that I am hoping you can help me resolve.

Errors Observed

I've posted a hopefully reproducible example as a gist. In this example, after I process and load the file in to a DB I run into a few errors during the query.

  1. Fail to get-value: "Thaw failed against type-id: 78" (when using a schema)
  2. Fail to get-value: "Thaw failed against type-id: 16" (using empty schema)
  3. Empty Set when expecting results (using empty schema)

Not all queries are broken but when I query using for:smiles I run into issues. These are strings with a bunch of special characters and I wonder if there is some string-escaping happening somewhere during the freeze. I was able to nippy-freeze/thaw on them so that could be wrong but it is my best guess.

System Environment

I'm on OSX with Clojure 10.1, Java 8, and DataLevin 0.2.16

(System/getProperty "java.vm.version")
=> "25.192-b01"

Deserialize Exception thrown for some string

deps.edn

{:deps {datalevin {:mvn/version "0.2.13"}}}

code for reproduce

(require '[datalevin.core :as d])
(def conn (d/create-conn {:id {:db/valueType :db.type/long}
                          :text {:db/valueType :db.type/string}} 
                         "tmp"))
(d/transact! conn [{:text "[7/3, 15:36]"
                    :id 3}])
(d/q '[:find (pull ?e [*])
       :where 
       [?e :id 3]]
     @conn)

exception

Error printing return value (ExceptionInfo) at datalevin.lmdb.LMDB/get_value (lmdb.clj:680).
Fail to get-value: "Thaw failed against type-id: 78"

Updating an instant field on an entity throws

(def conn (d/create-conn "temp" {:foo/id {:db/unique :db.unique/identity
                                          :db/valueType :db.type/string}
                                 :foo/date {:db/valueType :db.type/instant}}))

;; works the first time
(d/transact! conn [{:foo/id "foo"
                    :foo/date (java.util.Date.)}])

;; run again
(d/transact! conn [{:foo/id "foo"
                    :foo/date (java.util.Date.)}])
;; throws:
Execution error (ExceptionInfo) at datalevin.lmdb.LMDB/transact (lmdb.clj:698).
Fail to transact to LMDB: "Error putting r/w key buffer of \"datalevin/eav\": \"class java.lang.Long cannot be cast to class java.util.Date (java.lang.Long and java.util.Date are in module java.base of loader 'bootstrap')\""

Bundle all necessary assets for native compilation

Right now there are still some manual work for downstream users to integrate into their native image compilation script. These could be simplified.

  1. reflect-config.json and other config options could be put in the jar. See https://www.graalvm.org/reference-manual/native-image/BuildConfiguration/

  2. pre-built native libs can also be put in the jar, along with the header files. Hopefully the native image compiler can locate them on the classpath. Need to do some experimentation.

Mixed variables in complex queries

If the number of assigned variables in the query is greater than 8, everything becomes mixed up. Probably guilty is the zipmap used somewhere in the code, which only keeps the order when the number of keys <=8. Bug does not exist in datascript

(require '[taoensso.encore :as e])

(let [data (->> (range 1000)
                (map (fn [i] {:a (e/uuid-str)
                              :b (e/uuid-str)
                              :c (e/uuid-str)
                              :d (e/uuid-str)
                              :e (rand-int 3)
                              :f (rand-int 3)
                              :g (rand-int 3)
                              :h (rand-int 3)})))
      conn (d/create-conn {:a {:db/valueType :db.type/string}
                           :b {:db/valueType :db.type/string}
                           :c {:db/valueType :db.type/string}
                           :d {:db/valueType :db.type/string}
                           :e {:db/valueType :db.type/double}
                           :f {:db/valueType :db.type/double}
                           :g {:db/valueType :db.type/double}
                           :h {:db/valueType :db.type/double}})
      db    (-> (d/empty-db)
                (d/db-with data))]
  (d/q '[:find     ?eid1 .
         :where
         [?eid1 :a ?a1]
         [?eid1 :b ?b1]
         [?eid1 :c ?c1]
         [?eid1 :d ?d1]
         [?eid1 :e ?e1]
         [?eid1 :f ?f1]
         [?eid1 :g ?g1]
         [?eid1 :h ?h1]
         [?eid2 :e ?e1]]
    db))
;; => "efff5a2a-6ee0-49fa-bd4e-7f5cc3a8bc4c"

Instead of eid, a random variable is returned.

Problem with a large db

When the db file reaches about 100 Mb, error occurs during the transaction

Execution error (ExceptionInfo) at datalevin.lmdb.LMDB/transact (lmdb.clj:665).
Fail to transact to LMDB: "\"BufferOverflow: trying to put 8 bytes while 7 remaining in the ByteBuffer.\""

EDIT:

Limit is not 100Mb. After deleting db files and transact the data again, it works.

Self-contained jar convenient for native image compilation

Right now, we do not bundle native libraries we need in the released jar.

This is not a problem for Datalevin on JVM, as LMDBJava already bundles LMDB binary and that's all we need.

However, for people who include Datalevin as a dependency in their application and want to compile their application into a native image , they have to do some manual work right now:

  • merge our reflect-config.json into theirs. I don't think this step can be easily avoided.
  • compile libdtlv.a. This step should be avoided. We should bundle this in the jar and put it on the classpath.
  • install LMDB on their platform. This step should also be avoided. We should bundle it and put on the classpath.

Also these bundle binaries need to have platform specific versions.

Limit the size of in-memory transaction cache

If the system is running for an extended time, the in-memory transaction cache may grow to be too big for the heap.

Need to come up with a policy to clean it up, e.g. a size limit (datom count) option set when opening the DB.

invalid reuse of reader locktable slot

Keep an old connection, use a new connection to transact, close the db. Try query on old connection, this error appears. Run the query second time, "transaction has not been reset" is thrown.

[Update] This is an expected behavior. See below.

New query engine

Since Asami is a few orders magnitude faster than Datascript in query processing, we will switch to it for query processing, but retain other Datascript APIs, e.g. pull, index access, etc.

The main advantage in term of algorithm, is that Asami engine works on 3 nested maps as the index:

{entity {attribute #{value}}}
{attribute {value #{entity}}}
{value {entity #{attribute}}} 

So Asami does a lot less work than Datascript in handling the query. It also has a query planner to do optimization.

My conversation with Asami's author, Paula Gearon @quoll, indicates to me that it is possible to implement Asami query engine on top of a storage that is the same as what we currently have. She suggested to implements Graph protocol of Asami https://github.com/threatgrid/asami/blob/main/src/asami/graph.cljc#L9

The only thing missing that would be non-trivial is the implementation of resolve-triple function, which "instead of returning all the matched full datoms, it returns the bound values only", so we will get nested maps back to work with, instead of purely doing joins with datoms, which would be much slower.

We will get to this after raft.

[update] It turned out that Asami is not faster, for it returns lazy-seq, so it seems to be fast. With a doall, it was no longer the fastest. Datalevin is acutally currently the fastest. https://clojurians-log.clojureverse.org/datalog/2022-01-02/1641144801.084700

Indicee

Hi! How are indices defined in datalevin? I would like to create indices(composed of multiple fields) to have faster queries and am not sure how to do it. Thanks!

Making entities transactable

(This feature would mark a diversion from the datascript/datomic API.)

I've always wondered: why are entities not transactable? I find myself converting entities to maps all the time solely to transact them. This still causes problems when entities nest other entities. So here are a few simple ideas on how entities could be treated in transactions:

1. Entities could be treated as refs in transactions

(def schema
  {:user/friends #:db{:valueType   :db.type/ref
                      :cardinality :db.cardinality/many}})

(def ent (d/touch (d/entity @conn 1)))

ent ; eval
;; =>
{:db/id 1
 :user/email "[email protected]"
 :user/friends #{{:db/id 2} {:db/id 3}}} ; <-- nested entities

Now I convert it to a map

(def ent-map (into {:db/id (:db/id ent)} ent))
ent-map ; eval
;; =>
{:db/id 1
 :user/email "[email protected]"
 :user/friends #{{:db/id 2} {:db/id 3}}}
;; looks the same but nested entities (under :user/friends) have not been converted

I try to transact it

(d/transact! conn [(assoc ent-map :user/email "[email protected]")])
;; throws:
;; Execution error (ExceptionInfo) at datalevin.db/entid (db.cljc:385).
;; Expected number or lookup ref for entity id, got #:db{:id 2}

So I can either dissoc the :user/friends map-entry or convert contained entities to refs

(d/transact! conn [(-> ent-map
                       (dissoc :genstyle.project/population)
                       ;; OR (update :user/friends #(mapv :db/id %)) 
                       (assoc :user/email "[email protected]"))])

We could spare ourselves from this by treating entities as refs in transactions. The database already walks nested data-structures to resolve refs so why not resolve entities as refs, also?

2. Entities to return maps on update

datalevin.impl.entity/Entity implements clojure.lang.Associative which currently only throws errors:

clojure.lang.Associative
       ;; some methods elided
       (empty [e]         (throw (UnsupportedOperationException.)))
       (assoc [e k v]     (throw (UnsupportedOperationException.)))
       (cons  [e [k v]]   (throw (UnsupportedOperationException.)))

Instead assoc could return a hashmap

(deftype Entity [db eid touched cache]
  ;; elided
  clojure.lang.Associative
  (assoc [e k v]
    (let [e-map (cond-> {:db/id eid}
                  ; add other kvals if touched
                  touched (into e))]
     (assoc e-map k v))))

This would also make update possible. Together this means that the change of email to ent from above, could look like this:

(d/transact! conn [(assoc ent :user/email "[email protected]")])

I would've already implemented this for my own projects but unfortunately Clojure (unlike ClojureScript) doesn't allow to overwrite a Type's methods. To achieve this one would have to for Datalevin and change the definition of datalevin.impl.entity/Entity so I wanted to raise the issue here first and see what @huahaiy's thoughts are.

Handle migration in swap-attr

It should allow safe migration that does not alter existing data, and refuse unsafe schema changes that are inconsistent with existing data.

Writing after re-opening a connection overwrites entities

Thanks for creating datalevin! I've enjoyed toying with it for the last few days, great work!

After closing and re-creating a connection, the db seems to start overwriting entity data, and fails to update :db.type/instant values. It looks like the recreated connection has lost its state and indexes.

I created a quick reproduction here.

The symptom was an error while transacting:

Unhandled clojure.lang.ExceptionInfo
   Fail to transact to LMDB: "\"Error putting r/w key buffer of datalevin/eav:
   java.lang.Long cannot be cast to java.util.Date\""
   {:txs
    [[:del
      "datalevin/eav"
      #object[datalevin.bits.Indexable 0x5b5ff66f "datalevin.bits.Indexable@5b5ff66f"]
      :eav] .... etc

The 'overwritten' entities are likely the result of the new connection re-using old entity ids.

Datalevin as a C Shared library

For other languages to use Datalevin, it is ideal to make Datalevin a C shared library. However, it is inconvenient to use Datalevin if there is no C library that can handle EDN data. It is OK to treat query/transaction input as plain text, but it would be much more convenient if the results could be consumed by C easily. Unfortunately, there's no C library for EDN data. Another option is for Datalevin to output JSON results instead, with inevitable loss of type information.

I think this problem does not just affect Datalevin, but also the whole Clojure ecosystem. The adoption of Clojure is hindered by the lack of interoperability with the native world, which is primarily built on top of C. An EDN C library would hugely alleviate this problem.

Datalevin for Nodejs/javascript

Datalevin is an amazing project. I want to use datalog with Nodejs. Is it possible to compile Datalevin to Javascript with Clojurescript?

ClassCastException when transacting multiple bytearrays in one transaction

@huahaiy do you know what this could be?

(def conn
  (d/create-conn
    "data/test-conn"
    {:entity-things {:db/valueType   :db.type/ref
                     :db/cardinality :db.cardinality/many}
     :foo-bytes     {:db/valueType :db.type/bytes}}))

(d/transact! conn [{:foo-bytes (.getBytes "foooo")}])
; => {:datoms-transacted 1}

(d/transact! conn [{:entity-things
                    [{:foo-bytes (.getBytes "foooo")}]}])
;=> {:datoms-transacted 2}

(d/transact! conn [{:entity-things
                    [{:foo-bytes (.getBytes "foooo")}
                     {:foo-bytes (.getBytes "foooo")}]}])

; =>
; Execution error (ClassCastException) at (REPL:1).
; null

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.