Comments (7)
One of the things I'm curious to explore is the idea of making an implementation of https://github.com/Engelberg/ubergraph that uses something like datahike behind the scenes so that you can create really large graphs and aren't limited by memory. My goal would be to make the durable aspect as invisible to the user as possible, so you can use it exactly as you would an in-memory data structure. That would mean that every single change would need to be its own equally valid "snapshot". No concept of the "current graph" just as there isn't when you work with it in memory.
from datahike.
Well, you are absolutely right. I have deliberately used the atom to linearize the write operations through locking in a simple way. datahike still has the with
form that should be applicable to the connection atom in memory so you can either manually flush or transact afterwards and would overwrite the datahike identity. There is no reason that you cannot also make these parallel versions durable and they would even do structural sharing in the store. Do you have specific requirements? I can provide a dedicated flush-db routine and allow concurrent forks in one store. Honestly I tried to make it useful by having a comfortable entry point and am very happy to figure out what people would like to do with it.
Since I have worked on CRDTs before, which are by their nature forkable and joinable, I am at the moment rather trying to expose the the internals and explore different approaches myself. One idea that I have is to add a conflict-resolution mechanism to attributes in the schema, so they would be automatically joinable. This would not work for the whole database of course though.
from datahike.
That sounds cool! I still have to incorporate my complex network analysis algorithms for loom. I have also yesterday thought about doing something like this for clara and factui. You can just use https://github.com/replikativ/datahike/blob/master/src/datahike/core.cljc#L204 for instance, this will give you in memory forks and I can provide a simple flush routine separate from the datomic-like API.
Note that in effect this means just passing through the flush functionality of the hitchhiker-tree. I have not done a lot of fancyness for datahike, just glue code between datascript and the hitchhiker-tree. My main point atm. is to get across how great these persistent durable datastructures are to build complex state management systems that store and distribute state (read: distributed databases :) ) through simple recomposition of good in-memory libraries. The hitchhiker-tree might have limits when you want to write more than 10000 txs/sec, but I think Clojure has a huge potential in this space also beyond that.
from datahike.
So maybe it is easier to just put the node-map
and maybe attrs
here:
https://github.com/Engelberg/ubergraph/blob/master/src/ubergraph/core.clj#L124
in a hitchhiker-tree (which is just a sorted map on disk). It is persistent, so you can decide on your own when and how to make it durable. If you have questions about that, feel free to ping me.
from datahike.
from datahike.
redis is one backend for the hitchhiker-tree. I have ported the hitchhiker-tree to support core.async based IO for cljs and supplied a konserve backend a year ago:
https://github.com/datacrypt-project/hitchhiker-tree/blob/master/src/hitchhiker/konserve.cljc
In case of datahike I just map the different URL schemes to konserve backends, but I think the redis backend for datahike might be interesting as well. It is in effect two backend abstraction layers, that of the hitchhiker-tree and the one of konserve, which might be confusing. If you have any questions, I am happy to help in porting ubergraph. The hitchhiker-tree is solid, but using it in more projects would definitely help with things like garbage collection, performance improvements and serialization options.
from datahike.
I will close this for now, feel free to reopen it if you have questions or suggestions how to improve the current libraries for your use cases.
from datahike.
Related Issues (20)
- [Bug]: datahike.migrate has a problem with schema/double (which cbor converts to float) HOT 2
- [Bug]: unneeded dependencies pulled HOT 7
- Integrate HTTP server HOT 4
- Improve connection handling HOT 1
- [Bug]: NullPointerException trying to transact from CLI HOT 8
- [Bug]: FileNotFoundException over resources/datahike-logo.txt HOT 2
- [Bug]: Inconsistent treatment of invalid constant values HOT 2
- [Bug]: Unable to use Datahike as a git library HOT 2
- [Bug]: `pull-many` query with 3 attr-ids on a range of 500 entities takes ~2,900 ms
- Ability to disable `ensure-stored-config-consistency` HOT 7
- chore: simplify state management
- [Bug]: :config-does-not-match-stored-db for file storage when using VPN HOT 5
- [Bug]: Problem with bump org.babashka/tools-deps-native from 0.1.1 to 0.1.2 and `datahike-logo.txt` not being packaged into the jar
- [Bug]: Metadata is incorrect for various functions
- [Bug]: Changing cache size throws exception HOT 1
- [Bug]: bind-by-fn computes wrong result if not all symbols have values
- [Bug]: Failure to transact value`:db/id` for attribute of type `:db.type/keyword`
- [Bug]: Pulled attributes are not correct when attribute-refs are used
- Make history, as-of and since composable
- [Bug]: Comparisons in queries fail when mixing values of different types
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datahike.