lostatc / acid-store Goto Github PK
View Code? Open in Web Editor NEW[UNMAINTAINED] A transactional and deduplicating virtual file system
License: Apache License 2.0
[UNMAINTAINED] A transactional and deduplicating virtual file system
License: Apache License 2.0
Hi, this is a bug report.
We want to open a repository if it exists, create it if not, which is the purpose of OpenMode::Create
.
However, when doing so, at least when using FileRepo
and DirectoryConfig
, the repo opening fails at the second opening because the implementation attempted to re-create the directory, which is not was was requested.
Repro case (main.rs):
use std::path::{ PathBuf };
use acid_store::repo::{ file::FileRepo, OpenOptions, OpenMode, };
use acid_store::store::{ DirectoryConfig };
fn main() {
use std::env;
let args: Vec<String> = env::args().collect();
let config = DirectoryConfig{
path: PathBuf::from(&args[1])
};
let _repo : FileRepo = OpenOptions::new()
.mode(OpenMode::Create) // This seems to be ignored!
.open(&config)
.unwrap();
let stdin = std::io::stdin();
loop {
let mut input = String::new();
stdin.read_line(&mut input).expect("failed to read input");
if input == "exit\n" {
break;
}
}
}
Compile and run it twice:
cargo run /tmp/myrepo
cargo run /tmp/myrepo
Failure on any run with the same directory argument after the first call with that directory:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Store(File exists (os error 17))', src/main.rs:17:10
No error, no attempt at creating the directory when we use OpenMode::Create
and successful opening of the repository.
In open-options.rs
we can locate this call which shows that the mode is handled after attempting to access the store, instead of in the store.
let mut store = config.open()?; // HERE
match self.mode {
OpenMode::Open => self.open_repo(store),
OpenMode::Create => { // HANDLED HERE
Going into the call to config.open()
then leads to this code:
fn open(&self) -> crate::Result<Self::Store> {
// Create the blocks directory in the data store.
create_dir_all(&self.path)
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?;
create_dir(self.path.join(STORE_DIRECTORY))
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?; // FAILS HERE
create_dir(self.path.join(STAGING_DIRECTORY))
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?;
create_dir(self.path.join(type_path(BlockType::Data)))
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?;
create_dir(self.path.join(type_path(BlockType::Lock)))
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?;
create_dir(self.path.join(type_path(BlockType::Header)))
.map_err(|error| crate::Error::Store(anyhow::Error::from(error)))?;
We can see that the option is not handled at the right level and directory creation is "forced" systematically.
We did not attempt to see if the same issue appear with other kinds of stores. We intend to check with S3.
This project experiences frequent breaking API changes and hasn't seen significant real-world usage.
From what I can see the underlying principle has a real-world usage in https://github.com/rustic-rs/rustic
WRT the Backend
trait.
Would be nice if you would have a look and give some feedback (:
Cheers! π₯
First of all, this is an amazing project.
I am very impressed with how many different back-ends are supported right now, and how readable most of the source code is! π
I have one question about how acid-store works, with relation to other database/datastore-like systems that for instance use a 'log structured merge tree' implementation where incoming transactions are added to a write-ahead log (for consistency/durability) as well as an in-memory dictionary (for speed of lookups using recent data), and periodically merge (some or all of) this data into the blocks stored on the permanent back-end.
Is this similar to what acid-store does as well? Or does acid-store always immediately (when calling .commit()
) perform an update to the particular blocks in the back-end that require a change?
Put differently: Does acid-state
perform any kind of amortization to make the average insert/update faster, or not?
First of all, thanks a lot for making this crate! I am very interested in trying it out, and am currently experimenting with it. This issue is a platform for communicating my findings and experiences in the hopes to provide you with yet another usecase which might one day be supported.
I am building a mining platform for crates.io which consists of 3 stages:
For the current implementation, we are talking about 6 * 215000 + 36000 objects, that are read often, and usually written in small batches, excluding the first run when we see all ~215k crate versions at once.
Eventually I would like to run 'criner' on a Raspberry Pi with 512MB of memory for all tasks that don't require running cargo/rustc. Thus I would prefer a DB which trades of speed for lower and predictable memory consumption. Sled clearly is optimized for speed, which is great, but it's something I can't even pay for on my current hardware as it simply consumes too much memory when during migrations and when there are too many objects.
The database of choice is Sled, but I ran into the following issues that make me seek out an alternative.
Despite sled generally being lightning fast, it costs a lot of memory to support it. Now I run into problems where the memory consumption is disproportional and so high that I feel uncomfortable proceeding with it. Database sizes tend to be large, even though that isn't my primary concern. It's the concern of not being able to interact with the data anymore that one spent days producing. For instance, migrating from one version to another once took 50GB in memory (and it's a wonder my MBPro did not die trying, but completed the herculean task).
SQLite files are small! This seems great, especially after seeing sled easily take 10GB
When using an ObjectRepository with a DirectoryBackend, the removal of obsolete objects was prohibitively slow. The runtime was dominated by IO calls.
It's surprisingly hard to 'open or create' a database.
I would love to disable encryption and compression with cargo features, they pull in around 100 additional crates to compile.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.