Comments (12)
I personally using memmap2
only for reading, so I cannot help you here. Maybe someone else would.
from memmap2-rs.
This does not answer your general question and I am not sure if file sizes prohibit this in your case, but editors usually do extra work to atomically replace the file being edited, i.e. instead of modifying foo.bin
, they would copy it to .foo.bin.swp
and modify only that file and finally when the user wants to save, rename .foo.bin.swp
to foo.bin
thereby atomically replacing one file by the other.
If you adopt this approach, you can map the .foo.bin.swp
directly and modify its contents which would not touch the original.
If you want to optimize the impact of copying foo.bin
to .foo.bin.swp
in the first place, you could do something like cp --reflink=auto foo.bin .foo.bin.swp
which shares the underlying file extents if possible thereby copying only metadata and data only on demand, i.e. if you modify the mapping and the page cache's write back wants to push the pages to disk.
from memmap2-rs.
I am not sure if file sizes prohibit this in your case
I want the hex editor to be usable for things like viewing and editing block devices, which could be hundreds of gigabytes large, so I'm not sure how well this approach would play with that.
you could do something like
cp --reflink=auto foo.bin .foo.bin.swp
I want my hex editor to work at least on windows and linux, so I don't want to use unix specific commands. Not sure if windows supports something similar to this.
from memmap2-rs.
In that case, I would suggest looking at it from the other way around: Mmap the files directly for reading and writing. But instead of writing pending modifications into the memory map, store those in separate data structures (a sort of overlay that is considered when displayed any modified file range) and write them into the memory map only when saving.
This is sort of like your option 2 but without closing and re-opening the memory map and keeping the buffers storing the modifications around until saving. Of course, this still means modifications will not be atomic from the PoV of other reads of the underlying file.
from memmap2-rs.
instead of writing pending modifications into the memory map, store those in separate data structures (a sort of overlay that is considered when displayed any modified file range) and write them into the memory map only when saving.
This is what I would like to avoid doing if possible, as it makes the implementation a lot more complex.
Copy-on-write mmap would allow my data model to remain &mut [u8]
, which is drastically more simple than having to do bookkeeping work on every access.
from memmap2-rs.
having to do bookkeeping work on every access.
Don't you have to do the book keeping anyway if you are going for option 2?
(I don't think option 1 is obviously sound. This was discussed repeatedly on URLO without reaching a consensus that the current memory map semantics yield anything which soundness can be based upon. But then again, since this is somewhat unchartered territory, soundness of option 2 is not really ensured either. Using memory maps is always mildly problematic with Rust ATM. (Even copy-on-write maps only protect other processes from seeing your writes, it does not protect your process from seeing concurrent modifications performed by other processes.))
from memmap2-rs.
Don't you have to do the book keeping anyway if you are going for option 2?
There is nothing to do when trying to access memory. The only thing that has to be done is keep track of what changed during editing operations.
However, if I'm not using a copy-on-write memory map, I have to, on every access, look up if that access is within an edited range, and if it is, find the edit buffer that corresponds to that location, and return a slice to that instead of the original memory.
What if the request spans both an edited range and a non-edited range? I can't return a contiguous slice that spans both my edit buffer and the original memory, unless I allocate and copy on every access.
from memmap2-rs.
Since you need detailed control of when writeback (user presses save) happens and care about soundness and portability, I think you should probably implement your own paging to keep the simplicity of not having to implement an overlay on read. (Meaning just read fixed size blocks from storage into memory to display and optionally modify them (and track this on a block level). If a configured memory limit is reached, write out clean blocks to make space. When the user saves write out all dirty blocks.)
from memmap2-rs.
I think you should probably implement your own paging to keep the simplicity of not having to implement an overlay on read.
I don't see how fixed size blocks would help with simplicity, as there could still be requests that span blocks. And there are certain APIs that expect slices, like memmem. If I wanted to do a string search with such an implementation, I couldn't just simply call memmem, I would have to call memmem for every page, plus search each page boundary for matches as well.
And this is just one example of how such an implementation complicates things.
I would rather let the operating system take care of all this hard work for me, as it was written by professionals who know what they are doing, and has received lots of testing over the years.
from memmap2-rs.
I did some preliminary testing, and this passes both on windows and linux:
#[test]
fn test_mmap_write() {
use std::fs::OpenOptions;
use std::io::{Seek, Write};
std::fs::write(
"testfile-small.txt",
include_bytes!("../testfile-small-in.txt"),
)
.unwrap();
let mut f = OpenOptions::new()
.read(true)
.write(true)
.open("testfile-small.txt")
.unwrap();
let mut mmap = unsafe { memmap2::MmapOptions::new().map_copy(&f).unwrap() };
assert_eq!(&*mmap, include_bytes!("../testfile-small-in.txt"));
mmap[7..=12].copy_from_slice(b"Edited");
assert_eq!(&*mmap, include_bytes!("../testfile-small-edited-in.txt"));
mmap.flush().unwrap();
f.seek(std::io::SeekFrom::Start(7)).unwrap();
f.write_all(&mmap[7..=12]).unwrap();
f.flush().unwrap();
let new = std::fs::read("testfile-small.txt").unwrap();
assert_eq!(new, include_bytes!("../testfile-small-edited-in.txt"));
mmap[70..=86].copy_from_slice(b"Changed you again");
assert_eq!(&*mmap, include_bytes!("../testfile-small-edited2-in.txt"));
mmap.flush().unwrap();
f.seek(std::io::SeekFrom::Start(70)).unwrap();
f.write_all(&mmap[70..=86]).unwrap();
f.flush().unwrap();
let new = std::fs::read("testfile-small.txt").unwrap();
assert_eq!(new, include_bytes!("../testfile-small-edited2-in.txt"));
}
testfile-small-in.txt:
Hello, World! I am a small test file.
I'm pretty nice as you can see.
Change me please.
Thank you!.
testfile-small-edited-in.txt:
Hello, Edited I am a small test file.
I'm pretty nice as you can see.
Change me please.
Thank you!.
testfile-small-edited2-in.txt:
Hello, Edited I am a small test file.
I'm pretty nice as you can see.
Changed you again
Thank you!.
This gives me some level of confidence that I could use this strategy.
I could expose it under an --unsafe-memmap
option, and warn users about potential risks, but the option would still be available for users who understand the risks.
Then if some smart person wants to contribute a smart paging algorithm for my hex editor, I'll accept their pull request. But until then, I'd like to have at least basic usability for huge files without greatly complicating the implementation.
from memmap2-rs.
I think you should probably implement your own paging to keep the simplicity of not having to implement an overlay on read.
I think my tone was wrong here. What I would like to have said is that if you ask about soundness - meaning defined behaviour no matter the conditions under which the code executed to me - and portability, then I fear that the semantics this crate (or rather the underlying system calls) can provide will be insufficient. The intersection between memory maps and Rust's take on memory safety is still not really defined as the discussions on URLO seem to indicate.
Testing a certain pattern before usage can of course provide evidence that it is practical, but it cannot answer the question of soundness. This is why I suggested to try to implement your own paging without using the kernel-provided memory maps, because soundness should be achievable there based on the currently established interactions between Rust and the kernel.
Concerning the pattern your test case employs, I think your option 2 is still preferable and it avoids aliasing the same memory via the memory map and via the file handle (which will eventually write into the same physical page in the kernel's page cache). An alternative might to build an API which ensures that your code can only either Deref
the Mmap
or accessing the underlying File
, but not both at the same time.
But finally, even option 2 needs to assume that your process is the only one mapping the underlying file to be safe from aliasing issues which is usually not the case for an interactively used editor.
from memmap2-rs.
I'm fine with assuming that I have exclusive access to the file. I'll document this invariant for the users when they opt into mmap. The default will still be reading everything into a Vec. mmap will be an option available in cases it's needed, like huge files or block devices, and the user should take precautions that they provide exclusive access.
As far as the question of soundness goes, I don't need it to be 100% theoretically sound, I just want to know if there is something obviously wrong with these approaches that is known to break things.
As for why I would rather use a copy on write memmap than implementing my own paging, the answer is greatly reduced implementation complexity. If I can rely on data being [u8], I can reuse all the algorithms and libraries that work on byte slices.
This works because mmap maps pages on page fault, so I (and the libraries I use) don't have to do anything special, just access bytes directly.
If I did my own paging, I would have to write significantly more complex algorithms, and I couldn't use those libraries that work on byte slices either. I would have to write my own versions that request pages explicitly instead of it happening on page fault.
Unless I can write my own page fault handler for doing this, but I get the feeling that is another can of worms that would lead to similar issues that mmap has.
from memmap2-rs.
Related Issues (20)
- Migrate to `safer_owning_ref` HOT 1
- why does memmap2::Mmap::map need to be unsafe? HOT 4
- Implement AsRawFd, IntoRawFd, AsRawHandle, IntoRawHandle and Into<Stdio> for MMap HOT 2
- Composite memory maps currently impossible HOT 7
- mmap'd anon pages fail in io_uring IOSQE_BUFFER_SELECT usage HOT 2
- advise_writes_unsafely_to_part_of_map test fails on powerpc64le-unknown-linux-gnu HOT 2
- error[E0433]: failed to resolve: could not find `Advice` in `memmap2` HOT 1
- PermissionDenied, message: "Access is denied." at remove file after read on Github Action WIndows HOT 2
- Doesn't compile for freebsd HOT 3
- v0.9.2 build failed for android target. HOT 3
- Support for `MS_INVALIDATE`? HOT 8
- Mac osx support HOT 1
- Support `mremap(2)` HOT 1
- Mapping beyond 4GB offset if broken on 32 bit glibc (Linux) HOT 3
- Support multiple advice HOT 2
- Contributing a migration from rustix to libc HOT 9
- MSRV and edition HOT 5
- Merging changes with memmapix HOT 24
- memmap2 `advise` is unsound HOT 8
- Interest is supporting mbind HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from memmap2-rs.