Coder Social home page Coder Social logo

umap's Introduction

UMAP v2.1.0

Travis Build Status Documentation Status

Umap is a library that provides an mmap()-like interface to a simple, user- space page fault handler based on the userfaultfd Linux feature (starting with 4.3 linux kernel). The use case is to have an application specific buffer of pages cached from a large file, i.e. out-of-core execution using memory map.

The src directory in the top level contains the source code for the library.

The tests directory contains various tests written to test the library including a hello world program for userfaultfd based upon code from the userfaultfd-hello-world project.

LLNL-CODE-733797

Quick Start

Building umap is trivial. In the root directory of the repo

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=<where you want the sofware> ..
make install

The default for cmake is to build a Debug version of the software. If you would like to build an optimized (-O3) version, simply run

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=<install-dir> ..

Docker Container

There is a Dockerfile included in the repository to make it easier to build a container image with a basic Umap development environment.

A few caveats: The runtime requirements of the Docker container are the same as the non-containerized Umap. Additionally, it's important to note that Umap checks the kernel headers present in the build-time environment to decide whether or not WP mode should be enabled. The included Dockerfile will always build Umap with WP support enabled because the Ubuntu 22.04 kernel headers included in the the container indicate that WP is supported, even if the host kernel doesn't actually support that feature. Umap will return an error at runtime if it's built with WP enabled but run on a kernel without WP support.

Docker, as a security measure, uses a seccomp profile to restrict the syscalls allowed to be made by an application running inside a container to a minimal subset. Umap relies on being able to make host kernel syscalls that are otherwise blocked by Docker's default seccomp profile. Specifically, Umap relies on the userfaultfd syscall. See here for more information about which syscalls are blocked by Docker's default seccomp profile.

Example: Building and running the Umap Docker container

docker build -t umap .
docker run --security-opt seccomp=unconfined -it umap bash

Example: Running the Umap container with a seccomp whitelist

It's also possible to run the container with a seccomp whitelist rather than disabling confinement entirely. First, create a userfaultfd.json file:

{"names": ["userfaultfd"], "action": "SCMP_ACT_ALLOW"}

When running the container:

docker run --security-opt seccomp=userfaultfd.json -it umap bash

Build Requirements

Building Umap requires a C++ compiler, CMake >= 3.5.1, as well as the Linux kernel headers.

Additionally, Umap will automatically enable read/write mode at library compile time if it detects that the installed kernel supports it by looking at the defined symbols in the kernel headers. Some Linux distributions, such as Ubuntu 20.04.2, provide a 5.8 kernel that supports read/write mode but don't ship with headers that define these symbols.

Read-only mode: Linux kernel >= 4.3

Read/write mode: Linux kernel >= 5.7 (>= 5.10 preferred)

Note: Some early mainline releases of Linux that included support for read/write mode (between 5.7 and 5.9) contain a known bug that causes an application to hang indefinitely when performing a write. It's recommended to update to a 5.10 kernel if this bug is encountered.

Runtime Requirements

At runtime, Umap requires a kernel that supports the same features as it was built against. For example, running a version of Umap compiled against kernel 5.10 on a system with kernel 4.18 will result in runtime errors caused by write functionality not being present in the 4.18 kernel.

On Linux >= 5.4, the sysctl variable vm.unprivileged_userfaultfd needs to be set to 1 in order to use Umap in both read-only and read/write modes as a non-root user. The value of this variable may be determined by running sysctl vm.unprivileged_userfaultfd or cat /proc/sys/vm/unprivileged_userfaultfd.

Applications

Applications can be found at https://github.com/LLNL/umap-apps.

Documentation

Both user and code documentation is available here.

If you have build problems, we have comprehensive build sytem documentation too!

Publications

Peng, I.B., Gokhale, M., Youssef, K., Iwabuchi, K. and Pearce, R., 2021. Enabling Scalable and Extensible Memory-mapped Datastores in Userspace. IEEE Transactions on Parallel and Distributed Systems. doi: 10.1109/TPDS.2021.3086302.

Peng, I.B., McFadden, M., Green, E., Iwabuchi, K., Wu, K., Li, D., Pearce, R. and Gokhale, M., 2019, November. UMap: Enabling application-driven optimizations for page management. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (pp. 71-78). IEEE.

License

Contact

umap's People

Contributors

asarkar-parsys avatar egreen77 avatar ianlee1521 avatar ibpeng avatar kiwabuchi avatar mayagokhale avatar mcfadden8 avatar xiszishu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

umap's Issues

Logging: missing timestamp when timestamp logging is activated

With UMAP_LOG_LEVEL=DEBUG and UMAP_LOG_NO_TIMESTAMP_LEVEL not being set, I tried to activate the debug logging with timestamps. The resulting logs, however, do not contain a timestamp:

1200130:1200130 [DEBUG][../src/umap/umap.cpp:30]: umap region_addr: 0, region_size: 68719476736, prot: 3, flags: 2, offset: 0
1200130:1200130 [DEBUG][../src/umap/RegionManager.cpp:232]: set_umap_page_size Adjusting page size from 0 to 1048576
1200130:1200130 [DEBUG][../src/umap/RegionManager.cpp:208]: set_max_pages_in_buffer Maximum pages in page buffer changed from 0 to 410 pages
1200130:1200130 [INFO][../src/umap/umap.cpp:138]: umap_ex region_addr: 0, region_size: 68719476736, prot: 3, flags: 2, offset: 0, store: 0, umap_psize: 1048576
1200130:1200130 [DEBUG][../src/umap/store/StoreFile.cpp:21]: StoreFile region: 0x7fc46b700000 rsize: 68719476736 alignsize: 1048576 fd: 3
1200130:1200130 [DEBUG][../src/umap/RegionManager.cpp:43]: addRegion No active regions, initializing engine
1200130:1200130 [DEBUG][../src/umap/Uffd.cpp:147]: Uffd
 maximum fault events: 256
            page size: 1048576
1200130:1200130 [DEBUG][../src/umap/WorkerPool.hpp:73]: start_thread_pool Starting Uffd Manager Pool of 1 threads
1200130:1200130 [DEBUG][../src/umap/WorkerPool.hpp:73]: start_thread_pool Starting Fill Workers Pool of 1 threads
1200130:1200130 [DEBUG][../src/umap/WorkerPool.hpp:73]: start_thread_pool Starting Evict Workers Pool of 1 threads
1200130:1200130 [DEBUG][../src/umap/WorkerPool.hpp:73]: start_thread_pool Starting Evict Manager Pool of 1 threads
1200130:1200130 [DEBUG][../src/umap/RegionManager.cpp:53]: addRegion region: 0x7fc46b700000 - 0x7fd46b700000, region_size: 68719476736, number of regions: 2
1200130:1200130 [DEBUG][../src/umap/Uffd.cpp:259]: register_region Registering 65536 pages from: 0x7fc46b700000 - 0x7fd46b6fffff
1200130:1200187 [DEBUG][../src/umap/Buffer.cpp:214]: process_page_event NEW: { 0x7fc46b700000, FILLING, DIRTY } From: { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  1, m_free_pages.size(): 409, m_busy_pages.size():  1 }
1200130:1200188 [DEBUG][../src/umap/FillWorkers.cpp:42]: FillWorker : { page_desc: { 0x7fc46b700000, FILLING, DIRTY }, type: NONE } { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  1, m_free_pages.size(): 409, m_busy_pages.size():  1 }
1200130:1200188 [DEBUG][../src/umap/store/StoreFile.cpp:30]: read_from_store pread(fd=3, buf=0x7fd857b00000, nb=1048576, off=0)
1200130:1200187 [DEBUG][../src/umap/Buffer.cpp:214]: process_page_event NEW: { 0x7fc46b900000, FILLING, DIRTY } From: { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  2, m_free_pages.size(): 408, m_busy_pages.size():  2 }
1200130:1200188 [DEBUG][../src/umap/FillWorkers.cpp:42]: FillWorker : { page_desc: { 0x7fc46b900000, FILLING, DIRTY }, type: NONE } { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  2, m_free_pages.size(): 408, m_busy_pages.size():  2 }
1200130:1200188 [DEBUG][../src/umap/store/StoreFile.cpp:30]: read_from_store pread(fd=3, buf=0x7fd857b00000, nb=1048576, off=2097152)
1200130:1200187 [DEBUG][../src/umap/Buffer.cpp:214]: process_page_event NEW: { 0x7fc46ba00000, FILLING, DIRTY } From: { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  3, m_free_pages.size(): 407, m_busy_pages.size():  3 }
1200130:1200188 [DEBUG][../src/umap/FillWorkers.cpp:42]: FillWorker : { page_desc: { 0x7fc46ba00000, FILLING, DIRTY }, type: NONE } { m_size: 410, m_waits_for_avail_pd: 0, m_present_pages.size():  3, m_free_pages.size(): 407, m_busy_pages.size():  3 }
1200130:1200188 [DEBUG][../src/umap/store/StoreFile.cpp:30]: read_from_store pread(fd=3, buf=0x7fd857b00000, nb=1048576, off=3145728)

If I understand the code correctly, no timestamps are written into the log messages, even if m_log_timestamp is true, aren't they?

void Logger::logMessage( message::Level level,
const std::string& message,
const std::string& fileName,
int line ) noexcept
{
if ( !logLevelEnabled( level ) )
return; /* short-circuit */
std::lock_guard<std::mutex> guard(g_logging_mutex);
if (m_log_timestamp) {
std::cout
<< getpid() << ":"
<< syscall(__NR_gettid) << " "
<< "[" << MessageLevelName[ level ] << "]"
<< "[" << fileName << ":" << line << "]:"
<< message
<< std::endl;
}
else {
std::cout
<< message
<< std::endl;
}
}

madvise MADV_REMOVE with Umap

I'd like to use madvise(2) with MADV_REMOVE flag to free up a given range of pages and its associated backing store.

Different page sizes for different regions

When I understand the 2019 paper correctly (Peng et al., UMap: Enabling application-driven optimizations for page management), umap supports different page sizes.

However, to me, it looks like the region manager (which is a singleton) has a single member m_umap_page_size which is used for all regions. Can you help me understanding how to use different page sizes?

umap() does not appear to be handling residual pages for regions that are not block-aligned

umap() divides a given region into fixed sized blocks of size pages_per_block=min(pages_in_region, UMAP_PAGES_PER_BLOCK).

If the number of pages in the region are not aligned to the pages_per_block boundary, there will be a residual set of pages that need to be added to the last block of the region and managed by the last worker thread.

While running umapsort() and examining the statistics, it was noticed that when running with a range of size 2049 pages, only 2048 pages were being properly handled:

$ umapsort --initonly -p 2049 -b 10000000 -f /mnt/intel/test --directio -t 1
2049 pages, 8392704 bytes, 1 threads
initdata: 0x7f6a093d0000, 1049088
Init took 73.089566 us
2048 Faults
0 READ Faults
2048 WRITE Faults
0 WP Messages
2048 Dirty Evictions
0 Clean Evictions
0 SIGBUS Errors
0 Stuck WP Workarounds
0 Dropped Duplicates

Collect workaround frequency data for larger data sets

Using optimized versions of umapsort and churn, run tests with configurations that have a buffer that is 128GB for 300GB of data.

It would also be good to see if the workaround frequency is > 0 for the case when data will fit into the buffer (buffer size = 300GB, data size = 300GB).

Freeze in page_already_present

I am trying to run the psort example, however, the program freezes either when calling uunmap or -when UMAP_BUFSIZE=10- during initialization. I tracked it down to

2699597:2699599 [DEBUG][../src/umap/Buffer.cpp:255]: page_already_present Waiting for state: (ANY), { 0x7f1418370000, LEAVING, DIRTY }

The pthread_cond_wait is never completed.

My kernel is: Linux nemea 5.8.0-28-generic #30-Ubuntu SMP Thu Nov 5 13:24:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Any idea where to look? I tried entirely new systems, but was still unable to run your example. Could this be an OS issue? What configuration is known to be good?

Userfaultfd returns ENOMEM with > 900 active regision

We are seeing the following error when attempting to register more than 900 regions with userfaultfd:

Tue Aug 27 13:57:55 PDT 2019
./db_bench --benchmarks=fillseq --num=40000000 --threads=24
LevelDB:    version 1.22
Date:       Tue Aug 27 13:57:55 2019
CPU:        96 * AMD EPYC 7401 24-Core Processor
CPUCache:   512 KB
Values:     100 bytes each (50 bytes after compression)
Entries:    40000000
RawSize:    4425.0 MB (estimated)
FileSize:   2517.7 MB (estimated)
WARNING: Snappy compression is not enabled
------------------------------------------------
100869:100900 [ERROR][/home/martymcf/.sessions/altus/src/umap/src/umap/Uffd.cpp:266]: register_region ioctl(UFFDIO_REGISTER) failed: Cannot allocate memoryNumber of regions is: 950
terminate called after throwing an instance of 'Umap::Exception'
  what():  ! UMAP Exception [/home/martymcf/.sessions/altus/src/umap/src/umap/Uffd.cpp:266]:  register_region ioctl(UFFDIO_REGISTER) failed: Cannot allocate memoryNumber of regions is: 950
/home/martymcf/run_db_bench.sh: line 39: 100869 Aborted                 (core dumped) ./db_bench --benchmarks=fillseq --num=$blocks --threads=$threads

umap documetation has incorrect references to UMAP_PAGE_SIZE

Some documentation was causing confusion within umap. There were references to UMAP_PAGE_SIZE which were incorrect. The documentation (and comments) needed to be updated to state the correct name for the environment variable which is UMAP_PAGESIZE

Data miscompares

This issue is being reproduced with the bugfix/data-miscompare branch of umap.

logfile.gz was generated by running the umapsort as follows:

$ time umapsort -p 100000 -b 80000 -f /mnt/intel/sortfile -t 128 -u 1 >& /tmp/logfile
$ gzip /tmp/logfile

The machine configuration that this problem was reproduced on is:

$ uname -a
Linux behemoth-rhel7 4.20.0-rc6uffd-wp-merged-01-180264-g9bc88be70eb1 #1 SMP Fri Dec 21 09:09:08 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             4
NUMA node(s):          4
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 47
Model name:            Intel(R) Xeon(R) CPU E7- 4850  @ 2.00GHz
Stepping:              2
CPU MHz:               1063.952
BogoMIPS:              3989.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              24576K
NUMA node0 CPU(s):     0-9,40-49
NUMA node1 CPU(s):     10-19,50-59
NUMA node2 CPU(s):     20-29,60-69
NUMA node3 CPU(s):     30-39,70-79

Line number 698409 0x7f007d96f000 0 NOT Present { 0 READ } shows that we received a read fault notification for page 0xf007d96f000 and that it currently is not present. The umap buffer was full and it needed to evict the page indicated by line number 698410 0x7f008b7af000 0x7f00731de9b0 EVICT. The last activity logged for this particular page was back at line number 358812 0x7f008b7af000 0 NOT Present { 0 READ } where it was copied in.

Instrumentation was added to the eviction code to compare the SHA1 of the current page in memory with the SHA1 that was taken when the page was originally read from the backing store. This instrumentation will log a message as seen on line 698411 0x7f008b7af000 0x7f00731de9b0 Dirty page found that was not previously marked dirty! when it is noticed that the page is different and it wasn't marked as dirty (which happens when we get a WP and/or WRITE message from UFFD for that page.

Note, there were no other events associated with this page, but our SHA1 instrumentation indicated that the page had changed.

Enable CI from bamboo

CI from bamboo involves:

  • Getting a resource where bamboo agent may run (requires experimental server)
  • Creating a script that the bamboo agent can execute to perform build and tests
  • Create bamboo plan to get source code and execute test script when branches are updated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.