Coder Social home page Coder Social logo

pmemstream's Introduction

pmemstream

Basic Tests pmemstream version Coverity Scan Build Status Coverage Status CodeQL

⚠️ Discontinuation of the project

The pmemstream project will no longer be maintained by Intel.

  • Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
  • Intel no longer accepts patches to this project.
  • If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
  • You will find more information here.

Introduction

pmemstream is a logging data structure optimized for persistent memory.

This is experimental pre-release software and should not be used in production systems. APIs and file formats may change at any time without preserving backwards compatibility. All known issues and limitations are logged as GitHub issues.

Libpmemstream implements a pmem-optimized log data structure and provides stream-like access to the data. It presents a contiguous logical address space, divided into regions, with log entries of arbitrary sizes. We intend for this library to be a foundation for various, more complex higher-level solutions.

This library is a successor to libpmemlog. These two libraries are very similar in basic concept, but libpmemlog was developed in a straightforward manner and does not allow easy extensions.

For more information, including C API documentation see pmem.io/pmemstream.

example pmemstream

Build and install

Installation guide provides detailed instructions how to build and install pmemstream from sources, build rpm and deb packages, and more.

Contact us

For more information about pmemstream, please:

pmemstream's People

Contributors

igchor avatar karolina002 avatar kfilipek avatar lgtm-migrator avatar lukaszstolarczuk avatar nedved1 avatar szadam avatar tszczyp avatar wlemkows avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pmemstream's Issues

Future feature ideas

These are some of our ideas for the next features, not described in separate issues.

  • pass timestamp to iterator
  • global order entry iterator
  • streaming iterator
  • intent-based allocator with timestamped regions
  • expose pmemstream_async_wait_running
  • expose blocking/optimistic wait functions
  • config for pmemstream
  • error codes
  • fio integration

Add option to trigger pool prefaulting

FEAT: Add option to trigger pool prefaulting

Rationale

Right now, we are always clearing the unused part of the region on restart (to make sure that we don't have any trash which could be later interpreted as data). Because of this, all pages to which we will later write (inside append) are already prefaulted.

Once we get rid of clearing the regions, appends might trigger page fault. To get valid benchmarking results we should add option to force prefault at open/create time.

Description

Similar to: https://pmem.io/pmdk/manpages/linux/master/libpmemobj/pmemobj_ctl_get.3/

API Changes

None. Add ENV var to trigger page faults.

Failing tests with TSAN feature

ISSUE: Failing tests with TSAN feature

Environment Information

  • pmemstream version(s): cfa5d7c
  • PMDK (libpmem2) package version(s):
  • OS(es) version(s):
  • kernel version(s):
  • compiler, libraries, packaging and other related tools version(s): clang 13.0.1-2ubuntu2

and possibly:

  • ndctl version(s):

Please provide a reproduction of the bug:

Enabled TSAN in CMake and ctest -R "_none"

How often bug is revealed:

always

Actual behavior:

         40 - concurrent_async_wait_0_none (Failed)
         43 - concurrent_iterate_0_none (Failed)
         44 - concurrent_iterate_with_append_0_none (Failed)
         50 - publish_append_async_0_none (Failed)
         51 - region_runtime_initialize_0_none (Failed)
         68 - singly_linked_list_pmreorder_negative_none (Failed)
         81 - example-05_timestamp_based_order_0_none (Failed)

Expected behavior:

Pika tests image

Details

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? (Yes, No)

Requested priority: Low

0.2.0 release feature list

  • Timestamp-based persistency
    • #202
    • Expose timestamps in API (return from append/publish)
    • Extend futures to 2 stages: commited, persisted
    • Expose pmemstream_async_wait_*(timestamp)
    • Add extra tests:
      • #229
      • #211
      • more than PMEMSTREAM_MAX_CONCURRENCY concurrent operations
      • multithreaded async_append/async_publish and wait_committed/wait_persisted
    • Forbid multiple concurrent appends to the same region (even from a single thread) for now. Later, we might add some kind of transparent transactions to allow this.
    • Implement and test recovery. One approach (on #183) stores max_valid_timestamp in each region on recovery. An alternative approach could be to have an additional run_id variable incremented on each restart. Each region would store the current run_id in its metadata on recovery. We would also have some log/map which would correlate each run_id with a maximum valid timestamp. #192
    • Handle exceeding MAX_CONCURRENCY on append
    • Document what data is visible to iterators (committed vs persistent)
  • Implement RocksDB-like API for iterators
  • pmemstream_region_usable_size()
  • manapages
  • use pmem2_async_memcpy instead of vdm_memcpy #163

FEAT: Multiple regions support

FEAT: Multiple regions support

Rationale

  • Increasing concurrency (there is no need for heavy-weight synchronization between appends to different regions)
  • Logical division of entires
  • Grouping objects of different sizes together (can improve performance for concurrent append - appending similarly sized entires will result in bigger overlap for memcpy)

Description

  • Persistent allocator
    • Store alloc/free actions on PMEM (which allow rebuilding the state), do compaction regularly
  • Synchronization between iterators and region_free needed
    • Use delayed free / garbage collection?
  • Region discovery TBD (optional)
    • Named regions?
    • Root region?

API Changes

No changes - region allocate and free is already supported.

Implementation details

Most of the code is ready to handle multiple regions. There are a few global variables which should be made per-region (move to region_runtime), e.g. region_lock from region_runtimes_map

First steps

  • trivial persistent allocator based on free list
  • no synchronization for iterators and region_free
  • no region discovery (only iterators)

Assertion fault in span_get_region_runtime

ISSUE:

assertion fault in span_get_region_runtime when passing invalid region offset

The bug reproduction:

struct pmemstream_region region = {.offset = ALIGN_DOWN(UINT64_MAX, sizeof(span_bytes))};
ret = pmemstream_reserve(stream, region, NULL, sizeof(data), &entry, &data_address);

How often bug is revealed:

always

Actual behavior:

 __GI___assert_fail (assertion=0x7ffff7f226f0 "span_get_type(span) == SPAN_REGION", 
span_get_region_runtime () at /PMDK/pmemstream/src/span.c:121
pmemstream_reserve () at /PMDK/pmemstream/src/libpmemstream.c:166
invalid_region_test () at /PMDK/pmemstream/tests/api_c/reserve_and_publish.c:85
main () at /PMDK/pmemstream/tests/api_c/reserve_and_publish.c:189

Segmentation fault in pmemstream_reserve

ISSUE:

segmentation fault in pmemstream_reserve (potentially also in pmemstream_publish) when passing NULL entry

The bug reproduction:

struct pmemstream_entry *entry = NULL;
ret = pmemstream_reserve(stream, region, NULL, sizeof(data), entry, &data_address);

How often bug is revealed:

always

Actual behavior:

Program received signal SIGSEGV, Segmentation fault.
reserved_entry->offset = offset;
pmemstream_reserve () at PMDK/pmemstream/src/libpmemstream.c:193
null_entry_test () at /PMDK/pmemstream/tests/api_c/reserve_and_publish.c:165
main () at /PMDK/pmemstream/tests/api_c/reserve_and_publish.c:192

Segmentation fault in pmemstream_entry_iterator_new

ISSUE:

segmentation fault in pmemstream_entry_iterator_new when passing NULL entry_iterator

The bug reproduction:

ret = pmemstream_entry_iterator_new(NULL, stream, region);

How often bug is revealed:

always

Actual behavior:

  Program received signal SIGSEGV, Segmentation fault.
  pmemstream_entry_iterator_new () at /pmemstream/src/iterator.c:92
  92		*iterator = iter;
  
  pmemstream_entry_iterator_new () at /pmemstream/src/iterator.c:92
  null_entry_iterator_test () at /pmemstream/tests/api_c/entry_iterator.c:112
  main () at /pmemstream/tests/api_c/entry_iterator.c:162

Recovery during recovery test

FEAT: pmreorder based test for power failure during recovery.

Rationale

Process of recovery after power failure involves writing to pmem, so we need to test if it's possible continue it after another power failure.

Implementation details

This may be done in generic way:

  1. Run under pmemcheck test code (i.e multi_region_pmreorder test)
  2. Run pmreorder check with pmemcheck
  3. run pmreorder check for each generated storelog

Tricky part

With little refactor in pmreorder.py, it may easily be changed into python module to be used directly from python. This would tremendously simplify tracking of storelogs.

Roughly tested POC - seems to work

./singly_linked_list_pmreorder create /dev/shm/sllp1

valgrind --tool=pmemcheck -q --log-stores=yes --print-summary=no --log-file=/dev/shm/logfile.storelog --log-stores-stacktraces=no --expect-fence-after-clflush=yes  ./singly_linked_list_pmreorder fill /dev/shm/sllp

pmreorder -l /dev/shm/logfile.storelog -o /dev/shm/x.pmreorder -r ReorderAccumulative -p "/usr/bin/valgrind \-\-tool=pmemcheck \-q \-\-log-stores=yes \-\-print-summary=no \-\-log-file=/dev/shm/logfile-$(cat /proc/sys/kernel/random/uuid).storelog \-\-log-stores-stacktraces=no \-\-expect-fence-after-clflush=yes  ./singly_linked_list_pmreorder check"

pmreorder -l /dev/shm/logfile-be119487-d4c9-43f1-aea5-e28733078988.storelog -o /dev/shm/x.pmreorder -r ReorderAccumulative -p "./singly_linked_list_pmreorder check" 

Standalone example(s) compilation

We have to add CMake for compiling standalone example(s) and update CI with building it, to confirm our packages are built properly.

Assertion fault in span_get_entry_runtime

ISSUE:

assertion fault in span_get_entry_runtime when passing invalid entry offset

The bug reproduction:

struct pmemstream_entry entry = {.offset = ALIGN_DOWN(UINT64_MAX, sizeof(span_bytes))};
entry_data = pmemstream_entry_data(stream, entry);

How often bug is revealed:

always

Actual behavior:

__GI___assert_fail ( assertion=0x7ffff7f226c8 "span_get_type(span) == SPAN_ENTRY", 
span_get_entry_runtime ()  at /PMDK/pmemstream/src/span.c:105
pmemstream_entry_data ()  at /PMDK/pmemstream/src/libpmemstream.c:136
invalid_entry_test () at /PMDK/pmemstream/tests/api_c/append_entry.c:129
main () at /Share/PMDK/pmemstream/tests/api_c/append_entry.c:152

Extra test for region allocate/free

I'm not sure if current region_allocate clears all necessary metadata when taking region from free list.

Pseudocode:

r1 = allocate_region and append some data there
r2 = allocate_region and append some data there

reopen()

free_region(r1)
r3 = allocate_region()

if (testcase1) {
	check if data from r1 is not available
	check timestamps
} else if (testcase 2) {
	append some data to r3
	check if data from r1 is not available
	check timestamps
} else if (testcase3) {
	append some data to r2
	check if data from r1 is not available
	check timestamps
}

The above test cases are only for illustration. I think we could implement those tests as part of stateful testing.

test create_0_none fails

test create_0_none fails for RC_PARAMS="seed=4382463535112076010"

Environment Information

  • CI
  • pmemstream version(s): since 63de7d5

Please provide a reproduction of the bug:

make ..
make -j$(nproc)
export RC_PARAMS="seed=4382463535112076010"
ctest -R create_0_none

How often bug is revealed:

Always

Details

As 63de7d5 introduces new tests, the bug is probably older.

Output:

Signal: Aborted, backtrace:
0: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (test_sighandler+0x27) [0x5576a72a09e7]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f9f7444e24f]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0xcb) [0x7f9f7444e18b]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x12b) [0x7f9f7442d859]
4: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (UT_FATAL+0xc3) [0x5576a7286593]
5: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (_ZN12return_checkD2Ev+0x3b) [0x5576a7294f2b]
6: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (_ZZ4mainENKUlvE_clEv.isra.0.cold+0x1eb) [0x5576a7283cd3]
7: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (main+0x71) [0x5576a7286091]
8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf3) [0x7f9f7442f0b3]
9: /opt/workspace/pmemstream_v6/bisect_dir/tests/create (_start+0x2e) [0x5576a728628e]


-- Stderr:
Using configuration: seed=4382463535112076010

- verify if a single region of various sizes (>0) can be created
OK, passed 100 tests

- verify if a region_iterator finds the only region created
OK, passed 100 tests

- verify if a region of size > stream_size cannot be created
OK, passed 100 tests

- verify if a stream of various sizes can be created
OK, passed 100 tests

- verify if a stream of various block_sizes can be created
Falsifiable after 100 tests and 7 shrinks

unsigned long:
503187

/opt/workspace/pmemstream_v6/tests/common/stream_helpers.hpp:25:
RC_ASSERT(pmemstream_region_allocate(stream, region_size, &new_region) == 0)

Expands to:
-1 == 0
/opt/workspace/pmemstream_v6/tests/common/unittest.hpp:92 ~return_check - assertion failure: status, errormsg: 
Some of your RapidCheck properties had failures. To reproduce these, run with:
RC_PARAMS="reproduce=BgjdlJXamlHIpZGIhByc0JXZh1GIvZGI2Fmcp9WdzBiYs92Yr91cppXZzByYh5GIiVGIjJXZhRXZkh7cJf04xT1eZBavBXbr2b6FSHzdkJisMiOebemtu_tRACIgACYAAQCZAAAAH0ACPARESIB"

CMake Error at /opt/workspace/pmemstream_v6/tests/cmake/exec_functions.cmake:205 (message):
   /opt/workspace/pmemstream_v6/bisect_dir/tests/create /opt/workspace/pmemstream_v6/bisect_dir/tests/create_0_none/testfile failed: 134
Call Stack (most recent call first):
  /opt/workspace/pmemstream_v6/tests/cmake/exec_functions.cmake:246 (execute_common)
  /opt/workspace/pmemstream_v6/tests/cmake/run_default.cmake:8 (execute)

Arguments refactor

Simplify argument passed to functions.

  1. pmemstream_span_create_empty(stream, &stream->data->spans[0], stream->usable_size - metadata_size);
    should be changed to:
    pmemstream_span_create_empty(stream, stream->usable_size - metadata_size);
  2. Change &span[0] -> span

Segementation fault in pmem_offset_to_ptr

ISSUE:

segmentation fault in pmemstream_offset_to_ptr when passing NULL stream

The bug reproduction:

ret = pmemstream_entry_iterator_new(&eiter, NULL, region);

How often bug is revealed:

always

Actual behavior:

Program received signal SIGSEGV, Segmentation fault.
pmemstream_offset_to_ptr () at /PMDK/pmemstream/src/libpmemstream_internal.h:47
return (const uint8_t *)stream->data->spans + offset;
pmemstream_offset_to_ptr () at /Share/PMDK/pmemstream/src/libpmemstream_internal.h:47
span_offset_to_span_ptr () at /PMDK/pmemstream/src/span.c:14
span_get_region_runtime () at /PMDK/pmemstream/src/span.c:118
entry_iterator_initialize () at //PMDK/pmemstream/src/iterator.c:60
pmemstream_entry_iterator_new () at /PMDK/pmemstream/src/iterator.c:83
null_stream_test () at /PMDK/pmemstream/tests/api_c/entry_iterator.c:81
main () at /PMDK/pmemstream/tests/api_c/entry_iterator.c:160

FEAT: Transaction

FEAT: Transaction

Rationale

Allow grouping multiple operations (append/region_allocate/region_free) into single atomic action.
Requires #78.

Description

API Changes

Additional functions:
pmemstream_tx_begin()
pmemstream_tx_commit()
pmemstream_tx_append()
pmemstream_tx_reserve()

pmemstream_tx_region_allocate()
pmemstream_tx_region_free()

Implementation details

Transaction implementation will be based on timestamps. Each transaction will be assigned a unique timestamp. Committed and persisted timestamps will be used to distinguish between committed and aborted/in-progress transactions.

First steps

Support only for tx_append and tx_reserve.

Transactions and concurrent append to a single region

We can use timestamps to implement cross-region transactions and (optionally) combine it with concurrent append solution:

  • Instead of popcount, we use timestamp in entry,
  • Transaction commits (and increments timestamp) only after all previous operations on all regions, this tx uses, completed.

Crash inconsistency in region allocation

ISSUE: Crash inconsistency in region allocation

Environment Information

  • pmemstream version(s): 0.2.0
  • compiler, libraries, packaging and other related tools version(s): gcc, clang

Please provide a reproduction of the bug:

https://github.com/karczex/pmemstream/blob/recovery_on_recovery_test_2TC/tests/integrity/append_to_new_region_pmreorder.cpp

How often bug is revealed:

always

Details

Pmreorder test fails due to slist invariant violation in the scenario:

  1. Crash stream (via pmreorder) during data insertion to regions (Allocate regions first, than start appending data.)
  2. After next start of application allocate new region and append data to it
  3. allocate next region

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? (Yes, No) Maybe

Requested priority: (Showstopper, High, Medium, Low) High

FEAT: Support for multithreaded region allocation

FEAT: Support for multithreading in region allocator

Rationale

Developer have to keep thread-safety, safer is move this responsibility to the library.

Description

For now, allocator design doesn't support multithreading for adding, removing. Using iterators during allocate/free causes undefined behavior.
Affected API functions:
pmemstream_region_allocate
pmemstream_region_free
and all region iteration functions.

API Changes

There is no need to change API.

Implementation details

N/A

Meta

Assertion fault in region_runtime_clear_from_tail()

ISSUE:

assertion fault in region_runtime_clear_from_tail()

The bug reproduction:

struct pmemstream_region invalid_region = {.offset = ALIGN_DOWN(UINT64_MAX, sizeof(span_bytes))};
struct pmemstream_region_runtime *rtm = NULL;
ret = pmemstream_region_runtime_initialize(stream, invalid_region, &rtm);

How often bug is revealed:

always

Actual behavior:

 __GI___assert_fail ("region_runtime_get_state_acquire(region_runtime) == REGION_RUNTIME_STATE_DIRTY")
region_runtime_clear_from_tail () at pmemstream/src/region.c:238
region_runtime_initialize_clear_locked () at pmemstream/src/region.c:273
pmemstream_region_runtime_initialize () at pmemstream/src/libpmemstream.c:212
invalid_region_test () at pmemstream/tests/api_c/region_create.c:65
main () at pmemstream/tests/api_c/region_create.c:88

Optimization ideas

  • use pmem2_memcpy_async (#163) to enable nontemporal stores. Possibly we might need to add extra threshold to perform memcpy using temporal stores.
  • process timestamps in batches of arbitrary size during commit/persist operations (to increase concurrency - currently, thread tries to acquire as many timestamps as possible which might result in other threads waiting instead of doing actual work) (#245)
  • properly use notifier in wait_committed/wait_persisted - this can be especially important for SPDK integration (?)

DSA specific:

  • use batching (once implemented in miniasync or use DML directly)
  • for small appends, use normal mempcy and DML persist operation in background (combine multiple persists into one in software)

Try running pmemcheck tests with "--mult-stores=yes"

This should provide better test coverage for rapidcheck tests where we reopen the pmemstream multiple times.
This should mostly likely be run on internal infrastructure as part of stress tests (due to longer length).

FEAT: Timestamps and async API

FEAT: Timestamps and async API

Rationale

  • Will be used to ensure data consistency instead of popcount: offloads some of the work to background thread to speed up user threads
  • Foundation for Transactions (#77)

Description

  • Each entry stores timestamp which describes when the entry was created (all entries created in one tx have the same timestamp)
  • On pmem, we store the latest timestamp, which is known to be persistent (all entries with timestamps less or equal, are persisted).
  • On DRAM we store the highest timestamp, which is known to be committed.

Timestamps generation and data persistence

  • Data is stored using NONTEMPORAL stores (in pmemstream_append)
  • Timestamp is generated by incrementing a global, runtime counter (either in pmemstream_append or on transaction commit) and storing this timestamp inside entry.
  • Persisted timestamp is updated and persisted by a separate async function that scans in-progress regions. This function can be run in arbitrary thread: https://github.com/pmem/pmemstream/blob/master/examples/04_basic_async/main.cpp

Example of pmemstream with 3 entries:
image

Update README file

It'd be nice to extend the content of our README, proposed updated content:

  • extend the lib description,
  • differences to other, existing libraries/solutions,
  • add info about write ahead/redo log + perhaps some link,
  • mention pmemlog,
  • add an entry point to examples (extend their description? perhaps not in the top-level README),
  • add a fancy image/gif showing how pmemstream works,
  • value proposition, e.g.:
    • low latency writes,
    • ...but depends on medium capabilities,
    • low write amplification,
    • low SW overhead = little metadata per entry,
    • transparent usage of DSA/HW accelerators,
    • it's universal and easy to use,
    • does not require (extensive?) PMEM knowledge,
  • add a section about testing? rather perhaps write a new blog post about it,
  • ...

references:
https://pmem.io/blog/2022/01/introduction-to-pmemstream/
https://pmem.io/announcements/2022/pmemstream-v0-2-0-release/
manpages

Add data structure visualization tool

Data structure visualization tool as example

Rationale

It would be useful as debug tool and also may be used as part of documentation generation.

Description

Output may be inspired by tree (the linux tool):

.
├── stream
│   ├── region1
│   │   ├── entry1: offset; length; content
│   │   ├── entry2: offset; length; content
│   │   ├── entry3: offset; length; content
│   │   ├── entry4: offset; length; content
│   │   ├── entry5: offset; length; content
│   │   ├── entry6: offset; length; content
│   │   ├── entry7: offset; length; content
│   │   ├── entry8: offset; length; content
│   │   └── entry9: offset; length; content
│   ├── region2
│   │   ├── entry1: offset; length; content
│   │   ├── entry2: offset; length; content
│   │   └── entry3: offset; length; content
│   ├── region3
│   │   ├── entry1: offset; length; content
│   │   ├── entry2: offset; length; content
│   │   ├── entry3: offset; length; content
│   │   ├── entry4: offset; length; content
│   │   ├── entry5: offset; length; content
│   │   ├── entry6: offset; length; content
│   │   └── entry7: offset; length; content
│   └── region4
│       ├── entry1: offset; length; content
│       ├── entry12: offset; length; content
│       ├── entry2 : offset; length; content
│       ├── entry3 : offset; length; content
│       ├── entry4: offset; length; content
│       ├── entry5: offset; length; content
│       ├── entry6: offset; length; content
│       ├── entry7: offset; length; content
│       ├── entry8: offset; length; content
│       ├── entry9: offset; length; content
│       ├── entry10: offset; length; content
│       ├── entry11: offset; length; content
│       └── entry12: offset; length; content

4 regions, 32 entries

API Changes

None

Implementation details

Iterate over all regions and entries and print hierarchy, length and content

some pmreorder tests are run with ASAN

ISSUE: pmreorder tests are run with ASAN

Environment Information

Please provide a reproduction of the bug:

CMAKE options:
CC=clang
CXX=clang++
BUILD_TESTS=ON
CMAKE_BUILD_TYPE=Debug
TESTS_PMREORDER=ON
TESTS_USE_FORCED_PMEM=ON
TESTS_USE_VALGRIND=ON
USE_ASAN=ON
USE_UBSAN=ON

How often bug is revealed:

always

Actual behavior:

-- Stderr:
==559733==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==559733==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==559733==This might be related to ELF_ET_DYN_BASE change in Linux 4.12.
==559733==See https://github.com/google/sanitizers/issues/856 for possible workarounds.

and finally:

The following tests FAILED:
         70 - singly_linked_list_pmreorder_0_none (Failed)
         71 - singly_linked_list_pmreorder_negative_none (Failed)
         72 - multi_region_pmreorder_0_none (Failed)
         76 - append_break_lazy_1_none (Failed)

Expected behavior:

Not applicable tests should be skipped like following:

    Start 63: timestamp_0_memcheck_SKIPPED_BECAUSE_SANITIZER_USED
2/3 Test #63: timestamp_0_memcheck_SKIPPED_BECAUSE_SANITIZER_USED ....   Passed    0.02 sec
    Start 64: timestamp_0_pmemcheck_SKIPPED_BECAUSE_SANITIZER_USED
3/3 Test #64: timestamp_0_pmemcheck_SKIPPED_BECAUSE_SANITIZER_USED ...   Passed    0.02 sec

Details

Possibly is enough to add an exclusion in:

  • tests/CMakeLists.txt:158
if(TESTS_PMREORDER)
  • tests/CMakeLists.txt:167
if(GDB AND DEBUG_BUILD)

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? Yes

Requested priority: Low

Initial API proposal

struct data_entry {​
uint64_t data;​
};​

struct pmem2_map *map = map_open(argv[1]);​

if (map == NULL)​
return -1;​

struct pmemstream *stream;​
pmemstream_from_map(&stream, 4096, map);​

struct pmemstream_tx *tx;​
pmemstream_tx_new(&tx, stream);​

struct pmemstream_region new_region;​
pmemstream_tx_region_allocate(tx, stream, 4096, &new_region);​

struct pmemstream_region_context *rcontext;​
pmemstream_region_context_new(&rcontext, stream, new_region);​

struct data_entry e;​
e.data = 1;​

struct pmemstream_entry new_entry;​
pmemstream_tx_append(tx, rcontext, &e, sizeof(e), &new_entry);​
pmemstream_tx_reserve(tx, rcontext, sizeof(e), &new_entry);​
pmemstream_tx_commit(&tx);​

pmemstream_append(rcontext, &e, sizeof(e), &new_entry);​

struct data_entry *new_data_entry = pmemstream_entry_data(stream, new_entry);​
printf("new_data_entry: %lu\n", new_data_entry->data);​
pmemstream_region_context_delete(&rcontext);​

pmemstream_tx_region_free(tx, new_region);
pmemstream_region_clear(stream, new_region)

size_t size =  pmemstream_region_size(stream, new_region);

truct pmemstream_region_iterator *riter;​
pmemstream_region_iterator_new(&riter, stream);​
struct pmemstream_region region;​

while (pmemstream_region_iterator_next(riter, &region) == 0) {​
    struct pmemstream_entry entry;​
    struct pmemstream_entry_iterator *eiter;​ 
    pmemstream_entry_iterator_new(&eiter, stream, region);​
    uint64_t last_entry_data;​
    while (pmemstream_entry_iterator_next(eiter, NULL, &entry) ==​ 0) {​
        struct data_entry *d = pmemstream_entry_data(stream, entry);​
        printf("data entry %lu: %lu in region %lu\n",​
        entry.offset, d->data, region.offset);​
        last_entry_data = d->data;​
    }​
    pmemstream_entry_iterator_delete(&eiter);​
}​

pmemstream_delete(&stream);​
pmem2_map_delete(&map);

Based on: https://github.com/pbalcer/nvml/blob/0149cf71d9377aab835516b8cd79cc9ddc601744/src/examples/libpmem2/pmemstream/pmemstream.h

FEAT: Concurrent append to a single region

FEAT: Concurrent append to a single region

Rationale

  • Provide better throughput by allowing to issue multiple appends (from single or multiple threads)
  • Support parallel workloads which cannot be divided into multiple regions

Description

To make sure pmemstream is contiguous (there are no holes in it) appends to a single region must be synchronized somehow. However, synchronization should result in memcpy operations being serialized.

The idea is to synchronize append just before returning (or marking commited future #76 as completed).
Code in the implementation details section explains it in more detail.

API Changes

None.

Implementation details

pmemstream_append(..., data, size) {
    // get offset for append
    uint64_t offset = acquire_offset(size);
    uint64_t popcount = popcount_memory(data, size);
    *(stream->data + offset) = size;
    *(stream->data + offset + 8) = popcount;
    pmem2_memcpy(stream->data + offset + 16, data, size);
    // data at `offset` is published
    publish_offset(offset, size);
    // instead of calling this function directly we can have return a future here
    wait_for_prev_appends(offset);
} // pmemstream_append_finished

wait_for_prev_appends(offset) {
    uint64_t ready_offset;
    do {
        size_t size = consume_offset(&ready_offset);
    } while (ready_offset + size < offset);
}

Change iterators API

FEAT: New iterators (for both region and entry) API:

int pmemstream_region_iterator_is_valid(struct pmemstream_region_iterator *iterator);

void pmemstream_region_iterator_seek_first(struct pmemstream_region_iterator *iterator);

void pmemstream_region_iterator_next(struct pmemstream_region_iterator *iterator);

struct pmemstream_region pmemstream_region_iterator_get(struct pmemstream_region_iterator *iterator);



int pmemstream_entry_iterator_is_valid(struct pmemstream_entry_iterator *iterator);

void pmemstream_entry_iterator_seek_first(struct pmemstream_entry_iterator *iterator);

void pmemstream_entry_iterator_next(struct pmemstream_entry_iterator *iterator);

struct pmemstream_entry pmemstream_entry_iterator_get(struct pmemstream_entry_iterator *iterator);

Rationale

Such change enables more intuitive usage of iterators:

  1. for-loop based iteration.
for(pmemstream_region_iterator_seek_first(riter); pmemstream_region_iterator_is_valid(riter) == 0; pmemstream_region_iterator_next(riter)){
     		struct pmemstream_region region;
		pmemstream_region_iterator_get(riter, &region);
                /* Do something with region */
}
  1. Simplify regions/entries access when iterator is stored in a structure.
  2. Simplify next() functions signature.

FEAT: Multi-process support (single writer, multiple readers)

FEAT: Multi-process support

Rationale

With multi-process support (single writer, multiple readers) consumers can attach dynamically to pmemstream. Each consumer might use different set of dependencies.

One interesting use case is using separate consumer process for replication - application will constantly iterate over stream and send data to a replica (.e.g using rdma).

Description

  • Single writer/producer process
  • Multiple reader/consumer processes (can use iterators)
  • All runtime data stored in shared memory
  • Handle runtime data lifetime

API Changes

An additional method for attaching to stream in read-only mode.

Implementation details

All runtime metadata (including region_runtime's) must be kept in shared memory.

First steps

  • Single writer process per entire stream. In future we might consider adding support for single writer process per region.
  • Write process will be responsible for runtime data lifetime management. In future we might consider refcounts.

FEAT: Async API

FEAT: Async API

Rationale

  • Allow user to perform computation when waiting on append completion.
  • Allow user to issue multiple concurrent appends to a single region and take advantage of DSA
  • Expose more states of the append operation to user. Right now, if append completes data can be treated as committed and persisted. With Async API we might expose committed and persisted states separately (as futures).

Description

Async API would use miniasync framework to expose commited and persisted futures.

API Changes

Expose futures in append and publish.

Example (pseudocode)

future = pmemstream_append();
// no guarantees here

wait(future, COMMITED);
// data commited (can be safely read), possibly not persisted

wait(future, PERSISTED);
// data commited and persistent

Add get_usable_size() funciton

FEAT: Add get_usable_size() funciton

Rationale

Currently it's hard to guess size of region, which fits into the stream.

Description

Expose usable_size variable aligned to block size in API

API Changes

add get_max_region_size()

Add checksum to provide consistency for metadata

FEAT: Checksum for metadata consistency

Rationale

Let's imagine there is a case when metadata is corrupted in such a way that only size or popcount is broken.
In such case you can get a "correct" iterator with data exceeding original data.

Description

  • To avoid better consistency should be added mechanism like checksum for metadata memory blocks.
  • Provide mechanism that wouldn't need to check checksum twice (iterator + data retrieval (using pmemstream_entry_data).

API Changes

Just extend metadata structure and add function to calculate checksum from memory range in both, read and write.

Implementation details

To be discussed later.

Meta

[RC tests] timestamp order with multiple threads and multiple regions tests

FEAT: timestamp order with multiple threads and multiple regions tests

Description

Prerequisites:
(Prereq 1) Add generator derivated by pmemstream_test_base for that case:

  • Generates multiple regions
  • Multithreaded synchronously append entries to regions

(Prereq 2) Add generator derivated by pmemstream_test_base for that case:

  • Generates multiple regions
  • Multithreaded asynchronously append entries to regions

Cases for (prereq 1 and 2):
Case 1:
In every region, all timestamps increase.

Case 2:
In stream, all timestamps increase (duplications).

Case 3:
Add regions, remove single region with entries, add an empty region and then add new entries then check Case 2.

Implementation details

N/A

Add logging mechanism

We could add logging similar to e.g. libpmem2 (LOG macro and all its helpers):
https://github.com/pmem/pmdk/blob/master/src/libpmem2/libpmem2.c#L29

This could be used for logging i.a. issues introduced by a user's improper behavior, like in case user:

  • call pmemstream_reserve,
  • forgot/improperly memcpy data into the reserved space,
  • call pmemstream_publish,
  • call pmemstream_entry_iterator_next and got an assert error:
    • src/iterator.c:146: pmemstream_entry_iterator_next: Assertion validate_entry(iterator->stream, entry) == 0' failed.`

This is exactly the place to report a broader message to the user, with proper information like: data corruption or improper append happen....

Simplify pmemstream API

FEAT: Simplify pmemstream API

Rationale

The fact the particular entry is located in the particular region which is part of particular stream instance should be reflected in API.

Currently functions in pmemstream unnecessary gets to many arguments. i.e: int pmemstream_append(struct pmemstream *stream, struct pmemstream_region region, struct pmemstream_region_runtime *region_runtime, const void *data, size_t size, struct pmemstream_entry *new_entry) which is error prone. It's super easy to pass mismatched entry, region and region runtime.

Description

  • Make region and entry structures purely runtime.
  • Use pointers to regions and entries instead of offsets (allow to remove a lot of offset_to_ptr calls).
  • Add serialization/deserialization functions, which would operate on offsets.

API Changes

Signature change

int pmemstream_region_allocate(struct pmemstream *stream, size_t size, struct pmemstream_region *region);

int pmemstream_region_free(struct pmemstream_region region);
size_t pmemstream_region_size( struct pmemstream_region region);


size_t pmemstream_region_usable_size(struct pmemstream_region region);


int pmemstream_reserve(struct pmemstream_region region, size_t size,
		       struct pmemstream_entry *reserved_entry);


int pmemstream_publish(struct pmemstream_entry reserved_entry);

int pmemstream_append(const struct pmemstream_region region, const void *data, size_t size,
		      struct pmemstream_entry *new_entry);

int pmemstream_async_publish(struct pmemstream_entry entry);

int pmemstream_async_append(struct vdm *vdm, const struct pmemstream_region region, const void *data, size_t size,
			    struct pmemstream_entry *new_entry);

uint64_t pmemstream_committed_timestamp(struct pmemstream *stream);

uint64_t pmemstream_persisted_timestamp(struct pmemstream *stream);

struct pmemstream_async_wait_fut pmemstream_async_wait_committed(struct pmemstream *stream, uint64_t timestamp);

struct pmemstream_async_wait_fut pmemstream_async_wait_persisted(struct pmemstream *stream, uint64_t timestamp);

const void *pmemstream_entry_data(struct pmemstream_entry entry);

size_t pmemstream_entry_size(struct pmemstream_entry entry);

uint64_t pmemstream_entry_timestamp(struct pmemstream_entry entry);

Functions to be removed

	pmemstream_region_runtime_initialize()

Functions to be added

Those functions are needed to store as a separate metadata not related to actual address

	size_t pmemstream_region_offset(const struct pmemstream_region);
	size_t pmemstream_entry_offset(const struct pmemsream_entry);

	int pmemstream_reigon_from_offset(size_t offset, struct pmemstream_region *region); // Check if it's region is needed, so need to read metadata from pmem

	int pmemstream_entry_from_offset(size_t offset, struct *pmemstream_entry); // Check if it's entry is needed, so need to read metadata from pmem

Implementation details

Change structures pmemstream_region and pmemstream_entry to be purely runtime and operate on pointers instead of offsets

/* Runtime structure */
struct pmemstream_region {
	struct pmemstream *stream;

	struct pmemstream_region_runtime *region_runtime;

	/* Pointer to region on pmem */
	void *data;
};

/* Runtime structure */
struct pmemstream_entry {

	struct pmemstream_region region;

	 /* raw pointer to entry on pmem */
	 void *data;

	/* possible optimization: store in runtime structure size and timestamp to minimize entry metadata
	 * reads from pmem */
	size_t size;
	uint64_t timestamp;
};

Change entry_iterator behavior

FEAT: Change entry_iterator behavior

Rationale

Right now, when pmemstream_entry_iterator_next is called and there are no more entries user-provided offset will be set to after the last entry. This is problematic if a user wants to get the last entry from the region. They can't just do:

	struct pmemstream_entry last_entry;
	while (pmemstream_entry_iterator_next(eiter, nullptr, &last_entry) == 0) {		
        }

Becasue after pmemstream_entry_iterator_next returns -1, last_entry will be invalid.

Change this behavior to not modify offset when there are no more elements.

API Changes

None

Extend pmemstream customizability

FEAT: Extend pmemstream customizability

Rationale

There are multiple internal parameters that impact performance and are currently hardcoded. For example, MAX_CONCURRENCY or BATCH_SIZE for commit/persist processing. We should allow users to customize those values.

API Changes

Config structure + change in pmemstream_open function to accept the config.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.