Coder Social home page Coder Social logo

ligurio / unreliablefs Goto Github PK

View Code? Open in Web Editor NEW
171.0 7.0 9.0 103 KB

A FUSE-based fault injection filesystem.

Home Page: https://ligurio.github.io/unreliablefs/unreliablefs.1.html

License: MIT License

CMake 6.65% Roff 4.36% C 70.18% Python 18.08% Lua 0.73%
fault-injection filesystem fuse-filesystem software-testing software-testing-tools quality-assurance fault-injection-filesystem fuse chaos-engineering chaos-testing

unreliablefs's Introduction

UnreliableFS

Build Status

is a FUSE-based fault injection filesystem that allows to change fault-injections in runtime using simple configuration file.

Supported fault injections are:

  • errinj_errno - return error value and set random errno.
  • errinj_kill_caller - send SIGKILL to a process that invoked file operation.
  • errinj_noop - replace file operation with no operation (similar to libeatmydata, but applicable to any file operation).
  • errinj_slowdown - slowdown invoked file operation.

Building

Prerequisites:

  • CentOS: dnf install -y gcc -y cmake fuse fuse-devel
  • Ubuntu: apt-get install -y gcc cmake fuse libfuse-dev
  • FreeBSD: pkg install gcc cmake fusefs-libs pkgconf
  • OpenBSD: pkg_add cmake
  • macOS: brew install --cask osxfuse
$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
$ cmake --build build --parallel

Packages

Packaging status

Using

$ mkdir /tmp/fs
$ unreliablefs /tmp/fs -basedir=/tmp -seed=1618680646
$ cat << EOF > /tmp/fs/unreliablefs.conf
[errinj_noop]
op_regexp = .*
path_regexp = .*
probability = 30
EOF
$ ls -la
$ umount /tmp/fs

Documentation

See documentation in unreliablefs.1 and unreliablefs.conf.5.

License

MIT License, Copyright (c) 2020-2023, Sergey Bronnikov BSD-3-Clause, Copyright (C) 2009-2020, Ben Hoyt

unreliablefs's People

Contributors

ligurio avatar xaizek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

unreliablefs's Issues

Fault injection with bit flip

something like bit flip
Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot.

Data degradation results from the gradual decay of storage media over the course of years or longer. Causes vary by medium:

Solid-state media, such as EPROMs, flash memory and other solid-state drives, store data using electrical charges, which can slowly leak away due to imperfect insulation. The chip itself is not affected by this, so reprogramming it approximately once per decade prevents decay. An undamaged copy of the master data is required for the reprogramming.
Magnetic media, such as hard disk drives, floppy disks and magnetic tapes, may experience data decay as bits lose their magnetic orientation. Periodic refreshing by rewriting the data can alleviate this problem. In warm/humid conditions these media, especially those poorly protected against ambient air, are prone to the physical decomposition of the storage medium.[3][4]

see charybdefs 9

There’s a lot more to it than that, but this provides a basic idea of how the two storage types keep their data. Now let’s look at how they can lose it through bit rot. With hard drives, as mentioned above, saved bits can flip their magnetic polarity. If enough of them flip without being corrected, that can lead to bit rot. Solid-state drives, meanwhile, lose their data when the insulating layer degrades and the charged electrons leak out.

How long it takes to see bit rot in practice depends on a variety of issues. Hard drives have the potential to last with their data intact for decades even if powered down. SSDs, meanwhile, are said to lose their data within a few years in the same state. In fact, there are reports that, if they’re stored in an unusually hot location, the data on an SSD can be wiped out even faster.

Report #1 mentions vendors declaring "Bit Error Rate of 10-12 for their memory modules", "a observed error rate is 4 orders of magnitude lower than expected". For memory related tasks, at a rate of 8 GBps this means a single bit flip may occur every minute (10-12 vendors BER) or once in two days (10-16 BER)

According to #2, there can be up to 25000-75000 one-bit FIT per Mbit (failures in time per billion hours), which is equal to 1-5 bit errors per hour for 8GB of RAM according to my napkin. Paper says the same: "mean correctable error rates of 2000–6000 per GB per year".

The #3 report says, double bit flips "were deemed unlikely" but at ORNL's Cray XT5 they were observed "at a rate of one per day for 75,000+ DIMMs" even with ECC. And single-bit errors should be higher.

In a recently initiated effort, Schwarz et al. [28] have started to gather failure data at the Internet Archive, which they plan to use to study disk failure rates and bit rot rates and how they are affected by different environmental parameters. In their preliminary results, they report ARR values of 2-6% and note that the Internet Archive does not seem to see significant infant mortality. Both observations are in agreement with our findings.

https://stackoverflow.com/questions/24181878/how-to-random-flip-binary-bit-of-char-in-c-c

Log operations to a text file

Sometimes users wonder was fault injection happen or not. Logging can help to answer on such questions.

Logging real operations in pass-through mode without fault injections allows to analyze operations frequency per type and set probabilities for fault injections similar to real usage.

TODO: create a script that analyze operations log.

See also https://github.com/rflament/loggedfs

Introduce configuration file

Possible formats:

Configuration file should have number of sections each of them describes:

  • field with type of fault injection:
    • errinj_remove (see #35)
    • errinj_truncate (see #34)
    • errinj_kill_caller kill process that run operation (see #28)
    • errinj_incomplete_write incomplete writes (see #27)
    • errinj_corrupted_write corrupted writes (see #8)
    • errinj_errno (random or fixed) (+ regex that describes what errors one desired to return) (see #6 #9)
    • errinj_noop replace operation with no-op for those operations where it possible (see #18)
    • errinj_delayed_op delay on operations (see #29)
    • errinj_clear_cache (see #57)
  • field with probability (0-100%)
  • field with regexp for filesystem path where fault injection should happen

For example:

[errinj_errno]
path_regexp = *.xlog
operation_regexp =
probability = 60

[errinj_noop]
path_regexp = *.*
operation_regexp =
probability = 30

test_setxattr() is broken

        os.setxattr(target, attr_name, attr_value)                                                                                                
        assert attr_name.decode("utf-8") in os.listxattr(target)                                                                                  
>       assert os.getxattr(target, attr_name) == attr_value                                                                                       
E       AssertionError: assert b'' == b'unreliablefs'                                                                                             
E         Right contains 12 more items, first extra item: 117                                                                                     
E         Full diff:                                                                                                                              E         - b''                                                                                                                                   
E         + b'unreliablefs'                                                                                                                       
                                                                                                                                                  
tests/test_unreliablefs.py:497: AssertionError  

test_create is broken on FreeBSD

def test_create(setup_unreliablefs):
        mnt_dir, src_dir = setup_unreliablefs
        name = name_generator()
        fullname = pjoin(mnt_dir, name)
        with pytest.raises(OSError) as exc_info:
            os.stat(fullname)
        assert exc_info.value.errno == errno.ENOENT
        assert name not in os.listdir(mnt_dir)
        fd = os.open(fullname, os.O_CREAT | os.O_RDWR)
        os.close(fd)
>       assert name in os.listdir(mnt_dir)
E       AssertionError: assert 'testfile_6' in []
E        +  where [] = <built-in function listdir>('/tmp/pytest-of-root/pytest-0/test_create0/mnt')
E        +    where <built-in function listdir> = os.listdir
tests/test_unreliablefs.py:130: AssertionError

https://cirrus-ci.com/task/5170998354903040?command=test#L47

Bump minimal CMake version in FindFUSE.cmake

on macOS Catalina:

-- Detecting CXX compile features - done
CMake Deprecation Warning at cmake/FindFUSE.cmake:41 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  CMakeLists.txt:13 (find_package)

https://github.com/ligurio/unreliablefs/runs/1756590354

Update a README

It would be nice to have detailed descriptions about possible FUSE performance, about simulation real errors using unreliablefs

Performance


Simulation of real errors

Footnotes

  1. https://docs.virtuozzo.com/virtuozzo_hybrid_server_7_installation_guide/preparing-for-installation/planning-storage-gui.html#planning-node-hardware-configurations

  2. https://www.postgresql.org/docs/current/wal-reliability.html

test_symlink() is broken

Traceback (most recent call last):                                                                                                                
  File "/home/sergeyb/sources/unreliablefs/tests/test_unreliablefs.py", line 110, in test_symlink                                                 
    os.symlink(target_path, link_path)                                                                                                            
OSError: [Errno 5] Input/output error: '/tmp/testfile_5' -> '/tmp/pytest-of-sergeyb/pytest-0/test_symlink0/mnt/testfile_4'  

see also: #11, #10

Report statistics about injected errors

show a report on unmount that may look like this:

errinj_noop triggered 34 times
errinj_errno triggered 12 times

and probably detailed log with datetime when error injection has been triggered

test_passthrough is broken on FreeBSD 12

setup_unreliablefs = ('/tmp/pytest-of-root/pytest-0/test_passthrough0/mnt', '/tmp/pytest-of-root/pytest-0/test_passthrough0/src')

    def test_passthrough(setup_unreliablefs):
        mnt_dir, src_dir = setup_unreliablefs
        name = name_generator()
        src_name = pjoin(src_dir, name)
        mnt_name = pjoin(src_dir, name)
        assert name not in os.listdir(src_dir)
        assert name not in os.listdir(mnt_dir)
        with open(src_name, 'w') as fh:
            fh.write('Hello, world')
        assert name in os.listdir(src_dir)
>       assert name in os.listdir(mnt_dir)
E       AssertionError: assert 'testfile_20' in []
E        +  where [] = <built-in function listdir>('/tmp/pytest-of-root/pytest-0/test_passthrough0/mnt')
E        +    where <built-in function listdir> = os.listdir

tests/test_unreliablefs.py:344: AssertionError

https://cirrus-ci.com/task/5170998354903040?command=test#L47

Allow specifying a set of errnos to select from

Allow setting a set of errnos rather than just a particular errno or a
random one from the entire set. Update the cookbook for random faults to
exclude any errnos passed via extra arguments.

My objective here is to be able to exclude a specific errno from
random injection. The Go runtime gets confused by EAGAINs, which cause
it to epoll_wait on the file descriptor. I'd like to exclude EAGAIN from
the set of injected errors for my use case.

charybdefs 24

test_append is broken on FreeBSD


setup_unreliablefs = ('/tmp/pytest-of-root/pytest-0/test_append0/mnt', '/tmp/pytest-of-root/pytest-0/test_append0/src')
    def test_append(setup_unreliablefs):
        mnt_dir, src_dir = setup_unreliablefs
        name = name_generator()
        os_create(pjoin(src_dir, name))
        fullname = pjoin(mnt_dir, name)
        with os_open(fullname, os.O_WRONLY) as fd:
            os.write(fd, b'foo\n')
        with os_open(fullname, os.O_WRONLY|os.O_APPEND) as fd:
>           os.write(fd, b'bar\n')
E           OSError: [Errno 9] Bad file descriptor
tests/test_unreliablefs.py:188: OSError

https://cirrus-ci.com/task/5170998354903040?command=test#L66

test_seek is broken on FreeBSD

setup_unreliablefs = ('/tmp/pytest-of-root/pytest-0/test_seek0/mnt', '/tmp/pytest-of-root/pytest-0/test_seek0/src')

    def test_seek(setup_unreliablefs):
        mnt_dir, src_dir = setup_unreliablefs
        name = name_generator()
        os_create(pjoin(src_dir, name))
        fullname = pjoin(mnt_dir, name)
        with os_open(fullname, os.O_WRONLY) as fd:
            os.lseek(fd, 1, os.SEEK_SET)
>           os.write(fd, b'foobar\n')
E           OSError: [Errno 9] Bad file descriptor

tests/test_unreliablefs.py:200: OSError

https://cirrus-ci.com/task/5170998354903040?command=test#L47

test_open_unlink() is broken

Traceback (most recent call last):                                                                                                                
  File "/home/sergeyb/sources/unreliablefs/tests/test_unreliablefs.py", line 223, in test_open_unlink                                             
    assert fh.read() == data1+data2                                                                                                               
OSError: [Errno 9] Bad file descriptor 

Fault injection with fake storage capacity

There are many fraudulent USB sticks in circulation that report to have a high capacity (ex: 8GB) but are really only capable of storing a much smaller amount (ex: 1GB). Attempts to write on these devices will often result in unrelated files being overwritten. Any use of a fraudulent flash memory device can easily lead to database corruption, therefore. Internet searches such as "fake capacity usb" will turn up lots of disturbing information about this problem.

https://www.sqlite.org/howtocorrupt.html

Use realistic errno's

Right now unreliablefs uses any available errno, but it is not a real case.
For example errors like "No space left on device" should still provide the
ability to list the directory and change into the directory.
Perhaps every function should have a separate set of available errno's.

charybdefs 20

test_chown() is broken

Traceback (most recent call last):                                                                                                                
  File "/home/sergeyb/sources/unreliablefs/tests/test_unreliablefs.py", line 146, in test_chown                                                   
    os.chown(filename, uid_new, -1)                                                                                                               
PermissionError: [Errno 1] Operation not permitted: '/tmp/pytest-of-sergeyb/pytest-7/test_chown0/mnt/testfile_1'  

cannot use touch(1)

$ strace touch tmp/ddd
...
close(3)                                = 0                                                                                                       
utimensat(0, NULL, NULL, 0)             = -1 ENOSYS (Function not implemented)                                                                    
utimensat(AT_FDCWD, "tmp/ddd", NULL, 0) = -1 ENOSYS (Function not implemented)                                                                    
close(0)  
...

test_truncate_fd() is broken

Traceback (most recent call last):                                                                                                                
  File "/home/sergeyb/sources/unreliablefs/tests/test_unreliablefs.py", line 325, in test_truncate_fd                                             
    assert fh.read(size) == TEST_DATA                                                                                                             
  File "/usr/lib/python3.8/tempfile.py", line 613, in func_wrapper                                                                                
    return func(*args, **kwargs)                                                                                                                  
OSError: [Errno 9] Bad file descriptor        

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.