distributed-system-analysis / smallfile Goto Github PK
View Code? Open in Web Editor NEWdistributed metadata-intensive workload generator for POSIX-like filesystems
License: Apache License 2.0
distributed metadata-intensive workload generator for POSIX-like filesystems
License: Apache License 2.0
How would you restrict maximum directories that can exist any given time ?
It seems logic revolves around maximum files, and if maximum files is 10k-20k then no of directories are way too much. script spends a lot of time in creating those directories first, i would like to restrict maximum dirs.
Interested in incorporating these options.
dirs_per_dir
maximum_dirs
Any suggestion ?
This is new behavior (version : 3.1) that doesn't seem correct. I've never had to have the target directory mounted on the driver used to run smallfile tests on remote clients.
# ./smallfile_cli.py --host-set "gprfc001" --response-times N --same-dir N --stonewall N --top /mnt/cephfs/smf/ --network-sync-dir /smfnfs/sync/ --threads 1 --files 256 --files-per-dir 1000 --file-size 4 --file-size-distribution exponential --operation create
ERROR: you must ensure that shared directory /mnt/cephfs/smf is accessible from this host and every remote host in test
on baremetal RHEL 7.5 (3.10.0-862)
Can this test tool be used to test non-distributed file systems? like ext4 or btrfs
I noticed today when running huge tests that the pickle files output by each thread are enormous, and they cause smallfile_cli.py to blow up in size to many GB. These pickle files are used to return per-thread result data to the master process that initiated the test. The only explanation I have for this is that response times are saved in a python list and are thus put into the pickle file. But there is no need for this - if the user requests response time data, it should be saved in separate .csv files in network_shared/ and so there's no need to return it in the pickle file. If this is correct, then it should be a simple fix.
I tried the latest master branch on windows client. It returned me below error
adding time for Windows synchronization
Traceback (most recent call last):
File "C:\Users\administrator.GCHILD\Desktop\smallfile-master\smallfile-master\
smallfile_cli.py", line 288, in <module>
run_workload()
File "C:\Users\administrator.GCHILD\Desktop\smallfile-master\smallfile-master\
smallfile_cli.py", line 279, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "C:\Users\administrator.GCHILD\Desktop\smallfile-master\smallfile-master\
multi_thread_workload.py", line 121, in run_multi_thread_workload
sync_files.write_sync_file(sg, 'hi there')
File "C:\Users\administrator.GCHILD\Desktop\smallfile-master\smallfile-master\
sync_files.py", line 16, in write_sync_file
os.rename(fpath+notyet, fpath)
WindowsError: [Error 32] The process cannot access the file because it is being
used by another process
But branch "smallfile-dir-share-wi-clnts" is working perfectly on single client.
launch_host_smf on secondary node and start create, mkdir operation on the main node. Test aborts on the main node
Logs placed at https://drive.google.com/drive/folders/0B8kUR8TCDdh3Mk1rclVvNHBfaGc?usp=sharing
The python argparse module automates a lot of what is in parse.py and elsewhere. Nick Dokos turned me on to it. Should we convert to it? How much code would it save? how much would it change existing syntax?
I was trying to re-check whether python itself is becoming a bottleneck for smallfile, and specifically I wanted to see if pypy3 was better than python3 and could keep up with the "find" utility for reading directory trees. Initially pypy3 would not run because of a new dependency on PyYAML, but when I commented out the YAML parsing option in smallfile (just 2 lines), pypy3 worked fine.
So I created a tree containing 1 million 0-byte (empty) files using a single smallfile thread, and then tried doing the smallfile readdir operation both with pypy and with python3, and then compared it to the same result with the "find" utility. No cache-dropping was done, so that all the metadata could be memory-resident. While this may seem an unfair comparison, NVDIMM-N memories can provide low response times similar to this, and in addition cached-storage performance is something we need to measure. So the 3 commands were:
python3 ./smallfile_cli.py --threads 1 --file-size 0 --files 1048576 --operation readdir
pypy3 ./smallfile_cli.py --threads 1 --file-size 0 --files 1048576 --operation readdir
find /var/tmp/smf/file_srcdir/bene-laptop/thrd_00 -type f | wc -l
The results were:
test thousands of files/sec
---- ---------------------------
python3 160
pypy3 352
find 1000
Are all 3 benchmarks doing the same system calls? When I used strace to compare, smallfile was originally at a disadvantage because it was doing system calls to see if other threads had finished. Specifically it was looking for stonewall.tmp in the shared network directory every 100 files. This is not a big deal when doing actual file reads/writes, but for readdir this is a significant increase in the number of system calls. Here's what smallfile was doing per directory:
5693 openat(AT_FDCWD, "/var/tmp/smf/file_srcdir/bene-laptop/thrd_00/d_001/d_002/d_008", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 8
5693 fstat(8, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
5693 getdents(8, /* 112 entries */, 32768) = 5168
5693 getdents(8, /* 0 entries */, 32768) = 0
5693 close(8) = 0
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
5693 stat("/var/tmp/smf/network_shared/stonewall.tmp", 0x7ffc19b899e0) = -1 ENOENT (No such file or directory)
By adding the parameter "--stonewall Y" we get rid of the excess stat() system calls and the sequence per directory then becomes optimal. Rerunning the tests with this parameter, we get:
test thousands of files/sec
---- ---------------------------
python3 172 (was 160)
pypy3 380 (was 352
find 1000
Here's the system call pattern for the "find" command:
fcntl(9, F_DUPFD_CLOEXEC, 0) = 4
newfstatat(9, "d_009", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(9, "d_009", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 6
fstat(6, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl(6, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW|O_DIRECTORY)
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
newfstatat(9, "d_009", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
fcntl(6, F_DUPFD_CLOEXEC, 3) = 10
getdents(6, /* 102 entries */, 32768) = 4848
getdents(6, /* 0 entries */, 32768) = 0
close(6) = 0
So the find utility is actually making more system calls.
pypy3 and python3 are both using 100% CPU according to "top", which means they are using up a whole core and can't go any faster. Most of their time is spent in user space not system space. Yet even pypy3 is 1/3 the speed of "find".
So what conclusion do we draw from this? Perhaps smallfile and other utilities will need to be rewritten in a compiled language for greater speed, in order to keep up with modern storage hardware.
should not do this if any exception happens in a worker thread
The README has:
Copyright [2012] [Ben England]
Licensed under the Apache License, Version 2.0 (the "License"); you may not use files except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
It'd be great to add to the repo, so it's parsed by Github.
The smallfile tool is being used version-controlled as a plugin for Arcaflow. We are currently using the github tag v1.1 to pull the latest release, which we use for the histogram post-processing features. As github tags can be unreliable, we would like to point to a release build instead per arcalot/arcaflow-plugins-incubator#22.
Also see related issue #34
I am running a series of tests, the --create operation which is the first performed never reaches the 100% of file processed and actually almost always ends up with an 87,5%. This happens no matter the number of files/threads/fsync/pause options i use. This of course cause subsequent operations to reach 62%, 80%, 25% etc.
It seems to me there is a timeout somewhere that stops the operation execution no matter how many files i want to process.
I am running the test on 9 clients with 24 cores each, for instance:
files/thread : 200
threads : 2
record size (KB, 0 = maximum) : 0
file size (KB) : 64
file size distribution : fixed
files per dir : 100
dirs per dir : 10
threads share directories? : N
filename prefix : fsync_
filename suffix :
hash file number into dir.? : N
fsync after modify? : Y
pause between files (microsec) : 300
finish all requests? : Y
stonewall? : Y
measure response times? : N
verify read? : Y
verbose? : False
log to stderr? : False
ext.attr.size : 0
ext.attr.count : 0
permute host directories? : N
example output:
total threads = 16
total files = 2800
total IOPS = 2800
total data = 0.171 GiB
87.50% of requested files processed, minimum is 90.00
elapsed time = 3.961
files/sec = 706.826175
IOPS = 706.826175
MiB/sec = 44.176636
not enough total files processed, change test parameters
What is the meaning of this error ?
ERROR: not enough total files processed, change test parameters
The error message "record size cannot exceed file size" is correct and informative but it looks like the usage statement has a slight problem (undefined).
Traceback (most recent call last):
File "./smallfile_cli.py", line 284, in
run_workload()
File "./smallfile_cli.py", line 265, in run_workload
params = parse.parse()
File "/root/smallfile/parse.py", line 192, in parse
usage('record size cannot exceed file size')
NameError: global name 'usage' is not defined
If I run "python smallfile.py" the unit test passes, but if I run "pypy smallfile.py" it fails with "NameError: name 'xattr' is not defined", but python*pyxattr packages are installed.
This feature makes it easier to integrate smallfile with various test harnesses and CIs, such as Ceph Teuthology and Jenkins pipelines. The proposed branch name is "yaml". This will be merged soon if no objections are heard.
Smallfile should define its own exception classes instead of throwing generic Exception. This allows more specific exception handling and easier debugging if there is a problem.
[root@localhost smallfile-master]# python smallfile_cli.py --top /nfs/test/ --operation create --host-set 172.16.149.42 --network-sync-dir /tmp/smallfile-master/test/
smallfile version 3.0
hosts in test : ['172.16.149.42']
top test directory(s) : ['/nfs/test']
operation : create
files/thread : 200
threads : 2
record size (KB, 0 = maximum) : 0
file size (KB) : 64
file size distribution : fixed
files per dir : 100
dirs per dir : 10
threads share directories? : N
filename prefix :
filename suffix :
hash file number into dir.? : N
fsync after modify? : N
pause between files (microsec) : 0
finish all requests? : Y
stonewall? : Y
measure response times? : N
verify read? : Y
verbose? : False
log to stderr? : False
ext.attr.size : 0
ext.attr.count : 0
permute host directories? : N
remote program directory : /tmp/smallfile-master
network thread sync. dir. : /tmp/smallfile-master/test/
Exception seen in thread 00 host 172.16.149.42 (tail /var/tmp/invoke_logs-00.log)
Exception seen in thread 01 host 172.16.149.42 (tail /var/tmp/invoke_logs-01.log)
Traceback (most recent call last):
File "/tmp/smallfile-master/smallfile_remote.py", line 48, in
run_workload()
File "/tmp/smallfile-master/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/tmp/smallfile-master/multi_thread_workload.py", line 107, in run_multi_thread_workload
% startup_timeout)
Exception: threads did not reach starting gate within 11 sec
thread ssh-thread:172.16.149.42:256:ssh -x -o StrictHostKeyChecking=no 172.16.149.42 "python /tmp/smallfile-master/smallfile_remote.py --network-sync-dir /tmp/smallfile-master/test/ --as-host 172.16.149.42" has died
Traceback (most recent call last):
File "smallfile_cli.py", line 280, in
run_workload()
File "smallfile_cli.py", line 270, in run_workload
return run_multi_host_workload(params)
File "smallfile_cli.py", line 182, in run_multi_host_workload
'within %d seconds' % host_timeout)
Exception: hosts did not reach starting gate within 17 seconds
for large tests we need threads to start slow so that all threads can get into the mix and 1 thread doesn't pull way ahead of others. This is particularly important for tests with large numbers of threads (i.e. lots of client hosts). fio does this with "ramp_time" parameter and "startdelay" parameter. Would like to have something more automatic for smallfile, but we could start out this way.
self.remote_cmd = '%s %s "%s"'%\
(self.ssh_prefix, self.remote_host, remote_cmd_in)
The command is not appended with 'python' because of which the ssh <remote_host > cmd is not getting executed on the remote system.
self.remote_cmd = '%s %s "python %s"'%\
(self.ssh_prefix, self.remote_host, remote_cmd_in)
fixed the issue for me.
While trying to do some IOs on mount point, got this-->>
sudo python smallfile_cli.py --operation create --threads 8 --file-size 1024 --files 2048 --top /mnt/mpoint/
version : 3.1
hosts in test : None
top test directory(s) : ['/mnt/cephfs']
operation : create
files/thread : 2048
threads : 8
record size (KB, 0 = maximum) : 0
file size (KB) : 1024
file size distribution : fixed
files per dir : 100
dirs per dir : 10
threads share directories? : N
filename prefix :
filename suffix :
hash file number into dir.? : N
fsync after modify? : N
pause between files (microsec) : 0
finish all requests? : Y
stonewall? : Y
measure response times? : N
verify read? : Y
verbose? : False
log to stderr? : False
ext.attr.size : 0
ext.attr.count : 0
Traceback (most recent call last):
File "smallfile_cli.py", line 279, in
run_workload()
File "smallfile_cli.py", line 264, in run_workload
params = parse.parse()
File "/home/cephuser/smallfile/parse.py", line 280, in parse
test_params.recalculate_timeouts()
AttributeError: smf_test_params instance has no attribute 'recalculate_timeouts'
Hi,
I used smallfile a couple of years ago to get metric on a glusterfs setup using up to 4 hosts-set, and it worked great!
Now I'm kind of replicating my metrics with a cephfs, but I'm not able to do any test with any other host than the localhost?
I mean, lets use this simple command:
./smallfile_cli.py --operation create --top $MY_MOUNTPOINT --host-set $MY_HOSTNAME
Changing the calues of this command, I get this:
$MY_HOSTNAME
is only the localhost, it works$MY_HOSTNAME
contains any other host (eg c7), it fails with:Traceback (most recent call last):
File "/imatge/imagen/smf-benchmark/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/imatge/imagen/smf-benchmark/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/imatge/imagen/smf-benchmark/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 7 seconds
ERROR: ssh thread for host c7 completed with status 256
pickle file /mnt/cephfs/smf/network_shared/c7_result.pickle not found
no pickled invokes read, so no results
$MY_MOUNTPOINT
returns the same resultsThe ssh
works fine without password on any host.
We're using Debian-Jessie ssh.
Do you have any idea that were the problem could be?
Hi,
I don't understand why I get the below error.
[root@host1 smallfile]# python smallfile_cli.py --top /mnt/gluster/gv0_ganesha_nfs/smallfile --host-set host1,host2,host3,host4,host6,host7 --threads 8 --file-size 4 --files 10 --response-times Y --operation create
version : 3.1
hosts in test : ['host1', 'host2', 'host3', 'host4', 'host6', 'host7']
top test directory(s) : ['/mnt/gluster/gv0_ganesha_nfs/smallfile']
operation : create
files/thread : 10
threads : 8
record size (KB, 0 = maximum) : 0
file size (KB) : 4
file size distribution : fixed
files per dir : 100
dirs per dir : 10
threads share directories? : N
filename prefix :
filename suffix :
hash file number into dir.? : N
fsync after modify? : N
pause between files (microsec) : 0
finish all requests? : Y
stonewall? : Y
measure response times? : Y
verify read? : Y
verbose? : False
log to stderr? : False
ext.attr.size : 0
ext.attr.count : 0
permute host directories? : N
remote program directory : /root/smallfile
network thread sync. dir. : /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared
starting all threads by creating starting gate file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/starting_gate.tmp
Traceback (most recent call last):
File "/root/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/root/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/root/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 11 seconds
Traceback (most recent call last):
File "/root/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/root/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/root/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 11 seconds
ERROR: ssh thread for host host2 completed with status 256
Traceback (most recent call last):
File "/root/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/root/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/root/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 11 seconds
ERROR: ssh thread for host host3 completed with status 256
Traceback (most recent call last):
File "/root/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/root/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/root/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 11 seconds
ERROR: ssh thread for host host4 completed with status 256
Traceback (most recent call last):
File "/root/smallfile/smallfile_remote.py", line 48, in <module>
run_workload()
File "/root/smallfile/smallfile_remote.py", line 39, in run_workload
return multi_thread_workload.run_multi_thread_workload(params)
File "/root/smallfile/multi_thread_workload.py", line 143, in run_multi_thread_workload
% prm.host_startup_timeout)
Exception: starting signal not seen within 11 seconds
ERROR: ssh thread for host host6 completed with status 256
ERROR: ssh thread for host host7 completed with status 256
pickle file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/host2_result.pickle not found
pickle file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/host3_result.pickle not found
pickle file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/host4_result.pickle not found
pickle file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/host6_result.pickle not found
pickle file /mnt/gluster/gv0_ganesha_nfs/smallfile/network_shared/host7_result.pickle not found
host = host1,thr = 00,elapsed = 1.292002,files = 10,records = 10,status = ok
host = host1,thr = 01,elapsed = 1.314542,files = 10,records = 10,status = ok
host = host1,thr = 02,elapsed = 1.298001,files = 10,records = 10,status = ok
host = host1,thr = 03,elapsed = 1.308788,files = 10,records = 10,status = ok
host = host1,thr = 04,elapsed = 1.302692,files = 10,records = 10,status = ok
host = host1,thr = 05,elapsed = 1.300747,files = 10,records = 10,status = ok
host = host1,thr = 06,elapsed = 1.293828,files = 10,records = 10,status = ok
host = host1,thr = 07,elapsed = 1.299923,files = 10,records = 10,status = ok
total threads = 8
total files = 80
total IOPS = 80
total data = 0.000 GiB
WARNING: failed to get some responses from workload generators
100.00% of requested files processed, minimum is 90.00
elapsed time = 1.315
files/sec = 60.857695
IOPS = 60.857695
MiB/sec = 0.237725
And FYI, I have edited the ssh_thread.py script as follows for passwordless authentication:
ssh_prefix = 'ssh -i /var/lib/glusterd/nfs/secret.pem -x -o StrictHostKeyChecking=no '
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.