Coder Social home page Coder Social logo

holon's Introduction

A holon (Greek: ὅλον, holon neuter form of ὅλος, holos "whole") is something
that is simultaneously a whole and a part. The word was used by Arthur
Koestler in his book The Ghost in the Machine (1967, p. 48) and the phrase to
hólon is a Greek word preceding the Latin analogue universum, in the sense of
totality, a whole.[1]

holon's People

Contributors

akshatamahamuni avatar manishasalve avatar akshata0305 avatar ashwini-8 avatar vishkapare10 avatar paroscale-shirisha avatar 00pauln00 avatar ravan0407 avatar nikhilreddy-2 avatar

Stargazers

 avatar  avatar  avatar

Watchers

Kit Westneat avatar  avatar

holon's Issues

Basic leader election recipe failure.

  • Start peers for basic leader election.
  • commit-idx and last-applied should be 0, but its showing 1:
  • name: "Verify term value is same on all started peers"
    vars:
    stage: "stage0"
    raft_keys:
    - "/raft_root_entry/0/commit-idx"
    - "/raft_root_entry/0/last-applied"
    - "/raft_root_entry/0/last-applied-cumulative-crc"
    - "/raft_root_entry/0/newest-entry-crc"
    - "/raft_root_entry/0/term"
    - "/raft_root_entry/0/newest-entry-term"
    set_fact:
    raft_values="{{ lookup('niova_ctlrequest', 'lookup', nrunning_peers[item], raft_keys, wantlist=True) }}"
    failed_when: >
    (raft_values["/0/commit-idx"] != 0) or
    (raft_values["/0/last-applied"] != 0) or
    (raft_values["/0/last-applied-cumulative-crc"] != raft_values["/0/newest-entry-crc"]) or
    (raft_values["/0/term"] != raft_values["/0/newest-entry-term"])
    loop: "{{ range(0, nrunning_peers|length) | list }}"

Output:
TASK [Verify term value is same on all started peers] ********************************************************************************************************
failed: [localhost] (item=0) => {"ansible_facts": {"raft_values": {"/0/commit-idx": 1, "/0/last-applied": 1, "/0/last-applied-cumulative-crc": 2631123352, "/0/newest-entry-crc": 868987952, "/0/newest-entry-term": 524, "/0/term": 524}}, "ansible_loop_var": "item", "changed": false, "failed_when_result": true, "item": 0}
failed: [localhost] (item=1) => {"ansible_facts": {"raft_values": {"/0/commit-idx": 1, "/0/last-applied": 1, "/0/last-applied-cumulative-crc": 2631123352, "/0/newest-entry-crc": 868987952, "/0/newest-entry-term": 524, "/0/term": 524}}, "ansible_loop_var": "item", "changed": false, "failed_when_result": true, "item": 1}
failed: [localhost] (item=2) => {"ansible_facts": {"raft_values": {"/0/commit-idx": 1, "/0/last-applied": 1, "/0/last-applied-cumulative-crc": 2631123352, "/0/newest-entry-crc": 868987952, "/0/newest-entry-term": 524, "/0/term": 524}}, "ansible_loop_var": "item", "changed": false, "failed_when_result": true, "item": 2}

Auto package installs fail due to permissions error

Patch 55e546e seems to have introduced the following:

TASK [Install python modules if not installed previously] *****************************************************************************************************************
sockets
 is NOT installed
Installing : sockets

Collecting sockets
  Downloading https://files.pythonhosted.org/packages/cd/84/bd124c5d3c012de593c45c7f0208615c73493859bbd5389e1403e311d387/sockets-1.0.0-py3-none-any.whl
Installing collected packages: sockets
ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.8'
Consider using the `--user` option or check the permissions.

pathlib
 is NOT installed
Installing : pathlib

Collecting pathlib
  Using cached https://files.pythonhosted.org/packages/ac/aa/9b065a76b9af472437a0059f77e8f962fe350438b927cb80184c32f075eb/pathlib-1.0.1.tar.gz
Installing collected packages: pathlib
    Running setup.py install for pathlib ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-6jgd4jsq/pathlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-6jgd4jsq/pathlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ic7gqvwr/install-record.txt --single-version-externally-managed --compile
         cwd: /tmp/pip-install-6jgd4jsq/pathlib/
    Complete output (9 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib
    copying pathlib.py -> build/lib
    running install_lib
    creating /usr/local/lib/python3.8
    error: could not create '/usr/local/lib/python3.8': Permission denied
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-6jgd4jsq/pathlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-6jgd4jsq/pathlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ic7gqvwr/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

Holon issues a deprecate use warning

<module 'recipes.term_catch_up' from '/home/pauln/Code/holon/recipes/term_catch_up.py'>
./holon_framework.py:171: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn("%s" % r().name)

Add version number to ctl-interface output files

Here an example output file in holon:

/tmp/holon_recipes_run/a246e55c-c6a9-11ea-bdad-90324b2d1e89/ctl-interface/a24a95c6-c6a9-11ea-81ab-90324b2d1e89/output/current_time.a24cb50e-c6a9-11ea-b900-90324b2d1e89

At this time, the implementation of the ctlrequest allows the programmer to reuse the object resulting in a new version of the file which is written in place of the previous version. I believe it will become increasingly important to preserve all ctl-interface outputs. I'm proposing that the ctlrequest object contains a counter which increases each time an output file is created on its behalf and that this counter is placed at the end of the filename. This way at the end of a test, we'll have all of the output files, uniquely named and ordered.

Need a Holon option where post-run and process termination are skipped

Since Holon recipes are usually built from existing recipes, it would be useful for Holon to 'boot strap' the construction of new recipes. This could simply be done by requesting Holon to run the recipe which will be the parent for the new recipe and leaving the system intact. This should reduce the number of manual steps a developer is required to perform while implementing a new recipe.

User must be able to specify which raft backend to be used in recipe execution

We now have 2 raft backends, raft-server, and the new backend, pumicedb-server-test. These backends should be compatible with the existing set of recipes but that will diverge in coming recipes which rely on features specific to pumiceDB. We do want to maintain raft-server and raft-client as these are basic implementations which can be useful for baseline verification purposes and for determining if problems are specific to pumiceDB or the underlying raft code.

Please enable the specification of these backends in both holon and ansible such that the recipe execution may be directed at the desired backend.

Holon log seems not capture raft / pumicedb server output

I was debugging a case where the recipe was failing due to an early exit of the pumice_db server process. I ran the server on the cmd line and saw this:

$ NIOVA_LOG_LEVEL=2 NIOVA_LOCAL_CTL_SVC_DIR=/tmp/holon_recipes_run/87fe704e-0f05-11eb-bdd5-90324b2d1e89/configs ./pumicedb-server-test -r 87fe704e-0f05-11eb-bdd5-90324b2d1e89 -u 880eaefa-0f05-11eb-9f8d-90324b2d1e89
<12545.916231748:warn:pumicedb-server:env_parse@211> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon_recipes_run/87fe704e-0f05-11eb-bdd5-90324b2d1e89/configs applied from environment
<12545.922479499:error:pumicedb-server:rsbr_setup@837> rocksdb_open(): Invalid argument: Direct I/O is not supported by the specified DB.
<12545.922567855:error:pumicedb-server:raft_server_instance_startup@3601> B et=0 ei=-1 ht=0 hs=0 ci=-1:-1 v=00000000-0000-0000-0000-000000000000 l= raft_server_backend_setup(): Transport endpoint is not connected
<12545.922584326:warn:pumicedb-server:raft_net_instance_startup@958> ri_startup_pre_net_bind_cb(): Transport endpoint is not connected
<12545.922591640:warn:pumicedb-server:ev_pipe_cleanup@83> Operation not permitted
<12545.922599324:warn:pumicedb-server:ev_pipe_cleanup@83> Operation not permitted
<12545.922604885:warn:pumicedb-server:ev_pipe_cleanup@83> Operation not permitted
<12545.922610245:warn:pumicedb-server:ev_pipe_cleanup@83> Operation not permitted

=================================================================
==35910==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 67 byte(s) in 1 object(s) allocated from:
    #0 0x7f278ed893a7 in strdup (/lib64/libasan.so.6+0x5a3a7)
    #1 0x7f278e86d572  (/usr/local/lib/librocksdb.so.6.12+0x1e0572)
    #2 0x7f278e87b448 in rocksdb_open_column_families (/usr/local/lib/librocksdb.so.6.12+0x1ee448)
    #3 0x56bb82 in rsbr_setup src/raft_server_backend_rocksdb.c:818
    #4 0x4f0550 in raft_server_backend_setup src/raft_server.c:863
    #5 0x54cb6f in raft_server_instance_startup src/raft_server.c:3598
    #6 0x467811 in raft_net_instance_startup src/raft_net.c:955
    #7 0x550560 in raft_server_instance_run src/raft_server.c:3708
    #8 0x5837ea in PmdbExec src/pumice_db.c:913
    #9 0x40a187 in main test/pumice_db_test_server.c:351
    #10 0x7f278e134041 in __libc_start_main (/lib64/libc.so.6+0x27041)

Direct leak of 67 byte(s) in 1 object(s) allocated from:
    #0 0x7f278ed893a7 in strdup (/lib64/libasan.so.6+0x5a3a7)
    #1 0x7f278e86d572  (/usr/local/lib/librocksdb.so.6.12+0x1e0572)
    #2 0x7f278e87b448 in rocksdb_open_column_families (/usr/local/lib/librocksdb.so.6.12+0x1ee448)
    #3 0x56b96c in rsbr_setup src/raft_server_backend_rocksdb.c:805
    #4 0x4f0550 in raft_server_backend_setup src/raft_server.c:863
    #5 0x54cb6f in raft_server_instance_startup src/raft_server.c:3598
    #6 0x467811 in raft_net_instance_startup src/raft_net.c:955
    #7 0x550560 in raft_server_instance_run src/raft_server.c:3708
    #8 0x5837ea in PmdbExec src/pumice_db.c:913
    #9 0x40a187 in main test/pumice_db_test_server.c:351
    #10 0x7f278e134041 in __libc_start_main (/lib64/libc.so.6+0x27041)

SUMMARY: AddressSanitizer: 134 byte(s) leaked in 2 allocation(s).

It turns out that the /tmp/ on is a tmpfs and rocksDB does will not start there..

$ df /tmp
Filesystem     1K-blocks  Used Available Use% Mounted on
tmpfs            8042620  6776   8035844   1% /tmp

The issue is that the holon log didn't contain any of the above log lines:
87fe704e-0f05-11eb-bdd5-90324b2d1e89.log

It's possible that these lines were buffered and never flushed to the holon log file descriptor. In any case, it's critical that we capture such errors, otherwise, it will be almost impossible to debug the issues that Holon's recipes uncover.

Redundant or overly verbose information printed on the terminal at startup

pauln@localhost:~/Code/holon$ python3 ./holon.py -d -P /tmp term_catch_up
Holon Directory path: /tmp
Log file path /var/tmp/holon_ce57d1d0-abf9-11ea-a560-90324b2d1e89.log
Number of Servers: 5
Port no:6000
Client Port no:13000
Recipe: term_catch_up
The test root directory is: /tmp/ce57d1d0-abf9-11ea-a560-90324b2d1e89
The log file path is: /var/tmp/holon_ce57d1d0-abf9-11ea-a560-90324b2d1e89.log

This information is useful but most all of it should be placed in the log file only. It should be enough for holon to print only the log file path to the terminal:
log-file: /var/tmp/holon_ce57d1d0-abf9-11ea-a560-90324b2d1e89.log

The rest of these items can be printed directly to the log file.

dry-run option should not print out verbose recipe ancestry info

Please replace the following with the list of ancestors starting with this recipe's parent.

python3 holon_framework.py -d -s /tmp/holon -n /tmp/holon/inotify -i /tmp/holon/init/ -r term_ticker

...
Recipe: term_ticker
Basic Control interface recipe
1. To verify the idleness of the process.
2. Verify process can be activated by exiting the idleness.
3. Once process is active, verify it's timestamp progresses.

Basic Process control recipe
1. Pause and resume the server in a loop.
2. Make sure current time does not progress for the server during pause
and resume cycle.
3. Resume the process and verify time stamp progresses normally.

Term Ticker Recipe
1. Verify term increases in each iteration.
2. Restart the server process and make sure term value persists
across reboot.

to the set listed in a comma delimited list:
Ancestors: Basic Process control, Basic Control interface

term_catch_up reports file mv failure but continues to proceed.

The console output below reports the error but the test proceeds. Is the recipe tolerant to this error or should holon abort?

mv: cannot move '/tmp/get_term.1786e3b2-a5c1-11ea-b638-90324b2d1e89' to '/tmp/holon/inotify/e1ba4abc-a5c0-11ea-93b7-90324b2d1e89/input/get_term.1786e3b2-a5c1-11ea-b638-90324b2d1e89': No such file or directory

pauln@localhost:~/Code/holon$ python3 holon_framework.py  -s /tmp/holon -n /tmp/holon/inotify -i /tmp/holon/init/ -r term_catch_up
Server conf path: /tmp/holon
Inotify path: /tmp/holon/inotify
Init directory path: /tmp/holon/init/
Log file path /tmp/holon_recipe.log
Number of Servers: 5
Port no:6000
Client Port no:13000
Recipe: term_catch_up
<module 'term_catch_up' from '/home/pauln/Code/holon/term_catch_up.py'>
holon_framework.py:148: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn("%s" % r().name)
<41974.519132237:warn:raft-server:env_parse@194> env-var NIOVA_CTL_INTERFACE_INIT_PATH value /tmp/holon/init/ applied from environment
<41974.519183895:warn:raft-server:env_parse@194> env-var NIOVA_INOTIFY_BASE_PATH value /tmp/holon/inotify applied from environment
<41974.519195436:warn:raft-server:env_parse@194> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon applied from environment
basic_ctl_int ========================== OK
basic_process_ctl ========================== OK
Term value before Restart of the server is 25
<42058.750329152:warn:raft-server:env_parse@194> env-var NIOVA_CTL_INTERFACE_INIT_PATH value /tmp/holon/init/ applied from environment
<42058.750438608:warn:raft-server:env_parse@194> env-var NIOVA_INOTIFY_BASE_PATH value /tmp/holon/inotify applied from environment
<42058.750446723:warn:raft-server:env_parse@194> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon applied from environment
term_ticker ========================== OK
mv: cannot move '/tmp/get_term.1786e3b2-a5c1-11ea-b638-90324b2d1e89' to '/tmp/holon/inotify/e1ba4abc-a5c0-11ea-93b7-90324b2d1e89/input/get_term.1786e3b2-a5c1-11ea-b638-90324b2d1e89': No such file or directory <---------------------------------
<42064.761149992:warn:raft-server:env_parse@194> env-var NIOVA_CTL_INTERFACE_INIT_PATH value /tmp/holon/init/ applied from environment
<42064.761187743:warn:raft-server:env_parse@194> env-var NIOVA_INOTIFY_BASE_PATH value /tmp/holon/inotify applied from environment
<42064.761199405:warn:raft-server:env_parse@194> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon applied from environment
term_catch_up ========================== OK

"Json output file is not getting generated on apply fault injection" in pmdb_client_request_timeout_modification_and_retry recipe.

{
		"name" : "raft_leader_may_be_deposed",
		"enabled" : false,
		"file" : "src/raft_server.c",
		"function" : "raft_leader_instance_is_fresh",
		"line_number" : 2694,
		"when_to_inject" : "every-time",
		"last_injected_at" : "Thu Jan 01 00:00:00 UTC 1970",
		"last_bypassed_at" : "Thu Jan 01 00:00:00 UTC 1970",
		"injection_count" : 0,
		"frequency_seconds" : 0,
		"num_remaining" : 0,
		"cond_exec_count" : 403
	},

ctl-interface path: /home/makshata/recipe_test/dac1d378-1795-11eb-971f-e74b1203205f/ctl-interface/dadd8500-1795-11eb-8b67-cb78f5868401/input/pmdb_client_request_timeout_modification_and_retry-enable_fault_injection_raft_leader_may_be_deposed.425bd10a-1796-11eb-9da4-5ba178f40fd

Holon does not exit when encountering a startup error from the raft-server

Server conf path: /tmp/holon
Inotify path: /tmp/holon/inotify
Init directory path: /tmp/holon/init/
Log file path /tmp/holon_recipe.log
Number of Servers: 5
Port no:6000
Client Port no:13000
Recipe: term_ticker
<module 'term_ticker' from '/home/pauln/Code/holon/term_ticker.py'>
holon_framework.py:148: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn("%s" % r().name)
<40089.980626457:warn:raft-server:env_parse@194> env-var NIOVA_CTL_INTERFACE_INIT_PATH value /tmp/holon/init/ applied from environment
<40089.980662274:warn:raft-server:env_parse@194> env-var NIOVA_INOTIFY_BASE_PATH value /tmp/holon/inotify applied from environment
<40089.980672784:warn:raft-server:env_parse@194> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon applied from environment
<40089.983842460:error:raft-server:udp_socket_bind@91> bind(): Address already in use
<40089.983877696:warn:raft-server:raft_net_instance_startup@545> raft_net_udp_sockets_bind(): Address already in use <------------------------
<40089.983905849:warn:raft-server:ev_pipe_cleanup@80> Operation not permitted
<40089.983912962:warn:raft-server:ev_pipe_cleanup@80> Operation not permitted

At this point holon, should exit but it stays running.

Holon should auto-create the necessary directories

For ease of use, Holon should reduce the number of parameters and steps required by the user. The server-config-path does have a default, which is good. However, if the other directories are not present then they should be made on the in /tmp or /var/tmp. The same goes for directories which hold the log and json outputs.

pauln@localhost:~/Code/holon$ python3 holon_framework.py -d
Server config path (/etc/holon/raftconf/) does not exist

pauln@localhost:~/Code/holon$ python3 holon_framework.py -d -s /tmp
Inotify path (/tmp/inotify/) does not exist

pauln@localhost:~/Code/holon$ python3 holon_framework.py -d -s /tmp -n /tmp
Init path (/tmp/init/) does not exist

Ansible recipes should be in their own directory

Currently, we place all yaml files into the the ansible/ directory. This makes it difficult to identify which yaml files are for recipes and which are for other purposes. Please place the recipes files into a folder such as ansible/recipes/.

Error reporting in genericcmd.py::move_file

The following error msg shows a general numeric error (1) as opposed for the reason which the mv failed (ie Directory does not exist, Permission Denied

2020-06-03 13:22:42,612 Move file /tmp/get_term.d51d2c18-a5be-11ea-af23-90324b2d1e89 to /tmp/holon/inotify/9f5502f4-a5be-11ea-8496-90324b2d1e89/input/get_term.d51d2c18-a5be-11ea-af23-90324b2d1e89 failed with error: 1

There are several solutions to this issue. First, is to grab the stderr from the process and print that in the error msg:

def get_shell_script_output_using_communicate():
    session = subprocess.Popen('ls', stdout=PIPE, stderr=PIPE)
    stdout, stderr = session.communicate()
    if stderr:
        raise Exception("Error "+str(stderr))
    return stdout.decode('utf-8')

Next, the shutil.move() can be used instead of using a subprocess. In this case, the os.strerror() can be used as it is in the 'remove_file()' method.

Ctl-interface input and output files should contain recipe name and possibly the stage / task name from Ansible

When debugging Issue #60 I can see that it would be helpful to know which recipe and stage issued the ctl-interface requests.

pauln@groot:/home/manisha/tmp/d15299fa-f29f-11ea-8e65-4350fcdb3d1c/ctl-interface/d1637568-f29f-11ea-b9e3-7f999d700967/input$ ls -lrt ../output/
total 156
-rw-r--r-- 1 manisha manisha 7094 Sep  9 13:24 get_all.d20f7070-f29f-11ea-9cfc-cb92e0da30b8
-rw-r--r-- 1 manisha manisha    3 Sep  9 13:24 idle_off.d21cf60a-f29f-11ea-ab73-832dacf0e98c
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d227d8ea-f29f-11ea-bebe-3fd1e0869705
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d35bb812-f29f-11ea-8431-cbb66d5fd717
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d48f21f6-f29f-11ea-9ac0-47aea23ce088
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d5c27e1a-f29f-11ea-bbe4-9b88b5355c00
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d6f5cfc6-f29f-11ea-b9e8-a75a89938f68
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.d8c3dff0-f29f-11ea-8a65-1bad357b36a3
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:24 current_time.dbc47c46-f29f-11ea-a669-af9f7da86fc9
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.dec44ea8-f29f-11ea-97dd-7f8c817ce0da
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e1c3ba26-f29f-11ea-89a6-47332fdca618
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e4c3b8d4-f29f-11ea-8282-cf2df8b32ed4
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e4cd58c6-f29f-11ea-bb14-3f8969d8050d
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e60105bc-f29f-11ea-9378-33c912b22c5a
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e734bd70-f29f-11ea-b2d6-970a30e7cd72
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e8681728-f29f-11ea-8e75-17c5635f919e
-rw-r--r-- 1 manisha manisha   76 Sep  9 13:25 current_time.e99bbdde-f29f-11ea-87cd-8b32daae5d37
-rw-r--r-- 1 manisha manisha 9928 Sep  9 13:25 get_all.eb6ad528-f29f-11ea-89de-8774099702d7
-rw-r--r-- 1 manisha manisha 9930 Sep  9 13:25 get_all.ed3a504a-f29f-11ea-ba6b-d70e93247854
-rw-r--r-- 1 manisha manisha 9931 Sep  9 13:25 get_all.ef09b244-f29f-11ea-8f87-932875a82cff
-rw-r--r-- 1 manisha manisha 9931 Sep  9 13:25 get_all.f0d87bb4-f29f-11ea-9e67-afa922edb12b
-rw-r--r-- 1 manisha manisha 9933 Sep  9 13:25 get_all.f2a9bbce-f29f-11ea-ba75-33389320c06e
-rw-r--r-- 1 manisha manisha 9933 Sep  9 13:25 get_all.f49a4b7e-f29f-11ea-840d-cfa38fd21fb1
-rw-r--r-- 1 manisha manisha    3 Sep  9 13:25 idle_on.d1f89562-f29f-11ea-8b8e-abc0480bfe31
-rw-r--r-- 1 manisha manisha 6979 Sep  9 13:25 get_all.f8115fcc-f29f-11ea-9a1e-871d8043069f```

Stderr and Stdout of Raft should go into the holon log file

Now that holon has a log file, the output of the raft process should also go there.

<40450.348737059:warn:raft-server:env_parse@194> env-var NIOVA_CTL_INTERFACE_INIT_PATH value /tmp/holon/init/ applied from environment
<40450.348770713:warn:raft-server:env_parse@194> env-var NIOVA_INOTIFY_BASE_PATH value /tmp/holon/inotify applied from environment
<40450.348779409:warn:raft-server:env_parse@194> env-var NIOVA_LOCAL_CTL_SVC_DIR value /tmp/holon applied from environment

Running holon.py without arguments throws a python error

pauln@localhost:~/Code/holon$ python3 ./holon.py
Traceback (most recent call last):
File "./holon.py", line 75, in
print(f"Holon directory (%s) does not exist" % server_conf_path)
NameError: name 'server_conf_path' is not defined

There are a few minor fixes needed here:

  1. Running holon with no arguments should cause the help msg to print
  2. server_conf_path should have some default value (/tmp or /var/tmp) so that python doesn't complain about accessing a undefined variable and so that the user doesn't have to specify -P if she doesn't wish.

Basic leader election fails due to ignore_timer_events being true

2020-06-09 12:48:01,407 APPLY cmd=get_all ipath=/tmp//b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b197ada2-aa70-11ea-b3e8-90324b2d1e89/input/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89
2020-06-09 12:48:01,410 APPLY cmd=get_all ipath=/tmp//b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b1982c82-aa70-11ea-9545-90324b2d1e89/input/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89
2020-06-09 12:48:01,413 APPLY cmd=get_all ipath=/tmp//b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b1989cb2-aa70-11ea-adc4-90324b2d1e89/input/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89
2020-06-09 12:51:21,955 Commit idx is not 0 for peer 0
2020-06-09 12:51:21,956 Basic leader election recipe failed
2020-06-09 12:51:21,956 Error: Terminating recipe hierarchy execution
       "raft_root_entry" : [
                {
                        "raft-uuid" : "b19744c0-aa70-11ea-ac25-90324b2d1e89",
                        "peer-uuid" : "b1989cb2-aa70-11ea-adc4-90324b2d1e89",
                        "voted-for-uuid" : "00000000-0000-0000-0000-000000000000",
                        "leader-uuid" : "",
                        "state" : "follower",
                        "follower-reason" : "leader-already-present",
                        "client-requests" : "redirect-to-leader",
                        "term" : 0,
                        "commit-idx" : -1,
                        "last-applied" : -1,
                        "last-applied-cumulative-crc" : 0,
                        "newest-entry-idx" : -1,
                        "newest-entry-term" : 0,
                        "newest-entry-data-size" : 0,
                        "newest-entry-crc" : 0,
                        "dev-read-latency-usec" : {},
                        "dev-write-latency-usec" : {}
                }
        ],
        "raft_net_info" : {
                "ignore_timer_events" : true
        },
pauln@localhost:/tmp$ find b19744c0-aa70-11ea-ac25-90324b2d1e89|grep get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89 |grep output | xargs grep ignore_timer_events
b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b1989cb2-aa70-11ea-adc4-90324b2d1e89/output/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89:		"ignore_timer_events" : true
b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b1982c82-aa70-11ea-9545-90324b2d1e89/output/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89:		"ignore_timer_events" : true
b19744c0-aa70-11ea-ac25-90324b2d1e89/inotify/b197ada2-aa70-11ea-b3e8-90324b2d1e89/output/get_all.fb1a196a-aa70-11ea-ae3b-90324b2d1e89:		"ignore_timer_events" : true

holon.log

Dry Run should run through the recipe chain, logging the steps it would do without doing them

I missed this before. There are 2 problems here. First, this is code duplicated on line 166 - we always want to avoid this when possible! Secondly, now that we have -print-ancestry, it is not necessary for -dry-run to print the exact same message. However, -dry-run should go through the motions of executing the recipe chain without actually performing the steps. Instead the steps should be printed out in the log file. I realize this is a large change so for this branch, please address the 1st item and we'll save the second item for a later time.

Originally posted by @00pauln00 in #38

Debug verbosity should be optional

Currently, holon issues a lot of debugging msgs to the terminal. Such debug msgs should be fine within the log files, however, they should be optional for the terminal.

Here's just one example of the verbosity:

ok: [localhost] => {
    "msg": {
        "/0/commit-idx": -1,
        "/0/last-applied": -1,
        "/0/last-applied-cumulative-crc": 0,
        "/0/leader-uuid": "null",
        "/raft_net_info/ignore_timer_events": true
    }
}

TASK [Activate Raft timer thread] ***********************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": {
        "cmd": "ignore_timer_events@false",
        "error": 0,
        "input_fpath": "/var/tmp/holon_recipes_run/adcfe41c-0fc6-11eb-a033-90324b2d1e89/ctl-interface/ade147fc-0fc6-11eb-b653-90324b2d1e89/input/ignore_timer_eventsatfalse.af6d190c-0fc6-11eb-90d9-90324b2d1e89",
        "operation": "apply_cmd",
        "output_fpath": "/var/tmp/holon_recipes_run/adcfe41c-0fc6-11eb-a033-90324b2d1e89/ctl-interface/ade147fc-0fc6-11eb-b653-90324b2d1e89/output/ignore_timer_eventsatfalse.af6d190c-0fc6-11eb-90d9-90324b2d1e89",
        "peer_uuid": "ade147fc-0fc6-11eb-b653-90324b2d1e89",
        "where": "/raft_net_info/ignore_timer_events"
    }
}

Ansible does give the option of setting a verbosity value per debug msg:

- name: Display all variables/facts known for a host
  debug:
    var: hostvars[inventory_hostname]
    verbosity: 4```

This verbosity value should be set on any log msg which exists purely for developer debugging.

holon leaves behind file and directory artifacts without specifying "-D"

Holon should clean up the base path if -D wasn't used.

pauln@localhost:/tmp/c77334e0-aa64-11ea-b8df-90324b2d1e89$ find .
.
./init
./inotify
./inotify/c7741770-aa64-11ea-ba87-90324b2d1e89
./inotify/c7741770-aa64-11ea-ba87-90324b2d1e89/output
./inotify/c7741770-aa64-11ea-ba87-90324b2d1e89/output/idle_on.c776dbfe-aa64-11ea-abfe-90324b2d1e89
./inotify/c7741770-aa64-11ea-ba87-90324b2d1e89/init
./inotify/c7741770-aa64-11ea-ba87-90324b2d1e89/input
./inotify/c77388be-aa64-11ea-8ada-90324b2d1e89
./inotify/c77388be-aa64-11ea-8ada-90324b2d1e89/output
./inotify/c77388be-aa64-11ea-8ada-90324b2d1e89/input
./inotify/c77388be-aa64-11ea-8ada-90324b2d1e89/init
./raftdb
./raftdb/c7741770-aa64-11ea-ba87-90324b2d1e89.raftdb
./raftdb/c77388be-aa64-11ea-8ada-90324b2d1e89.raftdb
./configs
./configs/c774fcda-aa64-11ea-b893-90324b2d1e89.peer
./configs/c774aea6-aa64-11ea-bc8a-90324b2d1e89.peer
./configs/c7746c98-aa64-11ea-b72d-90324b2d1e89.peer
./configs/c7741770-aa64-11ea-ba87-90324b2d1e89.peer
./configs/c77388be-aa64-11ea-8ada-90324b2d1e89.peer
./configs/c77334e0-aa64-11ea-b8df-90324b2d1e89.raft```

Calling ctl_req_create_cmdfile_and_copy() in CtlRequest constructor is problematic

ctl_req_create_cmdfile_and_copy(self)

Intuitively speaking, the CtlRequest constructor should create a new object. Applying the cmdfile in the constructor context seems to be a scope violation. Furthermore, the return code of the ctl_req_create_cmdfile_and_copy() operation isn't checked when used inside the constructor and this could be considered a clear problem.

I think a slightly different approach should be taken using private methods which allows for the separation of object creation and the application of the cmdfile. In this below example, a new method, called "Apply", has been added. This method lives inside the class so it can be called like: object.Apply() instead of using the procedural coding form of function(parameter): ctl_req_create_cmdfile_and_copy(curr_time_ctl).

@@ -68,8 +68,10 @@ class Recipe(HolonRecipeBase):
         Creating cmd file to get all the JSON output from the server.
         Will verify parameters from server JASON output to check the idleness
         '''
-        get_all_ctl = CtlRequest(inotifyobj, "get_all", peer_uuid, app_uuid)
-
+        get_all_ctl = CtlRequest(inotifyobj, "get_all", peer_uuid, app_uuid).Apply()
+        if get_all_ctl.Error() != 0:
+            logging.error("CtlRequest() error %d", get_all_ctl.myerror())
+            recipe_failed = 1
 
         # append the get_all_ctl object into recipe's ctl_req list.
         self.recipe_ctl_req_obj_list.append(get_all_ctl)
@@ -59,12 +65,21 @@ class CtlRequest:
             self.input_fpath = inotifyobj.prepare_input_output_path(peer_uuid,
                                                                     cmd, True,
                                                                     app_uuid)
-            
+
         self.output_fpath = inotifyobj.prepare_input_output_path(peer_uuid,
                                                                 cmd, False,
                                                                 app_uuid)
+        self.error = 0
         # Copy the cmd file into input directory
-        ctl_req_create_cmdfile_and_copy(self)
+        #ctl_req_create_cmdfile_and_copy(self)
+
+    def Apply(self):
+        logging.warning("APPLY cmd=%s ipath=%s", self.cmd, self.input_fpath)
+        self.error = ctl_req_create_cmdfile_and_copy(self)
+        return self
+
+    def Error(self):
+        return self.error
 
     def delete_files(self):
         genericcmdobj = GenericCmds()

Need a new mode, similar to dry-run, which just prints the ancestry but does not generate configs or log

The dry-run mode is very useful because it generates configs which would have been used in the test - this is good for debugging. However, if a user merely would like to print the ancestry of a recipe, then dry-run is too heavy weight.

Please add a mode --print-ancestry where holon just prints the ancestor list and then exits.

The cmd and output should look like:

$ ./holon.py --print-ancestry basic_leader_election
Ancestors: term_catch_up, term_ticker, basic_process_ctl, basic_ctl_int

Command line argument checking is needed

When I run holon with no recipe the program aborts. Additionally, there's no option to print out the set of recipes which are available to me. If the recipe is not provided, perhaps we should print a list of available recipes?

pauln@localhost:~/Code/holon$ python3 holon_framework.py -d -s /tmp/holon -n /tmp/holon/inotify -i /tmp/holon/init/
Traceback (most recent call last):
  File "holon_framework.py", line 87, in <module>
    if recipe_name == "":
NameError: name 'recipe_name' is not defined

When no recipe arg is provided, holon should print the help msg and that msg should show a template for the cmd line parameters such as:

holon [options] RECIPE_NAME
with the [options] listed below.

Here's a good example:

$ python prog.py --help
usage: prog.py [-h] [-v | -q] x y

calculate X to the power of Y

positional arguments:
  x              the base
  y              the exponent

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose
  -q, --quiet

pmdb_client_error_demonstration2 recipe failure during leader parameter validation.

  1. Set leader election timeout to 30sec.
  2. Lower client request timeout to 3sec.
  3. Write object with valid seq0.
  4. Pause number of followers to break the quorum.
  5. Read til the request fails.
  6. Get leader values from the same leader which was there at the start of the recipe.

Result:
Leader state changes to "follower" and client-requests = redirect-to-leader

Expected result:
Leader remains leader and client-requests = "deny-may-be-deposed"

Install the python modules used by framework/recipes.

We should maintain the list of python modules which needs to be installed for running the recipes in some file.
There should be a routine in holon.py to iterate over this file and install the python modules before starting to execute actual recipes.

NIOVA processes managed by Holon should have their output labeled in the holon.log file

Below is an example where the log outputs from niova-raft processes are redirected into the holon log file. However, one cannot tell which server process has issued which log entries.

$ less /var/tmp/holon_39321f78-b709-11ea-8594-90324b2d1e89.log

...
<155683.363873462:debug:raft-server:udp_socket_send@250> 127.0.0.1:6000 rc=104 total=(104:104)
<155683.363890374:debug:raft-server:raft_server_send_msg@1150> AE_REQ t=38 lt=38 ci=0 pl=38:0 sz=0 hb=1 lcm=0 crc=458785512:458785512 3933caf8-b709-11ea-b4eb-90324b2d1e89 
<155683.363898258:debug:raft-server:udp_socket_recv@168> src=127.0.0.1:6001 nb=104 flags=0
<155683.363920330:debug:raft-server:raft_server_refresh_follower_prev_log_term@1742> L et=38 ei=0 ht=38 hs=16 ci=0:0 v=3933caf8-b709-11ea-b4eb-90324b2d1e89 l= peer=2 refresh=0:0 pti=38:1 ct=38 ccrc=458785512
<155683.363925980:debug:raft-server:raft_net_udp_cb@930> F et=38 ei=0 ht=38 hs=39 ci=0:0 v=3933caf8-b709-11ea-b4eb-90324b2d1e89 l=3933caf8-b709-11ea-b4eb-90324b2d1e89 fd=9 type=0 rc=104
<155683.363935228:debug:raft-server:raft_server_append_entry_sender@3092> csn@0x612000000f40 r 39342a98-b709-11ea-8e6b-90324b2d1e89 ref=2 store=/tmp/holon_recipes_run/39321f78-b709-11ea-8594-90324b2d1e89/raftdb/39342a98-b709-11ea-8e6b-90324b2d1e89.raftdb idx=2 pli=0 lt=38
<155683.363944595:debug:raft-server:raft_server_udp_peer_recv_handler@2623> AE_REQ t=38 lt=38 ci=0 pl=38:0 sz=0 hb=1 lcm=0 crc=458785512:458785512 3933caf8-b709-11ea-b4eb-90324b2d1e89 msg-size=(104) peer 127.0.0.1:6001
<155683.363970504:debug:raft-server:raft_server_process_append_entries_request@2266> AE_REQ t=38 lt=38 ci=0 pl=38:0 sz=0 hb=1 lcm=0 crc=458785512:458785512 3933caf8-b709-11ea-b4eb-90324b2d1e89 
<155683.363973199:debug:raft-server:udp_socket_send@250> 127.0.0.1:6002 rc=104 total=(104:104)
<155683.363986825:debug:raft-server:raft_server_send_msg@1150> AE_REQ t=38 lt=38 ci=0 pl=38:0 sz=0 hb=1 lcm=0 crc=458785512:458785512 3933caf8-b709-11ea-b4eb-90324b2d1e89 

Please use the internal holon index to prefix each niova log output line like so:
<niova-process-type>.<idx>

For raft this would look like raft.N, where 'N' is the peer idx number.

Incorrect error msg when recipe is not specified on the cmd line

pauln@localhost:~/Code/holon$ python3 ./holon.py -P /tmp
Error: Invalid recipe name passed
Select from valid recipes:
basic_process_ctl
basic_ctl_int
term_ticker
basic_leader_election
term_catch_up

In this case, no recipe was passed at all and the help msg should be printed since the cmd line usage was incorrect.

Holon working directory uniqueness should be done for the user

When I run the below command lots of times, I see config files and inotify objects from the previous runs.

python3 holon_framework.py -s /tmp/holon -n /tmp/holon/inotify -i /tmp/holon/init/ -r term_catch_up

I realize this is partially my own fault for not specifying a -s option like, /tmp/holon/$(uuid). However, I also realize that due to a simple mistake in combination with laziness that 2 or more Holon instances may end up running from the same directory and I believe we need to avoid this because it leads to unexplained behaviors which are difficult to diagnose.

The directory specified by -s should be considered a 'root' directory and another directory for the current test, which is named after the UUID of raft or some other UUID, holds the components which are currently held in /tmp/holon. This also makes cleanup easy - one can safely remove /tmp/holon/<UUID>/* without worrying about other instances which may be running from /tmp/holon

please replace the 'shared_init' parameter with an enum. (see this page for more details on enums:

please replace the 'shared_init' parameter with an enum. (see this page for more details on enums:
https://docs.python.org/3/library/enum.html). The enum should have at least 3 members - one for each type of ctl-interface request entry:

  • shared-init
  • private-init
  • regular

The prepare_init_path() method should also take this enum (in place of shared_init) and perform the path initialization accordingly. This way the if stmt on line 62 can be removed.

Originally posted by @00pauln00 in https://github.com/00pauln00/holon/pull/26/files

Default values are not set prior to use

TASK [Check if ports are already in use] **********************************************************************************************************************************
fatal: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {'startup_check': 'port_check', 'srv_port': '{{ srv_port }}', 'client_port': '{{ client_port }}'}: 'srv_port' is undefined\n\nThe error appears to be in '/home/pauln/Code/holon/ansible/holon.yml': line 13, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n  - name: \"Check if ports are already in use\"\n    ^ here\n"}

This patch seems to address the issue:

diff --git a/ansible/holon.yml b/ansible/holon.yml
index e3c0063..a57a9b9 100644
--- a/ansible/holon.yml
+++ b/ansible/holon.yml
@@ -10,6 +10,19 @@
     shell: "/usr/bin/uuid"
     register: cluster_uuid
 
+  - name: "Default Values"
+    set_fact:
+      dir_path: "{{ dir_path | default('/tmp/holon_recipes_run') }}"
+      backend_type: "{{ backend_type | default('raft')}}"
+      srv_port: "{{ srv_port | default(6000) }}"
+      npeers: "{{ npeers | default(5) }}"
+      client_port: "{{ client_port | default(13000) }}"
+      disable_post_run: "{{ disable_post_run | default(false) | bool }}"
+    failed_when: >
+      ((srv_port | int) >= 65536) or
+      ((client_port | int) >= 65536) or
+      ((backend_type != "raft") and (backend_type != "pumicedb"))
+
   - name: "Check if ports are already in use"
     vars:
       config_params:
@@ -26,19 +39,6 @@
     set_fact:
       installation: "{{ lookup('holon_startup', config_params, wantlist=True) }}"
 
-  - name: "Default Values"
-    set_fact:
-      dir_path: "{{ dir_path | default('/tmp/holon_recipes_run') }}"
-      backend_type: "{{ backend_type | default('raft')}}"
-      srv_port: "{{ srv_port | default(6000) }}"
-      npeers: "{{ npeers | default(5) }}"
-      client_port: "{{ client_port | default(13000) }}"
-      disable_post_run: "{{ disable_post_run | default(false) | bool }}"
-    failed_when: >
-      ((srv_port | int) >= 65536) or
-      ((client_port | int) >= 65536) or
-      ((backend_type != "raft") and (backend_type != "pumicedb"))
-
   - name: "Prepare parameter to pass across recipes"
     set_fact:
        raft_param:

Need an option to print the recipe description

In an earlier issue, it was requested that the description info was not printed out in the dry-run option. We definitely want to expose the descriptions to users, however, this should be done through an explicit cmd line option --print-desc. Note that the request here is for a "long form" command line option, which is different from the short form type (-v, -P, etc.) being used up to this point. The --print-desc option should also print the recipe's parent name.

Incorrect contents in the raft JSON output file.

This issue occurs randomly while running any recipe.
This gets reproduce even during basic_ctl_int which is the root recipe.

On copying the get_all cmd to server, the output file shows:
ctl_svc_nodes information and do not have raft_root_entry info for the server.

pmdb_client_error_demonstration1.yml is running raft_[server|client] instead of pumicedb-[client|server]-test

It proceeds fine until it attempts to check the ctl-interface output of the client which is incorrect since the raft_client and pumicedb-client-test have different json objects.

Holon log:
server-ctl-interface-data.zip

Server ctl-interface data:
882071e6-0efe-11eb-8a2e-90324b2d1e89.log

Client ctl-interface data:
e23cfaf0-0efe-11eb-8122-90324b2d1e89.zip

TASK [Verify the parameters for client.] ******************************************************************************************************************************
fatal: [localhost]: FAILED! => {"ansible_facts": {"client_values": {"/0/last-request-ack": "null", "/0/last-request-sent": "null", "/0/leader-uuid": "null", "/0/leader-viable": "null", "/0/raft-uuid": "null", "/0/state": "null"}}, "changed": false, "failed_when_result": true}

PLAY RECAP ************************************************************************************************************************************************************
localhost                  : ok=179  changed=13   unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.