Coder Social home page Coder Social logo

sonic-net / sonic-swss-common Goto Github PK

View Code? Open in Web Editor NEW
43.0 66.0 238.0 1.7 MB

Common components for SONiC switch state service

License: Other

Makefile 1.08% Shell 0.44% C++ 86.69% M4 1.29% Lua 1.15% Python 4.93% C 0.61% SWIG 1.25% Objective-C 2.10% Dockerfile 0.10% Starlark 0.28% Go 0.09%

sonic-swss-common's Introduction

static analysis:

Total alerts Language grade: Python Language grade: C/C++

sonic-swss-common builds:

master build 202205 build 202111 build 202106 build 202012 build 201911 build

SONiC - SWitch State Service Common Library - SWSS-COMMON

Description

The SWitch State Service (SWSS) common library provides libraries for database communications, netlink wrappers, and other functions needed by SWSS.

Getting Started

Build from Source

Checkout the source:

git clone --recursive https://github.com/sonic-net/sonic-swss-common

Install build dependencies:

sudo apt-get install make libtool m4 autoconf dh-exec debhelper cmake pkg-config \
                     libhiredis-dev libnl-3-dev libnl-genl-3-dev libnl-route-3-dev \
                     libnl-nf-3-dev swig3.0 libpython2.7-dev libpython3-dev \
                     libgtest-dev libgmock-dev libboost-dev

Build and Install Google Test and Mock from DEB source packages:

cd /usr/src/gtest && sudo cmake . && sudo make

You can compile and install from source using:

./autogen.sh
./configure
make && sudo make install

You can also build a debian package using:

./autogen.sh
./configure
dpkg-buildpackage -us -uc -b

Build with Google Test

  1. Rebuild with Google Test
$ ./autogen.sh
$ ./configure --enable-debug 'CXXFLAGS=-O0 -g'
$ make clean
$ GCC_COLORS=1 make
  1. Start redis server if not yet:
sudo sed -i 's/notify-keyspace-events ""/notify-keyspace-events AKE/' /etc/redis/redis.conf
sudo service redis-server start
  1. Run unit test:
tests/tests

Need Help?

For general questions, setup help, or troubleshooting:

For bug reports or feature requests, please open an Issue.

Contribution guide

Please read the contributors guide for information about how to contribute.

All contributors must sign an Individual Contributor License Agreement (ICLA) before contributions can be accepted. This process is managed by the Linux Foundation - EasyCLA and automated via a GitHub bot. If the contributor has not yet signed a CLA, the bot will create a comment on the pull request containing a link to electronically sign the CLA.

GitHub Workflow

We're following basic GitHub Flow. If you have no idea what we're talking about, check out GitHub's official guide. Note that merge is only performed by the repository maintainer.

Guide for performing commits:

  • Isolate each commit to one component/bugfix/issue/feature
  • Use a standard commit message format:
[component/folder touched]: Description intent of your changes

[List of changes]

Signed-off-by: Your Name [email protected]

For example:

swss-common: Stabilize the ConsumerTable

* Fixing autoreconf
* Fixing unit-tests by adding checkers and initialize the DB before start
* Adding the ability to select from multiple channels
* Health-Monitor - The idea of the patch is that if something went wrong with the notification channel,
  we will have the option to know about it (Query the LLEN table length).

  Signed-off-by: [email protected]
  • Each developer should fork this repository and add the team as a Contributor
  • Push your changes to your private fork and do "pull-request" to this repository
  • Use a pull request to do code review
  • Use issues to keep track of what is going on

sonic-swss-common's People

Contributors

bocon13 avatar daall avatar dzhangalibaba avatar eladraz avatar ganglyu avatar jimmyzhai avatar jipanyang avatar jleveque avatar judyjoseph avatar junchao-mellanox avatar kcudnik avatar lguohan avatar liuh-80 avatar liushilongbuaa avatar marian-pritsak avatar mint570 avatar oleksandrivantsiv avatar pavel-shirshov avatar prsunny avatar pterosaur avatar qiluo-msft avatar renukamanavalan avatar saiarcot895 avatar sihuihan88 avatar stcheng avatar stepanblyschak avatar stephenxs avatar wendani avatar xumia avatar zbud-msft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sonic-swss-common's Issues

DVS runtime error

When I used the following command to run a DVS test for branch 202205, I got a lot of Python exceptions.
sudo pytest -sv --imgname docker-sonic-vs-202205 --html=report.html --self-contained-html

The exceptions all look like:

test_drop_counters.py::TestDropCounters::test_deviceCapabilitiesTablePopulated remove extra link dummy
remove extra link Vlan100@Bridge
Exception ignored in: <bound method ApplDbValidator.__del__ of <conftest.ApplDbValidator object at 0x7f5e29cd1278>>
Traceback (most recent call last):
  File "/home/autotest/wayne/202205/tests/conftest.py", line 165, in __del__
    neighbors = self.get_keys(self.NEIGH_TABLE)
  File "/home/autotest/wayne/202205/tests/dvslib/dvs_database.py", line 115, in get_keys
    table = swsscommon.Table(self.db_connection, table_name)
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2515, in __init__
    this = _swsscommon.new_Table(*args)
RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address

After we call RedisReply::release(),function RedisReply::~RedisReply() may generate a core in redis version 3.2.11

In function RedisReply::release(), we set m_reply = NULL;
In function RedisReply::~RedisReply(), we call freeReplyObject(m_reply) directly with out any judgement.

In redis ,before Wed Dec 21 12:11:56 2016, function void freeReplyObject(void *reply) will directly access r->type without any judgement. There will generate a core when reply is NULL.

I have check function void freeReplyObject(void *reply) in redis version 3.2.11,it will generate a core if reply is NULL. While redis version 4.0.6 will not generate any core.

defaultvalueprovider_ut fails on address sanitizer

Enable address sanitizer in unit test.
Test output:

[----------] 4 tests from DECORATOR
[ RUN ] DECORATOR.ChoiceAndLeaflistDefaultValue
AddressSanitizer:DEADLYSIGNAL

==13684==ERROR: AddressSanitizer: SEGV on unknown address 0x000000003232 (pc 0x7f6f2bbdead8 bp 0x7ffc8ec81c60 sp 0x7ffc8ec81408 T0)
==13684==The signal is caused by a READ memory access.
#0 0x7f6f2bbdead8 (/lib/x86_64-linux-gnu/libc.so.6+0x172ad8)
#1 0x562b574e620c in __interceptor_strlen.part.0 (/tmp/sonic/tests/.libs/tests_asan+0x52a20c)
#2 0x7f6f2bedd057 in std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string(char const*, std::allocator const&) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x136057)
#3 0x562b5795f135 in swss::JSon::buildJson[abi:cxx11](char const**) ../common/json.cpp:34
#4 0x562b57942c8b in swss::DefaultValueHelper::GetDefaultValueInfoForLeaflist(lys_node_leaflist*, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >) ../common/defaultvalueprovider.cpp:228
#5 0x562b579432a7 in swss::DefaultValueHelper::GetDefaultValueInfoabi:cxx11 ../common/defaultvalueprovider.cpp:264
#6 0x562b57943501 in swss::DefaultValueHelper::BuildTableDefaultValueMapping(lys_node*, std::map<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int>, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int> >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int> const, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > > > >&) ../common/defaultvalueprovider.cpp:293
#7 0x562b57943736 in swss::DefaultValueProvider::AppendTableInfoToMapping(lys_node*) ../common/defaultvalueprovider.cpp:306
#8 0x562b579454f1 in swss::DefaultValueProvider::LoadModule(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, ly_ctx*) ../common/defaultvalueprovider.cpp:487
#9 0x562b57944f46 in swss::DefaultValueProvider::Initialize(char const*) ../common/defaultvalueprovider.cpp:445
#10 0x562b577b5e0a in MockDefaultValueProvider::MockInitialize(char const*) /tmp/sonic/tests/defaultvalueprovider_ut.cpp:15
#11 0x562b577af783 in DECORATOR_ChoiceAndLeaflistDefaultValue_Test::TestBody() /tmp/sonic/tests/defaultvalueprovider_ut.cpp:22
#12 0x562b57a24ad6 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) (/tmp/sonic/tests/.libs/tests_asan+0xa68ad6)
#13 0x562b57a1ab4d in testing::Test::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5eb4d)
#14 0x562b57a1aca4 in testing::TestInfo::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5eca4)
#15 0x562b57a1b138 in testing::TestSuite::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5f138)
#16 0x562b57a1b781 in testing::internal::UnitTestImpl::RunAllTests() (/tmp/sonic/tests/.libs/tests_asan+0xa5f781)
#17 0x562b57a25046 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) (/tmp/sonic/tests/.libs/tests_asan+0xa69046)
#18 0x562b57a1ad67 in testing::UnitTest::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5ed67)
#19 0x562b578b4e4f in RUN_ALL_TESTS() /usr/include/gtest/gtest.h:2486
#20 0x562b578b4921 in main /tmp/sonic/tests/main.cpp:83
#21 0x7f6f2ba8fd09 in __libc_start_main ../csu/libc-start.c:308
#22 0x562b574cd319 in _start (/tmp/sonic/tests/.libs/tests_asan+0x511319)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x172ad8)
==13684==ABORTING

[Question] Why can't spaces be used when checking if a redis key exists?

When checking if a redis entry exists the key isn't allowed to have any spaces:
https://github.com/Azure/sonic-swss-common/blob/master/common/dbconnector.cpp#L609-L613

However, that restriction doesn't seem to be enforced when creating (i.e. SET) or reading keys (i.e. KEYS). I was able to create an entry like:

VRF_TABLE:"vrf 80"

Is there a technical reason for this? I guess we can use KEYS to check for existence, but the O(1) time complexity of exists is much nicer.

Continuous ProducerStateTable del/set call may leave stale FV in table data

One scenario:

Existing table data -- Key1 : f1/v1, f2/v2

Two produceStateTable operation on key1, del then set,

del --- key1

set --- key1: f1/v1, f3/v3

producerStateTable uses SADD to store key1, del and set may be combined if consumerState is not quick enough to pickup del before set. what is left in redis DB will be:

key1: f1/v1, f2/v2, f3/v3

While the correct one should be key1: f1/v1, f3/v3

The merge of of del and set may trigger other unexpected behavior for the user of consumerState.

Both PFC_WD_DB and CONFIG_DB use db 4

Is this the intended design?

schema.h:
9 #define LOGLEVEL_DB 3
10: #define PFC_WD_DB 4

database.json
23: "CONFIG_DB": {
24 "db": 4
25 }

configdb-load.sh
16: echo -en "SELECT 4\nSET CONFIG_DB_INITIALIZED true" | redis-cli

ConfigDBPipeConnector_Native_get_config fails if CONFIG_DB_UPDATED string key exists in CONFIG_DB

summary

"CONFIG_DB_UPDATED" string key is inserted in CONFIG_DB after changing settings in RESTCONF. ( https://github.com/Azure/sonic-mgmt-common/blob/master/translib/db/db.go#L1421 )
If this key is present, ConfigDBPipeConnector_Native_get_config fails.

procedure for reproducing

  • environment
admin@sonic:~$ show version

SONiC Software Version: SONiC.202012.0-08307385
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 08307385
Build date: Wed Mar 10 03:10:26 UTC 2021
Built by: tetsuji@kf1-AF13sv001

Platform: x86_64-dellemc_s5248f_c3538-r0
HwSKU: DellEMC-S5248f-P-25G
ASIC: broadcom
ASIC Count: 1
...<snip>
  • update config with RESTCONF (it's example)
$ curl -s -X POST -H 'Content-Type:application/yang-data+json' -H 'Accept:application/yang-data+json' --insecure  'https://<sonic ip address>/restconf/data/sonic-interface:sonic-interface/INTERFACE' -d '{"sonic-interface:INTERFACE_LIST": [{"portname": "Ethernet1"}], "sonic-interface:INTERFACE_IPADDR_LIST": [{"portname": "Ethernet1","ip_prefix": "172.30.128.2/31"}]}'
  • sonic-cfggen fails in _swsscommon.ConfigDBPipeConnector_Native_get_config(self)
admin@sonic:~$ sonic-cfggen -d
Traceback (most recent call last):
  File "/usr/local/bin/sonic-cfggen", line 432, in <module>
    main()
  File "/usr/local/bin/sonic-cfggen", line 361, in main
    deep_update(data, FormatConverter.db_to_output(configdb.get_config()))
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2087, in get_config
    data = super(ConfigDBConnector, self).get_config()
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2117, in get_config
    return _swsscommon.ConfigDBPipeConnector_Native_get_config(self)
RuntimeError: Got unexpected result: Input/output error
  • DELete string keys "CONFIG_DB_UPDATED_*", then get well
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> KEYS CONFIG_*
1) "CONFIG_DB_UPDATED_INTERFACE"
2) "CONFIG_DB_INITIALIZED"
3) "CONFIG_DB_UPDATED"

127.0.0.1:6379[4]> TYPE CONFIG_DB_UPDATED_INTERFACE
string
127.0.0.1:6379[4]> HGETALL CONFIG_DB_UPDATED_INTERFACE
(error) WRONGTYPE Operation against a key holding the wrong kind of value
127.0.0.1:6379[4]> GET CONFIG_DB_UPDATED_INTERFACE
"1"

127.0.0.1:6379[4]> TYPE CONFIG_DB_UPDATED
string
127.0.0.1:6379[4]> HGETALL CONFIG_DB_UPDATED
(error) WRONGTYPE Operation against a key holding the wrong kind of value
127.0.0.1:6379[4]> GET CONFIG_DB_UPDATED
"1"

127.0.0.1:6379[4]> DEL CONFIG_DB_UPDATED_INTERFACE
(integer) 1
127.0.0.1:6379[4]> DEL CONFIG_DB_UPDATED
(integer) 1
127.0.0.1:6379[4]> KEYS CONFIG_*
1) "CONFIG_DB_INITIALIZED"
127.0.0.1:6379[4]> exit

admin@sonic:~$ sonic-cfggen -d --print-data | more
{
    "CRM": {
        "Config": {
            "acl_counter_high_threshold": "85",
...<snip>

swss loses netlink messages

When netlink sends big amount of netlink messages, swss is not capable to transform and save them into the DB.
Currently we can increase netlink receive buffer, but it can't prevent us of loosing some of them in time when there're a pike of messages. We need better netlink messages management code.

swss::exec return value and message has nothing to do with actual command return value

Consider the following log:

Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 ERR swss#intfmgrd: :- exec: /sbin/ip address add 10.106.116.191/31 dev PortChannel30: Success
Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 DEBUG swss#intfmgrd: :- exec: /sbin/ip address add 10.106.116.191/31 dev PortChannel30 : 
Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 ERR swss#intfmgrd: :- setIntfIp: Command '/sbin/ip address add 10.106.116.191/31 dev PortChannel30' failed with rc 512

exec prints ': Success' because that's what returns strerror(errno), however errno is set per process and this is not the same errno from the failed one.
Also, return code (512 in this case) is ambiguous as this is not in range of return codes from shell commands

Can we remove unwanted tables entries from schema.h

Hi,

In sonic-buildimage/src/sonic-swss-common/common/schema.h,
There is a comment "/***** TO BE REMOVED *****/" under APP_DB table name.
Are these not required at all? Or are they here for some historical reasons?

After migration of interface related details from APP_DB to CFG_DB, seems few tables are not required.

Can some one take up this task to clean up this file.
I am not an expert to provide more inputs.

Snippet from common/schema.h

/***** TO BE REMOVED *****/

#define APP_TC_TO_QUEUE_MAP_TABLE_NAME "TC_TO_QUEUE_MAP_TABLE"
#define APP_SCHEDULER_TABLE_NAME "SCHEDULER_TABLE"
#define APP_DSCP_TO_TC_MAP_TABLE_NAME "DSCP_TO_TC_MAP_TABLE"
#define APP_QUEUE_TABLE_NAME "QUEUE_TABLE"
#define APP_PORT_QOS_MAP_TABLE_NAME "PORT_QOS_MAP_TABLE"
#define APP_WRED_PROFILE_TABLE_NAME "WRED_PROFILE_TABLE"
#define APP_TC_TO_PRIORITY_GROUP_MAP_NAME "TC_TO_PRIORITY_GROUP_MAP_TABLE"
#define APP_PFC_PRIORITY_TO_PRIORITY_GROUP_MAP_NAME "PFC_PRIORITY_TO_PRIORITY_GROUP_MAP_TABLE"
#define APP_PFC_PRIORITY_TO_QUEUE_MAP_NAME "MAP_PFC_PRIORITY_TO_QUEUE"

#define APP_BUFFER_POOL_TABLE_NAME "BUFFER_POOL_TABLE"
#define APP_BUFFER_PROFILE_TABLE_NAME "BUFFER_PROFILE_TABLE"
#define APP_BUFFER_QUEUE_TABLE_NAME "BUFFER_QUEUE_TABLE"
#define APP_BUFFER_PG_TABLE_NAME "BUFFER_PG_TABLE"
#define APP_BUFFER_PORT_INGRESS_PROFILE_LIST_NAME "BUFFER_PORT_INGRESS_PROFILE_LIST"
#define APP_BUFFER_PORT_EGRESS_PROFILE_LIST_NAME "BUFFER_PORT_EGRESS_PROFILE_LIST"

configDB table name definition is missing in schema.h

The sonic-cfggen python code hard coded a lot of configDB table names, but none of them have been specified in schema.h.

As the central place for storing key information about db and table names that all modules will refer to, schema.h should be kept up to date.

PR commit check failure: testslibsairedis test failed

For PR #852

Azure.sonic-swss-common Failing after 41m — Build #20240221.2 failed

[ RUN      ] libsairedis.ars
[       OK ] libsairedis.ars (1 ms)
[ RUN      ] libsairedis.ars_profile
[       OK ] libsairedis.ars_profile (0 ms)
[----------] 74 tests from libsairedis (11 ms total)

[----------] Global test environment tear-down
Assertion failed: pfd.revents & POLLIN (src/signaler.cpp:265)
/bin/bash: line 6: 24417 Aborted                 (core dumped) ${dir}$tst
FAIL: testslibsairedis
[==========] Running 79 tests from 16 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from Switch

sonic-db-cli socket option not working when using PING

Problem:
When using the tool sonic-db-cli -s PING it's not really using the socket unix option.

I debugged it and it's triggering this function:
src/sonic-swss-common/common/dbconnector.cpp
void RedisContext::initContext(const char *host, int port, const timeval *tv)
instead of the function with the socket unix signature:
void RedisContext::initContext(const char *path, const timeval *tv)

The root cause looks related to this PR:
https://github.com/sonic-net/sonic-swss-common/pull/607/files
file:sonic-db-cli/sonic-db-cli.cpp
lines 274 - 291

More details:
the usage of the tool shows in the example that this command should be supported
admin@mysetup:~$ sudo sonic-db-cli --help
usage: sonic-db-cli [-h] [-s] [-n NAMESPACE] db_or_op [cmd [cmd ...]]

SONiC DB CLI:

positional arguments:
db_or_op Database name Or Unary operation(only PING/SAVE/FLUSHALL supported)
cmd Command to execute in database

optional arguments:
-h, --help show this help message and exit
-s, --unixsocket Override use of tcp_port and use unixsocket
-n NAMESPACE, --namespace NAMESPACE
Namespace string to use asic0/asic1.../asicn

sudo needed for commands accesing a different namespace [-n], or using unixsocket connection [-s]

Example 1: sonic-db-cli -n asic0 CONFIG_DB keys *
Example 2: sonic-db-cli -n asic2 APPL_DB HGETALL VLAN_TABLE:Vlan10
Example 3: sonic-db-cli APPL_DB HGET VLAN_TABLE:Vlan10 mtu
Example 4: sonic-db-cli -n asic3 APPL_DB EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 k1 k2 v1 v2
Example 5: sonic-db-cli PING | sonic-db-cli -s PING
Example 6: sonic-db-cli SAVE | sonic-db-cli -s SAVE
Example 7: sonic-db-cli FLUSHALL | sonic-db-cli -s FLUSHALL

SubscriberStateTable::hasCachedData() return false with data

SubscriberStateTable::hasCachedData() return false, even when m_buffer has an entry.

swss::SubscriberStateTable tbl(m_configDb, "CONTAINER_FEATURE");
std::deque<KeyOpFieldsValuesTuple> entries;

SWSS_LOG_ERROR("Has cached data: %s", tbl.hasCachedData() ? "yes" : "no");
tbl.pops(entries);
for (auto& entry: entries) {
    string key = kfvKey(entry);
    SWSS_LOG_ERROR("Key read: %s", key.c_str());
}

When you run the above, the log statement prints "Has cached data: no". But the subsequent for loop printed one key, as in DB.

message queue in producertable/consumertable has lots of immediate states

during ecmp route convergence period, a route will be added to message queue due to the nhg changes, e.g,

    10.0.0.1/24 -> 1.1.1.1,2.2.2.2
    10.0.0.1/24 -> 1.1.1.1,2.2.2.2,3.3.3.3
    10.0.0.1/24 -> 1.1.1.1,2.2.2.2,3.3.3.3,4.4.4.4

all these messages should be consolidated into one message so that swss does not have to put immediate states onto asic.

current message queue does not consolidating messages with a same key. The idea is to use sets instead of list to hold the message queue.

The producer set(key, field, value) will put the state into the app db, meanwhile put the key into the set using SADD. del(key) will remove the key from the db, meanwhile put the key into the set.

The consumer will use the SPOP to read from the set, and hgetall to get all the field and value for the key. the SPOP and HGETALL must to atomic operation. So, we need to lua to achieve this.

pop.lua

local key = redis.call("SPOP", KEYS[1])
if not key then return end
local key2 = redis.call("HGETALL", key)
return {key, key2}

We also need to upgrade redis to 3.2.4.

SubscribeStateTable doesn't signal to Selector about new data

Sometimes select doesn't return sst when the sst is ready.

I found that by analyzing bgpcfgd issue "some bgp sessions are still being down, after config bgp startup all was used" I found the bgpcfgd doesn't receive state for the db, after it was being updated by config startup all utility.

make database config optional

Thie PR made the existence of the database config mandatory.

To preserve previous behavior, I think it's better to make the config optional.
Is it possible to change the implementation to use DEFAULT_UNIXSOCKET if the database config doesn't exist?

There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot.

There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot.
When hitting this line https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-host-services/scripts/hostcfgd#L1241 .

Example:

>>> from swsscommon import swsscommon
>>> c=swsscommon.ConfigDBConnector()
>>> c.subscribe('A', lambda a: None)
>>> c.connect()
>>> c.listen()


^C^C^C^C

@qiluo-msft This looks like a swss-common issue.

This wasn't seen for some reason during 500 warm boot tests on 202012 since warm reboot is executed early enough before hostcfgd stucks in a listen().

Originally posted by @stepanblyschak in sonic-net/sonic-buildimage#10510 (comment)

[Question] Why use HSET instead of HMSET in consumer_state_table_pops.lua?

https://github.com/Azure/sonic-swss-common/blob/11db9ba6403b92effd060bea33b4ca9812465f60/common/consumer_state_table_pops.lua#L18

When multiple fields need to be modified, this process is not atomic. I think it may cause timing problems.
For example, when adding lag, if the fields such as mtu, status, and speed are written a little slowly, a service that subscribes to the keyspace event may not get the values of the related fields.

So, why not use "HMSET".

Thread race issue between notifications and SAI function calls

ERR orchagent: :- checkReplyType: Expected to get redis type 5 got type 3, err: NON-STRING-REPLY
ERR orchagent: :- checkReplyType: Expected to get redis type 3 got type 5, err: NON-STRING-REPLY
ERR orchagent: :- guard: RedisReply catches system_error: command: DEL FDB_TABLE:Vlan2:72-06-00-01-01-7E, reason: Wrong expected type of result: Input/output error
ERR orchagent: :- guard: RedisReply catches system_error: command: *4#015#012$5#015#012HMSET#015#012$21#015#012PORT_TABLE:Ethernet16#015#012$11#015#012oper_status#015#012$2#015#012up#015#012, reason: Wrong expected type of result: Input/output error
INFO supervisord: orchagent terminate called after throwing an instance of 'std::system_error'
INFO supervisord: orchagent   what():  Wrong expected type of result: Input/output error

According to the logs, it seems that the reply order is not correct which causes crash.

remove the rate limitation when swssloglevel to Debug mode

In debug mode, it's easy to trigger the log drop due to the log rate limitation . I have to modify the log rate limitation in rsyslog and related files. Is it possible to automatically remove the limitation if swssloglevel is Debug.

CI failed due to UT tests failed(ZMQ related logs)

The CI is unstable and rarely passed the build stage due to timeout and the following logs

[ RUN      ] BinarySerializer.serialize_overflow
 ERROR:- setData: There are not enough buffer for binary serializer to serialize,                             key count: 1, data length 15, buffer size: 50
[       OK ] BinarySerializer.serialize_overflow (0 ms)
[ RUN      ] BinarySerializer.deserialize_overflow
 ERROR:- deserializeBuffer: serialized value data was truncated,, value length: 11 increase buffer size: 113


***

[ RUN      ] ZmqConsumerStateTable.test

***

# a lot of logs. No new test is started

 DEBUG:> select: enter
 DEBUG:< select: exit
 DEBUG:> select: enter
##[error]The operation was canceled.

For example, for PR #777 only 2 of 8 CI runs passed the build stage(Once before ZMQ was merged)

String Equality tests in tests/main.cpp

Currently in tests/main.cpp, we are testing corner cases with a few SonicDBConfig functions, and we test that the string from the exception that is caught, matches an expected string.

https://github.com/sonic-net/sonic-swss-common/blob/master/tests/main.cpp#L71

However, this is not robust and better practice would be to test for what exception type is thrown as exception strings can be easily changed breaking the test or having to change in multiple locations.

https://github.com/sonic-net/sonic-swss-common/blob/master/common/dbconnector.cpp#L248

Here we see out_of_range exception thrown; we can use EXPECT_THROW instead to test that the same exception is thrown in our UT.

Questions on selectable priorities support in select.cpp

  • It seems epoll do not sort events by fd. How to guarantee we can handle events of high priority seletable first?
        ret = ::epoll_wait(m_epoll_fd, events.data(), sz_selectables, timeout);
  • It seems that we need to update updateLastUsedTime in all used seletables? Why not update them in the for block? And it seems that only update the first item in set m_ready in the if block.
int Select::poll_descriptors(Selectable **c, unsigned int timeout)
{
    int sz_selectables = static_cast<int>(m_objects.size());
    std::vector<struct epoll_event> events(sz_selectables);
    int ret;

    do
    {
        ret = ::epoll_wait(m_epoll_fd, events.data(), sz_selectables, timeout);
    }
    while(ret == -1 && errno == EINTR); // Retry the select if the process was interrupted by a signal

    if (ret < 0)
        return Select::ERROR;

    for (int i = 0; i < ret; ++i)
    {
        int fd = events[i].data.fd;
        Selectable* sel = m_objects[fd];
        sel->readData();
        m_ready.insert(sel);  <---- why not update updateLastUsedTime here?
    }

    if (!m_ready.empty())
    {
        auto sel = *m_ready.begin();  <---- only update the first item

        *c = sel;

        m_ready.erase(sel);
        // we must update clock only when the selector out of the m_ready
        // otherwise we break invariant of the m_ready
        sel->updateLastUsedTime();

        if (sel->hasCachedData())
        {
            // reinsert Selectable back to the m_ready set, when there're more messages in the cache
            m_ready.insert(sel);
        }

        sel->updateAfterRead();

        return Select::OBJECT;
    }

    return Select::TIMEOUT;
}

Failure to read netlink notifications

Hi,

I am occasionally seeing below error message. This happens either when I'm restarting all containers (swss, syncd, teamd) or when multiple vlans (around 500 to 1000 vlans) are being configured.

err teamsyncd: :- readData: netlink reports out of memory on reading a netlink socket. High possibility of a lost message

The swss common library currently sets netlink socket read buffer size as 3 MB. Code is here.

/* Set socket buffer size to 3MB */
nl_socket_set_buffer_size(m_socket, 3145728, 0);

I tried increasing this to 16 MB, but did not help.
Strangely, portsyncd also uses same swss common library and I'm not seeing any issues there.

Any pointers on what could be causing this, or how this can be handled?

NetLink socket is in blocking mode

In some cases we have observed that due to the missing of NLMSG_DONE flag in netlink message, the netlink process got stuck in "static int recvmsgs(struct nl_sock *sk, struct nl_cb *cb)" function call in libnl. Though no data available for the nl_sock, it blocks at recvmsg () system call.

Will it be better to change NetLink sockets to non blocking mode so application may handle erroneous netlink message smoothly? "nl_socket_set_nonblocking(m_socket);"

(gdb) bt
#0 0x00007fcec4d3559d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fcec630483e in nl_recv (sk=0x2096090, nla=0x7fff2f2b2860, buf=0x7fff2f2b2990, creds=0x7fff2f2b2998) at /sonic/src/libnl3/libnl3/./lib/nl.c:703
#2 0x00007fcec63051a5 in recvmsgs (cb=, sk=) at /sonic/src/libnl3/libnl3/./lib/nl.c:849
#3 nl_recvmsgs_report (sk=0x2096090, cb=0x2094450) at /sonic/src/libnl3/libnl3/./lib/nl.c:1062
#4 0x00007fcec6305459 in nl_recvmsgs (sk=, cb=) at /sonic/src/libnl3/libnl3/./lib/nl.c:1086
#5 0x00007fcec5e797ca in swss::NetLink::readMe (this=) at netlink.cpp:106
#6 0x00007fcec5e6e542 in swss::Select::select (this=this@entry=0x7fff2f2b2bb0, c=c@entry=0x7fff2f2b2b88, fd=fd@entry=0x7fff2f2b2b84, timeout=timeout@entry=999999)
at select.cpp:92
#7 0x000000000040317f in main (argc=, argv=) at teamsyncd.cpp:45

Too many FD events on modifying PORT table in CONFIG_DB

Xcvrd listens to configuration changes here. Steps to reproduce:

  1. Set any field in PORT table of CONFIG_DB as:-
    redis-cli -n 4 hset "PORT|Ethernet0" "tx_power" "-13"

  2. Add a print here to output the 'fvp' values and you find more than 6 events are received by Xcvrd

Jul 22 20:05:09.335091 sonic WARNING pmon#xcvrd[204620]: $$$handle_port_update_even(): {'admin_status': 'down', 'alias': 'Ethernet1/1', 'index': '1', 'lanes': '1,2,3,4,5,6,7,8', 'laser_freq': '193100', 'speed': '400000', 'tx_power': '-13', 'description': '', 'oper_status': 'up', 'mtu': '9100'}
Jul 22 20:05:09.342073 sonic WARNING pmon#xcvrd[204620]: message repeated 6 times: [ $$$handle_port_update_even(): {'admin_status': 'down', 'alias': 'Ethernet1/1', 'index': '1', 'lanes': '1,2,3,4,5,6,7,8', 'laser_freq': '193100', 'speed': '400000', 'tx_power': '-13', 'description': '', 'oper_status': 'up', 'mtu': '9100'}]

Clearly there are duplicates events pushed to Xcvrd which can impact performance of datapath initliazation especially during 400G link bring up sequence.

Unit test hangs at SubscriberStateTable

Either the instruction about enabling notify-keyspace-events should be provided, or the test case itself has the keyspace notification enabled. Otherwise people may struggle with the hang problem.

jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ ./tests --gtest_filter=SubscriberStateTable*
Running main() from gtest_main.cc
Note: Google Test filter = SubscriberStateTable*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from SubscriberStateTable
[ RUN ] SubscriberStateTable.set
650:M 16 Mar 23:43:21.450 * DB saved on disk

^C

jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ redis-cli config set notify-keyspace-events KEA
OK

jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ ./tests --gtest_filter=SubscriberStateTable*
Running main() from gtest_main.cc
Note: Google Test filter = SubscriberStateTable*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from SubscriberStateTable
[ RUN ] SubscriberStateTable.set
650:M 16 Mar 23:45:54.168 * DB saved on disk
[ OK ] SubscriberStateTable.set (46 ms)
[ RUN ] SubscriberStateTable.del
650:M 16 Mar 23:45:54.182 * DB saved on disk
[ OK ] SubscriberStateTable.del (14 ms)
[ RUN ] SubscriberStateTable.table_state
650:M 16 Mar 23:45:54.188 * DB saved on disk
++++++++++----------[ OK ] SubscriberStateTable.table_state (1140 ms)
[ RUN ] SubscriberStateTable.one_producer_multiple_subscriber
650:M 16 Mar 23:45:55.337 * DB saved on disk
Starting 64 subscribers on redis

Done.
[ OK ] SubscriberStateTable.one_producer_multiple_subscriber (5169 ms)
[----------] 4 tests from SubscriberStateTable (6369 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (6369 ms total)
[ PASSED ] 4 tests.

swssloglevel -p crashes

Description
swssloglevel -p crashes with the following coredump:
swssloglevel.1585549658.921.core.gz

[New LWP 820]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `swssloglevel -p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005594c075c592 in std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=..., __os=...) at /usr/include/c++/6/bits/basic_string.h:5345
5345    /usr/include/c++/6/bits/basic_string.h: No such file or directory.
(gdb) bt
#0  0x00005594c075c592 in std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=..., __os=...) at /usr/include/c++/6/bits/basic_string.h:5345
Azure/sonic-buildimage#1  main (argc=<optimized out>, argv=<optimized out>) at loglevel.cpp:138

The related code

    if (print)
    {
        if (argc != 2)
        {
            exitWithUsage(EXIT_FAILURE, "-p option does not accept other options");
        }

        std::sort(keys.begin(), keys.end());
        for (const auto& key : keys)
        {
            const auto redis_key = std::string(key).append(":").append(key);
            auto level = redisClient.hget(redis_key, DAEMON_LOGLEVEL);
            std::cout << std::left << std::setw(30) << key << *level << std::endl;
            //^This line causes the error
        }
        return (EXIT_SUCCESS);
    }

Steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

admin@r-tigris-04:~$ show version

SONiC Software Version: SONiC.HEAD.233-0b52be3e
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 0b52be3e
Build date: Tue Mar 24 07:50:08 UTC 2020
Built by: johnar@jenkins-worker-7

Platform: x86_64-mlnx_msn3800-r0
HwSKU: Mellanox-SN3800-D112C8
ASIC: mellanox
Serial Number: MT1925X00008
Uptime: 07:31:51 up 11 min,  1 user,  load average: 4.78, 4.42, 2.78

Docker images:
REPOSITORY                    TAG                 IMAGE ID            SIZE
docker-syncd-mlnx             HEAD.233-0b52be3e   b859341cd305        383MB
docker-syncd-mlnx             latest              b859341cd305        383MB
docker-teamd                  HEAD.233-0b52be3e   db902a800d3f        308MB
docker-teamd                  latest              db902a800d3f        308MB
docker-router-advertiser      HEAD.233-0b52be3e   ebad18d3ac3b        283MB
docker-router-advertiser      latest              ebad18d3ac3b        283MB
docker-platform-monitor       HEAD.233-0b52be3e   f9a3eab42069        628MB
docker-platform-monitor       latest              f9a3eab42069        628MB
docker-database               HEAD.233-0b52be3e   4f9cddcb8d89        283MB
docker-database               latest              4f9cddcb8d89        283MB
docker-orchagent              HEAD.233-0b52be3e   9cbc5b395350        326MB
docker-orchagent              latest              9cbc5b395350        326MB
docker-nat                    HEAD.233-0b52be3e   dc43ed837813        309MB
docker-nat                    latest              dc43ed837813        309MB
docker-sonic-telemetry        HEAD.233-0b52be3e   215e87e0755e        345MB
docker-sonic-telemetry        latest              215e87e0755e        345MB
docker-dhcp-relay             HEAD.233-0b52be3e   e5382e0a33e4        293MB
docker-dhcp-relay             latest              e5382e0a33e4        293MB
docker-sonic-mgmt-framework   HEAD.233-0b52be3e   a0473b8878af        420MB
docker-sonic-mgmt-framework   latest              a0473b8878af        420MB
docker-sflow                  HEAD.233-0b52be3e   ab92c2bddce9        308MB
docker-sflow                  latest              ab92c2bddce9        308MB
docker-lldp-sv2               HEAD.233-0b52be3e   2237927672e0        305MB
docker-lldp-sv2               latest              2237927672e0        305MB
docker-snmp-sv2               HEAD.233-0b52be3e   bcb58a78e025        340MB
docker-snmp-sv2               latest              bcb58a78e025        340MB
docker-fpm-frr                HEAD.233-0b52be3e   18544b0dedfa        328MB
docker-fpm-frr                latest              18544b0dedfa        328MB

Attach debug file sudo generate_dump:

    (paste your output here)

"swssloglevel -l SAI_LOG_LEVEL_ERROR -s -c switch" or any SAI API component no longer works in Master branch

Noticed that there is no way to set the loglevel for SAI anymore in master branch based image.
Tried master.333 as well as master.339 and both are not working…

admin@str-s6000-acs-8:/var/log/swss$ swssloglevel -p
_SAI_API_ACL Unknown log level
_SAI_API_BFD Unknown log level
_SAI_API_BMTOR Unknown log level
_SAI_API_BRIDGE Unknown log level
_SAI_API_BUFFER Unknown log level
_SAI_API_COUNTER Unknown log level
_SAI_API_DEBUG_COUNTER Unknown log level
_SAI_API_DTEL Unknown log level
_SAI_API_FDB Unknown log level
_SAI_API_HASH Unknown log level
_SAI_API_HOSTIF Unknown log level
_SAI_API_IPMC Unknown log level
_SAI_API_IPMC_GROUP Unknown log level
_SAI_API_ISOLATION_GROUP Unknown log level
_SAI_API_L2MC Unknown log level
_SAI_API_L2MC_GROUP Unknown log level
_SAI_API_LAG Unknown log level
_SAI_API_MACSEC Unknown log level
_SAI_API_MCAST_FDB Unknown log level
_SAI_API_MIRROR Unknown log level
_SAI_API_MPLS Unknown log level
_SAI_API_NAT Unknown log level
_SAI_API_NEIGHBOR Unknown log level
_SAI_API_NEXT_HOP Unknown log level
_SAI_API_NEXT_HOP_GROUP Unknown log level
_SAI_API_POLICER Unknown log level
_SAI_API_PORT Unknown log level
_SAI_API_QOS_MAP Unknown log level
_SAI_API_QUEUE Unknown log level
_SAI_API_ROUTE Unknown log level
_SAI_API_ROUTER_INTERFACE Unknown log level
_SAI_API_RPF_GROUP Unknown log level
_SAI_API_SAMPLEPACKET Unknown log level
_SAI_API_SCHEDULER Unknown log level
_SAI_API_SCHEDULER_GROUP Unknown log level
_SAI_API_SEGMENTROUTE Unknown log level
_SAI_API_STP Unknown log level
_SAI_API_SWITCH Unknown log level
_SAI_API_SYSTEM_PORT Unknown log level
_SAI_API_TAM Unknown log level
_SAI_API_TUNNEL Unknown log level
_SAI_API_UDF Unknown log level
_SAI_API_VIRTUAL_ROUTER Unknown log level
_SAI_API_VLAN Unknown log level
_SAI_API_WRED Unknown log level
buffermgrd NOTICE
fpmsyncd NOTICE
intfmgrd NOTICE
natmgrd NOTICE
natsyncd NOTICE
nbrmgrd NOTICE
neighsyncd NOTICE
orchagent NOTICE
portmgrd NOTICE
portsyncd NOTICE
syncd NOTICE
teammgrd NOTICE
teamsyncd NOTICE
tlm_teamd NOTICE
vlanmgrd NOTICE
vrfmgrd NOTICE
vxlanmgrd NOTICE
admin@str-s6000-acs-8:/var/log/swss$
admin@str-s6000-acs-8:/var/log/swss$ swssloglevel -l SAI_LOG_LEVEL_ERROR -s -c SWITCH
Component not present in DB

Usage: swssloglevel [OPTIONS]
SONiC logging severity level setting.

Options:
-h print this message
-l loglevel value
-c component name in DB for which loglevel is applied (provided with -l)
-a apply loglevel to all components (provided with -l)
-s apply loglevel for SAI api component (equivalent to adding prefix "SAI_API_" to component)
-p print components registered in DB for which setting can be applied

Examples:
swssloglevel -l NOTICE -c orchagent # set orchagent severity level to NOTICE
swssloglevel -l SAI_LOG_LEVEL_ERROR -s -c SWITCH # set SAI_API_SWITCH severity to ERROR
swssloglevel -l SAI_LOG_LEVEL_DEBUG -s -a # set all SAI_API_* severity to DEBUG
admin@str-s6000-acs-8:/var/log/swss$

admin@str-s6000-acs-8:~$ show vers

SONiC Software Version: SONiC.master.339-31baf381
Distribution: Debian 10.4
Kernel: 4.19.0-6-2-amd64
Build commit: 31baf38
Build date: Tue Jul 7 07:01:13 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-dell_s6000_s1220-r0
HwSKU: Force10-S6000
ASIC: broadcom
Serial Number: 1QBRX42
Uptime: 20:32:35 up 1:09, 1 user, load average: 1.12, 1.09, 1.12

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.