sonic-net / sonic-swss-common Goto Github PK
View Code? Open in Web Editor NEWCommon components for SONiC switch state service
License: Other
Common components for SONiC switch state service
License: Other
There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot.
When hitting this line https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-host-services/scripts/hostcfgd#L1241 .
Example:
>>> from swsscommon import swsscommon
>>> c=swsscommon.ConfigDBConnector()
>>> c.subscribe('A', lambda a: None)
>>> c.connect()
>>> c.listen()
^C^C^C^C
@qiluo-msft This looks like a swss-common issue.
This wasn't seen for some reason during 500 warm boot tests on 202012 since warm reboot is executed early enough before hostcfgd stucks in a listen().
Originally posted by @stepanblyschak in sonic-net/sonic-buildimage#10510 (comment)
SubscriberStateTable::hasCachedData() return false, even when m_buffer has an entry.
swss::SubscriberStateTable tbl(m_configDb, "CONTAINER_FEATURE");
std::deque<KeyOpFieldsValuesTuple> entries;
SWSS_LOG_ERROR("Has cached data: %s", tbl.hasCachedData() ? "yes" : "no");
tbl.pops(entries);
for (auto& entry: entries) {
string key = kfvKey(entry);
SWSS_LOG_ERROR("Key read: %s", key.c_str());
}
When you run the above, the log statement prints "Has cached data: no"
. But the subsequent for loop printed one key, as in DB.
@lguohan @marian-pritsak @qiluo-msft
Is this a bug? https://github.com/Azure/sonic-swss-common/blob/master/common/logger.cpp#L231, after first iteration loop will always end, what if there are more values ?
originally introduced here https://github.com/Azure/sonic-swss-common/pull/81/files#diff-c38571a8d7ef954052d654e8076f65eb02e0ec433b6aacdc9ad0446f8dcae21bR143, but loop then had continue statement
Enable address sanitizer in unit test.
Test output:
==13684==ERROR: AddressSanitizer: SEGV on unknown address 0x000000003232 (pc 0x7f6f2bbdead8 bp 0x7ffc8ec81c60 sp 0x7ffc8ec81408 T0)
==13684==The signal is caused by a READ memory access.
#0 0x7f6f2bbdead8 (/lib/x86_64-linux-gnu/libc.so.6+0x172ad8)
#1 0x562b574e620c in __interceptor_strlen.part.0 (/tmp/sonic/tests/.libs/tests_asan+0x52a20c)
#2 0x7f6f2bedd057 in std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string(char const*, std::allocator const&) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x136057)
#3 0x562b5795f135 in swss::JSon::buildJson[abi:cxx11](char const**) ../common/json.cpp:34
#4 0x562b57942c8b in swss::DefaultValueHelper::GetDefaultValueInfoForLeaflist(lys_node_leaflist*, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >) ../common/defaultvalueprovider.cpp:228
#5 0x562b579432a7 in swss::DefaultValueHelper::GetDefaultValueInfoabi:cxx11 ../common/defaultvalueprovider.cpp:264
#6 0x562b57943501 in swss::DefaultValueHelper::BuildTableDefaultValueMapping(lys_node*, std::map<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int>, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int> >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int> const, std::shared_ptr<std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > > > > > >&) ../common/defaultvalueprovider.cpp:293
#7 0x562b57943736 in swss::DefaultValueProvider::AppendTableInfoToMapping(lys_node*) ../common/defaultvalueprovider.cpp:306
#8 0x562b579454f1 in swss::DefaultValueProvider::LoadModule(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, ly_ctx*) ../common/defaultvalueprovider.cpp:487
#9 0x562b57944f46 in swss::DefaultValueProvider::Initialize(char const*) ../common/defaultvalueprovider.cpp:445
#10 0x562b577b5e0a in MockDefaultValueProvider::MockInitialize(char const*) /tmp/sonic/tests/defaultvalueprovider_ut.cpp:15
#11 0x562b577af783 in DECORATOR_ChoiceAndLeaflistDefaultValue_Test::TestBody() /tmp/sonic/tests/defaultvalueprovider_ut.cpp:22
#12 0x562b57a24ad6 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) (/tmp/sonic/tests/.libs/tests_asan+0xa68ad6)
#13 0x562b57a1ab4d in testing::Test::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5eb4d)
#14 0x562b57a1aca4 in testing::TestInfo::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5eca4)
#15 0x562b57a1b138 in testing::TestSuite::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5f138)
#16 0x562b57a1b781 in testing::internal::UnitTestImpl::RunAllTests() (/tmp/sonic/tests/.libs/tests_asan+0xa5f781)
#17 0x562b57a25046 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) (/tmp/sonic/tests/.libs/tests_asan+0xa69046)
#18 0x562b57a1ad67 in testing::UnitTest::Run() (/tmp/sonic/tests/.libs/tests_asan+0xa5ed67)
#19 0x562b578b4e4f in RUN_ALL_TESTS() /usr/include/gtest/gtest.h:2486
#20 0x562b578b4921 in main /tmp/sonic/tests/main.cpp:83
#21 0x7f6f2ba8fd09 in __libc_start_main ../csu/libc-start.c:308
#22 0x562b574cd319 in _start (/tmp/sonic/tests/.libs/tests_asan+0x511319)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x172ad8)
==13684==ABORTING
"CONFIG_DB_UPDATED" string key is inserted in CONFIG_DB after changing settings in RESTCONF. ( https://github.com/Azure/sonic-mgmt-common/blob/master/translib/db/db.go#L1421 )
If this key is present, ConfigDBPipeConnector_Native_get_config fails.
admin@sonic:~$ show version
SONiC Software Version: SONiC.202012.0-08307385
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 08307385
Build date: Wed Mar 10 03:10:26 UTC 2021
Built by: tetsuji@kf1-AF13sv001
Platform: x86_64-dellemc_s5248f_c3538-r0
HwSKU: DellEMC-S5248f-P-25G
ASIC: broadcom
ASIC Count: 1
...<snip>
$ curl -s -X POST -H 'Content-Type:application/yang-data+json' -H 'Accept:application/yang-data+json' --insecure 'https://<sonic ip address>/restconf/data/sonic-interface:sonic-interface/INTERFACE' -d '{"sonic-interface:INTERFACE_LIST": [{"portname": "Ethernet1"}], "sonic-interface:INTERFACE_IPADDR_LIST": [{"portname": "Ethernet1","ip_prefix": "172.30.128.2/31"}]}'
admin@sonic:~$ sonic-cfggen -d
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 432, in <module>
main()
File "/usr/local/bin/sonic-cfggen", line 361, in main
deep_update(data, FormatConverter.db_to_output(configdb.get_config()))
File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2087, in get_config
data = super(ConfigDBConnector, self).get_config()
File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2117, in get_config
return _swsscommon.ConfigDBPipeConnector_Native_get_config(self)
RuntimeError: Got unexpected result: Input/output error
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> KEYS CONFIG_*
1) "CONFIG_DB_UPDATED_INTERFACE"
2) "CONFIG_DB_INITIALIZED"
3) "CONFIG_DB_UPDATED"
127.0.0.1:6379[4]> TYPE CONFIG_DB_UPDATED_INTERFACE
string
127.0.0.1:6379[4]> HGETALL CONFIG_DB_UPDATED_INTERFACE
(error) WRONGTYPE Operation against a key holding the wrong kind of value
127.0.0.1:6379[4]> GET CONFIG_DB_UPDATED_INTERFACE
"1"
127.0.0.1:6379[4]> TYPE CONFIG_DB_UPDATED
string
127.0.0.1:6379[4]> HGETALL CONFIG_DB_UPDATED
(error) WRONGTYPE Operation against a key holding the wrong kind of value
127.0.0.1:6379[4]> GET CONFIG_DB_UPDATED
"1"
127.0.0.1:6379[4]> DEL CONFIG_DB_UPDATED_INTERFACE
(integer) 1
127.0.0.1:6379[4]> DEL CONFIG_DB_UPDATED
(integer) 1
127.0.0.1:6379[4]> KEYS CONFIG_*
1) "CONFIG_DB_INITIALIZED"
127.0.0.1:6379[4]> exit
admin@sonic:~$ sonic-cfggen -d --print-data | more
{
"CRM": {
"Config": {
"acl_counter_high_threshold": "85",
...<snip>
There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.
Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.
According to this comment https://github.com/Azure/sonic-swss-common/blob/master/common/logger.h#L81, why? If this is public API it should not be dependent of call order. Could this method be private or this issue should be fixed in proper way.
The sonic-cfggen python code hard coded a lot of configDB table names, but none of them have been specified in schema.h.
As the central place for storing key information about db and table names that all modules will refer to, schema.h should be kept up to date.
When multiple fields need to be modified, this process is not atomic. I think it may cause timing problems.
For example, when adding lag, if the fields such as mtu, status, and speed are written a little slowly, a service that subscribes to the keyspace event may not get the values of the related fields.
So, why not use "HMSET".
In function RedisReply::release(), we set m_reply = NULL;
In function RedisReply::~RedisReply(), we call freeReplyObject(m_reply) directly with out any judgement.
In redis ,before Wed Dec 21 12:11:56 2016, function void freeReplyObject(void *reply) will directly access r->type without any judgement. There will generate a core when reply is NULL.
I have check function void freeReplyObject(void *reply) in redis version 3.2.11,it will generate a core if reply is NULL. While redis version 4.0.6 will not generate any core.
ERR orchagent: :- checkReplyType: Expected to get redis type 5 got type 3, err: NON-STRING-REPLY
ERR orchagent: :- checkReplyType: Expected to get redis type 3 got type 5, err: NON-STRING-REPLY
ERR orchagent: :- guard: RedisReply catches system_error: command: DEL FDB_TABLE:Vlan2:72-06-00-01-01-7E, reason: Wrong expected type of result: Input/output error
ERR orchagent: :- guard: RedisReply catches system_error: command: *4#015#012$5#015#012HMSET#015#012$21#015#012PORT_TABLE:Ethernet16#015#012$11#015#012oper_status#015#012$2#015#012up#015#012, reason: Wrong expected type of result: Input/output error
INFO supervisord: orchagent terminate called after throwing an instance of 'std::system_error'
INFO supervisord: orchagent what(): Wrong expected type of result: Input/output error
According to the logs, it seems that the reply order is not correct which causes crash.
sonic-buildimage can build sonic-swss-common for buster and bullseye, but azure pipeline for sonic-swss-common only supports buster.
When netlink sends big amount of netlink messages, swss is not capable to transform and save them into the DB.
Currently we can increase netlink receive buffer, but it can't prevent us of loosing some of them in time when there're a pike of messages. We need better netlink messages management code.
Is this the intended design?
schema.h:
9 #define LOGLEVEL_DB 3
10: #define PFC_WD_DB 4
database.json
23: "CONFIG_DB": {
24 "db": 4
25 }
configdb-load.sh
16: echo -en "SELECT 4\nSET CONFIG_DB_INITIALIZED true" | redis-cli
In debug mode, it's easy to trigger the log drop due to the log rate limitation . I have to modify the log rate limitation in rsyslog and related files. Is it possible to automatically remove the limitation if swssloglevel is Debug.
When checking if a redis entry exists the key isn't allowed to have any spaces:
https://github.com/Azure/sonic-swss-common/blob/master/common/dbconnector.cpp#L609-L613
However, that restriction doesn't seem to be enforced when creating (i.e. SET) or reading keys (i.e. KEYS). I was able to create an entry like:
VRF_TABLE:"vrf 80"
Is there a technical reason for this? I guess we can use KEYS to check for existence, but the O(1) time complexity of exists is much nicer.
Currently in tests/main.cpp, we are testing corner cases with a few SonicDBConfig functions, and we test that the string from the exception that is caught, matches an expected string.
https://github.com/sonic-net/sonic-swss-common/blob/master/tests/main.cpp#L71
However, this is not robust and better practice would be to test for what exception type is thrown as exception strings can be easily changed breaking the test or having to change in multiple locations.
https://github.com/sonic-net/sonic-swss-common/blob/master/common/dbconnector.cpp#L248
Here we see out_of_range exception thrown; we can use EXPECT_THROW instead to test that the same exception is thrown in our UT.
Problem:
When using the tool sonic-db-cli -s PING
it's not really using the socket unix option.
I debugged it and it's triggering this function:
src/sonic-swss-common/common/dbconnector.cpp
void RedisContext::initContext(const char *host, int port, const timeval *tv)
instead of the function with the socket unix signature:
void RedisContext::initContext(const char *path, const timeval *tv)
The root cause looks related to this PR:
https://github.com/sonic-net/sonic-swss-common/pull/607/files
file:sonic-db-cli/sonic-db-cli.cpp
lines 274 - 291
More details:
the usage of the tool shows in the example that this command should be supported
admin@mysetup:~$ sudo sonic-db-cli --help
usage: sonic-db-cli [-h] [-s] [-n NAMESPACE] db_or_op [cmd [cmd ...]]
SONiC DB CLI:
positional arguments:
db_or_op Database name Or Unary operation(only PING/SAVE/FLUSHALL supported)
cmd Command to execute in database
optional arguments:
-h, --help show this help message and exit
-s, --unixsocket Override use of tcp_port and use unixsocket
-n NAMESPACE, --namespace NAMESPACE
Namespace string to use asic0/asic1.../asicn
sudo needed for commands accesing a different namespace [-n], or using unixsocket connection [-s]
Example 1: sonic-db-cli -n asic0 CONFIG_DB keys *
Example 2: sonic-db-cli -n asic2 APPL_DB HGETALL VLAN_TABLE:Vlan10
Example 3: sonic-db-cli APPL_DB HGET VLAN_TABLE:Vlan10 mtu
Example 4: sonic-db-cli -n asic3 APPL_DB EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 k1 k2 v1 v2
Example 5: sonic-db-cli PING | sonic-db-cli -s PING
Example 6: sonic-db-cli SAVE | sonic-db-cli -s SAVE
Example 7: sonic-db-cli FLUSHALL | sonic-db-cli -s FLUSHALL
Hi,
In sonic-buildimage/src/sonic-swss-common/common/schema.h,
There is a comment "/***** TO BE REMOVED *****/" under APP_DB table name.
Are these not required at all? Or are they here for some historical reasons?
After migration of interface related details from APP_DB to CFG_DB, seems few tables are not required.
Can some one take up this task to clean up this file.
I am not an expert to provide more inputs.
Snippet from common/schema.h
/***** TO BE REMOVED *****/
#define APP_TC_TO_QUEUE_MAP_TABLE_NAME "TC_TO_QUEUE_MAP_TABLE"
#define APP_SCHEDULER_TABLE_NAME "SCHEDULER_TABLE"
#define APP_DSCP_TO_TC_MAP_TABLE_NAME "DSCP_TO_TC_MAP_TABLE"
#define APP_QUEUE_TABLE_NAME "QUEUE_TABLE"
#define APP_PORT_QOS_MAP_TABLE_NAME "PORT_QOS_MAP_TABLE"
#define APP_WRED_PROFILE_TABLE_NAME "WRED_PROFILE_TABLE"
#define APP_TC_TO_PRIORITY_GROUP_MAP_NAME "TC_TO_PRIORITY_GROUP_MAP_TABLE"
#define APP_PFC_PRIORITY_TO_PRIORITY_GROUP_MAP_NAME "PFC_PRIORITY_TO_PRIORITY_GROUP_MAP_TABLE"
#define APP_PFC_PRIORITY_TO_QUEUE_MAP_NAME "MAP_PFC_PRIORITY_TO_QUEUE"
#define APP_BUFFER_POOL_TABLE_NAME "BUFFER_POOL_TABLE"
#define APP_BUFFER_PROFILE_TABLE_NAME "BUFFER_PROFILE_TABLE"
#define APP_BUFFER_QUEUE_TABLE_NAME "BUFFER_QUEUE_TABLE"
#define APP_BUFFER_PG_TABLE_NAME "BUFFER_PG_TABLE"
#define APP_BUFFER_PORT_INGRESS_PROFILE_LIST_NAME "BUFFER_PORT_INGRESS_PROFILE_LIST"
#define APP_BUFFER_PORT_EGRESS_PROFILE_LIST_NAME "BUFFER_PORT_EGRESS_PROFILE_LIST"
ret = ::epoll_wait(m_epoll_fd, events.data(), sz_selectables, timeout);
int Select::poll_descriptors(Selectable **c, unsigned int timeout)
{
int sz_selectables = static_cast<int>(m_objects.size());
std::vector<struct epoll_event> events(sz_selectables);
int ret;
do
{
ret = ::epoll_wait(m_epoll_fd, events.data(), sz_selectables, timeout);
}
while(ret == -1 && errno == EINTR); // Retry the select if the process was interrupted by a signal
if (ret < 0)
return Select::ERROR;
for (int i = 0; i < ret; ++i)
{
int fd = events[i].data.fd;
Selectable* sel = m_objects[fd];
sel->readData();
m_ready.insert(sel); <---- why not update updateLastUsedTime here?
}
if (!m_ready.empty())
{
auto sel = *m_ready.begin(); <---- only update the first item
*c = sel;
m_ready.erase(sel);
// we must update clock only when the selector out of the m_ready
// otherwise we break invariant of the m_ready
sel->updateLastUsedTime();
if (sel->hasCachedData())
{
// reinsert Selectable back to the m_ready set, when there're more messages in the cache
m_ready.insert(sel);
}
sel->updateAfterRead();
return Select::OBJECT;
}
return Select::TIMEOUT;
}
Sorry, posted in the wrong repository. Closing.
When an application has subscribed for notifications and a large batch of notifications should theoretically be available, it appears as though the notifications "trickle in" to the application from the underlying Redis connection (~10 per second), rather than being received in a single, large batch.
This behavior was noticed while debugging an issue with caclmgd here: sonic-net/sonic-buildimage#5275
during ecmp route convergence period, a route will be added to message queue due to the nhg changes, e.g,
10.0.0.1/24 -> 1.1.1.1,2.2.2.2
10.0.0.1/24 -> 1.1.1.1,2.2.2.2,3.3.3.3
10.0.0.1/24 -> 1.1.1.1,2.2.2.2,3.3.3.3,4.4.4.4
all these messages should be consolidated into one message so that swss does not have to put immediate states onto asic.
current message queue does not consolidating messages with a same key. The idea is to use sets instead of list to hold the message queue.
The producer set(key, field, value) will put the state into the app db, meanwhile put the key into the set using SADD. del(key) will remove the key from the db, meanwhile put the key into the set.
The consumer will use the SPOP to read from the set, and hgetall to get all the field and value for the key. the SPOP and HGETALL must to atomic operation. So, we need to lua to achieve this.
pop.lua
local key = redis.call("SPOP", KEYS[1])
if not key then return end
local key2 = redis.call("HGETALL", key)
return {key, key2}
We also need to upgrade redis to 3.2.4.
When I used the following command to run a DVS test for branch 202205, I got a lot of Python exceptions.
sudo pytest -sv --imgname docker-sonic-vs-202205 --html=report.html --self-contained-html
The exceptions all look like:
test_drop_counters.py::TestDropCounters::test_deviceCapabilitiesTablePopulated remove extra link dummy
remove extra link Vlan100@Bridge
Exception ignored in: <bound method ApplDbValidator.__del__ of <conftest.ApplDbValidator object at 0x7f5e29cd1278>>
Traceback (most recent call last):
File "/home/autotest/wayne/202205/tests/conftest.py", line 165, in __del__
neighbors = self.get_keys(self.NEIGH_TABLE)
File "/home/autotest/wayne/202205/tests/dvslib/dvs_database.py", line 115, in get_keys
table = swsscommon.Table(self.db_connection, table_name)
File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2515, in __init__
this = _swsscommon.new_Table(*args)
RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
Consider the following log:
Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 ERR swss#intfmgrd: :- exec: /sbin/ip address add 10.106.116.191/31 dev PortChannel30: Success
Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 DEBUG swss#intfmgrd: :- exec: /sbin/ip address add 10.106.116.191/31 dev PortChannel30 :
Jan 16 09:57:09.359886 CYS05-0101-1105-14T1 ERR swss#intfmgrd: :- setIntfIp: Command '/sbin/ip address add 10.106.116.191/31 dev PortChannel30' failed with rc 512
exec prints ': Success' because that's what returns strerror(errno)
, however errno
is set per process and this is not the same errno
from the failed one.
Also, return code (512 in this case) is ambiguous as this is not in range of return codes from shell commands
>SonicDBConfig [](http://example.com/codeflow?start=20&length=13)
Future TODO: decouple DBConnector with SonicDBConfig. DBConnector only need to know its own dbName/index, port, hostname, unix socket. No need to know any other information such as namespace, containerName, sep, etc.
Originally posted by @qiluo-msft in #845 (comment)
when installing required deps
docker run --network host -it debian:bookworm bash
apt-get install make libtool m4 autoconf dh-exec debhelper cmake pkg-config \
libhiredis-dev libnl-3-dev libnl-genl-3-dev libnl-route-3-dev \
libnl-nf-3-dev swig3.0 libpython2.7-dev libpython3-dev \
libgtest-dev libgmock-dev libboost-dev
we get
E: Unable to locate package swig3.0
E: Couldn't find any package by glob 'swig3.0'
E: Couldn't find any package by regex 'swig3.0'
E: Unable to locate package libpython2.7-dev
E: Couldn't find any package by glob 'libpython2.7-dev'
E: Couldn't find any package by regex 'libpython2.7-dev'
other things in tha read.me may need update
In some cases we have observed that due to the missing of NLMSG_DONE flag in netlink message, the netlink process got stuck in "static int recvmsgs(struct nl_sock *sk, struct nl_cb *cb)" function call in libnl. Though no data available for the nl_sock, it blocks at recvmsg () system call.
Will it be better to change NetLink sockets to non blocking mode so application may handle erroneous netlink message smoothly? "nl_socket_set_nonblocking(m_socket);"
(gdb) bt
#0 0x00007fcec4d3559d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fcec630483e in nl_recv (sk=0x2096090, nla=0x7fff2f2b2860, buf=0x7fff2f2b2990, creds=0x7fff2f2b2998) at /sonic/src/libnl3/libnl3/./lib/nl.c:703
#2 0x00007fcec63051a5 in recvmsgs (cb=, sk=) at /sonic/src/libnl3/libnl3/./lib/nl.c:849
#3 nl_recvmsgs_report (sk=0x2096090, cb=0x2094450) at /sonic/src/libnl3/libnl3/./lib/nl.c:1062
#4 0x00007fcec6305459 in nl_recvmsgs (sk=, cb=) at /sonic/src/libnl3/libnl3/./lib/nl.c:1086
#5 0x00007fcec5e797ca in swss::NetLink::readMe (this=) at netlink.cpp:106
#6 0x00007fcec5e6e542 in swss::Select::select (this=this@entry=0x7fff2f2b2bb0, c=c@entry=0x7fff2f2b2b88, fd=fd@entry=0x7fff2f2b2b84, timeout=timeout@entry=999999)
at select.cpp:92
#7 0x000000000040317f in main (argc=, argv=) at teamsyncd.cpp:45
parseMacString() considers 52:54:00:25::E9 and 52:54:00:25:E9 as good mac addresses
Noticed that there is no way to set the loglevel for SAI anymore in master branch based image.
Tried master.333 as well as master.339 and both are not working…
admin@str-s6000-acs-8:/var/log/swss$ swssloglevel -p
_SAI_API_ACL Unknown log level
_SAI_API_BFD Unknown log level
_SAI_API_BMTOR Unknown log level
_SAI_API_BRIDGE Unknown log level
_SAI_API_BUFFER Unknown log level
_SAI_API_COUNTER Unknown log level
_SAI_API_DEBUG_COUNTER Unknown log level
_SAI_API_DTEL Unknown log level
_SAI_API_FDB Unknown log level
_SAI_API_HASH Unknown log level
_SAI_API_HOSTIF Unknown log level
_SAI_API_IPMC Unknown log level
_SAI_API_IPMC_GROUP Unknown log level
_SAI_API_ISOLATION_GROUP Unknown log level
_SAI_API_L2MC Unknown log level
_SAI_API_L2MC_GROUP Unknown log level
_SAI_API_LAG Unknown log level
_SAI_API_MACSEC Unknown log level
_SAI_API_MCAST_FDB Unknown log level
_SAI_API_MIRROR Unknown log level
_SAI_API_MPLS Unknown log level
_SAI_API_NAT Unknown log level
_SAI_API_NEIGHBOR Unknown log level
_SAI_API_NEXT_HOP Unknown log level
_SAI_API_NEXT_HOP_GROUP Unknown log level
_SAI_API_POLICER Unknown log level
_SAI_API_PORT Unknown log level
_SAI_API_QOS_MAP Unknown log level
_SAI_API_QUEUE Unknown log level
_SAI_API_ROUTE Unknown log level
_SAI_API_ROUTER_INTERFACE Unknown log level
_SAI_API_RPF_GROUP Unknown log level
_SAI_API_SAMPLEPACKET Unknown log level
_SAI_API_SCHEDULER Unknown log level
_SAI_API_SCHEDULER_GROUP Unknown log level
_SAI_API_SEGMENTROUTE Unknown log level
_SAI_API_STP Unknown log level
_SAI_API_SWITCH Unknown log level
_SAI_API_SYSTEM_PORT Unknown log level
_SAI_API_TAM Unknown log level
_SAI_API_TUNNEL Unknown log level
_SAI_API_UDF Unknown log level
_SAI_API_VIRTUAL_ROUTER Unknown log level
_SAI_API_VLAN Unknown log level
_SAI_API_WRED Unknown log level
buffermgrd NOTICE
fpmsyncd NOTICE
intfmgrd NOTICE
natmgrd NOTICE
natsyncd NOTICE
nbrmgrd NOTICE
neighsyncd NOTICE
orchagent NOTICE
portmgrd NOTICE
portsyncd NOTICE
syncd NOTICE
teammgrd NOTICE
teamsyncd NOTICE
tlm_teamd NOTICE
vlanmgrd NOTICE
vrfmgrd NOTICE
vxlanmgrd NOTICE
admin@str-s6000-acs-8:/var/log/swss$
admin@str-s6000-acs-8:/var/log/swss$ swssloglevel -l SAI_LOG_LEVEL_ERROR -s -c SWITCH
Component not present in DB
Usage: swssloglevel [OPTIONS]
SONiC logging severity level setting.
Options:
-h print this message
-l loglevel value
-c component name in DB for which loglevel is applied (provided with -l)
-a apply loglevel to all components (provided with -l)
-s apply loglevel for SAI api component (equivalent to adding prefix "SAI_API_" to component)
-p print components registered in DB for which setting can be applied
Examples:
swssloglevel -l NOTICE -c orchagent # set orchagent severity level to NOTICE
swssloglevel -l SAI_LOG_LEVEL_ERROR -s -c SWITCH # set SAI_API_SWITCH severity to ERROR
swssloglevel -l SAI_LOG_LEVEL_DEBUG -s -a # set all SAI_API_* severity to DEBUG
admin@str-s6000-acs-8:/var/log/swss$
admin@str-s6000-acs-8:~$ show vers
SONiC Software Version: SONiC.master.339-31baf381
Distribution: Debian 10.4
Kernel: 4.19.0-6-2-amd64
Build commit: 31baf38
Build date: Tue Jul 7 07:01:13 UTC 2020
Built by: johnar@jenkins-worker-8
Platform: x86_64-dell_s6000_s1220-r0
HwSKU: Force10-S6000
ASIC: broadcom
Serial Number: 1QBRX42
Uptime: 20:32:35 up 1:09, 1 user, load average: 1.12, 1.09, 1.12
Description
swssloglevel -p
crashes with the following coredump:
swssloglevel.1585549658.921.core.gz
[New LWP 820]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `swssloglevel -p'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005594c075c592 in std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=..., __os=...) at /usr/include/c++/6/bits/basic_string.h:5345
5345 /usr/include/c++/6/bits/basic_string.h: No such file or directory.
(gdb) bt
#0 0x00005594c075c592 in std::operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=..., __os=...) at /usr/include/c++/6/bits/basic_string.h:5345
Azure/sonic-buildimage#1 main (argc=<optimized out>, argv=<optimized out>) at loglevel.cpp:138
The related code
if (print)
{
if (argc != 2)
{
exitWithUsage(EXIT_FAILURE, "-p option does not accept other options");
}
std::sort(keys.begin(), keys.end());
for (const auto& key : keys)
{
const auto redis_key = std::string(key).append(":").append(key);
auto level = redisClient.hget(redis_key, DAEMON_LOGLEVEL);
std::cout << std::left << std::setw(30) << key << *level << std::endl;
//^This line causes the error
}
return (EXIT_SUCCESS);
}
Steps to reproduce the issue:
1.
2.
3.
Describe the results you received:
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of show version
:
admin@r-tigris-04:~$ show version
SONiC Software Version: SONiC.HEAD.233-0b52be3e
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 0b52be3e
Build date: Tue Mar 24 07:50:08 UTC 2020
Built by: johnar@jenkins-worker-7
Platform: x86_64-mlnx_msn3800-r0
HwSKU: Mellanox-SN3800-D112C8
ASIC: mellanox
Serial Number: MT1925X00008
Uptime: 07:31:51 up 11 min, 1 user, load average: 4.78, 4.42, 2.78
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx HEAD.233-0b52be3e b859341cd305 383MB
docker-syncd-mlnx latest b859341cd305 383MB
docker-teamd HEAD.233-0b52be3e db902a800d3f 308MB
docker-teamd latest db902a800d3f 308MB
docker-router-advertiser HEAD.233-0b52be3e ebad18d3ac3b 283MB
docker-router-advertiser latest ebad18d3ac3b 283MB
docker-platform-monitor HEAD.233-0b52be3e f9a3eab42069 628MB
docker-platform-monitor latest f9a3eab42069 628MB
docker-database HEAD.233-0b52be3e 4f9cddcb8d89 283MB
docker-database latest 4f9cddcb8d89 283MB
docker-orchagent HEAD.233-0b52be3e 9cbc5b395350 326MB
docker-orchagent latest 9cbc5b395350 326MB
docker-nat HEAD.233-0b52be3e dc43ed837813 309MB
docker-nat latest dc43ed837813 309MB
docker-sonic-telemetry HEAD.233-0b52be3e 215e87e0755e 345MB
docker-sonic-telemetry latest 215e87e0755e 345MB
docker-dhcp-relay HEAD.233-0b52be3e e5382e0a33e4 293MB
docker-dhcp-relay latest e5382e0a33e4 293MB
docker-sonic-mgmt-framework HEAD.233-0b52be3e a0473b8878af 420MB
docker-sonic-mgmt-framework latest a0473b8878af 420MB
docker-sflow HEAD.233-0b52be3e ab92c2bddce9 308MB
docker-sflow latest ab92c2bddce9 308MB
docker-lldp-sv2 HEAD.233-0b52be3e 2237927672e0 305MB
docker-lldp-sv2 latest 2237927672e0 305MB
docker-snmp-sv2 HEAD.233-0b52be3e bcb58a78e025 340MB
docker-snmp-sv2 latest bcb58a78e025 340MB
docker-fpm-frr HEAD.233-0b52be3e 18544b0dedfa 328MB
docker-fpm-frr latest 18544b0dedfa 328MB
Attach debug file sudo generate_dump
:
(paste your output here)
The CI is unstable and rarely passed the build stage due to timeout and the following logs
[ RUN ] BinarySerializer.serialize_overflow
ERROR:- setData: There are not enough buffer for binary serializer to serialize, key count: 1, data length 15, buffer size: 50
[ OK ] BinarySerializer.serialize_overflow (0 ms)
[ RUN ] BinarySerializer.deserialize_overflow
ERROR:- deserializeBuffer: serialized value data was truncated,, value length: 11 increase buffer size: 113
***
[ RUN ] ZmqConsumerStateTable.test
***
# a lot of logs. No new test is started
DEBUG:> select: enter
DEBUG:< select: exit
DEBUG:> select: enter
##[error]The operation was canceled.
For example, for PR #777 only 2 of 8 CI runs passed the build stage(Once before ZMQ was merged)
Fixed, re-connect when exception happen.
I will create new UT in swss-common for DBInterface::_onetime_connect resource issue.
Originally posted by @liuh-80 in sonic-net/sonic-snmpagent#290 (comment)
Hi,
I am occasionally seeing below error message. This happens either when I'm restarting all containers (swss, syncd, teamd) or when multiple vlans (around 500 to 1000 vlans) are being configured.
err teamsyncd: :- readData: netlink reports out of memory on reading a netlink socket. High possibility of a lost message
The swss common library currently sets netlink socket read buffer size as 3 MB. Code is here.
/* Set socket buffer size to 3MB */
nl_socket_set_buffer_size(m_socket, 3145728, 0);
I tried increasing this to 16 MB, but did not help.
Strangely, portsyncd also uses same swss common library and I'm not seeing any issues there.
Any pointers on what could be causing this, or how this can be handled?
Either the instruction about enabling notify-keyspace-events should be provided, or the test case itself has the keyspace notification enabled. Otherwise people may struggle with the hang problem.
jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ ./tests --gtest_filter=SubscriberStateTable*
Running main() from gtest_main.cc
Note: Google Test filter = SubscriberStateTable*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from SubscriberStateTable
[ RUN ] SubscriberStateTable.set
650:M 16 Mar 23:43:21.450 * DB saved on disk
^C
jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ redis-cli config set notify-keyspace-events KEA
OK
jipan@ddedf238056c:/sonic/src/sonic-swss-common/tests$ ./tests --gtest_filter=SubscriberStateTable*
Running main() from gtest_main.cc
Note: Google Test filter = SubscriberStateTable*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from SubscriberStateTable
[ RUN ] SubscriberStateTable.set
650:M 16 Mar 23:45:54.168 * DB saved on disk
[ OK ] SubscriberStateTable.set (46 ms)
[ RUN ] SubscriberStateTable.del
650:M 16 Mar 23:45:54.182 * DB saved on disk
[ OK ] SubscriberStateTable.del (14 ms)
[ RUN ] SubscriberStateTable.table_state
650:M 16 Mar 23:45:54.188 * DB saved on disk
++++++++++----------[ OK ] SubscriberStateTable.table_state (1140 ms)
[ RUN ] SubscriberStateTable.one_producer_multiple_subscriber
650:M 16 Mar 23:45:55.337 * DB saved on disk
Starting 64 subscribers on redis
+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Done.
[ OK ] SubscriberStateTable.one_producer_multiple_subscriber (5169 ms)
[----------] 4 tests from SubscriberStateTable (6369 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (6369 ms total)
[ PASSED ] 4 tests.
Xcvrd listens to configuration changes here. Steps to reproduce:
Set any field in PORT table of CONFIG_DB as:-
redis-cli -n 4 hset "PORT|Ethernet0" "tx_power" "-13"
Add a print here to output the 'fvp' values and you find more than 6 events are received by Xcvrd
Jul 22 20:05:09.335091 sonic WARNING pmon#xcvrd[204620]: $$$handle_port_update_even(): {'admin_status': 'down', 'alias': 'Ethernet1/1', 'index': '1', 'lanes': '1,2,3,4,5,6,7,8', 'laser_freq': '193100', 'speed': '400000', 'tx_power': '-13', 'description': '', 'oper_status': 'up', 'mtu': '9100'}
Jul 22 20:05:09.342073 sonic WARNING pmon#xcvrd[204620]: message repeated 6 times: [ $$$handle_port_update_even(): {'admin_status': 'down', 'alias': 'Ethernet1/1', 'index': '1', 'lanes': '1,2,3,4,5,6,7,8', 'laser_freq': '193100', 'speed': '400000', 'tx_power': '-13', 'description': '', 'oper_status': 'up', 'mtu': '9100'}]
Clearly there are duplicates events pushed to Xcvrd which can impact performance of datapath initliazation especially during 400G link bring up sequence.
Thie PR made the existence of the database config mandatory.
To preserve previous behavior, I think it's better to make the config optional.
Is it possible to change the implementation to use DEFAULT_UNIXSOCKET if the database config doesn't exist?
One scenario:
Existing table data -- Key1 : f1/v1, f2/v2
Two produceStateTable operation on key1, del then set,
del --- key1
set --- key1: f1/v1, f3/v3
producerStateTable uses SADD to store key1, del and set may be combined if consumerState is not quick enough to pickup del before set. what is left in redis DB will be:
key1: f1/v1, f2/v2, f3/v3
While the correct one should be key1: f1/v1, f3/v3
The merge of of del and set may trigger other unexpected behavior for the user of consumerState.
For PR #852
Azure.sonic-swss-common Failing after 41m — Build #20240221.2 failed
[ RUN ] libsairedis.ars
[ OK ] libsairedis.ars (1 ms)
[ RUN ] libsairedis.ars_profile
[ OK ] libsairedis.ars_profile (0 ms)
[----------] 74 tests from libsairedis (11 ms total)
[----------] Global test environment tear-down
Assertion failed: pfd.revents & POLLIN (src/signaler.cpp:265)
/bin/bash: line 6: 24417 Aborted (core dumped) ${dir}$tst
FAIL: testslibsairedis
[==========] Running 79 tests from 16 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from Switch
using docker on x86 platform we can build binaries compatible with arm, will be nice if the build instruction can specify how to target a certain platform.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.