Coder Social home page Coder Social logo

Comments (10)

mrdeep1 avatar mrdeep1 commented on September 27, 2024 1

I don't see issues with your solution. However, it is worth looking at errno to see if it is something like EAGAIN caused by a full set of ppp buffers. Perhaps you need to see what errno contains in the failure and only make that fatal.

If you could let me know the errno value on failure, that would be great.

from libcoap.

mrdeep1 avatar mrdeep1 commented on September 27, 2024 1

See #1446 and #1447 which should address and fix both issues you have found and raised.

from libcoap.

mrdeep1 avatar mrdeep1 commented on September 27, 2024

Thanks for your investigations and research into this. In the latest master for idf-extra-components, coap/port/coap_mbedtls.c which is libcoap version 4.3.4, this issue (in 4.3.2) is fixed with

diff --git a/src/coap_mbedtls.c b/src/coap_mbedtls.c
--- a/src/coap_mbedtls.c
+++ b/src/coap_mbedtls.c

@@ -220,10 +219,21 @@ coap_dgram_write(void *ctx, const unsigned char *send_buffer,

   if (c_session) {
     coap_mbedtls_env_t *m_env = (coap_mbedtls_env_t *)c_session->tls;
-    result = coap_session_send(c_session, send_buffer, send_buffer_length);
+
+    if (!coap_netif_available(c_session)
+#if COAP_SERVER_SUPPORT
+        && c_session->endpoint == NULL
+#endif /* COAP_SERVER_SUPPORT */
+                                      ) {
+      /* socket was closed on client due to error */
+      errno = ECONNRESET;
+      return -1;
+    }
+    result = (int)coap_netif_dgrm_write(c_session,
+                                        send_buffer, send_buffer_length);
     if (result != (ssize_t)send_buffer_length) {
-      coap_log_warn("coap_network_send failed (%zd != %zu)\n",
-               result, send_buffer_length);
+      coap_log_warn("coap_netif_dgrm_write failed (%zd != %zu)\n",
+                    result, send_buffer_length);
       result = 0;
     }
     else if (m_env) {

but coap_netif_*() functions were introduced into 4.3.2.

from libcoap.

boribosnjak avatar boribosnjak commented on September 27, 2024

Thank you for your quick response.

I retested with the new libcoap version from IDF components (4.3.4). While it looks different, the result remains the same: coap_io_process does not return at some point.

Here is the updated log file (I added a line to output when coap_io_process is entered):

I (55300) CoAP_client: CoapClient::coap_start_psk_session
D (55300) RANDOM: getrandom(buf=0x3f811e8c, buflen=2, flags=0)
D (55300) RANDOM: getrandom returns 2
D (55300) RANDOM: getrandom(buf=0x3f811f30, buflen=4, flags=0)
D (55300) RANDOM: getrandom returns 4
I (55320) CoAP_client: created COAP session
I (55320) CoAP_client: start COAP request
I (55320) CoAP_client: CoapClient::doRequest, request pdu mid: 16018
I (55320) CoAP_client: CoapClient::doRequest, request pdu token: 01
D (55320) RANDOM: getrandom(buf=0x3ffbd248, buflen=1, flags=0)
D (55320) RANDOM: getrandom returns 1
I (55320) CoAP_client: coap_io_process entered
I (55790) CoAP_client: coap_io_process returned: 466, nextWait was: 1000
I (55790) CoAP_client: coap_io_process entered
I (57890) CoAP_client: coap_io_process returned: 2099, nextWait was: 1000
I (57920) [PPPOS CLIENT]: Disconnect requested.
W (57950) [PPPOS CLIENT]: status_cb: User interrupt (disconnected)
D (59080) [PPPOS CLIENT]: AT COMMAND: [AT..]
D (59090) [PPPOS CLIENT]: uart received total (6): ..OK..
D (59100) [PPPOS CLIENT]: AT RESPONSE: [..OK..]
I (59100) [PPPOS CLIENT]: Disconnected.
D (59100) [PPPOS CLIENT]: task pppos_client_ta stack-highwater: 1576
E (59100) [PPPOS CLIENT]: PPPoS TASK TERMINATED
I (59120) CoAP_client: coap_io_process entered
I (60120) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (60120) CoAP_client: coap_io_process entered
I (61120) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (61120) CoAP_client: coap_io_process entered
I (62120) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (62120) CoAP_client: coap_io_process entered
I (62890) CoAP_client: coap_io_process returned: 769, nextWait was: 1000
I (62890) CoAP_client: coap_io_process entered
I (63890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (63890) CoAP_client: coap_io_process entered
I (64890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (64890) CoAP_client: coap_io_process entered
I (65890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (65890) CoAP_client: coap_io_process entered
I (66890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (66890) CoAP_client: coap_io_process entered
I (67890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (67890) CoAP_client: coap_io_process entered
I (68890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (68890) CoAP_client: coap_io_process entered
I (69890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (69890) CoAP_client: coap_io_process entered
I (70890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (70890) CoAP_client: coap_io_process entered
I (71890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (71890) CoAP_client: coap_io_process entered
I (72890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (72890) CoAP_client: coap_io_process entered
I (73890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (73890) CoAP_client: coap_io_process entered
I (74890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (74890) CoAP_client: coap_io_process entered
I (75890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (75890) CoAP_client: coap_io_process entered
I (76890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (76890) CoAP_client: coap_io_process entered
I (77890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (77890) CoAP_client: coap_io_process entered
I (78890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (78890) CoAP_client: coap_io_process entered
I (79890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (79890) CoAP_client: coap_io_process entered
I (80890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (80890) CoAP_client: coap_io_process entered
I (81890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (81890) CoAP_client: coap_io_process entered
I (82890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (82890) CoAP_client: coap_io_process entered
I (83890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (83890) CoAP_client: coap_io_process entered
I (84890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (84890) CoAP_client: coap_io_process entered
I (85890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (85890) CoAP_client: coap_io_process entered
I (86890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (86890) CoAP_client: coap_io_process entered
I (87890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (87890) CoAP_client: coap_io_process entered
I (88890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (88890) CoAP_client: coap_io_process entered
I (89890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (89890) CoAP_client: coap_io_process entered
I (90890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (90890) CoAP_client: coap_io_process entered
I (91890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (91890) CoAP_client: coap_io_process entered
I (92890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (92890) CoAP_client: coap_io_process entered
I (93890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (93890) CoAP_client: coap_io_process entered
I (94890) CoAP_client: coap_io_process returned: 999, nextWait was: 1000
I (94890) CoAP_client: coap_io_process entered
(D) 2024-06-17 09:44:12. 1,220 | EMERGENCY | operator()() | TIMEOUT CHECK ... remaining: 780 seconds
(D) 2024-06-17 09:46:12. 0,871 | EMERGENCY | operator()() | TIMEOUT CHECK ... remaining: 660 seconds

We think that the session still seems to be available to the system. That is why we added the check for the result code of coap_session_send to fix the issue:

result = coap_session_send(c_session, send_buffer, send_buffer_length);
if (result >= 0) {

We could live with our quick fix, but are not sure if it could cause trouble in other situations (like retry-mechanism in bad connection environment etc.).

Do you see a risk in our solution?

from libcoap.

mrdeep1 avatar mrdeep1 commented on September 27, 2024

I cant immediately see any issues. Any ICMP response errors are picked up in the packet receive logic.

from libcoap.

boribosnjak avatar boribosnjak commented on September 27, 2024

I cant immediately see any issues. Any ICMP response errors are picked up in the packet receive logic.

Do you mean that you do not see issues with our quick-fix?

from libcoap.

boribosnjak avatar boribosnjak commented on September 27, 2024

I added a log to output the errno before going into coap_io_process. using 4.3.4 of coap library the output looks like following:

I (86263) CoAP_client: coap_io_process returned : 999 , nextWait was : 1000
I (86263) CoAP_client: coap_io_process entered
**I (87263) CoAP_client: Error coap : No more processes**

I (87263) CoAP_client: coap_io_process returned : 999 , nextWait was : 1000
I (87263) CoAP_client: coap_io_process entered
**I (88263) CoAP_client: Error coap : No more processes**

I (88263) CoAP_client: coap_io_process returned : 999 , nextWait was : 1000
I (88263) CoAP_client: coap_io_process entered
**I (89263) CoAP_client: Error coap : No more processes**

I (89263) CoAP_client: coap_io_process returned : 999 , nextWait was : 1000
I (89263) CoAP_client: coap_io_process entered
 (D) 2024-06-17 15:42:48.861,579 | EMERGENCY |       operator()() | TIMEOUT CHECK ... remaining: 780 seconds
 (D) 2024-06-17 15:44:48.862, 55 | EMERGENCY |       operator()() | TIMEOUT CHECK ... remaining: 660 seconds
 (D) 2024-06-17 15:46:48.861,183 | EMERGENCY |       operator()() | TIMEOUT CHECK ... remaining: 540 seconds

after 340 seconds our application continues to run in a different thread the output of errno looks like this:

(D) 2024-06-17 15:47:14. 92,735 | DN | send_data_and_wait() | response received
 (E) 2024-06-17 15:47:14. 98, 42 | DCS |    dcs_exception() | !!! Throw exception: timeout during call !!!!
 (E) 2024-06-17 15:47:14.101,554 | DCS | dcs_error_exception() | thrown exception: timeout during call with error number: timeout
**I (389823) Application: Error connect : No such file or directory**

after that timeout the system tries to reconnect to cellular network and with successful reconnect, the coap_client, which was still running in its thread, returned from the loop:


I (513643) [PPPOS CLIENT]: status_cb: Connected
I (513643) [PPPOS CLIENT]:    ipaddr    = 10.218.230.144
D (513653) cellular: PPP connected
D (513653) cellular: PPP connection successful
 (D) 2024-06-17 15:49:18. 58,112 | DN | send_data_and_wait() | blocking execution and waiting for resonse
**I (514663) CoAP_client: Error coap : No more processes**

I (514663) CoAP_client: coap_io_process returned : 425399 , nextWait was : 1000
E (514663) CoAP_client: No response from server
I (514663) CoAP_client: Received event in Coap_event_handler : 0x0000
I (514663) CoAP_client: Received event in Coap_NACK_handler, reason : 3
I (514663) CoAP_client: Cleaned up

part of the reconnect process is to disconnect coap before trying to reconnect again. with the new coap library (4.3.4) the system crashes. I guess we are not allowed to call coap_cleanup() twice!? what was not a problem with 4.3.1
here is the backtrace:

0x400825b1: panic_abort at /esp-idf-v4.4.6/components/esp_system/panic.c:408

0x40098729: esp_system_abort at /esp-idf-v4.4.6/components/esp_system/esp_system.c:137

0x4009ced5: __assert_func at /esp-idf-v4.4.6/components/newlib/assert.c:85

0x400994ad: xQueueSemaphoreTake at /esp-idf-v4.4.6/components/freertos/queue.c:1549 (discriminator 1)

0x4008179a: pthread_mutex_lock_internal at /esp-idf-v4.4.6/components/pthread/pthread.c:620

0x400d3614: pthread_mutex_destroy at /esp-idf-v4.4.6/components/pthread/pthread.c:585

0x401968b8: **coap_cleanup at /firmware/managed_components/espressif__coap/libcoap/src/coap_net.c:4048**

should I open a new ticket?

from libcoap.

mrdeep1 avatar mrdeep1 commented on September 27, 2024

As per coap_cleanup(3), coap_cleanup() should be the last call. You are seeing a mutex getting destroyed twice.

If you want to clean up everything else before restarting trying to do a new session, I suggest you use coap_free_context(3) which will do all of this for you.

Work could done here to support multiple coap_cleanup() calls, but it will need to be thread safe and cannot affect any other thread that is using CoAP.

from libcoap.

mrdeep1 avatar mrdeep1 commented on September 27, 2024

@boribosnjak These fixes are now in idf-extra-components. Can this issue now be closed?

from libcoap.

boribosnjak avatar boribosnjak commented on September 27, 2024

the fixes look good to me. thank you very much!

from libcoap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.