Comments (21)
I don't understand why I should switch off the interrupt. It must be enough that the processing in the interrupt is interrupted.
I have now tried this with just the mask and the error seems to be gone. The uptime is already more than 1 day.
ethernet_arch_lwip_gpio_mask();
const bool state = KNX_NETIF.isLinked();
ethernet_arch_lwip_gpio_unmask();
return state;
I had thought twice that it had hung up again. But a reconnect via showed me the running console with the corresponding uptime.
from arduino-pico.
For the sake of completeness: It's running :D Thank you.
from arduino-pico.
Please provide a MCVE so this could be reproduced elsewhere. IRQ mode is stable AFAIK and I've personally had the AdvancedWebServer
running for ~24hrs with 3 different browsers refreshing every couple seconds w/o incident, and others have used it as well in their own testing (i.e. the ESP32 WIFI driver port recently added support for it).
from arduino-pico.
Also, what exact Ethernet device is being used? One way a core would be stuck in the IRQ handler would be that the IRQ line never gets deasserted by the Ethernet adapter. If it's a shared line or noise, things could get stuck w/the IRQ asserted the the CPU continually calling IRQ handler.
from arduino-pico.
i was expecting this answer. i don't know how to do it. i have a huge framework here. it seems to be nothing simple. i have already switched off all my own DMAs & interrupts.
but what i also noticed is that i keep losing the link sporadically.
the hardware is a w5500 -> https://github.com/OpenKNX/OpenKNX/wiki/REG1-Eth
from arduino-pico.
The problem also occurs with other hardware such as https://github.com/OpenKNX/OpenKNX/wiki/REG1-Base-IP
In addition, we monitor the loop runtime. Our loops do not run longer than 6ms. Nevertheless, I keep getting messages about runtimes of >100ms.
0d 00:19:16: Common: Warning: The loop took longer than usual (146 >= 100)
from arduino-pico.
I thought the core was hung in an IRQ loop, so how would the loop
timer ever output anything?
... It looks as if it is only running in interrupt mode.
Again, w/o any code or way of reproducing there's not much we can do here other than guess.
Are you calling any raw LWIP functions in your code? If so, you need to protect those calls with the proper mutex or you could end up w/re-entrancy which will really mess up LWIP. The included libraries all have their calls protected (AFAIK!) so if you're just using WiFiClient
or WiFiUDP
then this probably isn't an issue.
Is there any way of seeing where your excess time is being spent in loop? I'm scratching my head here because if there was a deadlock somewhere in the LWIP code it would never advance. If there was a timeout it would be on the order of 5,000 milliseconds, not 10s-100s of ms.
from arduino-pico.
I thought the core was hung in an IRQ loop, so how would the loop timer ever output anything?
the loop warning appears regardless of the error. but only if the interrupt mode is active. therefore i suspect a connection.
Again, w/o any code or way of reproducing there's not much we can do here other than guess.
if i could do that, i would find the error myself and could write you what needs to be fixed :)
Are you calling any raw LWIP functions in your code?
i check whether the link is connected and if the connection is lost, i call the dhcp call. unfortunately, none of this is done by the system itself.
the network handling is implemented in this module:
https://github.com/OpenKNX/OFM-Network/blob/v1/src/NetworkModule.cpp
This is summarized here
I check every 500ms if link is active with
KNX_NETIF.isLinked();
when state chenge to true, i start dhcp to renew dhcp address
netif_set_link_up(KNX_NETIF.getNetIf());
if (_useStaticIP)
netif_set_ipaddr(KNX_NETIF.getNetIf(), _staticLocalIP);
else
dhcp_network_changed_link_up(KNX_NETIF.getNetIf());
when state change to false, i remove current address
netif_set_ipaddr(KNX_NETIF.getNetIf(), 0);
netif_set_link_down(KNX_NETIF.getNetIf());
in principle, i would prefer the stack to handle these basic functions. but so far we have always had to do it ourselves.
Is there any way of seeing where your excess time is being spent in loop?
Our calls in the loop are limited to max. 6ms. Then there is a return to main. in my opinion, the long time can only be caused by the interrupt.
from arduino-pico.
Those calls are not protected and you may end up re-entering LWIP which gives undefined behavior. Can't say it's causing your problem, but it's not safe in general.
It's simple to add the infra to protect them. Look at cores/lwip_wrap.cpp
and the lib/platform_wrap.txt
file. Using those templates it's pretty simple to add in the calls you're doing.
from arduino-pico.
you've already helped me with that :D
that means, if i leave the handling temporarily out, the error should be gone. then if that were so, you could think about getting the interrupt safe.
from arduino-pico.
Our calls in the loop are limited to max. 6ms. Then there is a return to main. in my opinion, the long time can only be caused by the interrupt.
Possibly, but the thing is the IRQ mode calls the exact same handler (LWIPIntfDev<template>::handlePackets
) as the async_context
one does. It just doesn't poll every 20ms, instead it reads as soon as the HW says there's a packet:
arduino-pico/libraries/lwIP_Ethernet/src/LwipIntfDev.h
Lines 439 to 446 in f737be3
arduino-pico/libraries/lwIP_Ethernet/src/LwipEthernet.cpp
Lines 173 to 188 in f737be3
The packet handlers will try and read up to 10 packets for all HW:
arduino-pico/libraries/lwIP_Ethernet/src/LwipIntfDev.h
Lines 536 to 545 in f737be3
from arduino-pico.
you've already helped me with that :D
Actually, the handling may need to change since now you need to disable the GPIO interrupts as well as grab a mutex. See about adding a ethernet_arch_lwip_gpio_mask
and ethernet_arch_lwip_gpio_unmask
before taking the mutex and after releasing it. Again, I don't imagine that code gets called much so it may not be related to your issue here, but better safe than sorry...
from arduino-pico.
One other thing, thinking about it, are you getting a packet storm? It's 2024 so I hope you're not on a hub, but with the IRQ mode it may be possible if you send 100s of packets at high speed that the 10-packet-per-IRQ call would end up being called over and over.
In async_context mode, you will get at max 10 packets every polling period (20ms by default). If your HW gets 20 packets in 20ms, the HW will throw away half of them.
In IRQ mode, if after pulling 10 packets out of the HW the IRQ gets re-asserted, the IRQ will be called again almost immediately. You'd get all 20 packets (assuming LWIP buffers available) but spend 2x the processing time doing so, of course.
from arduino-pico.
I was planning to do it this way.
ethernet_arch_lwip_begin();
return KNX_NETIF.isLinked();
ethernet_arch_lwip_end();
should i do ethernet_arch_lwip_begin + ethernet_arch_lwip_gpio_mask or instead?
I hope you're not on a hub
no it was modern network equipment :D
from arduino-pico.
That's not going to call the gpio masking, so you'll want to add in the calls manually. (Also you'll not want to return
before unlocking it. 😆 ) But, again, it seems like those calls will be very infrequently made.
from arduino-pico.
Unfortunately, removing the calls didn't help. at least the device hung up after about 6-7 hours. but it never lasted that long.
what i also noticed during the test our warning:
0d 01:11:06: Common: Warning: The loop took longer than usual (133 >= 100)
0d 01:21:06: Common: Warning: The loop took longer than usual (132 >= 100)
0d 01:31:06: Common: Warning: The loop took longer than usual (140 >= 100)
is displayed exactly every 10 minutes. and by that I mean exactly 10m
from arduino-pico.
There's no built-in 600 second timers AFAIK, so can't really help you there from the core side.
Off the top of my head, what is your DHCP lease lifetime? Could you be requesting and receiving a new (same) lease at 10m intervals?
I guess one thing I would have to say is that the polled networking might be okay for a soft-realtime system, but the IRQ one would not. As mentioned before, the IRQ will try and process all packets, meaning that if you get a packet storm the CPU usage is unbounded as every packet will at least be attempted to be read. For the polled/async_context version you're guaranteed no more than 10 packets per polling period, so there is an soft upper bound for the time spent doing it. Any more would be thrown away by the HW.
You could look at checking the packets being processed every loop. Look at the LwipIntfDev<>::packets{received,sent}()
method.
You could also do some instrumentation on the IRQ side, tracking the delta rp2040.getCycleCount64()
from IRQ start to finish.
from arduino-pico.
The lag doesn't really bother me. Our platform uses dma and interrupts for time criticals. if the loop runs somethimes a few ms, it no longer has any effect.
dhcp does not seem to be the reason. during the lag there are neither dhcp requests on the network nor do the leasetimes match.
currently the problem can be reproduced quite well by calling isLinked.
This one doesn't work.
ethernet_arch_lwip_begin();
const bool state = KNX_NETIF.isLinked();
ethernet_arch_lwip_end();
return state;
but if I understand correctly, I have to do this so that the interrupt is skipped.
ethernet_arch_lwip_gpio_mask();
const bool state = KNX_NETIF.isLinked();
ethernet_arch_lwip_gpio_unmask();
return state;
or should do both?
from arduino-pico.
You'll want to disable the GPIO interrupt then take the lwip mutex (ethernet_arch_lwip_gpio_mask(); ethernet_arch_lwip_begin()
) and the reverse order when done. There may be a better way of centralizing these steps (i.e. move it silently into the isLinked
method) if this does turn our to be the issue.
from arduino-pico.
In this case, it's not LWIP but SPI which you're protecting against re-entrancy.
arduino-pico/libraries/lwIP_w5500/src/utility/w5500.h
Lines 86 to 91 in 3aaa132
That call in the function does SPI operations. If you get a packet-avail IRQ on the GPIO while that SPI is running you're going to start another SPI operation in the middle of an ongoing one. At best the internal SPI object state will be destroyed. At worst, it'll completely confuse the W5500 chip and you'd need a reset/power cycle to clear it up.
It's a bug in the w5500 driver, I would say. Adding in the mask call before the lwip_mutex_grab call and after the lwip mutex-release call should be done. I need to verify the other devices, too, now that a generic failure mode was found.
from arduino-pico.
Also in the ENC28J driver, but not in the W5100 (because there is no link register to read so it always returns true
).
from arduino-pico.
Related Issues (20)
- W25Q64JV SPI Flash weird behavior
- UnknownPackageError: Could not find the package HOT 2
- Feature Request: dmadison ArduinoXInput support HOT 1
- FYI: Build on Windows HOT 4
- Flashing settings for Listfiles-USB - Docs HOT 4
- Version 3.8.0 Compiler Error HOT 1
- I2S Right Channel is One Sample Shifted in Time from Left HOT 1
- Feature Request: Integration of class MacAddress from arduino-esp32? HOT 2
- Fix enum redeclaration issue when using WiFi and Ethernet HOT 14
- WiFiServer and FreeRTOS - Repeatedly accepting connection eventually causes freeze HOT 4
- Joystick library - please update submodule HOT 1
- LittleFS file not opening on Pico HOT 3
- Question: What limits lwip usage within a FreeRTOS environment to setup/loop and not tasks?
- When using platformio, earlephilhower-based firmware doesn't run after reboot-to-bootloader (but does after hard BOOTSEL+RESET) HOT 4
- Hang with Pico W when using a W5500 and WiFi at the same time.
- SdFat speed regression (because of pico SPI driver change) HOT 5
- Fix for Linux hanging on PICO switching to storage mode for upload HOT 4
- Feature Request: Ability to change bluetooth device name from SerialBT
- Bluetooth reconnect fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arduino-pico.