uwcms / ipmc Goto Github PK
View Code? Open in Web Editor NEWUniversity of Wisconsin ZYNQ IPMC
University of Wisconsin ZYNQ IPMC
Some classes, mostly drivers that are abstract, miss having their destructors as virtual.
This can cause serious memory leaks when objects gets destroyed and needs fixing.
There might be a possible problem in the PSUART driver interrupt.
When receiving data the buffer can be full and this won't empty the RX FIFO
Also, in the recv() function the DMA range request might be too small and not all bytes will be read from the RX FIFO. There is no code to take this into account and request another buffer if there is a rollover. I believe this is fine if the IRQ gets triggered immediately after but it needs verification.
There are various flash related things in IPMC.cc which need to be moved to a better home. Please find them a new home and clean them up.
While running on a CDB with no UART cable attached the PS UART driver triggers an assert. This can be easily reproduced by just unplugging the cable.
This can be the root cause of some crashes we have seen during startup.
The interrupt helper class initializes and enables interrupts before the child classes, normally the drivers, constructors.
Interrupts need to be enable only after the drivers had time to do their initial bring up.
Implement exception support while attaining the following goals:
networkd task starves other processes when waiting for a DHCP address, probably internal to lwIP. Investigation is required.
PID Name BasePrio CurPrio StackHW CPU% CPU
10 networkd 3 3 1716 61% 2604622
2 IDLE 0 0 156 38% 1622663
Console might become irresponsible.
> ipmc_image_erase A
This is a destructive operation! Image A is the current boot target. Refusing.
> ipmc_image_erase B
This is a destructive operation! Please run:
ipmc_image_erase B 153 [random number]
> ipmc_image_erase B 153
Erasing Image B
As the title says.
Maybe check out how other IPs do it, like the GPIO IP or UART IP.
In commit ba81059 a few functions were added to do validation of incoming bin files either via FTP or UART.
Only simple checks are performed at this stage and the following are recommended to be added:
A problem related to cache incoherency and DMA in the lwip driver makes incoming packets corrupted.
To avoid problems until a solution is found lwip checksum were enabled in 16379ec.
A post was added to the Xilinx forums asking for help, no solution so far.:
https://forums.xilinx.com/t5/Embedded-Processor-System-Design/FreeRTOS-lwip-driver-cache-issues-in-2018-2/m-p/879300
This is both observable via UART or telnet:
When card boots or connection via Telnet is established, there is a bug where no commanded has been executed yet the auto complete or the first initial command is missing a new line - only the first one.
This is reproduced by starting a telnet session, write 'eeprom.' and press tab for example (or any other command that will auto complete).
After executing one command the bug disappears.
Implement a PS_QSPI_Flash driver, able to read and write the QSPI flash.
The Xilinx xilsf library may be useful (it can be enabled in the BSP).
Concerns: If at all possible we need to stay in 3 bit addressing mode, or remain compatible with it (since a watchdog reboot could at any time return us to the bootrom), and ideally on page 0 (since expanding the flash to a larger model will make this relevant and the boot manager would reside at 0x00000000). If necessary a watchdog pre-kill hook mechanism can be arranged to increase the likelyhood of correct behavior, but this is obviously not strictly preferable.
Doxygen comments with ///! should be replaced with /// or //! instead. ///! is not proper doxygen format.
Automatically detect IPMC HW revision on boot (presumably GPIOs).
The LED controller does not currently support the IPMI blue LED blink modes.
The Long Blink shall be a cycle of 100 ms of off followed by 900 ms of illumination.
The Short Blink shall be a cycle of 100 ms of illumination followed by 900 ms of off.
Now that we have #16 and #18, it is extremely important to consider that thrown exceptions will not call xSemaphoreGive()
, xSemaphoreGiveRecursive()
, or portEXIT_CRITICAL()
as the stack unwinds, leaving the IPMC in a deadlock-immanent state.
Upgrade MutexLock
to support non-immediate locks and add RecursiveMutexLock
and CriticalSection
to handle these cases as well. Then convert all existing uses of semaphores and critical sections to use these mechanisms so that the semaphores and critical sections are handled properly as the stack unwinds.
Consider adding a stack trace when certain task trips the watchdog timer.
There seems to be no built in functions / library to backtrace while providing a specific stack pointer, so very likely this would need to be implemented from scratch.
Printing the task's program counter, which would be easy, won't yield much info.
There is a major bug somewhere in the I2C driver where after a few operation (unclear how many) the driver won't be able to talk to the PL interface which will cause operations to timeout.
It is unclear exactly what is happening but this is likely related to the problem seen before where sending and receiving 1byte sometimes cause the PL logic to lock and nothing recovers it.
The UART console service occasionally outputs control sequence garbage. This would occur because for some reason the \x1b (ESC) characters starting escape sequences are dropped (in either direction, depending on the code sequence observed).
I suspect this is a problem in our interactions with the Xilinx UART driver, but have not been able to confirm or trace it. Hopefully the introduction of telnet using the same service core will provide some insight.
If after a clean power on boot, but not a WDT restart, the card has a WDT reset within the 20/30 seconds of operation, go to fallback.
lwIP has been updated on commit 3d53d01.
Some lines of the update have been commented (tagged with 'TODO') because of incompatibilities between Vivado 2017.2 and 2017.4. These lines need to be uncommented when Vivado gets upgraded.
After lwip 2.0.2 update, interface up and link up/down seem to be called at odd times. Network down is never called. The interface is always up but the up callback gets called several times, likely when DHCP succeeds after a few seconds.
This has no effect on the overall operation of the IPMC but better handling of these messages would be appropriate.
Executing esm X
will issue an ESM reset and the status messages will start to show up.
Multiple hw_servers trying to connect to the same XVC endpoint seem to corrupt lwip, likely because it keeps piling up connections.
Attempt to automatically reject all incoming connections if there is a connection established instead of backlogging them.
Also, improve status messages to inform the user of multiple hw_servers trying to connect to the same endpoint.
Now that #16 is complete, we are able to use exceptions in our code. Various things can now be upgraded to take advantage of this. The most obvious case is to review any configASSERT
s in our code to determine if they would be better handled as catchable exceptions.
Unclear why this is happening, but sometimes when the push-button to reset the IPMC gets pressed, or after executing the 'restart' command, the Zynq seems to lock, which then requires a power cycle to solve since pressing the reset button again won't do anything.
When locked, the Red LED of the PHY also starts to blink very fast (~5Hz I would say), unclear as well why.
Needs further investigation.
read() and write() functions of RingBuffer are wrapped around a CriticalGuard, meaning interrupts will be disabled even for large reads and writes, which can cause performance issues.
Consider only guarding the necessary info to do the copy and then execute the copy itself outside the guard, for example. Alternatively, don't guard it and let the drivers handle interrupts locally.
Either append the checksum to the end of the .bit file or add it to the user field after the header.
Some drivers might be doing excessive use of disable/enable of global interrupts for critical portions of the code, this can cause considerable slow down if many drivers do this.
Consider audit drivers and changing the disable/enable from global interrupts to interrupts only used by said driver.
Right now FTP authentication is hard coded, add the same method as Telnet and have it as a central authentication point. Check how the FTP is coded right now.
_prepare()
technically doesn't take lambdas compatible with its other execution modes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.