Coder Social home page Coder Social logo

ipmc's People

Contributors

jtikalsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ipmc's Issues

Possible issue with PSUART driver and receiver buffer

There might be a possible problem in the PSUART driver interrupt.
When receiving data the buffer can be full and this won't empty the RX FIFO
Also, in the recv() function the DMA range request might be too small and not all bytes will be read from the RX FIFO. There is no code to take this into account and request another buffer if there is a rollover. I believe this is fine if the IRQ gets triggered immediately after but it needs verification.

Clean up flash_* from IPMC.cc

There are various flash related things in IPMC.cc which need to be moved to a better home. Please find them a new home and clean them up.

Interrupt helper class enables interrupts before child class constructor

This can be the root cause of some crashes we have seen during startup.

The interrupt helper class initializes and enables interrupts before the child classes, normally the drivers, constructors.

Interrupts need to be enable only after the drivers had time to do their initial bring up.

Add exception support

Implement exception support while attaining the following goals:

  1. Prevent system wide locking if there are uncaught exceptions
  2. Attempt to have a stack trace or some sort of information about uncaught exceptions
  3. Have per-threat handling of exceptions

Investigate high CPU usage by lwip during autonegotiation

networkd task starves other processes when waiting for a DHCP address, probably internal to lwIP. Investigation is required.

PID Name             BasePrio CurPrio StackHW CPU% CPU
 10 networkd                3       3    1716  61% 2604622
  2 IDLE                    0       0     156  38% 1622663

Console might become irresponsible.

Add an IPMC Flash Image Erase command

> ipmc_image_erase A
This is a destructive operation! Image A is the current boot target. Refusing.
> ipmc_image_erase B
This is a destructive operation! Please run:
ipmc_image_erase B 153 [random number]
> ipmc_image_erase B 153
Erasing Image B

Add additional checks to binfile validation

In commit ba81059 a few functions were added to do validation of incoming bin files either via FTP or UART.

Only simple checks are performed at this stage and the following are recommended to be added:

  • md5 checks to partitions that have it (maybe refuse images that don't?)
  • bitfile/PL image CRC checks
  • Check proper order of partitions: FSBL -> PL -> PS
  • Verify partition size

Console new line issue right after start

This is both observable via UART or telnet:
When card boots or connection via Telnet is established, there is a bug where no commanded has been executed yet the auto complete or the first initial command is missing a new line - only the first one.

This is reproduced by starting a telnet session, write 'eeprom.' and press tab for example (or any other command that will auto complete).
After executing one command the bug disappears.

Implement Flash Driver

Implement a PS_QSPI_Flash driver, able to read and write the QSPI flash.

The Xilinx xilsf library may be useful (it can be enabled in the BSP).

Concerns: If at all possible we need to stay in 3 bit addressing mode, or remain compatible with it (since a watchdog reboot could at any time return us to the bootrom), and ideally on page 0 (since expanding the flash to a larger model will make this relevant and the boot manager would reside at 0x00000000). If necessary a watchdog pre-kill hook mechanism can be arranged to increase the likelyhood of correct behavior, but this is obviously not strictly preferable.

Fix doxygen comments (///!)

Doxygen comments with ///! should be replaced with /// or //! instead. ///! is not proper doxygen format.

Add IPMI blink modes to the LED Controller

The LED controller does not currently support the IPMI blue LED blink modes.

The Long Blink shall be a cycle of 100 ms of off followed by 900 ms of illumination.
The Short Blink shall be a cycle of 100 ms of illumination followed by 900 ms of off.

Improve all mutex and critical section handling to be exception safe

Now that we have #16 and #18, it is extremely important to consider that thrown exceptions will not call xSemaphoreGive(), xSemaphoreGiveRecursive(), or portEXIT_CRITICAL() as the stack unwinds, leaving the IPMC in a deadlock-immanent state.

Upgrade MutexLock to support non-immediate locks and add RecursiveMutexLock and CriticalSection to handle these cases as well. Then convert all existing uses of semaphores and critical sections to use these mechanisms so that the semaphores and critical sections are handled properly as the stack unwinds.

Stack trace on WDT trip

Consider adding a stack trace when certain task trips the watchdog timer.

There seems to be no built in functions / library to backtrace while providing a specific stack pointer, so very likely this would need to be implemented from scratch.

Printing the task's program counter, which would be easy, won't yield much info.

PL I2C driver read/send timeout

There is a major bug somewhere in the I2C driver where after a few operation (unclear how many) the driver won't be able to talk to the PL interface which will cause operations to timeout.

It is unclear exactly what is happening but this is likely related to the problem seen before where sending and receiving 1byte sometimes cause the PL logic to lock and nothing recovers it.

Console output character drops / garbage

The UART console service occasionally outputs control sequence garbage. This would occur because for some reason the \x1b (ESC) characters starting escape sequences are dropped (in either direction, depending on the code sequence observed).

I suspect this is a problem in our interactions with the Xilinx UART driver, but have not been able to confirm or trace it. Hopefully the introduction of telnet using the same service core will provide some insight.

Confusing status reports from lwip 2.0.2

After lwip 2.0.2 update, interface up and link up/down seem to be called at odd times. Network down is never called. The interface is always up but the up callback gets called several times, likely when DHCP succeeds after a few seconds.

This has no effect on the overall operation of the IPMC but better handling of these messages would be appropriate.

Executing esm X will issue an ESM reset and the status messages will start to show up.

Improve how XVC handles multiple connections

Multiple hw_servers trying to connect to the same XVC endpoint seem to corrupt lwip, likely because it keeps piling up connections.

Attempt to automatically reject all incoming connections if there is a connection established instead of backlogging them.

Also, improve status messages to inform the user of multiple hw_servers trying to connect to the same endpoint.

Convert relevant asserts to exceptions

Now that #16 is complete, we are able to use exceptions in our code. Various things can now be upgraded to take advantage of this. The most obvious case is to review any configASSERTs in our code to determine if they would be better handled as catchable exceptions.

Resetting via console or reset button can lock Zynq

Unclear why this is happening, but sometimes when the push-button to reset the IPMC gets pressed, or after executing the 'restart' command, the Zynq seems to lock, which then requires a power cycle to solve since pressing the reset button again won't do anything.

When locked, the Red LED of the PHY also starts to blink very fast (~5Hz I would say), unclear as well why.

Needs further investigation.

RingBuffer read/write performance optimization

read() and write() functions of RingBuffer are wrapped around a CriticalGuard, meaning interrupts will be disabled even for large reads and writes, which can cause performance issues.

Consider only guarding the necessary info to do the copy and then execute the copy itself outside the guard, for example. Alternatively, don't guard it and let the drivers handle interrupts locally.

Mitigate driver interrupt clogging

Some drivers might be doing excessive use of disable/enable of global interrupts for critical portions of the code, this can cause considerable slow down if many drivers do this.
Consider audit drivers and changing the disable/enable from global interrupts to interrupts only used by said driver.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.