deniz-eren / dev-can-linux Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 917 KB

Porting of Linux CAN-bus drivers to QNX

License: GNU General Public License v2.0

Makefile 1.40% C 85.16% CMake 3.07% C++ 10.38%

can-bus driver qnx qnx7 advantech kvaser pcan peak sja1000

dev-can-linux's Issues

Rare race condition found impacting file close

Describe the bug

During aggressive open/close scenario when active message reads/writes are happening, a rare race-condition found to lock up the driver causing a crash.

Rare race condition found in void* rx_loop (void* arg) impacting cleaning up blocked clients during file close.

To Reproduce

After running 'ctest' repeated, the issue is replicated:

ctest

Test project /root/userhome/Repos/outlook/dev-can-linux/build
    Start 1: ssh-driver-baud-tests
1/6 Test #1: ssh-driver-baud-tests ............   Passed    0.72 sec
    Start 2: ssh-driver-integrity-tests
2/6 Test #2: ssh-driver-integrity-tests .......   Passed    5.14 sec
    Start 3: ssh-driver-io-tests
3/6 Test #3: ssh-driver-io-tests ..............   Passed   43.11 sec
    Start 4: ssh-driver-raw-tests

DRIVER STUCK

Top shows CPU runaway:

CPU states: 82.0% user, 0.0% kernel
CPU  0 Idle: 16.8% 
CPU  1 Idle: 19.0% 
Memory: 4095M total, 3892M avail, page size 4K

    PID   TID PRI STATE    HH:MM:SS    CPU  COMMAND             
84344854     4  10 Run       0:00:05  47.33% dev-can-linux       
184333     2  10 Rcv       1:04:09  30.90% io-pkt-v6-hc        
84344854    34  10 Rcv       0:00:58   2.10% dev-can-linux       
155658     1  10 Rcv       0:23:29   0.79% devc-pty            
647187     1  10 SigW      0:12:40   0.68% sshd                
143369     4  10 Rcv       0:00:12   0.15% pipe                
84344854    39  10 Rply      0:00:00   0.04% dev-can-linux       
        1    15  10 Run       0:00:00   0.00% kernel              
88072222     1  10 Rply      0:00:00   0.00% top                 
1191959     1  10 SigW      0:00:01   0.00% sshd                

            Min        Max       Average 
CPU 0 idle:   16%        24%        19% 
CPU 1 idle:   16%        23%        19% 
Mem Avail:   3892MB      3895MB      3893MB  
Processes:    30         30         30    
Threads:     112        112        112

Checking pidin we see tests/driver/raw/driver-raw-tests is still running:

#pidin arg
    pid Arguments
84344854 dev-can-linux -Ex -vvv 
88043549 /root/userhome/Repos/outlook/dev-can-linux/build/tests/driver/raw/driver-raw-tests

Log of driver 'dev-can-linux -Ex -vvv' shows:

CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0
...
CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_: _RESMGR_NOREPLY
io_devctl -> id: 0

After CTRL-C shows:

Shutdown IRQ loop
Shutdown program
io_devctl -> id: 0
Shutting down adv_pci
Removing card
Removing device adv_pci-can0
io_devctl -> id: 0
io_devctl -> id: 0
io_devctl -> id: 0
io_devctl -> id: 0
io_devctl -> id: 0
unregister_netdev: adv_pci-can0
netif_stop_queue
free_irq; irq: 258
cancel_delayed_work_sync (3c52da2e10)
netif_tx exit: adv_pci-can0
timer_loop shutdown
io_devctl -> id: 0
io_close_ocb -> id: 0
can_ocb_free -> /dev/can0/rx0

Still stuck.

Further attempt to stop with 'kill -s SIGKILL ' shows:

Killed

When in stuck state, and we test using candump and cansend as follows, the result is a core dump when cansend is run.

candump -u0,rx0 
/dev/can0/rx0 TS: 162329081ms [EFF] 1234 [2] AB CD 00 00 00 00 00 00
devctl CAN_DEVCTL_RX_FRAME_RAW_BLOCK: No such process

cansend -u0,tx0 -m0x1234,1,0xABCD

Then driver prints the following logs during this stuck state test:

io_devctl -> id: 0
io_open -> (id: 1, rcvid: 10)
can_ocb_calloc -> /dev/can0/tx0 (id: 1)
io_devctl -> id: 1
netif_stop_queue
CAN_DEVCTL_TX_FRAME_RAW; /dev/can0/tx0 TS: 0ms [EFF] 1234 [2] AB CD 00 00 00 00 00 00
netif_rx; adv_pci-can0 [EFF] 1234 [2] AB CD  0  0  0  0  0  0
netif_wake_queue
io_close_ocb -> id: 1
can_ocb_free -> /dev/can0/tx0
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_BLOCK; /dev/can0/rx0 TS: 162329081ms [EFF] 1234 [2] AB CD 00 00 00 00 00 00
io_devctl -> id: 0
CAN_DEVCTL_RX_FRAME_RAW_BLOCK; /dev/can0/rx0 TS: 162329081ms [EFF] 1234 [2] AB CD 00 00 00 00 00 00
io_devctl -> id: 0

Process 84344854 (dev-can-linux) terminated SIGSEGV code=1 fltno=11 ip=00000049c7d7f8a8(/opt/bin/[email protected]+0x000000000000deb3) mapaddr=00000000000118a8. ref=0000002228943ae8
Memory fault (core dumped)

Expected behavior

Should close file without incident.

Screenshots

Platform (please complete the following information)

Target QNX architecture: x86_64
CAN-bus hardware device: QEmu VM
Development environment: workspace
Version: QNX 7.1

Driver (please complete the following information)

Driver loaded: adv_pci
Branch: main
Version: 1.3.1 but old issue

Additional context

Update driver and harmonize with Linux kernel v6.6

Harmonize with Linux kernel v6.6; i.e. pull changes from Linux.

Support module parameters (impacts only driver f81601)

Currently only used by driver f81601.

See src/kernel/drivers/net/can/sja1000/f81601.c

static bool internal_clk = true;
module_param(internal_clk, bool, 0444);
MODULE_PARM_DESC(internal_clk, "Use internal clock, default true (24MHz)");

static unsigned int external_clk;
module_param(external_clk, uint, 0444);
MODULE_PARM_DESC(external_clk, "External clock when internal_clk disabled");

Therefore current fixed configuration of the f81601 driver is with internal clock enabled only - until this feature is implemented.

Implement module_param() and MODULE_PARM_DESC() currently stubs in src/kernel/include/linux/module.h

netif_stop_queue causes lockup during exit under some fault conditions

netif_stop_queue() causes lockup during exit if an error CAN frame is received in netif_rx(), and therefore netif_wake_queue() never occurs.

Solution would be to refactor struct device_session to use queue mutex and condvar both prevent duplication of semaphores and because the queue semaphores have graceful exit already.

PCIe 0x11 (MSI-X) capability support

Multiple device IRQ support

Currently the implementation in src/include/interrupt.h and src/interrupt.c only supports a single IRQ being allocated to the devices. If multiple IRQs are indicated by the OS the driver will exit with an error. In practice this scenario has not occured in emulation or during real hardware testing, however this needs to be made more general.

Error handler in src/pci.c must be removed when implementation made:

log_err("read multiple (%d) IRQs\n", nirq);

Support QNX 8.0

Test case driver-integrity-tests occasionally fails

Jenkins console shows the following when the failure happens:

QNX_HOST=/root/qnx710/host/linux/x86_64
QNX_TARGET=/root/qnx710/target/qnx7
MAKEFLAGS=-I/root/qnx710/target/qnx7/usr/include
+ docker exec --user root --workdir /root dev-env bash -c source .profile                     && /root/workspace/dev/.setup-profile.sh                     && cd /data/home/root/build_coverage                     && ctest --output-junit test_results.xml || (exit 0)
QNX_HOST=/root/qnx710/host/linux/x86_64
QNX_TARGET=/root/qnx710/target/qnx7
MAKEFLAGS=-I/root/qnx710/target/qnx7/usr/include
Test project /data/home/root/build_coverage
    Start 1: ssh-driver-baud-tests
1/6 Test #1: ssh-driver-baud-tests ............   Passed    6.44 sec
    Start 2: ssh-driver-integrity-tests

Case can be replicated when running VM target emulation with ctest from host. The driver appears to be operating correctly still when this happens, however the test-case itself hangs; most likely on a pthread_join().

Limits to the numbers of open/close (65535) by the driver ?

Describe the bug

We use the 1.2.0 version of the driver with a PEAK-MiniPCIe CAN interface with QNX7.1 running on an Arm64 based IMX8QM SOM.

We have a stress test were we send from outside CAN messages every 10ms and the program (called can-loop) shall simply send them back after altering the MID. This program runs also on the 3 internal CAN of the iMX8QM. So we have 4 simultaneous CAN interface running, and one of them is using the dev-can-linux driver.

We observed that after 65535 messages the dev-can-linux 1.2.0 drivers stops answering. It is not related to the client program as we have to kill the driver and relaunch it to be able to get functionality back. Remark: The can-loop program closes and reopens the device (mailboxes) attached to the dev-can-linux driver (/dev/can3) between each message, because we add issues letting it open constantly.

To Reproduce

we launch the driver the following way:

'dev-can-linux -q -U3 -e 1c:08,0x05 -b id=3,freq=125k,btr0=0x07,btr1=0x14 &'

Then we launch our can-loop program.

Platform

Target QNX architecture: Arm64
CAN-bus hardware device: PCAN-miniPCIe card
Version: QNX 7.1

Driver

Driver loaded: peak_pci
Branch: main
Version: 1.2.0

More clear case

Experimental refactoring using Model Based Design

Release builds for armle-v7 platform

Support priority for transmitted messages

Add support for CAN_DEVCTL_SET_PRIO and CAN_DEVCTL_GET_PRIO, together with associated functionality.

Investigate termination GPIO feature

Investigate the termination GPIO feature present in the Linux Kernel source-code linux/can/dev.h and drivers/net/can/dev/dev.c for example.

Currently this functionality has not been integrated to the dev-can-linux driver. Decision needs to be made on whether or not this is a useful feature to support.

Add test platform and emulation support for armle-v7 and aarch64le architectures

Add test platform and emulation support for 32-bit and/or 64-bit versions of ARM (armle-v7 and aarch64le) architectures.

The current driver can be built for any architecture and will most likely work on any architecture. However at this point, we have only tested using x86_64 architecture in emulation (with QEmu) and with real hardware.

Currently only x86_64 is supported with the test platform image (tests/image/), emulation (tests/emulation/) and Jenkins CI pipelines (ci/jenkins/).

This task involves a number of updates:

Add new test platform image builds (tests/image-armle-v7/ and/or tests/image-aarch64le); or figure out a good way to configure the existing test platform image for different architectures (tests/image). The first option might be the most effectice way to go. Perhaps we refactor the current tests/image to be tests/image-x86_64 in alignment.
Add new QEmu emulation configurations for new architectures (tests/emulation/).
Configure Jenkins CI pipelines to run all supported test architectures (ci/jenkins/).

Decouple baud rate and device RX/TX file descriptors configuration program options

Fulfil Github Community Standards items

QEmu support for MSI capable devices

Update driver and harmonize with Linux kernel v6.9

Harmonize with Linux v6.9

MSI-X Interrupt disposition configuration options

Slow memory leak

Slow memory leak in blocked_client linked list

Legacy MSI without Per Vector Masking (PVM) support

Initial experimental implementation of for Legacy PCI 0x05 (MSI) capability without the support for Per Vector Masking (PVM)

Multithreaded IRQ handler processing

During the process of harmonizing with Linux Kernel version 6.6 (#52), it was discovered Linux has gone down the direction of supporting multiple threads to process IRQ handlers.

Currently this capability isn't seen as necessary, however would be a good enhancement.

When implementing this feature functions netif_tx_lock and netif_tx_unlock need to be implemented, together with refactoring of the current IRQ processing algorithms.

We may need to review our handling of spin_lock_irqsave and spin_unlock_irqrestore when implementing multithread IRQ support.

Internal loopback feature (Linux IFF_ECHO) should be made configurable (related to #56)

Describe the bug

Exact same context as #56.

If we do not open and close the mailboxes between each transfer, the drivers seems to act like it receives input several messages in bursts.

See screenshot

To Reproduce

we launch the driver the following way:

'dev-can-linux -q -U3 -e 1c:08,0x05 -b id=3,freq=125k,btr0=0x07,btr1=0x14 &'

Then we launch our can-loop program (see #56).

Platform

Target QNX architecture: Arm64
CAN-bus hardware device: PCAN-miniPCIe card
Version: QNX 7.1

Driver

Driver loaded: peak_pci
Branch: main
Version: 1.2.0

Code

void *fctThreadCan3( void *arg )
{
	CAN_DEF_ERREUR ret_fct = CAN_NODEFINE;
	CmdCan_t vCmdCan;

	memset(&vCmdCan.vTrame,0, sizeof(trameCan_t));
	vCmdCan.dev_gstCanId = fctMgrOpen("/dev/can3");
	if ( vCmdCan.dev_gstCanId == -1 )
	{
		printf(" fctMgrOpen /dev/can3  failed\n");
		exit(EXIT_FAILURE);
	}
	delay(1);

	printf("can3 - entering thread \n");
	while(1)
	{
	//	memset(&vCmdCan.vTrame,0, sizeof(trameCan_t));
	//	vCmdCan.dev_gstCanId = fctMgrOpen("/dev/can3");
	//	if ( vCmdCan.dev_gstCanId == -1 )
	//	{
	//		printf(" fctMgrOpen /dev/can3  failed\n");
	//		exit(EXIT_FAILURE);
	//	}
	//	delay(1);
		ret_fct = fctCmdRxD (vCmdCan.dev_gstCanId, &vCmdCan.vTrame, 1000000);
		if ( ret_fct == CAN_NOERR )
		{
			// Only answer if message ID bit 11 to 8 contain the can interface number (3) 
			if( ((vCmdCan.vTrame.idCan & 0x7FF) >> 8) == 3)
			{
				// Send back with incremented mid
				vCmdCan.vTrame.idCan++;
				if( ((vCmdCan.vTrame.idCan) & 0xFF) > 0xFF)
				{
					vCmdCan.vTrame.idCan = 0x300;
				}
			vCmdCan.vTrame.queue = CAN_LPQUEUE;
			fctSendCmdTxD(vCmdCan.dev_gstCanId, &vCmdCan.vTrame);	
			}
		}
		delay(2);
		//fctMgrClose(vCmdCan.dev_gstCanId);
		//delay(1);	
	}
	fctMgrClose(vCmdCan.dev_gstCanId);
	delay(1);	
}

We need to perform the open/close for each loop iteration (commented parts) to have correct behaviour.

Open/close functions here:

#define MAX_CAN_DEVICES 4
#define CAN_DEVICE_PATH_MAX 16

struct canDevice
{
	int rx;
	int tx;
	bool open;
int dev_id;
};

static struct canDevice canDevices[MAX_CAN_DEVICES] = {};

static int fctMgrOpenMailbox(const char *dev, const char *mailbox)
{
	int fd = -1;
	char mailboxPath[CAN_DEVICE_PATH_MAX];

	if (snprintf(mailboxPath, CAN_DEVICE_PATH_MAX, "%s/%s", dev, mailbox) >= CAN_DEVICE_PATH_MAX)
	{
		return fd;
	}

	if(mailbox[0]=='r')
		fd = open(mailboxPath, O_RDONLY);
	else
		fd = open(mailboxPath, O_WRONLY);

	return(fd);
}

int fctMgrOpen(const char * dev)
{
int deviceindex = -1; // we need to find the CANx value from the /dev string.

	for (int i = 0; i < MAX_CAN_DEVICES; i++)
	{
		if (canDevices[i].open)
		{
			continue;
		}

		if((dev[8] >='0') &&  (dev[8] <='4'))
		{
			deviceindex = dev[8] -'0' ;
		}
	
		int rx = fctMgrOpenMailbox(dev, "rx0");
		if (rx == -1)
		{
			fprintf(stderr, "(open) %s rx0, %s\n", dev, strerror(errno));
			return -1;
		}

		int tx = -1;
		char tx_mailbox[] = "tx?";

		// depending if we are on Flexcan or PEAK, the tx mailbox is tx1 or tx0.
		//
		for (int tx_id = '0'; tx_id <= '1'; tx_id++)
		{
			tx_mailbox[2] = tx_id;
			tx = fctMgrOpenMailbox(dev, tx_mailbox);
			if (tx != -1)
			{
				break;
			}
		}

		if (tx == -1)
		{
			return -1;
		}

		canDevices[i] = (struct canDevice)
		{
			.rx = rx,
			.tx = tx,
			.dev_id = deviceindex,
			.open = true,
		};

		return i;
	}
	return -1;
}

int fctMgrClose(int fd)
{
	if (fd >= 0 && fd < MAX_CAN_DEVICES && canDevices[fd].open)
	{
		canDevices[fd].open = false;
		close(canDevices[fd].rx);
		close(canDevices[fd].tx);
		fd=-1;
	}
	return(-1);
}

PCIe 0x05 (MSI) and 0x11 (MSI-X) capability support

Add PCIe capability ID 0x05 (MSI) support

Issue discovered regarding cards with PCIe capability 0x05 (MSI). Looking at dev-can-linux -vvvvv output you can check if a device supports the 0x05 (MSI) capability:

read capability[0]: 1
read capability[1]: 5     <---- THIS ONE
read capability[2]: 10

When using PCIe cards, we found (on an equivalent Linux platform) that when the MSI capability IRQs are available and when the driver does not use them, some issues arise. From what we can tell, at rare occasions the IRQ event is received before the chipset has data available to read. The fix is believed to be the use of the capability 0x05 (MSI) by the driver. The issue is very rare and can only be detected at heavy traffic testing conditions.

Handling read/write of I/O port address spaces could be problematic for non-x86 architectures.

Read and write functions in src/kernel/include/linux/io.h utilize a static memory region check (of address 0-0xFFFF) to determine when to use I/O port functions in*() and out*(). For addresses beyond this region, functions utilize memory address operations with the appropriate memory barriers.

This however much not elegant, works fine for x86 platform, however for non-x86 platform it could be problematic. For example, if on some architectures the PCI device bar memory regions with address 0-0xFFFF come up as MEM (or memory mapped regions) instead of I/O port regions, then the read and write functions will still use the static 0xFFFF region threshold check to direct the operations to in*() and out*() functions. These functions correspond to x86 in* and out* assembler operations, which perhaps won't work on other architectures, depends on how QNX has implemented the in*() and out*() functions.

Nevertheless, a better implementation would be to track the PCI bar I/O and MEM address blocks and use a more specific I/O memory threshold in read and write functions. This will still give us a simple and fast check, more over still a correct check. When run on x86 platform, this more specific threshold can be more precise and on non-x86 platform the threshold would be 0 and thus all operations will be done as memory operations.

A further benefit of such an implementation would be the behaviour of the read and write functions will be determined by the src/pci.c functions that are responsible for managing the PCI interface for QNX implementation.

USB CAN drivers

Issue when run on a computer without CAN-bus device

Driver spams MsgReceivePulse error; No such process if run on a computer that has no supported CAN-bus devices.

Core dump on start-up when getsubopt() is supplied invalid command-line input

To replicate:

dev-can-linux -u id

Outcome:

Process 92688406 (dev-can-linux) terminated SIGSEGV code=1 fltno=11 ip=0000002dec9a5fd7(/proc/boot/libc.so.5@_Stoint+0x0000000000000017) mapaddr=0000000000043fd7. ref=0000000000000000
Memory fault (core dumped)

Regression in the -i program option

The -i and -ii both give detailed driver support details, however -i is meant to report shorter summary of devices supported instead.

Refactor global variables and integrate to driver structures

Issue with transmission synchronisation functions

There seems to be a deadlock issue with the mutex and/or cond_var setup in netif_stop_queue(). The issue could be in combination to how the Linux drivers callback the synchronisation functions also. The issue presents itself when very high transmission rate is applied under non-realistic test conditions. It is suspected the synchronisation functions of the transmission function netif_tx() is problematic. The symptom is the loss of receive ability when this deadlock happens.

Refactor netif_tx() and synchronisation interface functions netif_start_queue(), netif_queue_stopped(), netif_wake_queue() and netif_stop_queue(). The issue can be replicated thus the fix can be verified.

Support QNX Standard MID encoding

Hi,

We use the driver under QNX 7.1 with a Toradex IMX8QM SOM with a MiniPCIE PEAK CAN board.

One issue we had is when using standard MID the MID were not ok for us,
it was ok for extended mode, but not for standard.

On QNX the encoding for the MID trough the standard devctl depends whether or not the driver is using extended MIDs:

In standard 11-bit MIDs, bits 18–28 define the MID.
In extended 29-bit MIDs, bits 0–28 define the MID.

Check
http://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.utilities/topic/c/canctl.html

Beside this, to work on our board we had to use the event mode and also correct a pci configuration space offset if I recall correctly.

CAN FD support

Currently CAN FD functionality of the Linux Kernel source-code is not supported. If there is demand for this feature this enhancement should be considered.

Update driver and harmonize with Linux kernel v6.10

Harmonize with Linux v6.10

Together with this change, we will move to exact Linux Kernel version naming in config/HARMONIZED_LINUX_VERSION which will be 6.10 rather than the current convention of 610 to avoid confusion.

Hardware testing found issue regarding Error Passive State

Describe the bug

Hardware testing with heavy message flooding results in IRQs stopping when in Error Passive State

To Reproduce

Steps to reproduce the behaviour, start the driver:

dev-can-linux -Ex -vvvvv

Then from another console run the following script to flood heavy amount of CAN-bus messages to the driver:

#!/bin/bash
# Basic while loop
counter=1
while [ $counter -le 100000 ]
do
        echo -n test > /dev/can1/tx0
        echo -n test > /dev/can0/tx0
        ((counter++))
done
echo Test sequence complete

The driver soon gets overwhelmed with the messages and gets stuck in Error Passive State:

error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can0: controller problems: 8
netif_rx: adv_pci-can0: TX error counter; tx:60, rx:0
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can1: controller problems: 8
netif_rx: adv_pci-can1: TX error counter; tx:70, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can0: controller problems: 20
netif_rx: adv_pci-can0: TX error counter; tx:80, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can1: controller problems: 20
netif_rx: adv_pci-can1: TX error counter; tx:80, rx:0

Expected behaviour

According to the documentation of SJA1000 chip, during Error Passive State the device should still receive messages and provide IRQs until the amount of errors reach greater than 256, in which case the device chip should enter Bus-Off state. The chip never enters Bus-Off state, the last netif_rx message we receive is that the chip is in Error Passive State and then no further IRQs arrive.

Screenshots

dev-can-linux v1.3.4
Harmonized with Linux Kernel version 69
dev-can-linux comes with ABSOLUTELY NO WARRANTY; for details use option `-w'.
This is free software, and you are welcome to redistribute it
under certain conditions; option `-c' for details.
warning: release versions allow at max -vv option.
driver start (version: 1.3.4)
Auto detected device (13fe:00d7) successfully: (driver "adv_pci")
initializing device 13fe:00d7
read ssvid: 13fe
read ssid: 00d7
read cs: 0, slot: 0, func: 0, devfn: 0
read capability[2]: 0x10
capability 0x10 (PCIe) already enabled
PCIe version: 1
read capability[1]: 0x05
nirq: 8
capability 0x05 (MSI) Per Vector Masking (PVM) not supported
capability 0x05 (MSI) enabled
read ba[0] MEM { addr: df302000, size: 800 }
read ba[1] MEM { addr: df301000, size: 80 }
read ba[2] MEM { addr: df300000, size: 80 }
read irq[0]: 266
read irq[1]: 267
read irq[2]: 268
read irq[3]: 269
read irq[4]: 270
read irq[5]: 271
read irq[6]: 272
read irq[7]: 273
ioremap [df302000] mapping to [53da9db000] successful
reg_base=53da9db000 irq=266
setting BTR0=0x01 BTR1=0x1c
ioremap [df302400] mapping to [53da9dc400] successful
reg_base=53da9dc400 irq=266
setting BTR0=0x01 BTR1=0x1c
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can0: controller problems: 8
netif_rx: adv_pci-can0: TX error counter; tx:60, rx:0
error warning interrupt
Controller changed from Error Active State (0) into Error Warning State (1).
netif_rx: adv_pci-can1: controller problems: 8
netif_rx: adv_pci-can1: TX error counter; tx:70, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can0: controller problems: 20
netif_rx: adv_pci-can0: TX error counter; tx:80, rx:0
error passive interrupt
Controller changed from Error Warning State (1) into Error Passive State (2).
netif_rx: adv_pci-can1: controller problems: 20
netif_rx: adv_pci-can1: TX error counter; tx:80, rx:0

Platform

Target QNX architecture, x86_64
CAN-bus hardware device, Advantech (13fe:00d7)
Development environment, workspace
Version, QNX 7.1

Driver

Driver loaded, adv_pci
Branch: main
Version, 1.3.4

Additional context

If the chip had entered Bus-Off state, the current implementation would have performed a chip restart, which is the expected behaviour. Of course the chip cannot handle the amount of data provided, but we would have expected it to progress to Bus-Off and then get rebooted inline with the current restart_ms value of 50ms the driver was started with (default option).

We tested with special implementation via canctl to poke the driver to check the chip registers to see if it is in Buss-Off state, however it was not. The chip also reported the IRQ system was still ON. We were worried the IRQ handler was missing an IRQ, however this test ruled this possibility out.

Other cases online also suggest in Error Passive State others with different hardware and software have experienced the same with no recorded resolution.

Review device statistics mappings

Check mappings of CAN_DEVCTL_GET_STATS

Dropped packet statistics

Dropped packet statistics to be updated in queue management functions.

Error and debug info

Support for CAN_DEVCTL_ERROR, CAN_DEVCTL_DEBUG_INFO and CAN_DEVCTL_DEBUG_INFO2

Add masking of regular IRQ devices and fix IRQ clash issue

Providing compile time option to mask IRQ at the IRQ or pulse handler.

There is enough question in mind as to where the IRQ should be masked; at the ISR or at the pulse handler.

To facilitate experimentation we implement 2 new compile time configurations config/CONFIG_QNX_INTERRUPT_MASK_ISR and config/CONFIG_QNX_INTERRUPT_MASK_PULSE.

The reason we don't just use config/CONFIG_QNX_INTERRUPT_ATTACH_EVENT is because this function only masks regular IRQs and not MSI/MSI-X interrupt vectors.

Currently the default and recommended configurations are config/CONFIG_QNX_INTERRUPT_ATTACH with config/CONFIG_QNX_INTERRUPT_MASK_PULSE.

deniz-eren / dev-can-linux Goto Github PK

dev-can-linux's Issues

Describe the bug

To Reproduce

Expected behavior

Screenshots

Platform (please complete the following information)

Driver (please complete the following information)

Additional context

Describe the bug

To Reproduce

Platform

Driver

Describe the bug

To Reproduce

Platform

Driver

Code

Describe the bug

To Reproduce

Expected behaviour

Screenshots

Platform

Driver

Additional context

Recommend Projects

Recommend Topics

Recommend Org