Coder Social home page Coder Social logo

udmabuf's Introduction

u-dma-buf(User space mappable DMA Buffer)

Overview

Introduction of u-dma-buf

u-dma-buf is a Linux device driver that allocates contiguous memory blocks in the kernel space as DMA buffers and makes them available from the user space. It is intended that these memory blocks are used as DMA buffers when a user application implements device driver in user space using UIO (User space I/O).

A DMA buffer allocated by u-dma-buf can be accessed from the user space by opening the device file (e.g. /dev/udmabuf0) and mapping to the user memory space, or using the read()/write() functions.

CPU cache for the allocated DMA buffer can be disabled by setting the O_SYNC flag when opening the device file. It is also possible to flush or invalidate CPU cache while retaining CPU cache enabled.

The physical address of a DMA buffer allocated by u-dma-buf can be obtained by reading /sys/class/u-dma-buf/udmabuf0/phys_addr.

The size of a DMA buffer and the device minor number can be specified when the device driver is loaded (e.g. when loaded via the insmod command). Some platforms allow to specify them in the device tree.

Architecture of u-dma-buf

Figure 1. Architecture

Figure 1. Architecture


Supported platforms

  • OS : Linux Kernel Version 3.6 - 3.8, 3.18, 4.4, 4.8, 4.12, 4.14, 4.19, 5.0 - 5.10, 6.1 (the author tested on 3.18, 4.4, 4.8, 4.12, 4.14, 4.19, 5.4, 5.10, 6.1).
  • CPU: ARM Cortex-A9 (Xilinx ZYNQ / Altera CycloneV SoC)
  • CPU: ARM64 Cortex-A53 (Xilinx ZYNQ UltraScale+ MPSoC)
  • CPU: x86(64bit) However, verification is not enough. I hope the results from everyone. In addition, there is a limit to the following feature at the moment.
    • Can not control of the CPU cache by O_SYNC flag . Always CPU cache is valid.
    • Can not various settings by the device tree.

Note: udmabuf to u-dma-buf

Why u-dma-buf instead of udmabuf

The predecessor of u-dma-buf is udmabuf. The kernel module name has been changed from "udmabuf" to "u-dma-buf". The purpose of this is to avoid duplicate names because another kernel module with the same name as "udmabuf" has been added since Linux Kernel 5.x.

Changes from udmabuf to u-dma-buf

Categoly udmabuf u-dma-buf
module name udmabuf.ko u-dma-buf.ko
source file udmabuf.c u-dma-buf.c
sys class name /sys/class/udmabuf/ /sys/class/u-dma-buf/
DT compatible prop. "ikwzm,udmabuf-0.10.a" "ikwzm,u-dma-buf"

Usage

Compile

Makefile

This repository contains a Makefie. Makefile has the following Parameters:

Parameter Name Description Default Value
ARCH Architecture Name $(shell uname -m | sed -e s/arm.*/arm/ -e s/aarch64.*/arm64/)
KERNEL_SRC Kernel Source Directory /lib/modules/$(shell uname -r)/build

Cross Compile

If you have a cross-compilation environment for target system, you can compile with:

shell$ make ARCH=arm KERNEL_SRC=/home/fpga/src/linux-5.10.120-zynqmp-fpga-generic all

The ARCH variable specifies the architecture name.
The KERNEL_SRC variable specifies the Linux Kernel source code path.

Self Compile

If your target system is capable of self-compiling the Linux Kernel module, you can compile it with:

shell$ make all

You need the kernel source code in /lib/modules/$(shell uname -r)/build to compile.

Build in Linux Source Tree

It can also be compiled into the Linux Kernel Source Tree.

Make directory in Linux Kernel Source Tree.

shell$ mkdir <linux-source-tree>/drivers/staging/u-dma-buf

Copy files to Linux Kernel Source Tree.

shell$ cp Kconfig Makefile u-dma-buf.c <linux-source-tree>/drivers/staging/u-dma-buf

Add u-dma-buf to Kconfig

shell$ diff <linux-source-tree>/drivers/staging/Kconfig
  :
+source "drivers/staging/u-dma-buf/Kconfig"
+

Add u-dma-buf to Makefile

shell$ diff <linux-source-tree>/drivers/staging/Makefile
  :
+obj-$(CONFIG_U_DMA_BUF) += u-dma-buf/

Set CONFIG_U_DMA_BUF

For make menuconfig, set the following:

Device Drivers --->
  Staging drivers --->
    <M> u-dma-buf(User space mappable DMA Buffer) --->

If you write it directly in defconfig:

shell$ diff <linux-source-tree>/arch/arm64/configs/xilinx_zynqmp_defconfig
   :
+CONFIG_U_DMA_BUF=m

Install

Installation with the insmod

Load the u-dma-buf kernel driver using insmod. The size of a DMA buffer should be provided as an argument as follows. The device driver is created, and allocates a DMA buffer with the specified size. The maximum number of DMA buffers that can be allocated using insmod is 8 (udmabuf0/1/2/3/4/5/6/7).

zynq$ insmod u-dma-buf.ko udmabuf0=1048576
u-dma-buf udmabuf0: driver version = 4.5.2
u-dma-buf udmabuf0: major number   = 248
u-dma-buf udmabuf0: minor number   = 0
u-dma-buf udmabuf0: phys address   = 0x1e900000
u-dma-buf udmabuf0: buffer size    = 1048576
u-dma-buf u-dma-buf.0: driver installed.
zynq$ ls -la /dev/udmabuf0
crw------- 1 root root 248, 0 Dec  1 09:34 /dev/udmabuf0

In the above result, the device is only read/write accessible by root. If the permission needs to be changed at the load of the kernel module, create /etc/udev/rules.d/99-u-dma-buf.rules with the following content.

SUBSYSTEM=="u-dma-buf", GROUP="root", MODE="0666"

The module can be uninstalled by the rmmod command.

zynq$ rmmod u-dma-buf
u-dma-buf u-dma-buf.0: driver removed.

Installation with the Debian package

For details, refer to the following URL.

Configuration via the module parameters

The u-dma-buf kernel module has the following module parameters:

Parameter Name Type Default Description
udmabuf0 ulong 0 u-dma-buf0 buffer size
udmabuf1 ulong 0 u-dma-buf1 buffer size
udmabuf2 ulong 0 u-dma-buf2 buffer size
udmabuf3 ulong 0 u-dma-buf3 buffer size
udmabuf4 ulong 0 u-dma-buf4 buffer size
udmabuf5 ulong 0 u-dma-buf5 buffer size
udmabuf6 ulong 0 u-dma-buf6 buffer size
udmabuf7 ulong 0 u-dma-buf7 buffer size
info_enable int 1 install/uninstall infomation enable
dma_mask_bit int 32 dma mask bit size
bind charp "" bind device name
quirk_mmap_mode int 2 or 3 quirk mmap mode(1:off,2:on,3:auto)

udmabuf[0-7]

This parameter specifies the capacity of the u-dma-buf to be created in bytes. The number of u-dma-buf that can be created with this parameter is 8. The device name will be udmabuf[0-7]. If this parameter is 0, the u-dma-buf is not created.

info_enable

This parameter specifies whether or not detailed information about when the u-dma-buf was created should be displayed.

dma_mask_bit

** Note: The value of dma-mask is system dependent. Make sure you are familiar with the meaning of dma-mask before setting. **

bind

This parameter specifies the parent device of the u-dma-buf. If this parameter is an empty string (default value), u-dma-buf is created as a new platform device. If a parent device name is specified for this parameter, u-dma-buf is created as its child device.

The format of the string specified in this parameter is "<bus>/<device-name>".

The <bus> is the bus name, currently pci is supported. The bus name can be omitted. If omitted, it will be the platform bus.

The <device-name> specifies the name of the device under bus management.

For example, to designate "0000:00:15.0" under the pci bus as the parent device, do the following

shell$ sudo insmod u-dma-buf.ko udmabuf0=0x10000 info_enable=3 bind="pci/0000:00:15.0" 
[13422.022482] u-dma-buf udmabuf0: driver version = 4.5.2
[13422.022483] u-dma-buf udmabuf0: major number   = 238
[13422.022483] u-dma-buf udmabuf0: minor number   = 0
[13422.022484] u-dma-buf udmabuf0: phys address   = 0x0000000070950000
[13422.022485] u-dma-buf udmabuf0: buffer size    = 65536
[13422.022485] u-dma-buf udmabuf0: dma device     = 0000:00:15.0
[13422.022486] u-dma-buf udmabuf0: dma bus        = pci
[13422.022486] u-dma-buf udmabuf0: dma coherent   = 1
[13422.022487] u-dma-buf udmabuf0: dma mask       = 0x00000000ffffffff
[13422.022487] u-dma-buf udmabuf0: iommu domain   = NONE
[13422.022487] u-dma-buf udmabuf0: quirk mmap     = 0
[13422.022488] u-dma-buf: udmabuf0 installed.

quirk_mmap_mode

This parameter specifies the default value of quirk-mmap-mode. quirk-mmap is described in detail below.
If this parameter is 1, quirk-mmap is prohibited.
If this parameter is 2, quirk-mmap is used.
If this parameter is 3, quirk-mmap is not used if the device has a dma-cohrent of true, and quirk-mmap is used only if dma-coherent is false.

If the architecture is ARM or ARM64, this parameter defaults to 2.
If the architecture is other than the above, this parameter defaults to 3.

Configuration via the device tree file

In addition to the allocation via the insmod command and its arguments, DMA buffers can be allocated by specifying the size in the device tree file. When a device tree file contains an entry like the following, u-dma-buf will allocate buffers and create device drivers when loaded by insmod.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			device-name = "udmabuf0";
			minor-number = <0>;
			size = <0x00100000>;
		};

zynq$ insmod u-dma-buf.ko
u-dma-buf udmabuf0: driver version = 4.5.2
u-dma-buf udmabuf0: major number   = 248
u-dma-buf udmabuf0: minor number   = 0
u-dma-buf udmabuf0: phys address   = 0x1e900000
u-dma-buf udmabuf0: buffer size    = 1048576
u-dma-buf amba:udmabuf@0x00: driver installed.
zynq$ ls -la /dev/udmabuf0
crw------- 1 root root 248, 0 Dec  1 09:34 /dev/udmabuf0

The following properties can be set in the device tree.

  • compatible
  • size
  • minor-number
  • device-name
  • sync-mode
  • sync-always
  • sync-offset
  • sync-size
  • sync-direction
  • dma-coherent
  • dma-mask
  • quirk-mmap-off
  • quirk-mmap-on
  • quirk-mmap-auto
  • memory-region

compatible

The compatible property is used to set the corresponding device driver when loading u-dma-buf. The compatible property is mandatory. Be sure to specify compatible property as "ikwzm,u-dma-buf" (for u-dma-buf.ko) or "ikwzm,udmabuf-0.10.a" (for udmabuf.ko).

size

The size property is used to set the capacity of DMA buffer in bytes. The size property is mandatory.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
		};

If you want to specify a buffer size of 4GiB or more, specify a 64bit value as follows. A 64-bit value is expressed by arranging two in the order of upper 32 bits and lower 32 bits.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x01 0x00000000>;  // size = 0x1_0000_0000
		};

minor-number

The minor-number property is used to set the minor number. The valid minor number range is 0 to 255. A minor number provided as insmod argument will has higher precedence, and when definition in the device tree has colliding number, creation of the device defined in the device tree will fail.

The minor-number property is optional. When the minor-number property is not specified, u-dma-buf automatically assigns an appropriate one.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			minor-number = <0>;
			size = <0x00100000>;
		};

device-name

The device-name property is used to set the name of device.

The device-name property is optional. The device name is determined as follow:

  1. If device-name property is specified, the value of device-name property is used.
  2. If device-name property is not present, and if minor-number property is specified, sprintf("udmabuf%d", minor-number) is used.
  3. If device-name property is not present, and if minor-number property is not present, the entry name of the device tree is used (udmabuf@0x00 in this example).
		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			device-name = "udmabuf0";
			size = <0x00100000>;
		};

sync-mode

The sync-mode property is used to configure the behavior when u-dma-buf is opened with the O_SYNC flag.

  • sync-mode=<1>: If O_SYNC is specified or sync-always property is specified, CPU cache is disabled. Otherwise CPU cache is enabled.
  • sync-mode=<2>: If O_SYNC is specified or sync-always property is specified, CPU cache is disabled but CPU uses write-combine when writing data to DMA buffer improves performance by combining multiple write accesses. Otherwise CPU cache is enabled.
  • sync-mode=<3>: If O_SYNC is specified or sync-always property is specified, DMA coherency mode is used. Otherwise CPU cache is enabled.

The sync-mode property is optional. When the sync-mode property is not specified, sync-mode is set to <1>.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
			sync-mode = <2>;
		};

Details on O_SYNC and cache management will be described in the next section.

sync-always

If the sync-always property is specified, when opening u-dma-buf, it specifies that the operation specified by the sync-mode property will always be performed regardless of O_SYNC specification.

The sync-always property is optional.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
			sync-mode = <2>;
			sync-always;
		};

Details on O_SYNC and cache management will be described in the next section.

sync-offset

The sync-offset property is used to set the start of the buffer range when manually controlling the cache of u-dma-buf.

The sync-offset property is optional. When the sync-offset property is not specified, sync-offset is set to <0>.

Details on cache management will be described in the next section.

sync-size

The sync-size property is used to set the size of the buffer range when manually controlling the cache of u-dma-buf.

The sync-size property is optional. When the sync-size property is not specified, sync-size is set to <0>.

Details on cache management will be described in the next section.

sync-direction

The sync-direction property is used to set the direction of DMA when manually controlling the cache of u-dma-buf.

  • sync-direction=<0>: DMA_BIDIRECTIONAL
  • sync-direction=<1>: DMA_TO_DEVICE
  • sync-direction=<2>: DMA_FROM_DEVICE

The sync-direction property is optional. When the sync-direction property is not specified, sync-direction is set to <0>.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
			sync-offset = <0x00010000>;
			sync-size = <0x000F0000>;
			sync-direction = <2>;
		};

Details on cache management will be described in the next section.

dma-coherent

If the dma-coherent property is specified, indicates that coherency between DMA buffer and CPU cache can be guaranteed by hardware.

The dma-coherent property is optional. When the dma-coherent property is not specified, indicates that coherency between DMA buffer and CPU cache can not be guaranteed by hardware.

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
			dma-coherent;
		};

Details on cache management will be described in the next section.

dma-mask

** Note: The value of dma-mask is system dependent. Make sure you are familiar with the meaning of dma-mask before setting. **

		udmabuf@0x00 {
			compatible = "ikwzm,u-dma-buf";
			size = <0x00100000>;
			dma-mask = <64>;
		};

quirk-mmap-off

If the quirk-mmap-off property is specified, quirk-mmap. is not used.

quirk-mmap-on

If the quirk-mmap-on property is specified, quirk-mmap. is used.

quirk-mmap-auto

If the quirk-mmap-auto property is specified, quirk-mmap is not used if the device has a dma-cohrent of true, and quirk-mmap is used only if dma-coherent is false.

memory-region

Linux can specify the reserved memory area in the device tree. The Linux kernel excludes normal memory allocation from the physical memory space specified by reserved-memory property. In order to access this reserved memory area, it is necessary to use a general-purpose memory access driver such as /dev/mem, or associate it with the device driver in the device tree.

By the memory-region property, it can be associated the reserved memory area with u-dma-buf.

	reserved-memory {
		#address-cells = <1>;
		#size-cells = <1>;
		ranges;
		image_buf0: image_buf@0 {
			compatible = "shared-dma-pool";
			reusable;
			reg = <0x3C000000 0x04000000>; 
			label = "image_buf0";
		};
	};
	udmabuf@0 {
		compatible = "ikwzm,u-dma-buf";
		device-name = "udmabuf0";
		size = <0x04000000>; // 64MiB
		memory-region = <&image_buf0>;
	};

In this example, 64MiB of 0x3C000000 to 0x3FFFFFFF is reserved as "image_buf0". In this "image_buf0", specify "shared-dma-pool" in compatible property and specify the reusable property. By specifying these properties, this reserved memory area will be allocated by the CMA. Also, you need to be careful about address and size alignment.

The above "image_buf0" is associated with "udmabuf@0" with memory-region property. With this association, "udmabuf@0" reserves physical memory from the CMA area specified by "image_buf0".

The memory-region property is optional. When the memory-region property is not specified, u-dma-buf allocates the DMA buffer from the CMA area allocated to the Linux kernel.

Configuration via the /dev/u-dma-buf-mgr

Since u-dma-buf v4.0, u-dma-buf devices can be create or delete using u-dma-buf-mgr. See https://github.com/ikwzm/u-dma-buf-mgr for more information.

Device file

When u-dma-buf is loaded into the kernel, the following device files are created. <device-name> is a placeholder for the device name described in the previous section.

  • /dev/<device-name>
  • /sys/class/u-dma-buf/<device-name>/phys_addr
  • /sys/class/u-dma-buf/<device-name>/size
  • /sys/class/u-dma-buf/<device-name>/sync_mode
  • /sys/class/u-dma-buf/<device-name>/sync_offset
  • /sys/class/u-dma-buf/<device-name>/sync_size
  • /sys/class/u-dma-buf/<device-name>/sync_direction
  • /sys/class/u-dma-buf/<device-name>/sync_owner
  • /sys/class/u-dma-buf/<device-name>/sync_for_cpu
  • /sys/class/u-dma-buf/<device-name>/sync_for_device
  • /sys/class/u-dma-buf/<device-name>/dma_coherent

/dev/<device-name>

/dev/<device-name> is used when mmap()-ed to the user space or accessed via read()/write().

    if ((fd  = open("/dev/udmabuf0", O_RDWR)) != -1) {
        buf = mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        /* Do some read/write access to buf */
        close(fd);
    }

The device file can be directly read/written by specifying the device as the target of dd in the shell.

zynq$ dd if=/dev/urandom of=/dev/udmabuf0 bs=4096 count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 3.07516 s, 1.4 MB/s
zynq$dd if=/dev/udmabuf4 of=random.bin
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB) copied, 0.173866 s, 24.1 MB/s

phys_addr

The physical address of a DMA buffer can be retrieved by reading /sys/class/u-dma-buf/<device-name>/phys_addr.

    unsigned char  attr[1024];
    unsigned long  phys_addr;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/phys_addr", O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%x", &phys_addr);
        close(fd);
    }

size

The size of a DMA buffer can be retrieved by reading /sys/class/u-dma-buf/<device-name>/size.

    unsigned char  attr[1024];
    unsigned int   buf_size;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/size", O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%d", &buf_size);
        close(fd);
    }

sync_mode

The device file /sys/class/u-dma-buf/<device-name>/sync_mode is used to configure the behavior when u-dma-buf is opened with the O_SYNC flag.

    unsigned char  attr[1024];
    unsigned long  sync_mode = 2;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_mode", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_mode);
        write(fd, attr, strlen(attr));
        close(fd);
    }

Details on O_SYNC and cache management will be described in the next section.

sync_offset

The device file /sys/class/u-dma-buf/<device-name>/sync_offset is used to specify the start address of a memory block of which cache is manually managed.

    unsigned char  attr[1024];
    unsigned long  sync_offset = 0x00000000;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_offset", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_offset); /* or sprintf(attr, "0x%x", sync_offset); */
        write(fd, attr, strlen(attr));
        close(fd);
    }

Details of manual cache management is described in the next section.

sync_size

The device file /sys/class/u-dma-buf/<device-name>/sync_size is used to specify the size of a memory block of which cache is manually managed.

    unsigned char  attr[1024];
    unsigned long  sync_size = 1024;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_size", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_size); /* or sprintf(attr, "0x%x", sync_size); */
        write(fd, attr, strlen(attr));
        close(fd);
    }

Details of manual cache management is described in the next section.

sync_direction

The device file /sys/class/u-dma-buf/<device-name>/sync_direction is used to set the direction of DMA transfer to/from the DMA buffer of which cache is manually managed.

  • 0: sets DMA_BIDIRECTIONAL
  • 1: sets DMA_TO_DEVICE
  • 2: sets DMA_FROM_DEVICE
    unsigned char  attr[1024];
    unsigned long  sync_direction = 1;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_direction", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_direction);
        write(fd, attr, strlen(attr));
        close(fd);
    }

Details of manual cache management is described in the next section.

dma_coherent

The device file /sys/class/u-dma-buf/<device-name>/dma_coherent can read whether the coherency of DMA buffer and CPU cache can be guaranteed by hardware. It is able to specify whether or not it is able to guarantee by hardware with the dma-coherent property in the device tree, but this device file is read-only.

If this value is 1, the coherency of DMA buffer and CPU cache can be guaranteed by hardware. If this value is 0, the coherency of DMA buffer and CPU cache can be not guaranteed by hardware.

    unsigned char  attr[1024];
    int dma_coherent;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/dma_coherent", O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%x", &dma_coherent);
        close(fd);
    }

sync_owner

The device file /sys/class/u-dma-buf/<device-name>/sync_owner reports the owner of the memory block in the manual cache management mode. If this value is 1, the buffer is owned by the device. If this value is 0, the buffer is owned by the cpu.

    unsigned char  attr[1024];
    int sync_owner;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_owner", O_RDONLY)) != -1) {
        read(fd, attr, 1024);
        sscanf(attr, "%x", &sync_owner);
        close(fd);
    }

Details of manual cache management is described in the next section.

sync_for_cpu

In the manual cache management mode, CPU can be the owner of the buffer by writing non-zero to the device file /sys/class/u-dma-buf/<device-name>/sync_for_cpu. This device file is write only.

If '1' is written to device file, if sync_direction is 2(=DMA_FROM_DEVICE) or 0(=DMA_BIDIRECTIONAL), the write to the device file invalidates a cache specified by sync_offset and sync_size.

    unsigned char  attr[1024];
    unsigned long  sync_for_cpu = 1;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_for_cpu);
        write(fd, attr, strlen(attr));
        close(fd);
    }

The value written to this device file can include sync_offset, sync_size, and sync_direction.

    unsigned char  attr[1024];
    unsigned long  sync_offset    = 0;
    unsigned long  sync_size      = 0x10000;
    unsigned int   sync_direction = 0;
    unsigned long  sync_for_cpu   = 1;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_cpu", O_WRONLY)) != -1) {
        sprintf(attr, "0x%08X%08X", (sync_offset & 0xFFFFFFFF), (sync_size & 0xFFFFFFF0) | (sync_direction << 2) | sync_for_cpu);
        write(fd, attr, strlen(attr));
        close(fd);
    }

The sync_offset/sync_size/sync_direction specified by sync_for_cpu is temporary and does not affect the sync_offset or sync_size or sync_direction device files.

Details of manual cache management is described in the next section.

sync_for_device

In the manual cache management mode, DEVICE can be the owner of the buffer by writing non-zero to the device file /sys/class/u-dma-buf/<device-name>/sync_for_device. This device file is write only.

If '1' is written to device file, if sync_direction is 1(=DMA_TO_DEVICE) or 0(=DMA_BIDIRECTIONAL), the write to the device file flushes a cache specified by sync_offset and sync_size (i.e. the cached data, if any, will be updated with data on DDR memory).

    unsigned char  attr[1024];
    unsigned long  sync_for_device = 1;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
        sprintf(attr, "%d", sync_for_device);
        write(fd, attr, strlen(attr));
        close(fd);
    }

The value written to this device file can include sync_offset, sync_size, and sync_direction.

    unsigned char  attr[1024];
    unsigned long  sync_offset     = 0;
    unsigned long  sync_size       = 0x10000;
    unsigned int   sync_direction  = 0;
    unsigned long  sync_for_device = 1;
    if ((fd  = open("/sys/class/u-dma-buf/udmabuf0/sync_for_device", O_WRONLY)) != -1) {
        sprintf(attr, "0x%08X%08X", (sync_offset & 0xFFFFFFFF), (sync_size & 0xFFFFFFF0) | (sync_direction << 2) | sync_for_device);
        write(fd, attr, strlen(attr));
        close(fd);
    }

The sync_offset/sync_size/sync_direction specified by sync_for_device is temporary and does not affect the sync_offset or sync_size or sync_direction device files.

Details of manual cache management is described in the next section.

Coherency of data on DMA buffer and CPU cache

CPU usually accesses to a DMA buffer on the main memory using cache, and a hardware accelerator logic accesses to data stored in the DMA buffer on the main memory. In this situation, coherency between data stored on CPU cache and them on the main memory should be considered carefully.

When the coherency is maintained by hardware

When hardware assures the coherency, CPU cache can be turned on without additional treatment. For example, ZYNQ provides ACP (Accelerator Coherency Port), and the coherency is maintained by hardware as long as the accelerator accesses to the main memory via this port.

In this case, accesses from CPU to the main memory can be fast by using CPU cache as usual. To enable CPU cache on the DMA buffer allocated by u-dma-buf, open u-dma-buf without specifying the O_SYNC flag.

    /* To enable CPU cache on the DMA buffer, */
    /* open u-dma-buf without specifying the `O_SYNC` flag. */
    if ((fd  = open("/dev/udmabuf0", O_RDWR)) != -1) {
        buf = mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        /* Read/write access to the buffer */
        close(fd);
    }

The manual management of cache, described in the following section, will not be necessary when hardware maintains the coherency.

If the dma-coherent property is specified in the device tree, specify that coherency can be guaranteed with hardware. In this case, the cache control described in "2. Manual cache management with the CPU cache still being enabled" described later is not performed.

When hardware does not maintain the coherency

To maintain coherency of data between CPU and the main memory, another coherency mechanism is necessary. u-dma-buf supports two different ways of coherency maintenance; one is to disable CPU cache, and the other is to involve manual cache flush/invalidation with CPU cache being enabled.

1. Disabling CPU cache

To disable CPU cache of allocated DMA buffer, specify the O_SYNC flag when opening u-dma-buf.

    /* To disable CPU cache on the DMA buffer, */
    /* open u-dma-buf with the `O_SYNC` flag. */
    if ((fd  = open("/dev/udmabuf0", O_RDWR | O_SYNC)) != -1) {
        buf = mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        /* Read/write access to the buffer */
        close(fd);
    }

As listed below, sync_mode can be used to configure the cache behavior when the O_SYNC flag is present in open():

  • sync_mode=0: CPU cache is enabled regardless of the O_SYNC flag presence.
  • sync_mode=1: If O_SYNC is specified, CPU cache is disabled. If O_SYNC is not specified, CPU cache is enabled.
  • sync_mode=2: If O_SYNC is specified, CPU cache is disabled but CPU uses write-combine when writing data to DMA buffer improves performance by combining multiple write accesses. If O_SYNC is not specified, CPU cache is enabled.
  • sync_mode=3: If O_SYNC is specified, DMA coherency mode is used. If O_SYNC is not specified, CPU cache is enabled.
  • sync_mode=4: CPU cache is enabled regardless of the O_SYNC flag presence.
  • sync_mode=5: CPU cache is disabled regardless of the O_SYNC flag presence.
  • sync_mode=6: CPU uses write-combine to write data to DMA buffer regardless of O_SYNC presence.
  • sync_mode=7: DMA coherency mode is used regardless of O_SYNC presence.

As a practical example, the execution times of a sample program listed below were measured under several test conditions as presented in the table.

int check_buf(unsigned char* buf, unsigned int size)
{
    int m = 256;
    int n = 10;
    int i, k;
    int error_count = 0;
    while(--n > 0) {
      for(i = 0; i < size; i = i + m) {
        m = (i+256 < size) ? 256 : (size-i);
        for(k = 0; k < m; k++) {
          buf[i+k] = (k & 0xFF);
        }
        for(k = 0; k < m; k++) {
          if (buf[i+k] != (k & 0xFF)) {
            error_count++;
          }
        }
      }
    }
    return error_count;
}
int clear_buf(unsigned char* buf, unsigned int size)
{
    int n = 100;
    int error_count = 0;
    while(--n > 0) {
      memset((void*)buf, 0, size);
    }
    return error_count;
}

Table-1 The execution time of the sample program checkbuf

sync_mode O_SYNC DMA buffer size
1MByte 5MByte 10MByte
0 Not specified 0.437[sec] 2.171[sec] 4.340[sec]
Specified 0.437[sec] 2.171[sec] 4.340[sec]
1 Not specified 0.434[sec] 2.179[sec] 4.337[sec]
Specified 2.283[sec] 11.414[sec] 22.830[sec]
2 Not specified 0.434[sec] 2.169[sec] 4.337[sec]
Specified 1.616[sec] 8.262[sec] 16.562[sec]
3 Not specified 0.434[sec] 2.169[sec] 4.337[sec]
Specified 1.600[sec] 8.391[sec] 16.587[sec]
4 Not specified 0.437[sec] 2.171[sec] 4.337[sec]
Specified 0.437[sec] 2.171[sec] 4.337[sec]
5 Not specified 2.283[sec] 11.414[sec] 22.809[sec]
Specified 2.283[sec] 11.414[sec] 22.840[sec]
6 Not specified 1.655[sec] 8.391[sec] 16.587[sec]
Specified 1.655[sec] 8.391[sec] 16.587[sec]
7 Not specified 1.655[sec] 8.391[sec] 16.587[sec]
Specified 1.655[sec] 8.391[sec] 16.587[sec]

Table-2 The execution time of the sample program clearbuf

sync_mode O_SYNC DMA buffer size
1MByte 5MByte 10MByte
0 Not specified 0.067[sec] 0.359[sec] 0.713[sec]
Specified 0.067[sec] 0.362[sec] 0.716[sec]
1 Not specified 0.067[sec] 0.362[sec] 0.718[sec]
Specified 0.912[sec] 4.563[sec] 9.126[sec]
2 Not specified 0.068[sec] 0.360[sec] 0.721[sec]
Specified 0.063[sec] 0.310[sec] 0.620[sec]
3 Not specified 0.068[sec] 0.361[sec] 0.715[sec]
Specified 0.062[sec] 0.310[sec] 0.620[sec]
4 Not specified 0.068[sec] 0.360[sec] 0.718[sec]
Specified 0.067[sec] 0.360[sec] 0.710[sec]
5 Not specified 0.913[sec] 4.562[sec] 9.126[sec]
Specified 0.913[sec] 4.562[sec] 9.126[sec]
6 Not specified 0.062[sec] 0.310[sec] 0.618[sec]
Specified 0.062[sec] 0.310[sec] 0.619[sec]
7 Not specified 0.062[sec] 0.310[sec] 0.620[sec]
Specified 0.062[sec] 0.310[sec] 0.621[sec]

Note: on using O_SYNC flag on ARM64

For v2.1.1 or earier, udmabuf used pgprot_writecombine() on ARM64 and sync_mode=1(noncached). The reason is that a bus error occurred in memset() in udmabuf_test.c when using pgprot_noncached().

However, as reported in #28, when using pgprot_writecombine() on ARM64, it was found that there was a problem with cache coherency.

Therefore, since v2.1.2, when sync_mode = 1, it was changed to use pgprot_noncached(). This is because cache coherency issues are very difficult to understand and difficult to debug. Rather than worrying about the cache coherency problem, we decided that it was easier to understand when the bus error occurred.

This change requires alignment attention when using O_SYNC cache control on ARM64. You probably won't be able to use memset().

If a problem occurs, either cache coherency is maintained by hardware, or use a method described below that manually cache management with CPU cache still being enabled.

2. Manual cache management with the CPU cache still being enabled

As explained above, by opening u-dma-buf without specifying the O_SYNC flag, CPU cache can be left turned on. However, for ARM or ARM64, this is only possible if quirk-mmap is enabled. quirk-mmap will be discussed in detail later.

    /* To enable CPU cache on the DMA buffer, */
    /* open u-dma-buf without specifying the `O_SYNC` flag. */
    if ((fd  = open("/dev/udmabuf0", O_RDWR)) != -1) {
        buf = mmap(NULL, buf_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        /* Read/write access to the buffer */
        close(fd);
    }

To manually manage cache coherency, users need to follow the

  1. Specify a memory area shared between CPU and accelerator via sync_offset and sync_size device files. sync_offset accepts an offset from the start address of the allocated buffer in units of bytes. The size of the shared memory area should be set to sync_size in units of bytes.
  2. Data transfer direction should be set to sync_direction. If the accelerator performs only read accesses to the memory area, sync_direction should be set to 1(=DMA_TO_DEVICE), and to 2(=DMA_FROM_DEVICE) if only write accesses.
  3. If the accelerator reads and writes data from/to the memory area, sync_direction should be set to 0(=DMA_BIDIRECTIONAL).

Following the above configuration, sync_for_cpu and/or sync_for_device should be used to set the owner of the buffer specified by the above-mentioned offset and the size.

When CPU accesses to the buffer, '1' should be written to sync_for_cpu to set CPU as the owner. Upon the write to sync_for_cpu, CPU cache is invalidated if sync_direction is 2(=DMA_FROM_DEVICE) or 0(=DMA_BIDIRECTIONAL). Once CPU is becomes the owner of the buffer, the accelerator cannot access the buffer.

On the other hand, when the accelerator needs to access the buffer, '1' should be written to sync_for_device to change ownership of the buffer to the accelerator. Upon the write to sync_for_device, the CPU cache of the specified memory area is flushed using data on the main memory.

However, if the dma-coherent property is specified in the device tree, CPU cache is not invalidated and flushed.

Note: What is quirk-mmap?

The Linux Kernel mainline turns off caching when doing mmap() for architectures such as ARM and ARM64 where cache aliasing problems can occur.

However, u-dma-buf provides quirk-mmap to enable caching in cases where the above architecture does not cause cache alias problems. The quirk-mmap is u-dma-buf's own mmap mechanism and does not utilize the dma_mmap_coherent() provided by the dma-mapping API in the linux kernel. This may cause problems in some cases, so please be careful when using it.

Example using u-dma-buf with Python

The programming language "Python" provides an extension called "NumPy". This section explains how to do the same operation as "ndarry" by mapping the DMA buffer allocated in the kernel with memmap of "NumPy" with u-dma-buf.

Udmabuf Class

import numpy as np

class Udmabuf:
    """A simple u-dma-buf class"""
    def __init__(self, name):
        self.name           = name
        self.device_name    = '/dev/%s'                 % self.name
        self.class_path     = '/sys/class/u-dma-buf/%s' % self.name
        self.phys_addr      = self.get_value('phys_addr', 16)
        self.buf_size       = self.get_value('size')
        self.sync_offset    = None
        self.sync_size      = None
        self.sync_direction = None

    def memmap(self, dtype, shape):
        self.item_size = np.dtype(dtype).itemsize
        self.array     = np.memmap(self.device_name, dtype=dtype, mode='r+', shape=shape)
        return self.array

    def get_value(self, name, radix=10):
        value = None
        for line in open(self.class_path + '/' + name):
            value = int(line, radix)
            break
        return value
    def set_value(self, name, value):
        f = open(self.class_path + '/' + name, 'w')
        f.write(str(value))
        f.close

    def set_sync_area(self, direction=None, offset=None, size=None):
        if offset is None:
            self.sync_offset    = self.get_value('sync_offset')
        else:
            self.set_value('sync_offset', offset)
            self.sync_offset    = offset
        if size   is None:
            self.sync_size      = self.get_value('sync_size')
        else:
            self.set_value('sync_size', size)
            self.sync_size      = size
        if direction is None:
            self.sync_direction = self.get_value('sync_direction')
        else:
            self.set_value('sync_direction', direction)
            self.sync_direction = direction

    def set_sync_to_device(self, offset=None, size=None):
        self.set_sync_area(1, offset, size)

    def set_sync_to_cpu(self, offset=None, size=None):
        self.set_sync_area(2, offset, size)

    def set_sync_to_bidirectional(self, offset=None, size=None):
        self.set_sync_area(3, offset, size)

    def sync_for_cpu(self):
        self.set_value('sync_for_cpu', 1)

    def sync_for_device(self):
        self.set_value('sync_for_device', 1)

udmabuf_test.py

from udmabuf import Udmabuf
import numpy as np
import time
def test_1(a):
    for i in range (0,9):
        a *= 0
        a += 0x31
if __name__ == '__main__':
    udmabuf      = Udmabuf('udmabuf0')
    test_dtype   = np.uint8
    test_size    = udmabuf.buf_size//(np.dtype(test_dtype).itemsize)
    udmabuf.memmap(dtype=test_dtype, shape=(test_size))
    comparison   = np.zeros(test_size, dtype=test_dtype)
    print ("test_size  : %d" % test_size)
    start        = time.time()
    test_1(udmabuf.array)
    elapsed_time = time.time() - start
    print ("udmabuf0   : elapsed_time:{0}".format(elapsed_time) + "[sec]")
    start        = time.time()
    test_1(comparison)
    elapsed_time = time.time() - start
    print ("comparison : elapsed_time:{0}".format(elapsed_time) + "[sec]")
    if np.array_equal(udmabuf.array, comparison):
        print ("udmabuf0 == comparison : OK")
    else:
        print ("udmabuf0 != comparison : NG")

Execution result

Install u-dma-buf. In this example, 8MiB DMA buffer is reserved as "udmabuf0".

zynq# insmod u-dma-buf.ko udmabuf0=8388608
[ 1183.911189] u-dma-buf udmabuf0: driver version = 4.5.2
[ 1183.921238] u-dma-buf udmabuf0: major number   = 240
[ 1183.931275] u-dma-buf udmabuf0: minor number   = 0
[ 1183.936063] u-dma-buf udmabuf0: phys address   = 0x0000000041600000
[ 1183.942328] u-dma-buf udmabuf0: buffer size    = 8388608
[ 1183.947641] u-dma-buf u-dma-buf.0: driver installed.

Executing the script in the previous section gives the following results.

zynq# python3 udmabuf_test.py
test_size  : 8388608
udmabuf0   : elapsed_time:0.11204075813293457[sec]
comparison : elapsed_time:0.11488151550292969[sec]
udmabuf0 == comparison : OK

The execution time for "udmabuf0"(buffer area secured in the kernel) and the same operation with ndarray (comparison) were almost the same. That is, it seems that "udmabuf0" is also effective CPU cache.

I confirmed the contents of "udmabuf0" after running this script.

zynq# dd if=/dev/udmabuf0 of=udmabuf0.bin bs=8388608
1+0 records in
1+0 records out
8388608 bytes (8.4 MB) copied, 0.151531 s, 55.4 MB/s
shell# 
shell# od -t x1 udmabuf0.bin
0000000 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
*
40000000

After executing the script, it was confirmed that the result of the execution remains in the buffer. Just to be sure, let's check that NumPy can read it.

zynq# python
Python 2.7.9 (default, Aug 13 2016, 17:56:53)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> a = np.memmap('/dev/udmabuf0', dtype=np.uint8, mode='r+', shape=(8388608))
>>> a
memmap([49, 49, 49, ..., 49, 49, 49], dtype=uint8)
>>> a.itemsize
1
>>> a.size
8388608
>>>

udmabuf's People

Contributors

agamez avatar avpatel avatar bcattle avatar caryan avatar d3-jwatts avatar expipiplus1 avatar fbezdeka avatar felixonmars avatar gpanders avatar ikwzm avatar luca-della-vedova avatar stefano-garzarella avatar tichkr avatar yuasatakayuki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

udmabuf's Issues

Only able to make DMA transactions with v1.4.6 for AXI DMAC

Hi Ichiro,

Just wanted to let you know that for ADI's AXI DMAC IP I was only able to make DMA transactions using udmabuf 1.4.6. I tried it for zynq 7000 vivado 2019.1. I tried u-dma-buf master branch and also version 1.4.8 for both of these my udmabuf didnt populate. Then I tried udmabuf 1.4.6 on the same HW and my dmabuffer began to fill with data.

Get a "cannot execute on arm64 due to bus error" error when running the test exemple

Hi,
I try to use your module on a Nvidia Xavier AGX board. The compilation and module loadging go without any error. But when I tried to run the example, I got the "cannot execute on arm64 due to bus error" message. Here is the complete output:

machine = aarch64
phys_addr=0xe6900000
size=1048576
check_buf()
sync_mode=0, O_SYNC=0, sync_mode=0, O_SYNC=1, sync_mode=1, O_SYNC=0, sync_mode=1, O_SYNC=1, sync_mode=2, O_SYNC=0, sync_mode=2, O_SYNC=1, sync_mode=3, O_SYNC=0, sync_mode=3, O_SYNC=1, sync_mode=4, O_SYNC=0, sync_mode=4, O_SYNC=1, sync_mode=5, O_SYNC=0, sync_mode=5, O_SYNC=1, sync_mode=6, O_SYNC=0, sync_mode=6, O_SYNC=1, sync_mode=7, O_SYNC=0, sync_mode=7, O_SYNC=1, clear_buf()
sync_mode=0, O_SYNC=0, sync_mode=0, O_SYNC=1, sync_mode=1, O_SYNC=0, sync_mode=1, O_SYNC=1, cannot execute on arm64 due to bus error.
sync_mode=2, O_SYNC=0, sync_mode=2, O_SYNC=1, sync_mode=3, O_SYNC=0, sync_mode=3, O_SYNC=1, sync_mode=4, O_SYNC=0, sync_mode=4, O_SYNC=1, sync_mode=5, O_SYNC=0, cannot execute on arm64 due to bus error.
sync_mode=5, O_SYNC=1, cannot execute on arm64 due to bus error.
sync_mode=6, O_SYNC=0, sync_mode=6, O_SYNC=1, sync_mode=7, O_SYNC=0, sync_mode=7, O_SYNC=1,

How can I fix this problem. My goal is to use udmabuf with uio in order to perform DMA transfers between the Xavier system memory and a NVMe SSD drive (connected through a M.2 Key M)

Thanks

dd command gets stuck

Hi , when a use insmod command,
I get
udmabuf udmabuf0: driver version = 1.3.2
udmabuf udmabuf0: major number = 244
udmabuf udmabuf0: minor number = 0
udmabuf udmabuf0: phys address = 0x3e500000
udmabuf udmabuf0: buffer size = 1048576
udmabuf udmabuf0: dma coherent = 0
udmabuf udmabuf.0: driver installed.

and see the drivers at /dev ...
but then I try the dd command and I get stuck .
I am using
Linux peta64v2 4.14.0-xilinx-v2018.2 #4 SMP PREEMPT Sat Feb 9 22:53:26 PST 2019 armv7l GNU/Linux

do I have to edit the device tree ?

Usage on kernel 3.10 ?

I tried to compile this for kernel 3.10., and got some compile errors.
I looked at the list of supported kernels again, and discovered a gap in versions: 3.6 - 3.8, 3.18, ...
Is there a specific reason for omitting 3.10, i.e. will it definitely be a waste of time to try it with that kernel, or do you just not have a report of anyone succeeding with that kernel version and hence not listed it?
(if I get it to work, I will report back for another number to add to your list ;) May take a while though, I'm new to anything kernel related...)

EDIT:
Btw., one of such error's compiling get's me is this:
udmabuf.c:304:43: error: implicit declaration of function ‘is_device_dma_coherent’ [-Werror=implicit-function-declaration] DEF_ATTR_SHOW(dma_coherent , "%d\n" , is_device_dma_coherent(this->dma_dev)
The other errors are like that, something undeclared, and then some warnings about types defaulting to int, e.g. of DEFINE_IDA.

How to make a DMA transfer?

What is the purpose of the driver ?
How it can be used to really transfer the buffer using a DMA ? Isn't it an architecture dependent ?
Why it is depended only at ARM ?
For example, using PPC chip with a PCIe device.

Ubuntu 16 Compile Error

I am trying to compile this for Petalinux 2019.2 using a Ubuntu 16.04 development environment on kernel version 4.15. Petalinux fails compilation during the compilation of this module throwing error,
include/linux/refcount.h:58:12: fatal error: asm/refcount.h: No such file or directory. I wouldn't expect that my version of the OS is too old to support this. Do you have any experience with this error mode? I can't seem to find any references to an asm/refcount.h anywhere online. Any help would be appreciated.

Buffer for BRAM

Hi Ichiro,

If I want to use udmabuf for vivado’s BRAM IP core.

Do I need to set the bram cntrl node as a uio node in device tree or do I follow the memory region example below?

Thanks man

Test program failure on aarch64 system

We are attempting to use the udmabuf device on an aarch64 system. When running the udmabuf_test.c program, it will succeed the first time. The second time we run the program, the kernel will issue errors and the system will go down.

$ uname -a
Linux XXXXX 4.14.0-49.el7a.aarch64 #1 SMP Wed Mar 14 10:02:34 EDT 2018 aarch64 aarch64 aarch64 GNU/Linux

$ ./udmabuf_test.x 
phys_addr=0x81b00000
size=1048576
check_buf()
sync_mode=0, O_SYNC=1, time = 2.175054 sec
$ ./udmabuf_test.x 
phys_addr=0x81b00000
size=1048576
check_buf()
sync_mode=0, O_SYNC=1, time = 2.175091 sec

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:page:ffff7fe000006c80 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:21:11 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)

There are many of similar error messages.

dmesg:

[Jul 1 16:03] udmabuf udmabuf0: driver version = 1.4.1
[  +0.005047] udmabuf udmabuf0: major number   = 239
[  +0.004886] udmabuf udmabuf0: minor number   = 0
[  +0.004692] udmabuf udmabuf0: phys address   = 0x0000000081b00000
[  +0.006178] udmabuf udmabuf0: buffer size    = 1048576
[  +0.005213] udmabuf udmabuf0: dma coherent   = 0
[  +0.004700] udmabuf udmabuf.0: driver installed.
[Jul 1 16:06] BUG: Bad page map in process udmabuf_test.x  pte:e8000081b10f4f pmd:bf67ac0003
[  +0.008352] page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0
[  +0.008352] flags: 0xfffff0000000014(referenced|dirty)
[  +0.005222] raw: 0fffff0000000014 0000000000000000 0000000000000000 00000001ffffff80

Message from syslogd@vulcan2 at Jul  1 16:06:17 ...
 kernel:page:ffff7fe000006c40 count:1 mapcount:-127 mapping:          (null) index:0x0

Message from syslogd@vulcan2 at Jul  1 16:06:17 ...
 kernel:flags: 0xfffff0000000014(referenced|dirty)
[  +0.007817] raw: ffff7fe000006f60 ffff7fe0000069e0 0000000000000000 0000000000000000
[  +0.007828] page dumped because: bad pte
[  +0.004009] addr:0000ffff89380000 vm_flags:000040fb anon_vma:          (null) mapping:ffff809f5141dde0 index:1
[  +0.010100] file:udmabuf0 fault:udmabuf_device_vma_fault [udmabuf] mmap:udmabuf_device_file_mmap [udmabuf] readpage:          (null)
[  +0.011998] CPU: 28 PID: 36133 Comm: udmabuf_test.x Kdump: loaded Tainted: G           OE  ------------   4.14.0-49.el7a.aarch64 #1
[  +0.011897] Hardware name: To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M., BIOS 0ACKQ214 01/12/2018
[  +0.011028] Call trace:
[  +0.002528] [<ffff000008088e6c>] dump_backtrace+0x0/0x23c
[  +0.005474] [<ffff0000080890cc>] show_stack+0x24/0x2c
[  +0.005131] [<ffff0000087f607c>] dump_stack+0x84/0xa8
[  +0.005131] [<ffff00000823a324>] print_bad_pte+0x16c/0x1f0
[  +0.005560] [<ffff00000823cd48>] unmap_page_range+0x4f0/0x708
[  +0.005820] [<ffff00000823d004>] unmap_single_vma+0xa4/0xf8
[  +0.005646] [<ffff00000823d3a8>] unmap_vmas+0x70/0xbc
[  +0.005127] [<ffff000008243740>] unmap_region+0xb0/0x120
[  +0.005385] [<ffff000008246080>] do_munmap+0x1e8/0x2f8
[  +0.005212] [<ffff0000082467f8>] vm_munmap+0x6c/0xb8
[  +0.005038] [<ffff000008246874>] SyS_munmap+0x30/0x40
[  +0.005125] Exception stack(0xffff00003020fec0 to 0xffff000030210000)
[  +0.006515] fec0: 0000ffff89370000 0000000000100000 0000000000000001 0000000000000000
[  +0.007904] fee0: 0000000000401160 0000000000000bd0 0000ffff894badd4 0000000000000000
[  +0.007902] ff00: 00000000000000d7 0000ffff895f15b0 00000000ffffffff 0000000000000000
[  +0.007903] ff20: 0000000000000018 00000003e8000000 000e6ed3c5384905 00009ea30d20eab6
[  +0.007903] ff40: 0000ffff8954ae00 0000000000420070 0000ffffc15f8b50 0000000000000000
[  +0.007903] ff60: 0000000000000000 00000000004008e0 0000000000000000 0000000000000000
[  +0.007902] ff80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  +0.007903] ffa0: 0000000000000000 0000ffffc15f8d80 0000000000400d48 0000ffffc15f8d80
[  +0.007903] ffc0: 0000ffff8954ae08 0000000080000000 0000ffff89370000 00000000000000d7
[  +0.007903] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  +0.007904] [<ffff00000808359c>] __sys_trace_return+0x0/0x4
[  +0.005658] Disabling lock debugging due to kernel taint

Not working for kernel 5.4.0

The kernel module won't compile with petalinux 2020.1 with kernel version 5.4.0.
It was working with petalinux 2019.2 and kernel version 4.19.x

I have created a new module with petalinux 2020.1 just like i did with 2019.2 but changed the module name from udmabuf to u-dma-buf because it was mentioned that after kernel 5.0 an udmabuf module exists.

I will attach the log file from the petalinux output with udmabuf-2.1.5. and udmabuf-2.2.0-rc3.
udmabuf-2.1.5-log.txt
udmabuf-2.2.0-rc3-log.txt

Am i doing something wrong or do i miss something?

Zynq Ultrascale+ (ARM Cortex A-53) Kernel 4.19 problem

There is a compilation problem while it is being used with Petalinux 2019.1. This version includes kernel version 4.19. I want to use it with Zynq Ultrascale+ (ARM A-53) and Linux 4.19 kernel.

Here is the compilation error.

| /home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work/plnx_zynqmp-xilinx-linux/udmabuf/1.0-r0/udmabuf.c: In function 'udmabuf_platform_driver_probe':
| /home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work/plnx_zynqmp-xilinx-linux/udmabuf/1.0-r0/udmabuf.c:1246:18: error: too few arguments to function 'of_dma_configure'

|          retval = of_dma_configure(&pdev->dev, pdev->dev.of_node);
|                   ^~~~~~~~~~~~~~~~
| In file included from /home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work/plnx_zynqmp-xilinx-linux/udmabuf/1.0-r0/udmabuf.c:44:
| /home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work-shared/plnx-zynqmp/kernel-source/include/linux/of_device.h:58:5: note: declared here
|  int of_dma_configure(struct device *dev,
|      ^~~~~~~~~~~~~~~~
| make[3]: *** [/home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work-shared/plnx-zynqmp/kernel-source/scripts/Makefile.build:312: /home/oguzhancik/petalinux-projects/zcu102-works/build/tmp/work/plnx_zynqmp-xilinx-linux/udmabuf/1.0-r0/udmabuf.o] Error 1
| ERROR: oe_runmake failed

I need to used it for my DMA design from userspace and buffer management is based on udmabuf kernel module.

Leaking memory

When configuring a udmabuf device in the device tree and using reserved memory, repeatedly loading and unloading the kernel module seems to leak the memory. Here is the relevant portion of the device tree:

reserved-memory {
	#address-cells = <1>;
	#size-cells = <1>;
	ranges;

	fpga_reserved1: buffer@20000000 {
		compatible = "shared-dma-pool";
		no-map;
		reg = <0x20000000 0x08000000>;
	};
};

udmabuf@0 {
	compatible = "ikwzm,udmabuf-0.10.a";
	device-name = "udmabuf0";
	minor-number = <0>;
	size = <0x00C80000>;
	memory-region = <&fpga_reserved1>;
};

Here is dmesg output after running insmod and rmmod repeatedly:

[   20.093466] udmabuf udmabuf@0: driver probe start.
[   20.098066] udmabuf udmabuf@0: assigned reserved memory node buffer@20000000
[   20.109583] udmabuf udmabuf0: major number   = 245
[   20.109597] udmabuf udmabuf0: minor number   = 0
[   20.109606] udmabuf udmabuf0: phys address   = 0x20000000
[   20.109614] udmabuf udmabuf0: buffer size    = 13107200
[   20.109621] udmabuf udmabuf0: dma coherent   = 0
[   20.109629] udmabuf udmabuf@0: driver installed.
[   75.176237] udmabuf udmabuf@0: driver remove start.
[   75.176505] udmabuf udmabuf@0: driver removed.
[   78.847090] udmabuf udmabuf@0: driver probe start.
[   78.847453] udmabuf udmabuf@0: assigned reserved memory node buffer@20000000
[   78.859316] udmabuf udmabuf0: major number   = 244
[   78.859329] udmabuf udmabuf0: minor number   = 0
[   78.859338] udmabuf udmabuf0: phys address   = 0x21000000
[   78.859346] udmabuf udmabuf0: buffer size    = 13107200
[   78.859353] udmabuf udmabuf0: dma coherent   = 0
[   78.859361] udmabuf udmabuf@0: driver installed.
[   82.806212] udmabuf udmabuf@0: driver remove start.
[   82.806493] udmabuf udmabuf@0: driver removed.
[   84.507019] udmabuf udmabuf@0: driver probe start.
[   84.507396] udmabuf udmabuf@0: assigned reserved memory node buffer@20000000
[   84.522188] udmabuf udmabuf0: major number   = 243
[   84.522200] udmabuf udmabuf0: minor number   = 0
[   84.522210] udmabuf udmabuf0: phys address   = 0x22000000
[   84.522218] udmabuf udmabuf0: buffer size    = 13107200
[   84.522225] udmabuf udmabuf0: dma coherent   = 0
[   84.522233] udmabuf udmabuf@0: driver installed.
[   86.336180] udmabuf udmabuf@0: driver remove start.
[   86.336450] udmabuf udmabuf@0: driver removed.

Notice how the physical memory address goes up with every load of the kernel module. This would continue until the assigned DMA pool is depleted.

I have tested this on a 4.9.0 kernel.

Cache coherency?

Hi,

I am trying to read/write to/from a PL BRAM with a CDMA on a ZynqMPSoc device.
I want to enable cache coherency between the PS and the PL, so I am using the HPC0 port.
I have set AxCache to 0b1111 and AxProt to 0b010 and the lpd_apu (0xFF41A040) to 0x3 to enable inner and outer shareability.

Using the udmabuf with SYNC_MODE_WRITECOMBINE I have the same performances as with no cache coherency : bandwidth to/from BRAM seems okay, but PS to PS memory in the allocated region is not since the region is not cacheable.

When I allocate the memory with mode 0 (i.e neither noncached nor writecombine nor dmacoherent) I have very good PS to PS performances, the same as with malloc, but PS to/from PL performances plummet.

Is there a proper way to enable cache coherency (and snooping) with the udmabuf driver?

Thank you for your help.

Support for High DDR Memory?

Hi,

ZynqMP has two memory regions, low memory and high memory. E.g. on ZCU106, two memory regions exist:
image

It seems like udmabuf always allocate dma buffer in low memory. But that size is limited compared to the high memory.
Is there any support to allocate a dma buffer in high DDR memory?

Thank you.

Optimization flags

I'm using petalinux on a zynq ultrascale+ (aarch64-linux-gnu-gcc)

I have a dummy script initializing udma/dma and a custom fft accelerator, and sending data to the devices.
Without optimization flags everything works fine.
However as soon as I optimize for "-O1" sometimes it will finish executing the script and some others it will stall.
And over "-O2" It will never finish executing.

Has anyone experienced something similar, Is there something that optimization flags could be doing to cause this behavior?

Buffer from UDMABUF and V4L2_MEMORY_USERPTR

Hi!

I tried to pass buffer taken from UDMABUF to the V4L2 as User buffer (as WA for issue: https://forums.xilinx.com/t5/Embedded-Linux/V4l2-V4L2-MEMORY-USERPTR-contiguous-mapping-is-too-small-4096/td-p/825067) and takes error:

V4L2 ioctl error (22): Invalid argument

on VIDIOC_QBUF.

After smoke research, I take next result:

Feb 27 02:58:18 X335002201 user.info kernel: [   70.931051] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931062] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931066] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931070] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931075] check_array_args: VIDIOC_QBUF
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931078] v4l_qbuf: check_fmt() ret=0
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931082] vb2_qbuf(): vb2_queue_or_prepare_buf() ret=0
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931083] vb2_core_qbuf()
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931098] get_vaddr_frames() flags=0x4fb
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931103] get_vaddr_frames(): follow_pfn fail: -22/-22
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931105] vb2_create_framevec(): ret=-22
Feb 27 02:58:18 X335002201 user.err kernel: [   70.931107] create framevec fail: -22
Feb 27 02:58:18 X335002201 user.warn kernel: [   70.931110] vb2_core_qbuf: __buf_prepare, ret=-22
Feb 27 02:58:18 X335002201 user.info kernel: [   70.931112] v4l_qbuf: ops->vidioc_qbuf() ret=-22

Looking into pfn_fail() shows, that nested call of __follow_pte_pmd() at the mm/memory.c is failed.

Is it possible to use buffers from UDMABUF as user buffers for V4L2?

udmabuf udmabuf@1: driver setup failed

Hello,
I have udmabuf.ko loading error with my zcu102 board. Can you please see error message?

system-user.dtsi device tree configuration.

    udmabuf@0 {
            compatible = "ikwzm,udmabuf-0.10.a";
            device-name = "udmabuf0";
            minor-number = <0>;
            size = <0x8000000>;
    };
    udmabuf@1 {
            compatible = "ikwzm,udmabuf-0.10.a";
            device-name = "udmabuf1";
            minor-number = <1>;
            size = <0x8000000>;
    };

sudo insmod udmabuf.ko

[ 675.848739] udmabuf udmabuf0: driver version = 1.4.2
[ 675.853703] udmabuf udmabuf0: major number = 241
[ 675.858491] udmabuf udmabuf0: minor number = 0
[ 675.863108] udmabuf udmabuf0: phys address = 0x000000006fd00000
[ 675.869201] udmabuf udmabuf0: buffer size = 134217728
[ 675.874511] udmabuf udmabuf0: dma coherent = 0
[ 675.879124] udmabuf udmabuf@0: driver installed.
[ 675.884167] udmabuf udmabuf@1: swiotlb buffer is full (sz: 134217728 bytes)
[ 675.891138] udmabuf udmabuf@1: swiotlb: coherent allocation failed, size=134217728
[ 675.898709] CPU: 3 PID: 3722 Comm: insmod Tainted: G O 4.19.0 #1
[ 675.906004] Hardware name: xlnx,zynqmp (DT)
[ 675.910171] Call trace:
[ 675.912609] dump_backtrace+0x0/0x148
[ 675.916259] show_stack+0x14/0x20
[ 675.919566] dump_stack+0x90/0xb4
[ 675.922865] swiotlb_alloc+0x160/0x168
[ 675.926606] __dma_alloc+0xa8/0x1e0
[ 675.930085] udmabuf_platform_driver_probe+0x4b4/0x950 [udmabuf]
[ 675.936078] platform_drv_probe+0x50/0xa0
[ 675.940078] really_probe+0x1c8/0x280
[ 675.943724] driver_probe_device+0x54/0xe8
[ 675.947803] __driver_attach+0xe4/0xe8
[ 675.951538] bus_for_each_dev+0x70/0xc0
[ 675.955364] driver_attach+0x20/0x28
[ 675.958924] bus_add_driver+0x1dc/0x208
[ 675.962743] driver_register+0x60/0x110
[ 675.966563] __platform_driver_register+0x44/0x50
[ 675.971255] udmabuf_module_init+0x214/0x1000 [udmabuf]
[ 675.976468] do_one_initcall+0x74/0x178
[ 675.980288] do_init_module+0x54/0x1c8
[ 675.984020] load_module+0x1b5c/0x20e0
[ 675.987753] __se_sys_finit_module+0xb8/0xc8
[ 675.992007] __arm64_sys_finit_module+0x18/0x20
[ 675.996522] el0_svc_common+0x84/0xd8
[ 676.000167] el0_svc_handler+0x68/0x80
[ 676.003899] el0_svc+0x8/0xc
[ 676.006787] dma_alloc_coherent() failed. return(0)
[ 676.011573] udmabuf udmabuf@1: driver setup failed. return=-12
[ 676.017759] udmabuf: probe of udmabuf@1 failed with error -12

"*** No rule to make target'prepare0', needed by'vdso_prepare'. Stop." when building for linux-xlnx(v2019.1)

shell$ export ARCH=arm64
shell$ export KERNEL_SRC_DIR=/home/work/ZynqMP-FPGA-Linux/linux-xlnx-v2019.2-zynqmp-fpga
shell$ make -C /home/work/ZynqMP-FPGA-Linux/linux-xlnx-v2019.2-zynqmp-fpga ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- M=/home/work/udmabuf obj-m=u-dma-buf.o u-dma-buf.ko
make[1]: Entering directory '/home/work/ZynqMP-FPGA-Linux/linux-xlnx-v2019.2-zynqmp-fpga'
make[1]: *** No rule to make target 'prepare0', needed by 'vdso_prepare'.  Stop.
make[1]: Leaving directory '/home/work/ZynqMP-FPGA-Linux/linux-xlnx-v2019.2-zynqmp-fpga'
Makefile:20: recipe for target 'all' failed
make: *** [all] Error 2

Java Interface & Coherency

Hi,

First, thank you so much for open-sourcing the code. It's super useful!

I created a Java interface to use the device driver and call a hardware accelerator. The Java interface is similar to the one that you pushed as an example for Python. Unfortunately, I have some coherency issues. I created the memory mapping in Java in the following way:

String path = "/dev/udmabuf" + String.valueOf(id);
RandomAccessFile file = new RandomAccessFile(path, "rw");
buffer = file.getChannel().map(FileChannel.MapMode.READ_WRITE, 0, bytes);

I set sync_mode to 0 and sync_direction to 0. The accelerator does not support coherency. Thus, I used sync_for_cpu and sync_for_device to ensure the coherency with the CPU. When I write data from the CPU it seems to work properly. The accelerator reads the correct input data. When the accelerator writes, however, the CPU does not always read the data correctly. When the input and output data that I send to the accelerator are small, it can actually read the correct the values. When the input and output become bigger in size, the CPU reads old values for some reasons. The sizes of the data is always less than the entire size of the UDMA buffer.

I am running on an ARM64 architecture (Xilinx ZCU102).

Do you have any idea what could be the issue?

[Question] Are memory barriers necessary when utilizing manual cache management from C/C++ code?

This whitepaper has a lot of good information for those trying to decide the right route to proceed when wanting to get data back and forth between CPU and PL. One of the things it mentions is the need for memory fences/barriers if you are going to be controlling cache coherency from SW rather than HW.

"In Linux, after each buffer is flushed or invalidated, global memory barrier should be inserted to guarantee no memory accesses are reordered. "

Is there any reason this shouldn't be a concern when using manual cache control with udmabuf? In other words, is there a possibility when using manual cache control with udmabuf, that the compiler might optimize the code in such a way that the read might be performed before the invalidate command makes it to the cache when reading a PL to CPU DMA?

In the whitepaper the give this basically as the deciding factor in ditching the HP AXI with SW coherency and instead utilizing HPC AXI with HW coherency through CCI. They say that HP is faster until you put in the necessary memory barriers.

Uncached memory access for Core 1

Hello ikwzm,

thank you for this very useful driver.
I'm using this driver for transferring data between PL and PS and also from Core 1 to Core 0 on Zynq 7000 device.
Core 0 is running Linux while Core 1 is running bare metal application in sort of AMP mode.
Sync mode is set to 1 so cache should be disabled. But, data read with udmabuf on Core 0 seems wrong, some data is good, some data is zero. It seems that cache is still enabled.
This is confirmed by running Xil_DCacheFlushRange on the baremetal app for that specific address, then data seems good.

I know this isn't primary use case for udmabuf. Can you maybe point out how to flush and invalidate cache from Core 0 Linux/udmabuf instead of flushing cache from bare metal app after every write.

Best regards,
Darko

data reset during u-dma-buf creation

Hi,
I would like to use u-dma-buf to transfer data between two processors in an Ultrascale+ platform. The problem is, that one Processor is running much earlier and producing data much earlier than the other processor which is running the Linux OS and uses the u-dma-buf module.

So there is data in the specified memory but during the boot of the Linux OS and the creation of the u-dma-buf it is reset. I am creating the u-dma-buf via a device tree entry nearly identical to the one in the documentation. If i remove the udmabuf node and only leave the reserved-memory node the data remains in the buffer.

Is there a simple solution to this anybody can think of? I tried removing the reusable option but this only results in a kernel panic.

Thanks

reserved-memory {
	#address-cells = <1>;
	#size-cells = <1>;
	ranges;
	image_buf0: image_buf@0 {
		compatible = "shared-dma-pool";
		reusable;
		reg = <0x3C000000 0x04000000>; 
		label = "image_buf0";
	};
};
udmabuf@0 {
	compatible = "ikwzm,u-dma-buf";
	device-name = "udmabuf0";
	size = <0x04000000>; // 64MiB
	memory-region = <&image_buf0>;
};

Suggest use pfn instead of page in mmap page fault handler

when dma_alloc_xxx API return pages from buddy rather than CMA(for example, CMA reserve is exhausted), only first page is refcounted, the pages after it will have refcount zero, this is not problem usually, but when you map them one by one in page fault handler, the munmap() call will decrease refcount for each page and free some of them if it's not refcounted, the first page is OK, all pages after it will be free after munmap, this is a bug.

I suggest to use pfn mapping (using vmf_insert_pfn()) in that case. and set VM_PFNMAP flag in vma->vm_flags, so munmap will not try to manipulate page struct.

mmap() offset parameter does nothing

As the title states, providing a nonzero offset parameter to mmap() does not change the offset of the mapped region. This appears to be because of the following line:

vma->vm_pgoff = 0;

When I remove this line, the offset parameter works as expected. I don't understand why it was there, maybe to fix another bug?

CONFIG_ARM

Hi,

Great job on the driver.
Could you elaborate on the CONFIG_ARM, where is it set and when?

This is actually brings me to the next questions:

  1. Why the dma_mmap_coherent is utilized when CONFIG_ARM is not defined?
  2. There is no mapping applied in the udmabuf_driver_vma_fault - as I saw in other samples which have utilized remap_pfn_range - https://github.com/claudioscordino/mmap_alloc/blob/master/mmap_alloc.c
  3. https://www.embeddedrelated.com/showthread/comp.arch.embedded/217578-1.php states that the vma->vm_pgoff has to be set to zero prior to invoking dma_mmap_coherent, why you do not reset this value?

Thanks,
Igal

udmabuf of_reserved_mem_device failed

Hello,

I'm trying to set up a udmabuf on a memory device inside the PL side of a ZynqMP at address 0x4_0000_0000. I'm using the Petalinux kernel version 4.19.0-xilinx-v2019.1 with a Debian Stretch rootfs. I'm running into an issue though, of_reserved_mem_device_init is failing, returning -22. Here is the section of my device tree for the reserved memory region and udmabuf:

amba_pl@0 {
                #address-cells = <0x2>;
                #size-cells = <0x2>;
                compatible = "simple-bus";
                ranges;

                reserved-memory {
                        #address-cells = <2>;
                        #size-cells = <2>;
                        ranges;
                        image_buf0: image_buf@0 {
                                compatible = "shared-dma-pool";
                                reusable;
                                reg = <0x4 0x0 0x0 0x400000>;
                                label = "image_buf0";
                        };
                };

                udmabuf@0 {
                        compatible = "ikwzm,udmabuf-0.10.a";
                        device-name = "udmabuf0";
                        size = <0x0 0x400000>;
                        memory-region = <&image_buf0>;
                };
        };

Here's what happens when I try to load udmabuf:

rocky@zynq:~/repos/udmabuf$ sudo insmod udmabuf.ko 
[ 1076.852053] udmabuf amba_pl@0:udmabuf@0: of_reserved_mem_device_init failed. return=-22
[ 1076.860210] udmabuf amba_pl@0:udmabuf@0: driver installed.
[ 1076.865730] udmabuf: probe of amba_pl@0:udmabuf@0 failed with error -22

I've attached the full kernel log as well. Can you help me find where it is going wrong?

dmesg.txt

Q: is it possible to use same /dev/udmabuf0 device from multiple programms?

Scenario is simple: shared memory implementation in case, when hardware blocks is used.

Application1 open /dev/udmabuf0, mmap it, feed buffer into V4L2 and signal Application 2.

Application 2 open /dev/udmabuf0, take some data and feed buffer into... VCU codec, for example.

In such case, we can omit extra memory coping.

udmabuf initiate transfer?

Hi,

This question maybe arises from a misconception, so please do tell me where I'm wrong.

Once I've loaded this module, I see that I can read/write from /dev/udmabuf0 and read/write from the segment of memory associated to the DMA device. However, it seems that udmabuf won't launch the DMA transfer itself.

If this is the case, how am I supposed to order from userspace RX or TX transfer to really occur?

Thanks!

Can't set physical address in device tree for Zynq MPSoC

`reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;
rproc_0_reserved: rproc@3ed00000 {
no-map;
reg = <0x0 0x3ed00000 0x0 0x1000000>;
};

	image_buf0: image_buf@0 {
		compatible = "shared-dma-pool";
		reusable;
		reg = <0x0 0x6fc00000 0x0 0x0100000>;
		alignment = <0x0 0x1000>; 
		label = "image_buf0";

	};



};

udmabuf@0 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "udmabuf0";
size = <0x0 0x0100000>;
memory-region = <&image_buf0>;
};`
When kernel boot i see
[ 0.000000] Reserved memory: incorrect alignment of CMA region
[ 0.000000] cma: Reserved 256 MiB at 0x000000005fc00000
When insmod the driver i see that the physical address is different from what i set in device tree

[ 127.914295] udmabuf udmabuf@0: of_reserved_mem_device_init failed. return=-22
[ 127.921580] udmabuf udmabuf@0: driver installed.
[ 127.926204] udmabuf: probe of udmabuf@0 failed with error -22
[ 127.932553] udmabuf udmabuf.0: DMA mask not set
[ 127.937692] udmabuf udmabuf0: driver version = 1.4.5
[ 127.942653] udmabuf udmabuf0: major number = 240
[ 127.947439] udmabuf udmabuf0: minor number = 0
[ 127.952052] udmabuf udmabuf0: phys address = 0x000000005fd00000
[ 127.958140] udmabuf udmabuf0: buffer size = 1048576
[ 127.963274] udmabuf udmabuf0: dma device = udmabuf.0
[ 127.968582] udmabuf udmabuf0: dma coherent = 0
[ 127.973195] udmabuf udmabuf.0: driver installed.

Kernel oops when syncing (and can't use reusable)

First I wanted to thank you for this module I find it very useful and exactly what I need for my project. That said, I'm seeing the exact same issue reported in #10 (kernel oops when sync'ing with reserved memory and no-map;). I tired using reusable; as you suggested, but I get an error when loading udmabuf.

device tree:

reserved-memory {
	#address-cells = <1>;
	#size-cells = <1>;
	ranges;

	fb: fb@0 {
		compatible = "shared-dma-pool";
		reg = <0xd8000000 0x100000>;
		label = "fb";
		reusable;
	};
};

udmabuf@0 {
	compatible = "ikwzm,u-dma-buf";
	device-name = "fb";
	minor-number = <0>;
	memory-region = <&fb>;
	size = <0x100000>; // 1MiB
};

When doing modprobe u-dma-buf:

[   16.367839] u_dma_buf: loading out-of-tree module taints kernel.
[   16.370110] u-dma-buf udmabuf@0: of_reserved_mem_device_init failed. return=-22
[   16.370380] u-dma-buf udmabuf@0: driver installed.
[   16.370403] u-dma-buf: probe of udmabuf@0 failed with error -22

I also noticed the memory is not actually reserved with reusable :

cat /proc/iomem
.....
d8000000-dfffffff : System RAM

It must be part of some memory pool. When using no-map I can see it's reserved:

cat /pro/iomem
....
d8100000-dfffffff : System RAM

I'm testing on ARM + kernel 4.19.49... Any suggestions to fix this ? I need to use reserve memory so the address is known, and be able to invalidate the cache after a DMA write...

Can only transfer up to 15K of Source address

Hi Ichiro I'm attempting to loop back 32K of data.

Although I fill my Source buffer with 32768 bytes of data my destination buffer only ends up with 15928 bytes and the remainder of buffer is padded with zeros.

My axi_dma has a range of 64k. with 32 bit data width.
My fifos depth is 32K with a 4-byte data width.
My udmabuf size is 1048576 bytes.

Platform: zcu102 Ultrascale+. kernel 4.14

I've only been able to transfer 32K of data on a zedboard using kernel 4.0 and using the https://github.com/jeremytrimble/ezdma driver

Please help.

Zcu102 Kernel 4.14 petalinux problem by the udmabuf-master

There is a compilation problem while it is being used with Petalinux 20118.2. This version includes kernel version 4.14. I want to use it with Zynq Ultrascale+ (ARM A-53) and Linux 4.14 kernel.

I copy the udmabuf.c to the /project-spec/meta-user/recipes-modules/udmabuf/files

and add the device tree in /project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi like this
/{
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
image_buf0: image_buf@0 {
compatible = "shared-dma-pool";
reusable;
reg = <0x3C000000 0x04000000>;
label = "image_buf0";
};
};
udmabuf@0 {
compatible = "ikwzm,udmabuf-0.10.a";
device-name = "udmabuf0";
size = <0x04000000>; // 64MiB
memory-region = <&image_buf0>;
};
}

and compile success. but when I boot on zcu102 there was some error.

Here is the boot error.

Starting udev
[ 5.476400] [drm] Cannot find any crtc or sizes
[ 5.501104] udevd[1875]: starting version 3.2.2
[ 5.542714] udevd[1876]: starting eudev-3.2.2
[ 5.674402] udmabuf: loading out-of-tree module taints kernel.
[ 5.683052] udmabuf udmabuf@0: of_reserved_mem_device_init failed. return=-22
[ 5.700321] udmabuf: probe of udmabuf@0 failed with error -22

I may need some support for this. Thank you.

Kernel Oops when synching

I just started testing this project with a UltraScale+ processor running in ARM64 mode. I am trying to use this with a reserved-memory space. When I try to change the synchronization state, I am receiving a kernel oops. My device-tree looks as follows:

  reserved-memory {
    #address-cells = <2>;
    #size-cells = <2>;
    ranges;

    dma_region: dmaregion@7bf00000 {
      compatible = "shared-dma-pool";
      reg = <0x0 0x7bf00000 0x0 0x4000000>;
      no-map;
    };
  };

  dma: dma@7bf00000 {
    compatible = "ikwzm,udmabuf-0.10.a";
    device-name = "dma_region";
    minor-number = <0>;
    memory-region = <&dma_region>;
    size = <0x4000000>;
  };

When I load the module on the target system, everything looks fine:

# modprobe udmabuf
[  178.772612] udmabuf dma@7bf00000: assigned reserved memory node dmaregion@7bf00000
[  178.798115] udmabuf dma_region: major number   = 246
[  178.803007] udmabuf dma_region: minor number   = 0
[  178.807784] udmabuf dma_region: phys address   = 0x000000007bf00000
[  178.814080] udmabuf dma_region: buffer size    = 67108864
[  178.819416] udmabuf dma_region: dma coherent   = 0
[  178.824215] udmabuf dma@7bf00000: driver installed.

# cd /sys/class/udmabuf/dma_region/
# ls
debug_vma        power            sync_for_cpu     sync_owne
dev              size             sync_for_device  sync_size
device           subsystem        sync_mode        uevent
phys_addr        sync_direction   sync_offset

# cat phys_addr
0x000000007bf00000

However, when I try to write sync_for_cpu or sync_for_device, I am receiving an oops:

# echo 1 > sync_for_device
[  490.115828] Unable to handle kernel paging request at virtual address ffffffc07bf00000
[  490.123730] Mem abort info:
[  490.126490]   Exception class = DABT (current EL), IL = 32 bits
[  490.132392]   SET = 0, FnV = 0
[  490.135432]   EA = 0, S1PTW = 0
[  490.138551] Data abort info:
[  490.141417]   ISV = 0, ISS = 0x00000147
[  490.145235]   CM = 1, WnR = 1
[  490.148182] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff8009c76000
[  490.154970] [ffffffc07bf00000] *pgd=000000007fff6003, *pud=000000007fff6003, *pmd=000000007fff5003, *pte=0000000000000000
[  490.165910] Internal error: Oops: 96000147 [#1] PREEMPT SMP
[  490.171444] Modules linked in: udmabuf(O) zdma_uio(O)
[  490.176482] CPU: 0 PID: 1451 Comm: sh Tainted: G           O    4.14.0-ultrazed #8
[  490.183946] Hardware name: xlnx,zynqmp (DT)
[  490.188108] task: ffffffc07865cd00 task.stack: ffffff800c210000
[  490.194019] PC is at __clean_dcache_area_poc+0x20/0x38
[  490.199135] LR is at __swiotlb_sync_single_for_device+0x50/0x60
[  490.205036] pc : [<ffffff800808e91c>] lr : [<ffffff800808d15c>] pstate: 80000145
[  490.212416] sp : ffffff800c213cf0
[  490.215709] x29: ffffff800c213cf0 x28: ffffffc07865cd00
[  490.221006] x27: ffffff8008441000 x26: 0000000000000040
[  490.226301] x25: 0000000000000124 x24: 0000000000000015
[  490.231596] x23: ffffff800c213eb8 x22: 0000000000000000
[  490.236891] x21: 0000000004000000 x20: ffffffc078090c10
[  490.242185] x19: 000000007bf00000 x18: 0000000000000844
[  490.247480] x17: 0000007fbb41a128 x16: ffffff80081450b8
[  490.252775] x15: 0000000000000008 x14: 0000007fbb36cdc0
[  490.258070] x13: 00000000004b5474 x12: 0101010101010101
[  490.263365] x11: 0000000000000000 x10: 0101010101010101
[  490.268660] x9 : ffffff800c213d58 x8 : ffffffc001573280
[  490.273954] x7 : 0000000004000000 x6 : 000000007bf00000
[  490.279249] x5 : ffffffc078090c10 x4 : 0000000000000001
[  490.284544] x3 : 000000000000003f x2 : 0000000000000040
[  490.289839] x1 : ffffffc07ff00000 x0 : ffffffc07bf00000
[  490.295135] Process sh (pid: 1451, stack limit = 0xffffff800c210000)
[  490.301472] Call trace:
[  490.303899] Exception stack(0xffffff800c213bb0 to 0xffffff800c213cf0)
[  490.310327] 3ba0:                                   ffffffc07bf00000 ffffffc07ff00000
[  490.318143] 3bc0: 0000000000000040 000000000000003f 0000000000000001 ffffffc078090c10
[  490.325955] 3be0: 000000007bf00000 0000000004000000 ffffffc001573280 ffffff800c213d58
[  490.333767] 3c00: 0101010101010101 0000000000000000 0101010101010101 00000000004b5474
[  490.341579] 3c20: 0000007fbb36cdc0 0000000000000008 ffffff80081450b8 0000007fbb41a128
[  490.349391] 3c40: 0000000000000844 000000007bf00000 ffffffc078090c10 0000000004000000
[  490.357203] 3c60: 0000000000000000 ffffff800c213eb8 0000000000000015 0000000000000124
[  490.365015] 3c80: 0000000000000040 ffffff8008441000 ffffffc07865cd00 ffffff800c213cf0
[  490.372827] 3ca0: ffffff800808d15c ffffff800c213cf0 ffffff800808e91c 0000000080000145
[  490.380639] 3cc0: ffffff800c213ce0 ffffff8008435668 0000008000000000 ffffff800823e998
[  490.388450] 3ce0: ffffff800c213cf0 ffffff800808e91c
[  490.393309] [<ffffff800808e91c>] __clean_dcache_area_poc+0x20/0x38
[  490.399480] [<ffffff8000448890>] udmabuf_set_sync_for_device+0xa0/0xe8 [udmabuf]
[  490.406854] [<ffffff8008294af8>] dev_attr_store+0x18/0x28
[  490.412232] [<ffffff800819cb78>] sysfs_kf_write+0x38/0x50
[  490.417612] [<ffffff800819bcc4>] kernfs_fop_write+0x11c/0x184
[  490.423343] [<ffffff8008144c4c>] __vfs_write+0x1c/0xf8
[  490.428462] [<ffffff8008144ee4>] vfs_write+0xac/0x160
[  490.433496] [<ffffff80081450fc>] SyS_write+0x44/0x88
[  490.438443] Exception stack(0xffffff800c213ec0 to 0xffffff800c214000)
[  490.444869] 3ec0: 0000000000000001 00000000004f9260 0000000000000002 0000007fbb4ad000
[  490.452684] 3ee0: 0000000000650031 0000000000000000 0080008080808080 7f7f7f7f7f7f7f7f
[  490.460496] 3f00: 0000000000000040 fffffffffffffff0 0101010101010101 0000000000000000
[  490.468308] 3f20: 0101010101010101 00000000004b5474 0000007fbb36cdc0 0000000000000008
[  490.476120] 3f40: 00000000004f35d8 0000007fbb41a128 0000000000000844 0000000000000001
[  490.483932] 3f60: 00000000004f9260 0000000000000002 00000000004f4000 00000000004f9260
[  490.491744] 3f80: 0000000000000020 00000000004f4000 0000000000000000 00000000004f5688
[  490.499556] 3fa0: 0000000000000000 0000007fdc3d2260 000000000040dcac 0000007fdc3d2260
[  490.507368] 3fc0: 0000007fbb41a150 0000000080000000 0000000000000001 0000000000000040
[  490.515180] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  490.522993] [<ffffff8008082c70>] el0_svc_naked+0x24/0x28
[  490.528285] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7a20)
[  490.534360] ---[ end trace 7646d5a8a18a8b56 ]---

Am I doing something fundamentally wrong?

will udmabuf be bottleneck

hi, i want to use udmabuf in my axi dma userspace driver, according xilinx pg021 axi dma.pdf, axi dma throughput is about 300MB/s, in your README.md,
zynq# dd if=/dev/udmabuf0 of=udmabuf0.bin bs=8388608
get about 55MB/s, will udmabuf be bottleneck of axi dma userspace driver? thank you.

Attribute sync_owner: write permission without 'store'

ATTR sync_owner 0644 will make kernel warning message

  __ATTR(sync_owner     , 0664, udmabuf_show_sync_owner      , NULL                       ),

Change ATTR sync_owner 0644 to 0444 to fix the issue

  __ATTR(sync_owner     , 0444, udmabuf_show_sync_owner      , NULL                       ),

Kernel Version (CentOS)

Linux localhost.localdomain 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Kernel Output

[52380.158706] ------------[ cut here ]------------
[52380.158710] WARNING: CPU: 13 PID: 8088 at drivers/base/core.c:634 device_create_file+0x8d/0xa0
[52380.158711] Attribute sync_owner: write permission without 'store'
[52380.158712] Modules linked in: u_dma_buf(OE+) vfio_pci vfio_iommu_type1 vfio tcp_lp fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod snd_hda_codec_hdmi vfat fat edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec irqbypass crc32_pclmul ghash_clmulni_intel snd_hda_core snd_hwdep snd_seq eeepc_wmi
[52380.158742]  aesni_intel asus_wmi sparse_keymap snd_seq_device lrw gf128mul glue_helper rfkill ablk_helper cryptd snd_pcm pcspkr i2c_piix4 snd_timer snd soundcore pinctrl_amd i2c_designware_platform i2c_designware_core pcc_cpufreq acpi_cpufreq ip_tables xfs libcrc32c nouveau video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci crct10dif_pclmul igb crct10dif_common libata crc32c_intel nvme mxm_wmi serio_raw ptp sdhci_acpi pps_core nvme_core iosf_mbi dca sdhci i2c_algo_bit drm_panel_orientation_quirks mmc_core wmi [last unloaded: u_dma_buf]
[52380.158768] CPU: 13 PID: 8088 Comm: insmod Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-1062.el7.x86_64 #1
[52380.158770] Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 0411 06/15/2019
[52380.158771] Call Trace:
[52380.158775]  [<ffffffff91379262>] dump_stack+0x19/0x1b
[52380.158779]  [<ffffffff90c9a878>] __warn+0xd8/0x100
[52380.158782]  [<ffffffff90c9a8ff>] warn_slowpath_fmt+0x5f/0x80
[52380.158786]  [<ffffffff910ae56d>] device_create_file+0x8d/0xa0
[52380.158788]  [<ffffffff910b0c13>] device_add+0x5f3/0x7b0
[52380.158792]  [<ffffffff910b0fe0>] device_create_groups_vargs+0xe0/0x110
[52380.158795]  [<ffffffff910b1081>] device_create+0x51/0x70
[52380.158799]  [<ffffffffc0c37891>] ? udmabuf_platform_driver_probe+0x211/0x850 [u_dma_buf]
[52380.158802]  [<ffffffffc0c37cf5>] udmabuf_platform_driver_probe+0x675/0x850 [u_dma_buf]
[52380.158805]  [<ffffffff910b65d7>] platform_drv_probe+0x47/0x120
[52380.158808]  [<ffffffff910b4205>] driver_probe_device+0xc5/0x3e0
[52380.158810]  [<ffffffff910b4520>] ? driver_probe_device+0x3e0/0x3e0
[52380.158812]  [<ffffffff910b4563>] __device_attach+0x43/0x50
[52380.158816]  [<ffffffff910b1e85>] bus_for_each_drv+0x75/0xc0
[52380.158819]  [<ffffffff910b4040>] device_attach+0x90/0xb0
[52380.158821]  [<ffffffff910b3268>] bus_probe_device+0x98/0xd0
[52380.158824]  [<ffffffff910b0b0f>] device_add+0x4ef/0x7b0
[52380.158827]  [<ffffffff910b6071>] platform_device_add+0xd1/0x2d0
[52380.158831]  [<ffffffffc0c36efd>] udmabuf_platform_device_create+0xdd/0x1c0 [u_dma_buf]
[52380.158835]  [<ffffffffc0d10000>] ? 0xffffffffc0d0ffff
[52380.158839]  [<ffffffffc0d102b3>] u_dma_buf_init+0x2b3/0x1000 [u_dma_buf]
[52380.158842]  [<ffffffff90c0210a>] do_one_initcall+0xba/0x240
[52380.158845]  [<ffffffff90d1c90a>] load_module+0x271a/0x2bb0
[52380.158849]  [<ffffffff90fad740>] ? ddebug_proc_write+0x100/0x100
[52380.158852]  [<ffffffff90d184a3>] ? copy_module_from_fd.isra.44+0x53/0x150
[52380.158855]  [<ffffffff90d1cf86>] SyS_finit_module+0xa6/0xd0
[52380.158858]  [<ffffffff9138bede>] system_call_fastpath+0x25/0x2a
[52380.158860] ---[ end trace f634d7728939c9c3 ]---
[52380.158960] u-dma-buf udmabuf1: driver version = 2.1.2
[52380.158962] u-dma-buf udmabuf1: major number   = 234
[52380.158965] u-dma-buf udmabuf1: minor number   = 1
[52380.158967] u-dma-buf udmabuf1: phys address   = 0x00000000c6400000
[52380.158969] u-dma-buf udmabuf1: buffer size    = 1048576
[52380.158971] u-dma-buf udmabuf1: dma device     = u-dma-buf.1
[52380.158973] u-dma-buf u-dma-buf.1: driver installed.

make error for zynq debian9 rootfs

Hello,

I am trying to make your udmabuf module, but I found errors.
I did not find 4.19.0-xilinx-v2019.1/build directory form /lib/modules.
How do I install or copy the 4.19.0-xilinx-v2019.1/build directory at debian9 rootfs ?

My test environment is petalinux 2019.1 and Debian9 rootfs.

root@zdfe-bxb:/home/kha/udmabuf-kmod-dpkg/udmabuf# make
make -C /lib/modules/4.19.0-xilinx-v2019.1/build ARCH=arm CROSS_COMPILE= M=/home/kha/udmabuf-kmod-dpkg/udmabuf modules
make[1]: *** /lib/modules/4.19.0-xilinx-v2019.1/build: No such file or directory. Stop.
Makefile:19: recipe for target 'all' failed
make: *** [all] Error 2

Thanks,
Kiman

4GiB udmabuf fails

We have had success using u-dma-buf with a 512MiB address space. Now we're trying to use the full 4GiB of our device and are running into an issue when we load the kernel module. Below is our dmesg output. We are running u-dma-buf version 3.2.2

test@zynq:~$ sudo insmod u-dma-buf2.ko
[sudo] password for test:
[   63.855991] u_dma_buf: loading out-of-tree module taints kernel.
[   63.863216] u-dma-buf udmabuf@400000000: assigned reserved memory node image_buf@400000000
[   64.946316] dma_alloc_coherent(size=4294967296) failed. return(0)
[   64.952415] u-dma-buf udmabuf@400000000: driver setup failed. return=-12
[   64.959420] u-dma-buf udmabuf@400000000: driver installed.
[   64.964920] u-dma-buf: probe of udmabuf@400000000 failed with error -12
test@zynq:~$ uname -a
Linux zynq.tezzaron.com 5.4.0-xilinx-v2020.1 #1 SMP Wed Aug 5 17:06:26 UTC 2020 aarch64 GNU/Linux

Here is my device tree fragment for petalinux:

/include/ "system-conf.dtsi"
/ {
    memory {
        device_type = "memory";
        /*
        Notes:
        first 512MB is normal PS DRAM
        next 4GB at HPM0 is for udmabuf
        next 1GB is PS DRAM
        Final 2GB is the rest of PS Ram
        */
        reg = <0x0 0x0 0x0 0x20000000 
                0x4 0x00000000 0x1 0x00000000
                0x0 0x40000000 0x0 0x40000000
                0x8 0x0 0x0 0x80000000
            >;
    };
    reserved-memory {
            #address-cells = <2>;
            #size-cells = <2>;
            ranges;
            image_buf0: image_buf@400000000 {
                    compatible = "shared-dma-pool";
                    reusable;
                    reg = <0x4 0x00000000 0x1 0x00000000>;
                    label = "image_buf0";
            };
   };


    udmabuf@400000000 {
            #size-cells = <2>;
            compatible = "ikwzm,u-dma-buf";
            device-name = "udmabuf0";
            size = <0x1 0x0>;
            dma-mask = <64>;
            memory-region = <&image_buf0>;
    };

};

Do you see where this could be going wrong? Admittedly I jumped from 512MiB straight to 4GiB. I'm going to try a 1024MiB buffer size and report back.

Thanks again!

EDIT: corrected the previous sizes I tried

How does this driver interact (if at all) with the ZynqMP DMA driver?

To start, I'd like to say that I appreciate your making this driver available. The detailed and thorough documentation you've provided is also much appreciated.

I'm trying to understand how your driver interacts with the Xilinx ZynqMP DMA driver (on an UltraScale+ RFSoC ZCU111 running PetaLinux)?

On boot, there appears to be some number of DMA channels configured:

[    4.594638] xilinx-zynqmp-dma fd500000.dma: ZynqMP DMA driver Probe success
[    4.601751] xilinx-zynqmp-dma fd510000.dma: ZynqMP DMA driver Probe success
[    4.608853] xilinx-zynqmp-dma fd520000.dma: ZynqMP DMA driver Probe success
[    4.615965] xilinx-zynqmp-dma fd530000.dma: ZynqMP DMA driver Probe success
[    4.623071] xilinx-zynqmp-dma fd540000.dma: ZynqMP DMA driver Probe success
[    4.630197] xilinx-zynqmp-dma fd550000.dma: ZynqMP DMA driver Probe success
[    4.637309] xilinx-zynqmp-dma fd560000.dma: ZynqMP DMA driver Probe success
[    4.644422] xilinx-zynqmp-dma fd570000.dma: ZynqMP DMA driver Probe success
[    4.651591] xilinx-zynqmp-dma ffa80000.dma: ZynqMP DMA driver Probe success
[    4.658705] xilinx-zynqmp-dma ffa90000.dma: ZynqMP DMA driver Probe success
[    4.665816] xilinx-zynqmp-dma ffaa0000.dma: ZynqMP DMA driver Probe success
[    4.672924] xilinx-zynqmp-dma ffab0000.dma: ZynqMP DMA driver Probe success
[    4.680033] xilinx-zynqmp-dma ffac0000.dma: ZynqMP DMA driver Probe success
[    4.687143] xilinx-zynqmp-dma ffad0000.dma: ZynqMP DMA driver Probe success
[    4.694256] xilinx-zynqmp-dma ffae0000.dma: ZynqMP DMA driver Probe success
[    4.701369] xilinx-zynqmp-dma ffaf0000.dma: ZynqMP DMA driver Probe success

that are enumerated here

root@xilinx-zcu111-2019_2:/sys/class/dma# ls
dma0chan0   dma0chan2   dma0chan4   dma10chan0  dma12chan0  dma14chan0  dma16chan0  dma18chan0  dma2chan0   dma4chan0   dma6chan0   dma8chan0
dma0chan1   dma0chan3   dma0chan5   dma11chan0  dma13chan0  dma15chan0  dma17chan0  dma1chan0   dma3chan0   dma5chan0   dma7chan0   dma9chan0

Forgive me for asking what is probably an obvious/fundamental question... does your driver use these channels? If so, how do I use your driver to ensure that I use a particular channel?

[QUESTION]Performance comparted to UIO?

I looked at this project since I'm trying to optimize a RAM - to - reseved RAM transfer on a 7 series Zynq device.
It looks you did really a great work! It's explained very clearly and seem easy to use.
My question is the following: what is the performance compared to the uio driver?
Based on some benchmark I did, I could get 87.00 MBytes/s with the standard UIO implementation.
(https://forums.xilinx.com/t5/Embedded-Linux/User-IO-perfmance-on-Zynq-UIO/m-p/1094560/highlight/false#M41587), while at the bottom of the help page i could see a 55.4 MB/s using this driver.
Can you share some information about the system you used to get this data? Am I looking at the "right number"?

Thanks!!

Marco

Zynq UltraScale+ (ARM Cortex-A53)

On a Xilinx Zynq UltraScale+ (CPU ARM Cortex-A53), I am running the Linux kernel 4.9-xilinx-v2017.2.

I know that both OS and CPU are not listed as supported, but --- hopefully --- the porting is easy.

For cross-compiling, I modified the Makefile:
ARCH := arm64
CROSS_COMPILE ?= aarch64-linux-gnu-

When I insmod the kernel driver, I get the following error:
# insmod udmabuf.ko udmabuf0=1024
[ 8554.453499] dma_alloc_coherent() failed
[ 8554.457374] udmabuf: couldn't create udmabuf0 driver

Can I fix the problem?

RX BUFFER zero padded only on cold start

Hi all,

My design is a simple loopback axi_dma mm2s to axi_dma s2mm. On a xilinx zynqMP ultrascale+ zcu102.

I was wondering why only on a cold start (first run of userspace app is the RX buffer all zero)?
All sequential runs the RX buffer equals the TX buffer.

I tried a FIFO in btween the transaction, but got the same result.

thanks,

udmabuf with loop back example

Hi guys I'm trying to use udmabuf on a simple loopback design here is my device tree that I plan to use:

Do I need to define memory region of udmabuf nodes for RX and TX dma channels?

I dont understand how the uio device would do the DMA trasanctions and how udmabuf will help?

Does my device tree make sense at all?
`

/ {
    /* Loopback DMA setup */
    
    loopback_dma: axidma@40400000 {
        compatible = "generic-uio";
        #dma-cells = <1>;
        reg = < 0x40400000 0x10000 >;
        clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>; // fclk0 from clock controller
        clock-names = "s_axi_lite_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk", "m_axi_sg_aclk";
        xlnx,include-sg;

        loopback_dma_mm2s_chan: dma-channel@40400000 {
            compatible = "xlnx,axi-dma-mm2s-channel";
            interrupt-parent = <&intc>;
            interrupts = <0 31 4>; 
            xlnx,datawidth = <0x20>;        
            xlnx,sg-length-width = <14>;    

            xlnx,device-id = <0x1>;     
        };

        loopback_dma_s2mm_chan: dma-channel@40400030 {
            compatible = "xlnx,axi-dma-s2mm-channel";
            interrupt-parent = <&intc>;
            interrupts = <0 32 4>; 
            xlnx,datawidth = <0x20>;       
            xlnx,sg-length-width = <14>;    
            xlnx,device-id = <0x1>;     
        };
    };

udmabuf@0x00 {
			compatible = "ikwzm,udmabuf-0.10.a";
			device-name = "udmabuf0";
			minor-number = <0>;
			size = <0x00100000>;
                        sync-direction = <1>; //TX
		};
udmabuf@0x01 {
			compatible = "ikwzm,udmabuf-0.10.a";
			device-name = "udmabuf1";
			minor-number = <0>;
			size = <0x00100000>;
                        sync-direction = <2>; //RX
		};

};
`

Thaks for your time, 💯

Buffer size limit?

Hi
When I allocate a udma buffer upto 4MB, I have no issues.

But when I allocated a udma buffer of 8MB...

insmod udmabuf.ko udmabuf0=8388608

and then ran

ls -la /dev/udmabuf0

I get a message : ls: cannot access /dev/udmabuf0: No Such file or directory

How can I fix this?
Thanks

dmesg output:

[ 5050.855332] CPU: 1 PID: 5502 Comm: insmod Tainted: G W OE ------------ 3.10.0-693.el7.x86_64 #1
[ 5050.855333] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 5050.855334] ffff8800a057b990 000000002964dae8 ffff8800a057b940 ffffffff816a3d91
[ 5050.855337] ffff8800a057b980 ffffffff810879c8 000002782964dae8 ffffffffc06b8340
[ 5050.855339] ffff8801175c5400 ffff8800a0796410 ffff8801175c5400 ffff8800a0570780
[ 5050.855342] Call Trace:
[ 5050.855346] [] dump_stack+0x19/0x1b
[ 5050.855349] [] __warn+0xd8/0x100
[ 5050.855351] [] warn_slowpath_fmt+0x5f/0x80
[ 5050.855354] [] device_create_file+0x8d/0xa0
[ 5050.855356] [] device_add+0x5fb/0x7c0
[ 5050.855359] [] device_create_groups_vargs+0xe0/0x110
[ 5050.855361] [] device_create+0x51/0x70
[ 5050.855364] [] ? udmabuf_platform_driver_probe+0x1b5/0x7e0 [udmabuf]
[ 5050.855367] [] udmabuf_platform_driver_probe+0x5dd/0x7e0 [udmabuf]
[ 5050.855370] [] platform_drv_probe+0x42/0x110
[ 5050.855372] [] driver_probe_device+0xc2/0x3e0
[ 5050.855375] [] ? driver_probe_device+0x3e0/0x3e0
[ 5050.855377] [] __device_attach+0x3b/0x40
[ 5050.855379] [] bus_for_each_drv+0x6b/0xb0
[ 5050.855382] [] device_attach+0x90/0xb0
[ 5050.855384] [] bus_probe_device+0x98/0xc0
[ 5050.855386] [] device_add+0x4ff/0x7c0
[ 5050.855388] [] platform_device_add+0xd1/0x2d0
[ 5050.855391] [] udmabuf_static_device_create+0xbc/0xe9 [udmabuf]
[ 5050.855394] [] ? 0xffffffffc06bafff
[ 5050.855397] [] udmabuf_module_init+0x272/0x1000 [udmabuf]
[ 5050.855400] [] do_one_initcall+0xb8/0x230
[ 5050.855403] [] load_module+0x1f64/0x29e0
[ 5050.855406] [] ? ddebug_proc_write+0xf0/0xf0
[ 5050.855409] [] ? copy_module_from_fd.isra.42+0x53/0x150
[ 5050.855411] [] SyS_finit_module+0xa6/0xd0
[ 5050.855414] [] system_call_fastpath+0x16/0x1b
[ 5050.855416] ---[ end trace 8e98ba62cbd19414 ]---
[ 5050.855523] dma_alloc_coherent() failed. return(0)
[ 5050.855526] udmabuf udmabuf.0: driver setup failed. return=-12
[ 5050.855554] udmabuf udmabuf.0: driver installed.
[ 5050.855557] udmabuf: probe of udmabuf.0 failed with error -12

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.