Coder Social home page Coder Social logo

dce's Introduction

DCE API and Library Version 1903
----------------------------------

This README describes how to build and run example applications of the dce.h
library.

Test program: dce-example
-------------------------
This is a highly modular example application that walks through many separate
tests that build up in complexity.

Test program: dce-api-perf-test
-------------------------------

This is a throughput measurement application of the dce.h library. It can
compress or decompress data with multiple formats and modes. e.g. zlib/raw
stateful/stateless. The number of threads and DPAA2 object resources (interfaces
to the DCE hardware accelerator) is flexible. It can be configured to run
multiple threads or a single thread. The number of threads can be specified per
DCE interface (DPDCEI). Please use the --help option to get a full list of
parameters.

Build instructions
------------------

You can cross compile or build natively.  For example,

```
  make CROSS_COMPILE=aarch64-linux-gnu-
```

Assuming cross compiler aarch64-linux-gnu-gcc is on your path.  Or

```
  make CROSS_COMPILE=' '
```

for a native build.

Preparing DPAA2 Objects
-----------------------

Boot Linux one the NXP ARM system and transfer the contents of this
directory to it if cross-compiling.

The DCE application and library depend on some DPAA2 objects being
created in a DPAA2 resource container (dprc).  This can be done
statically for a production system, but these instructions assume they
will be created dynamically using the "restool" command.  To run the
example application, first enter these commands to create the objects
and assign them to container dprc.2.  Notice that "sudo" runs the
commands as root.  The "sleep" is defensive.  These objects are
asynchronously hot plugged.  These commands print messages, including
to syslog.  The messages are not shown below.

```
  sudo restool dprc create dprc.1 --label="DCE test dprc"

  sudo restool dpio create
  sudo restool dpio create
  sudo restool dpio create

  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_COMPRESSION
  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_COMPRESSION
  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_COMPRESSION

  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_DECOMPRESSION
  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_DECOMPRESSION
  sudo restool dpdcei create --priority=1 --engine=DPDCEI_ENGINE_DECOMPRESSION

  sleep 1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpio.16 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpio.17 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpio.18 --plugged=1

  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.0 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.1 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.2 --plugged=1

  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.3 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.4 --plugged=1
  sudo restool dprc assign dprc.1 --child=dprc.2 --object=dpdcei.5 --plugged=1

```

You can use this last restool command to check that all objects were
created.  Here is an example (with output):

```
  sudo restool dprc show dprc.2
  dprc.2 contains 6 objects:
  object          label           plugged-state
  dpdcei.2                        plugged
  dpdcei.1                        plugged
  dpdcei.0                        plugged
  dpio.18                         plugged
  dpio.17                         plugged
  dpio.16                         plugged
```

Note the names of the objects (like dprc.2 and dpio.18).  They have to
match in the commands below.  Obviously, scripts could handle all
this, but these instructions are intentionally explicit.

Binding the DPAA2 Resource Container to VFIO
--------------------------------------------

VFIO is a standard Linux facility that allows hardware devices to be
directly assigned to user space while maintaining access control and
isolation.  The following commands bind dprc.2 to VFIO.  Keep in mind
that the name "dprc.2" must match.  The example shows the commands
being done literally as root (after sudo su) because it makes the
redirection clearer.

```
  sudo su
  echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
  echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
  echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind
  exit
```

At this point, you can run the example applications as often as you
like because its hardware resources have been allocated and assigned.

Running dce-example
-------------------
This application is easy to run. It contains input data required for the test,
but the Data Path objects must be supplied

```
  sudo bin/dce-example dprc.2 dpio.10 dpdcei.0 dpdcei.5
```

The output is
```
  SETUP Data Path Objects ***************************************************************
  Worker dce-example: Setup MC resources
  Worker dce-example: Open root dprc
  Worker dce-example: Open test dprc
  Worker dce-example: Setup vfio to allow HW devices to access virtual addresses
  Worker dce-example: Allocate virtual memory and map it to test dprc using vfio
  Worker dce-example: Setup QBman Software Portal
  Worker dce-example: Setup Decompression Compression Engine devices

  TESTS *********************************************************************************
  _________________________________________________________________________
  Test # 0: single_chunk_compress
  Worker dce-example: Setting up dpdcei_lane
  Worker dce-example: Setting up input data
  Worker dce-example: Setting up output buffer
  Worker dce-example: Sending operation on compression dpdcei
  Worker dce-example: Polling for result
  Worker dce-example: Received response with status STREAM_END
  Worker dce-example: Success
  _________________________________________________________________________
  _________________________________________________________________________
  Test # 1: single_chunk_decompress
  Worker dce-example: Setting up dpdcei_lane
  Worker dce-example: Compress data for decompression testing
  Worker dce-example: Setting up input data
  Worker dce-example: Setting up output buffer
  Worker dce-example: Sending operation on decompression dpdcei
  Worker dce-example: Polling for result
  Worker dce-example: Received response with status STREAM_END
  Worker dce-example: Checking if the decompressed data matches the original data
  Worker dce-example: Success
  _________________________________________________________________________
```

Running dce-api-perf-test stateless compression in asynchronous mode
--------------------------------------------------------------------

This is a high-performance use case in which blocks of data are compressed
independently.  In addition three threads are used to feed the DCE.

```
  # Put some data into input-file
  sudo bin/dce-api-perf-test --in=input-file --paradigm=stateless --time=2 --format=gzip --chunk-size=16384 \
   --resources dprc.2 dpio.16 dpdcei.0 threads 1 dpio.17 dpdcei.1 threads 1 dpio.18 dpdcei.2 threads 1
```

The output is

```
  Worker 2_eq: Done work. Waiting for 300 outstanding work requests before exit
  Worker 1_eq: Done work. Waiting for 297 outstanding work requests before exit
  Worker 0_eq: Done work. Waiting for 292 outstanding work requests before exit
  Worker 0_eq: tx_max = 1934 rx_max = 134 tx_min = 1863 rx_min = 62 tx_avg = 1920 rx_avg = 62 enqueuer got ahead 1  26 times. Interrupt count = 80359
  Worker 1_eq: tx_max = 1996 rx_max = 6 tx_min = 1976 rx_min = 0 tx_avg = 1987 rx_avg = 0 enqueuer got ahead 121 times. Interrupt count = 80360
  Worker 2_eq: tx_max = 1877 rx_max = 76 tx_min = 1794 rx_min = 4 tx_avg = 1794 rx_avg = 58 enqueuer got ahead 126 times. Interrupt count = 80369
  Took 2006397 us to process 10561885184 bytes, and output 4090115746 bytes. Cycles elapsed 50221964. Counter frequency is 25030920
  Throughput is 42112 Mbps
```

The measured compression rate is 42112 Mbps.  The compression ratio
achieved is

  10561885184./4090115746. = 2.58

Running stateful compression in synchronous mode and writing output to a file
------------------------------------------------------------------------------

This example is a lower-performance use case co-mingled with file I/O.
It uses three threads to feed DCE, compressing the data from
input-file three times.  The result does not show the performance
capabilities of DCE.  A different example application could.  But
this application allows one to see that gunzip can process what DCE
compresses.

```
  # Put some data into input-file.
  sudo ./bin/dce-api-perf-test --synchronous --in=input-file --out=output-file.gz --paradigm=stateful-recycle \
    --time=2 --format=gzip  --chunk-size=16384 \
   --resources dprc.2 dpio.16 dpdcei.0 threads 1 dpio.17 dpdcei.1 threads 1 dpio.18 dpdcei.2 threads 1
```

The output looks like this:

```
  Worker 2_eq: Done work. Waiting for 2 outstanding work requests before exit
  Worker 1_eq: Done work. Waiting for 0 outstanding work requests before exit
  Worker 0_eq: Worker 1_eq: tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 98482
  Worker 2_eq: Done work. Waiting for 0 outstanding work requests before exit
  Worker 0_eq: tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 98482
  tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 98482
  Took 1997255 us to process 1640031232 bytes, and output 616670144 bytes. Cycles elapsed 50008214. Counter frequency is 25038460
  Throughput is 6569 Mbps
```

You can use gunzip to decompress one of the files that was created and
see that it matches the input file.

```
  cp output-file.gz_0 foo.0.gz
  gunzip foo.0.gz
  sha1sum input-file foo.0
  284b3e13cc79662bc056af7e673fbdbebc682f0e  input-file
  284b3e13cc79662bc056af7e673fbdbebc682f0e  foo.0
```

Running stateful decompression in synchronous mode and writing output to a file
--------------------------------------------------------------------------------

This is an example of decompression per the stateful, synchronous use
case with file output.  Again, this is not a high-performance use case
due to the way the test application is constructed.

  cp input-file compress_input
  gzip compress_input
  sudo bin/dce-api-perf-test --in=compressed_in.gz --paradigm=stateful-recycle --format=gzip \
      --chunk-size=4096 --synchronous --out=uncomp_out \
      --resources dprc.2 dpio.16 dpdcei.3 threads 1 dpio.17 dpdcei.4 threads 1 dpio.18 dpdcei.5 threads 1

The output looks like this:

```
  Worker 1_eq: Done work. Waiting for 1 outstanding work requests before exit
  Worker 0_eq: Done work. Waiting for 0 outstanding work requests before exit
  Worker 0_eq: tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 89741
  Worker 2_eq: Done work. Waiting for 0 outstanding work requests before exit
  Worker 2_eq: tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 89744
  Worker 1_eq: tx_max = 0 rx_max = 0 tx_min = 4294967295 rx_min = 4294967295 tx_avg = 0 rx_avg = 0 enqueuer got ahead 0 times. Interrupt count = 89744
  Took 1997805 us to process 367468676 bytes, and output 1120840003 bytes. Cycles elapsed 50008070. Counter frequency is 25031500
  Throughput is 4488 Mbps
```

You can check that the output is correct.

```
  sha1sum input-file uncomp_out_*
  284b3e13cc79662bc056af7e673fbdbebc682f0e  input-file
  284b3e13cc79662bc056af7e673fbdbebc682f0e  uncomp_out_0
  284b3e13cc79662bc056af7e673fbdbebc682f0e  uncomp_out_1
  284b3e13cc79662bc056af7e673fbdbebc682f0e  uncomp_out_2
```

About the DCE API and libraries
-------------------------------

The DCE API provides a set of user-space APIs that simplify access to
DCE. The driver provides accessor objects called sessions which
maintain state in coordination with the hardware. The responses from
DCE can then be polled by the user. This DCE API is meant to be
entirely sufficient for a user to use DCE without having to read the
related HW documents. It is documented in file dce/dce.h.

Known issues
------------

Multiple DPDCEIs can be attached to a single DPIO. This use case is
supported but no scheduling fairness can be guaranteed in this
release. This use mode may result in the starvation of some DPDCEIs.

dce's People

Contributors

ahmedmpr avatar hemantagr avatar ting-liu avatar luozhenhua avatar

Stargazers

kenli avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.