Coder Social home page Coder Social logo

Avoid kernel_recvmsg() about af_ktls HOT 3 CLOSED

ktls avatar ktls commented on June 1, 2024
Avoid kernel_recvmsg()

from af_ktls.

Comments (3)

fridex avatar fridex commented on June 1, 2024

I think the best approach would be to:

  • extend tcp_read_sock() with MSG_PEEK flag
  • introduce udp_read_sock() with MSG_PEEK support for UDP

Using directly skbuffs is not nice, since there should be appropriate operations on UDP/TCP sockets to encapsulate such logic (and make it possible to reuse these operations in other parts of the kernel).

from af_ktls.

fridex avatar fridex commented on June 1, 2024

When run "splice echo time" scenario for 2 seconds a simple ping-pong with server [1]:

splice(ksd, NULL, pipe, NULL, 1400, 0);
splice(pie, NULL, ksd, NULL, 1400, 0);
With MTU 1400:

I am getting following results:

  • 44.24% of total time spent in kernel_sendmsg()
    • 38.28% of total time spent in tcp_push - on actual sending
    • 1.15% of total time spent in allocation socket buffers skb_stream_alloc_skb
    • cca 2% on copy from kernel vector (copy_from_iter, memcpy_erms)
  • 33.14% of total time spent in tls_splice_read
    • 13.14% of total time spent in kernel_recvmsg
    • cca 2% on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)
With MTU 16000:

I am getting following results:

  • 22.29% of total time spent in kernel_sendmsg()
    • 16.30% of total time spent in tcp_push - on actual sending
    • 0.69% of total time spent in allocation socket buffers skb_stream_alloc_skb
    • 3.03% on copy from kernel vector (copy_from_iter, memcpy_erms)
  • 42.25% of total time spent in tls_splice_read
    • 9.02% of total time spent in kernel_recvmsg
    • 4.02 % on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)

Ideally we could save:

  • for 1400 MTU:
    • cca 2% by avoiding kernel_recvmsg()
    • cca 3.15% by avoiding kernel_sendmsg()
  • for 16000 MTU:
    • 3.72% by avoiding kernel_sendmsg()
    • 4.02% by avoiding kernel_recvmsg()

We have to consider addional logic within kernel_sendmsg() and kernel_recvmsg() (locking, ...). Using kernel_sendpage() and tcp_read_sock() (udp_read_sock()) can have different logic which could have positive/negative impact as well.

perf reporting context switches not expensive at all (0.30% of total)

related: https://github.com/fridex/af_ktls/issues/22

[1] https://github.com/fridex/af_ktls-tool/blob/master/action.c#L795

from af_ktls.

djwatson avatar djwatson commented on June 1, 2024

fixed by #62

from af_ktls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.