Current implementation uses kernel_recvmsg() for rece

I think the best approach would be to: extend <code class="not

fixed by <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Avoid kernel_recvmsg() about af_ktls HOT 3 CLOSED

ktls commented on June 1, 2024

Avoid kernel_recvmsg()

from af_ktls.

Comments (3)

fridex commented on June 1, 2024

I think the best approach would be to:

extend tcp_read_sock() with MSG_PEEK flag
introduce udp_read_sock() with MSG_PEEK support for UDP

Using directly skbuffs is not nice, since there should be appropriate operations on UDP/TCP sockets to encapsulate such logic (and make it possible to reuse these operations in other parts of the kernel).

from af_ktls.

fridex commented on June 1, 2024

When run "splice echo time" scenario for 2 seconds a simple ping-pong with server [1]:

splice(ksd, NULL, pipe, NULL, 1400, 0);
splice(pie, NULL, ksd, NULL, 1400, 0);

With MTU 1400:

I am getting following results:

44.24% of total time spent in kernel_sendmsg()
- 38.28% of total time spent in tcp_push - on actual sending
- 1.15% of total time spent in allocation socket buffers skb_stream_alloc_skb
- cca 2% on copy from kernel vector (copy_from_iter, memcpy_erms)
33.14% of total time spent in tls_splice_read
- 13.14% of total time spent in kernel_recvmsg
- cca 2% on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)

With MTU 16000:

I am getting following results:

22.29% of total time spent in kernel_sendmsg()
- 16.30% of total time spent in tcp_push - on actual sending
- 0.69% of total time spent in allocation socket buffers skb_stream_alloc_skb
- 3.03% on copy from kernel vector (copy_from_iter, memcpy_erms)
42.25% of total time spent in tls_splice_read
- 9.02% of total time spent in kernel_recvmsg
- 4.02 % on copy and allocation (skb_copy_datagram_iter, copy_page_to_iter)

Ideally we could save:

for 1400 MTU:
- cca 2% by avoiding kernel_recvmsg()
- cca 3.15% by avoiding kernel_sendmsg()
for 16000 MTU:
- 3.72% by avoiding kernel_sendmsg()
- 4.02% by avoiding kernel_recvmsg()

We have to consider addional logic within kernel_sendmsg() and kernel_recvmsg() (locking, ...). Using kernel_sendpage() and tcp_read_sock() (udp_read_sock()) can have different logic which could have positive/negative impact as well.

perf reporting context switches not expensive at all (0.30% of total)

[1] https://github.com/fridex/af_ktls-tool/blob/master/action.c#L795

from af_ktls.

djwatson commented on June 1, 2024

fixed by #62

from af_ktls.

Avoid kernel_recvmsg() about af_ktls HOT 3 CLOSED

Comments (3)

With MTU 1400:

With MTU 16000:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent