Coder Social home page Coder Social logo

kdr's Introduction

KDR

Data races are increasingly seen as concurrency bugs and they are difficult to reproduce and diagnose in parallel programs. Linux kernel is a concurrency-intensive and large-scale software system that contains million lines of code contributed by more than 10 thousands programmers. High thread-level parallelism and non-deterministic thread interleaving are most prone to race conditions. We conducted a thorough investigation of data races reported on Linux kernels in recent 5 years. The investigation was performed by reviewing bug reports in Kernel Bug Tracker and studying kernel source code revision logs in Linux Kernel Organization. The results show that there are about 500 kernel data races reported and fixed in recent 5 years, and the file system and drivers among all modules have a much higher percentage of race conditions than other modules. Race distribution over years, modules and kernel versions are also reported according to our statistical results. We conducted a case-by-case study on some selected data races and summed up 4 data race patterns. Our analysis results should be of interest to researchers and engineers who are committed to kernel data race detection and kernel development.



If this is useful for you, please cite our paper: Shi, Jianjun & Ji, Weixing & Wang, Yizhuo & Huang, Lifu & Guo, Yunkun & Shi, Feng. (2018). Linux Kernel Data Races in Recent 5 Years. Chinese Journal of Electronics. 27. 556-560. 10.1049/cje.2018.03.015.

References:

[1] Kernel.org git repositories
[2]The Linux Kernel Archives
list source module version id status
[1] ChangeLog IO 3.10.8 d50235b7bc3ee0a0427984d763ea7534149531b4 Resolved
[2] ChangeLog Driver 3.0.38 8265981bb439f3ecc5356fb877a6c2a6636ac88a Resolved
[3] ChangeLog Driver 3.0.59 ce73ec6db47af84d1466402781ae0872a9e7873c Resolved
[4] ChangeLog KVM 3.10.60 2febc839133280d5a5e8e1179c94ea674489dae2 Resolved
[5] ChangeLog KVM 3.3.6 6dbf79e7164e9a86c1e466062c48498142ae6128 Resolved
[6] ChangeLog IO 3.10.23 eb1c160b22655fd4ec44be732d6594fd1b1e44f4 Resolved
[7] ChangeLog Driver 3.1 9c921c22a7f33397a6774d7fa076db9b6a0fd669 Resolved
[8] ChangeLog Driver 3.1 7456caae37396fc1bc6f8e9461d07664b8c2f280 Resolved
[9] ChangeLog Driver 3.0.11 5dc2470c602da8851907ec18942cd876c3b4ecc1 Resolved
[10] ChangeLog Driver 3.0.17 eea915bb0d1358755f151eaefb8208a2d5f3e10c Resolved
[11] ChangeLog Driver 3.0.41 44e4360fa3384850d65dd36fb4e6e5f2f112709b Resolved
[12] ChangeLog Driver 3.11.2 1eeeef153c02f5856ec109fa532eb5f31c39f85c Resolved
[13] ChangeLog Driver 3.10.34 ef0899410ff630b2e75306da49996dbbfa318165 Resolved
[14] ChangeLog Driver 3.10.23 b869ccfab1e324507fa3596e3e1308444fb68227 Resolved
[15] ChangeLog Driver 3.10.42 21f8aaee0c62708654988ce092838aa7df4d25d8 Resolved
[16] ChangeLog Driver 3.10.46 d9e93c08d8d985e5ef89436ebc9f4aad7e31559f Resolved
[17] ChangeLog File System 3.10.39 ec4cb1aa2b7bae18dd8164f2e9c7c51abcf61280 Resolved
[18] ChangeLog File System 3.2.45 794446c6946513c684d448205fbd76fa35f38b72 Resolved
[19] ChangeLog File System 3.10.66 06bed7d18c2c07b3e3eeadf4bd357f6e806618cc Resolved
[20] ChangeLog File System 3.0.88 1c327d962fc420aea046c16215a552710bde8231 Resolved
[21] ChangeLog Process Management 3.0.68 71b5707e119653039e6e95213f00479668c79b75 Resolved
[22] ChangeLog Process Management 3.12.14 532de3fc72adc2a6525c4d53c07bf81e1732083d Resolved
[23] ChangeLog Process Management 3.12.37 c291ee622165cb2c8d4e7af63fffd499354a23be Resolved
[24] ChangeLog Process Management 3.14.41 b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed Resolved
[25] ChangeLog Memory Management 3.12.36 91b57191cfd152c02ded0745250167d0263084f8 Resolved
[26] ChangeLog File System 3.10.33 1362f4ea20fa63688ba6026e586d9746ff13a846 Resolved
[27] ChangeLog Drivers 3.7.2 bd9eb7fbe69111ea0ff1f999ef4a5f26d223d1d5 Resolved
[28] ChangeLog Process Management 3.10.21 a399b29dfbaaaf91162b2dc5a5875dd51bbfa2a1 Resolved
[29] ChangeLog File System 3.10.62 c6c15e1ed303ffc47e696ea1c9a9df1761c1f603 Resolved
[30] ChangeLog Others 3.17.3 30a6b8031fe14031ab27c1fa3483cb9780e7f63c Resolved

#1

Commit id d50235b7bc3ee0a0427984d763ea7534149531b4 Version 3.10.8
Module IO Date 2013/7/3
Pattern use before initialization
Description There's a race between elevator switching and normal io operation. Because the allocation of struct elevator_queue and struct elevator_data don't in a atomic operation.So there are have chance to use NULL ->elevator_data.
Reproduce Using the follow method can easy reproduce this bug 1:dd if=/dev/sdb of=/dev/null 2:while true;do echo noop > scheduler;echo deadline > scheduler;done
Interleaving

#2

commit id 8265981bb439f3ecc5356fb877a6c2a6636ac88a kernel version 3.0.38
module Driver date 2012/7/13
pattern access with improper synchronization
description Checking for adc->ts_pend already claimed should be done with the lock held.
reproduce
interleaving

#3

commit id ce73ec6db47af84d1466402781ae0872a9e7873c kernel version 3.0.38
module Driver date 2013/1/3
pattern access with improper synchronization
description The locking in update_vsyscall_tz() is not only unnecessary because the vdso code copies the data unproteced in __kernel_gettimeofday() but also introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which causes user space process to loop forever in vdso code.
reproduce
interleaving

#4

commit id 2febc839133280d5a5e8e1179c94ea674489dae2 kernel version 3.10.60
module KVM date 2014/10/24
pattern access without synchronization
description There's a race condition in the PIT emulation code in KVM. In __kvm_migrate_pit_timer the pit_timer object is accessed without synchronization. If the race condition occurs at the wrong time this can crash the host kernel.
reproduce
interleaving

#5

commit id 6dbf79e7164e9a86c1e466062c48498142ae6128 kernel version 3.3.6
module KVM date 2012/3/8
pattern access with improper synchronization
description During protecting pages for dirty logging, other threads may also try to protect a page in mmu_sync_children() or kvm_mmu_get_page().
reproduce
interleaving

#6

commit id eb1c160b22655fd4ec44be732d6594fd1b1e44f4 kernel version 3.10.23
module IO date 2013/11/8
pattern access without synchronization
description The soft lockup below happens at the boot time of the system using dm multipath and the udev rules to switch scheduler.
reproduce
interleaving

#7

commit id 9c921c22a7f33397a6774d7fa076db9b6a0fd669 kernel version 3.1
module Driver date 2011/7/14
pattern access without synchronization
description Use battery->lock in sysfs_remove_battery() to make checking, removing, and clearing bat.dev atomic. This is necessary because sysfs_remove_battery() may be invoked concurrently from different paths.
reproduce
interleaving

#8

commit id 7456caae37396fc1bc6f8e9461d07664b8c2f280 kernel version 3.1
module Driver date 2011/7/20
pattern access without synchronization
description When a request is made, the card presence is checked and the request is queued. These two parts must be atomic with respect to card removal, or a card removal could be handled in between, and the new request wouldn't get cancelled until another card was inserted.
reproduce
interleaving

#9

commit id 5dc2470c602da8851907ec18942cd876c3b4ecc1 kernel version 3.0.11
module Driver date 2011/11/14
pattern access without synchronization
description There's a race between the USB disconnect handler and the TTY close handler which may cause the acm object to be freed while it's still being used.
reproduce
interleaving

#10

commit id eea915bb0d1358755f151eaefb8208a2d5f3e10c kernel version 3.0.17
module Driver date 2012/1/5
pattern access without synchronization
description Its caused by the fact that firmware_loading_store has a case 0 in its switch statement that reads and writes the fw_priv->fw poniter without the protection of the fw_lock mutex.
reproduce
interleaving

#11

commit id 44e4360fa3384850d65dd36fb4e6e5f2f112709b kernel version 3.0.41
module Driver date 2012/4/12
pattern access without synchronization
description /proc/sys/kernel/random/boot_id can be read concurrently by userspace processes. If two (or more) user-space processes concurrently read boot_id when sysctl_bootid is not yet assigned, a race can occur making boot_id differ between the reads.
reproduce
interleaving

#12

commit id 1eeeef153c02f5856ec109fa532eb5f31c39f85c kernel version 3.11.2
module Driver date 2013/8/30
pattern use after free
description The following race condition triggers here. causing an oops later when walking pending_list after the firmware has been released.
reproduce
interleaving

#13

commit id ef0899410ff630b2e75306da49996dbbfa318165 kernel version 3.10.34
module Driver date 2013/11/8
pattern access without synchronization
description The patch did not convert the s390 dasd device driver which is the only device driver which also calls elevator_init(). So add the missing locking.
reproduce
interleaving

#14

commit id b869ccfab1e324507fa3596e3e1308444fb68227 kernel version 3.10.23
module Driver date 2013/11/14
pattern access without synchronization
description This patch fixes two race conditions between bond_store_updelay/downdelay and bond_store_miimon which could lead to division by zero as miimon can be set to 0 while either updelay/downdelay are being set and thus miss the zero check in the beginning, the zero div happens because updelay/downdelay are stored as new_value / bond->params.miimon. Use rtnl to synchronize with miimon setting.
reproduce
interleaving

#15

commit id 21f8aaee0c62708654988ce092838aa7df4d25d8 kernel version 3.10.42
module Driver date 2014/2/20
pattern access without synchronization
description We check tid->sched without a lock taken on ath_tx_aggr_sleep(). That is race condition which can result of doing list_del(&tid->list) twice (second time with poisoned list node) and cause crash like shown below:
reproduce
interleaving

#16

commit id d9e93c08d8d985e5ef89436ebc9f4aad7e31559f kernel version 3.10.46
module Driver date 2014/5/27
pattern access with improper synchronization
description We find a race between write and resume. usb_wwan_resume run play_delayed() and spin_unlock, but intfdata->suspended still is not set to zero. At this time usb_wwan_write is called and anchor the urb to delay list. Then resume keep running but the delayed urb have no chance to be commit until next resume. If the time of next resume is far away, tty will be blocked in tty_wait_until_sent during time. The race also can lead to writes being reordered.
reproduce
interleaving

#17

commit id ec4cb1aa2b7bae18dd8164f2e9c7c51abcf61280 kernel version 3.10.39
module File System date 2014/4/7
pattern access with improper synchronization
description When heavily exercising xattr code the assertion that jbd2_journal_dirty_metadata() shouldn't return error was triggered.
reproduce
interleaving

#18

commit id 794446c6946513c684d448205fbd76fa35f38b72 kernel version 3.2.45
module File System date 2013/4/4
pattern use after free
description In order to demonstrace this issue one should mount ext4 with mount -o discard option on SSD disk. This makes callback longer and race window becomes wider. In order to fix this we should mark transaction as finished only after callbacks have completed
reproduce
interleaving

#19

commit id 06bed7d18c2c07b3e3eeadf4bd357f6e806618cc kernel version 3.10.39
module File System date 2014/4/7
pattern use before initialization
description This commit fixes a race whereby nlmclnt_init() first starts the lockd daemon, and then calls nlm_bind_host() with the expectation that nlmsvc_timeout has already been initialised. Unfortunately, there is no no synchronisation between lockd() and lockd_up() to guarantee that this is the case.
reproduce
interleaving

#20

commit id 1c327d962fc420aea046c16215a552710bde8231 kernel version 3.0.88
module File System date 2013/7/11
pattern access without synchronization
description In nlmsvc_retry_blocked, the check that the list is non-empty and acquiring the pointer of the first entry is unprotected by any lock. This allows a rare race condition when there is only one entry on the list. A function such as nlmsvc_grant_callback() can be called, which will temporarily remove the entry from the list. Between the list_empty() and list_entry(),the list may become empty, causing an invalid pointer to be used as an nlm_block, leading to a possible crash.
reproduce
interleaving

#21

commit id 71b5707e119653039e6e95213f00479668c79b75 kernel version 3.0.68
module Process Management date 2013/2/18
pattern use after free
description In cgroup_exit() put_css_set_taskexit() is called without any lock, which might lead to accessing a freed cgroup.
reproduce
interleaving

#22

commit id 532de3fc72adc2a6525c4d53c07bf81e1732083d kernel version 3.12.14
module Process Management date 2014/2/18
pattern access without synchronization
description Currently, there's nothing preventing cgroup_enable_task_cg_lists() from missing set PF_EXITING and race against cgroup_exit(). Depending on the timing, cgroup_exit() may finish with the task still linked on css_set leading to list corruption. Fix it by grabbing siglock in cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be visible.
reproduce
interleaving

#23

commit id c291ee622165cb2c8d4e7af63fffd499354a23be kernel version 3.12.37
module Process Management date 2014/12/13
pattern use before initialization
description Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor.
reproduce
interleaving

#24

commit id b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed kernel version 3.14.41
module Process Management date 2015/4/17
pattern use after free
description ptrace_resume() is called when the tracee is still __TASK_TRACED. We set tracee->exit_code and then wake_up_state() changes tracee->state. If the tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T) wrongly looks like another report from tracee.
reproduce
interleaving

#25

commit id 91b57191cfd152c02ded0745250167d0263084f8 kernel version 3.12.36
module Memory Management date 2014/12/3
pattern access with improper synchronization
description In some android devices, there will be a "divide by zero" exception. vmpr->scanned could be zero before spin_lock(&vmpr->sr_lock).
reproduce
interleaving

#26

commit id 1362f4ea20fa63688ba6026e586d9746ff13a846 kernel version 3.14.41
module File System date 2014/2/20
pattern use after free
description Currently last dqput() can race with dquot_scan_active() causing it to call callback for an already deactivated dquot.
reproduce
interleaving

#27

commit id bd9eb7fbe69111ea0ff1f999ef4a5f26d223d1d5 kernel version 3.7.2
module Drivers date 2012/11/14
pattern use after free
description There is one race that both request_firmware() with the same firmware name.
reproduce
interleaving

#28

commit id a399b29dfbaaaf91162b2dc5a5875dd51bbfa2a1 kernel version 3.10.21
module Process Management date 2013/11/22
pattern use after free
description When IPC_RMID races with other shm operations there's potential for use-after-free of the shm object's associated file (shm_file).
reproduce
interleaving

#29

commit id c6c15e1ed303ffc47e696ea1c9a9df1761c1f603 kernel version 3.10.62
module File System date 2014/11/19
pattern access without synchronization
description The currect code for nfsd41_cb_get_slot() and nfsd4_cb_done() has no locking in order to guarantee atomicity, and so allows for races of the form.
reproduce
interleaving

#30

commit id 30a6b8031fe14031ab27c1fa3483cb9780e7f63c kernel version 3.17.3
module Others date 2014/10/26
pattern access with improper synchronization
description free_pi_state and exit_pi_state_list both clean up futex_pi_state's. exit_pi_state_list takes the hb lock first, and most callers of free_pi_state do too. requeue_pi doesn't, which means free_pi_state can free the pi_state out from under exit_pi_state_list.
reproduce
interleaving

kdr's People

Contributors

bitshijianjun avatar jiweixing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kdr's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.