Coder Social home page Coder Social logo

ucsan's Introduction

ucsan

The User Concurrency Sanitizer (UCSAN) is the watchpoint-based data race detector. It uses the thread sanitizer interface in Clang (v3.2+) and GCC (v4.8+) to determine whether the variable data race.

Currently, it only will check the non-volatile type of variable access. In other words, all the variables with thread-safety operations can declare as volatile types to avoid data race checking.

Get start

Build

Build this project with following commands:

$ make                  # Generate static library: libucsan.a
$ make clean            # Delete generated files

Makefile Parameter

  • nr_cpu : number of cpu.
  • nr_wp : number of watchpoint slot.
  • CC : compiler, gcc or clang.

How to use?

Add the static library libucsan.a to your project:

$ gcc -c main.c -fsanitize=thread
$ gcc -o main main.o -L/path/to/libucsan/ -lucsan -rdynamic -pthread

Or you can move the libucsan.a to your project directory. Then:

$ gcc -c main.c -fsanitize=thread
$ gcc -o main main.o libucsan.a -rdynamic -pthread

ucsan's People

Contributors

lind026 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

ucsan's Issues

Define the structure of task and stack

以下是我和子婷討論出來的內容,需要確認是否理解錯誤和還有想要釐清的問題如下,

task_struct部分:(遇到data race才會存取)

  1. int pid:紀錄檢測到data race的行程編號
  2. int cpuid:紀錄執行thread的cpu號碼
  3. bool read_write_state:紀錄當前變數的讀寫狀態
  4. void* address:紀錄共享變數在記憶體的位址(同個共享變數,位址是一樣的)
  5. size_t size:紀錄變數的大小(同個變數,size相同)

detect使用unify的set_task_info(task_struct t)會將發生data race的變數的task資訊傳進unify。如果task的address和size相同,代表是同個共享變數。
可以到task_container裡查詢是否已經有記錄過同個共享變數的資料。task_container是一個二維陣列。例如:當前檢測到行程b裡v1出現data race,而在之前task_container就有存取v1的其他task,所以再把新檢測到的task放進負責紀錄v1的task的陣列裡。

以上是對task的理解。
所以我們可能需要detect負責建立task,或者是傳task內5個需要的值,再由unify負責建立task

需要解決的疑問:

  1. 具體需要stack的什麼資訊?我們有想到可以存function call順序、區域變數當前存取的值。
    2.一個共享變數只需要存取一個stack_info?還是根據task不同,會存取到多個不同資訊的stack_info?
  2. **stack的內容會隨著程式的執行不斷變動嗎?**還是會完整記錄執行函式的順序?例如:檢測v1時,它是由func A到B再到C慢慢算出值,在func C才檢測到data race,但先前計算的歷程(有經過func A和B)是不是不會記錄起來?或是說return value後,其他出現data race就無法存到正確的stack,因為stack資訊被更改了?
  3. 是不是等detect那邊確認所有程式碼的檢測完畢了,再讓detect呼叫unify去做寫檔(void Unify::write_file()),把task_container和stack_info的內容寫進一個txt檔,寫檔完再回到detect?也就是detect負責決定unify寫檔的時機。

detect: Job list

Detect Function Example

Data-race detection in the Linux kernel

/*
 * KCSAN uses the same instrumentation that is emitted by supported compilers
 * for ThreadSanitizer (TSAN).
 *
 * When enabled, the compiler emits instrumentation calls (the functions
 * prefixed with "__tsan" below) for all loads and stores that it generated;
 * inline asm is not instrumented.
 *
 * Note that, not all supported compiler versions distinguish aligned/unaligned
 * accesses, but e.g. recent versions of Clang do. We simply alias the unaligned
 * version to the generic version, which can handle both.
 */

#define DEFINE_TSAN_READ_WRITE(size)                                           \
	void __tsan_read##size(void *ptr);                                     \
	void __tsan_read##size(void *ptr)                                      \
	{                                                                      \
		check_access(ptr, size, 0, _RET_IP_);                          \
	}                                                                      \
	EXPORT_SYMBOL(__tsan_read##size);                                      \
	void __tsan_unaligned_read##size(void *ptr)                            \
		__alias(__tsan_read##size);                                    \
	EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
	void __tsan_write##size(void *ptr);                                    \
	void __tsan_write##size(void *ptr)                                     \
	{                                                                      \
		check_access(ptr, size, KCSAN_ACCESS_WRITE, _RET_IP_);         \
	}                                                                      \
	EXPORT_SYMBOL(__tsan_write##size);                                     \
	void __tsan_unaligned_write##size(void *ptr)                           \
		__alias(__tsan_write##size);                                   \
	EXPORT_SYMBOL(__tsan_unaligned_write##size);                           \
	void __tsan_read_write##size(void *ptr);                               \
	void __tsan_read_write##size(void *ptr)                                \
	{                                                                      \
		check_access(ptr, size,                                        \
			     KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE,       \
			     _RET_IP_);                                        \
	}                                                                      \
	EXPORT_SYMBOL(__tsan_read_write##size);                                \
	void __tsan_unaligned_read_write##size(void *ptr)                      \
		__alias(__tsan_read_write##size);                              \
	EXPORT_SYMBOL(__tsan_unaligned_read_write##size)

DEFINE_TSAN_READ_WRITE(1);
DEFINE_TSAN_READ_WRITE(2);
DEFINE_TSAN_READ_WRITE(4);
DEFINE_TSAN_READ_WRITE(8);
DEFINE_TSAN_READ_WRITE(16);

Watchpoint

enum watchpoint_state {
	is_none,
	is_write,
	is_read,
};

atomic_int watchpoint;

void detect::check_access(void *ptr, size_t size, int read_write_state)
{
	watchpoint = find_watchpoint(ptr);
	if (watchpoint == is_none) // atomic operation
		watchpoint = read_write_state; //atomic operation
	else {
		// someone already created the watchpoint
		// report the data race to unify subsystem
		unify::report(ptr, size, read_write_state);
		return;
	}
	// wait some time
	while (1) {
		if (detect someone access this address) {
			// data race
			// XXX: only one task will report the data race to system
			// report the task info to unify subsystem
			unify::report(...);
		}
	}
	// no one is access it, no data race
	
	watchpoint = is_none; // atomic operation
	return;
}

detection sys.: correction and alternative

correction:

  1. atomic_compare_exchange_strong(): cppreference與gcc的定義不一樣,須注意與修正。
    e.g. insert_watchpoint func. and find_watchpoint func. in src/core.c

alternative:

  1. check_access func. 使用mutex,解決同時兩個processes都對同一個記憶體進行操作,且該記憶體位址並未註冊在watchpoints之case
    problem: lock會導致執行效率差
    trade offs: 照原本的,不要使用mutex,若發生上述case,ignore

  2. watchpoints(array)位址存放方式: 使用slot與mod PAGE_SIZE(即offset)達到direct access,不須sequential search,即時間為big O(1)
    problem: 若collison高,則效果不好
    trade offs:照原本的,用sequential search即可

unify: Job list

stack information

task information

  • Process class
typedef int pid_t;
struct task_struct {
	int pid; /* from OS */
 	int cpuid; /* from OS */
	bool read_write_state; /* read:true, write:false */
	void *address; /* Address of object */
	size_t size; /* size of object, unit is byte */
	/* link to others task_struct for one watchpoint */
};
void unify::report(void *ptr, size_t size, int state)
{
    // store and create the task info
    task_struct *task = create_task(ptr, size, state); 
    stack_info *stack_info;
    
    // Add task to task_container, collect the task
    // task_container probably will be array, the set of task
    create_task_container(task_container, task);
	
    stack_info = collect_stack_info();
    
    report_to_system(together(stack_info, task_container));
}

Let C plus plus compatible with C

If we want to let C++ compatible with C, we need to do the following things:

/* header */

#ifdef __cplusplus
extern "C" {
#endif

void C_function();

#ifdef __cplusplus
}
#endif

The more information from Stack Overflow:

  • extern "C" is a linkage-specification
  • Every compiler is required to provide "C" linkage
  • A linkage specification shall occur only in namespace scope
  • See Richard's Comment: Only function names and variable names with external linkage have a language linkage
  • Two function types with distinct language linkages are distinct types even if otherwise identical
  • Linkage specs nest, inner one determines the final linkage
  • extern "C" is ignored for class members
  • At most one function with a particular name can have "C" linkage (regardless of namespace)
  • See Richard's comment: static inside extern "C" is valid; an entity so declared has internal linkage, and so does not have a language linkage
  • Linkage from C++ to objects defined in other languages and to objects defined in C++ from other languages is implementation-defined and language-dependent. Only where the object layout strategies of two language implementations are similar enough can such linkage be achieved

Here is the cppreference: Language linkage

Fix the bug in gcc compiler specific

Currently, the if statement of __GNUC__ will let the compiler-specific header not work well.
It won't do as we think, since it doesn't let us use the GCC-specific function.

Unit test report format

I'm coding the unit test system.
Here is the example for the report message:

a - src/core.o
a - src/unify.o
a - lib/per_cpu.o
a - tests/test_watchpoint.o

 ------------------------------ unit test start ------------------------------ 

 ============================== detect ============================== 

 [1655276431041011] INFO: src/core.c:39:test_insert(): ptr=0x7ffc79773b1c size=4 type=1 watchpoint=0x5629900a2120
 [1655276431041143] INFO: src/core.c:52:test_insert(): watchpoint=0x5629900a2120 old=8007ffc79773b1c new=100000000000000
 [TEST SUCCESS] test_insert()

 detect subsystem test failed: 2 error(s) 
     [1655276431041011] INFO: src/core.c+37:test_insert(): ptr=0x7ffc79773b1c size=4 type=1 watchpoint=0x5629900a2120
     [1655276431041136] ERROR: src/core.c:46:test_insert(): insert(0x5629900a2120) != find(0x5629900a21e0)
     [1655276431041140] ERROR: src/core.c:50:test_insert(): inserted but not found, old=8007ffc79773b1c, new=100000000000000
     [1655276431041143] INFO: src/core.c+50:test_insert(): watchpoint=0x5629900a2120 old=8007ffc79773b1c new=100000000000000
     [TEST SUCCESS] test_insert()

 ============================== detect ============================== 

Please let me know if there is any other format that you want me to add.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.