srdja / collections-c Goto Github PK
View Code? Open in Web Editor NEWA library of generic data structures for the C language.
Home Page: http://srdja.github.io/Collections-C
License: GNU Lesser General Public License v3.0
A library of generic data structures for the C language.
Home Page: http://srdja.github.io/Collections-C
License: GNU Lesser General Public License v3.0
slist.c:744:20: warning: Access to field 'next' results in a dereference of a null pointer (loaded from variable 'flip')
flip->next = prev;
~~~~ ^
The questionable code is at https://github.com/srdja/Collections-C/blob/master/src/slist.c#L747
for (i = 0; i < list->size; i++) {
flip->next = prev;
prev = flip;
flip = next;
next = next ? next->next : NULL;
}
As I understand it, clang is worried about the condition to break the loop: on one hand, list size is used a list iterator; on the other hand, next element is verified against NULL on each cycle - which could mean that the size is not still exhausted but next element is NULL.
Maybe it's worth to change the loop to iterate by list->next
to make it more clear?
I would like to point out that identifiers like "__HASHTABLE_H__
" and "__TREESET_H_
" do eventually not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
There should be a more convenient way to free data after it's removed from a container and
it should be:
I suggest adding a callback to remove
type functions so that instead of:
void *e;
container_remove_at(c, 3, &e);
free(e);
we could write something like this:
container_remove_at(c, 3, NULL, free);
Nice to see you!
I see the source code of array. I found that it just stores the address of the element. And in all of the test code , it only adds auto var in the same scope to the array. If I add auto val in a different scope. Then I get it in the origin scope, it would cause code dump. Because the val in other scope has been destoryed. Sorry for my poor English!
array_get(ar, 0, (void_) &str);
Is not it should be modified as
array_get_at (ar, 0, (void_) &str);
Curretnly all errors are reported by returning an error code, which is in itself not a problem, however if a function should also return a value, problems arise. One is that the error codes have to share the type with the return value and second, the error codes limit the range of valid values that can be returned from a function.
A better approach would be to have the function return value reserved for errors, while havnig the function result returned through a parameter.
For example:
int foo_new(struct foo *f, int baz) {
if (baz) {
*f = (struct foo) {.bar = 3};
return OK;
} else {
// something went wrong
return ERR;
}
}
This way the caller can check the return code to see if the value was set or not before using it.
If we did the inverse where the result is returned through function return and the error code set through a parameter like so:
struct foo foo_new(int baz, int *status) {
if (baz) {
*status = OK;
return (struct foo) {.bar = 3};
} else {
// something went wrong
*status = ERR;
return ??? // What do we return?
}
}
We wouldn't be able to handle failures properly.
The return error / set result may not be ergonomic as the original way, but it allows for more flexibility in error reporting.
You should be able do perform a binary search on a sorted array.
Ring buffers (or circular buffers) are pretty important data structures that are used in many embedded applications (keyboards, drivers, etc). They're pretty similar to queues and are often pretty easy to implement, as the data they store usually consists of relatively small integer values.
Collections with iterators should also have zip iterators.
ArrayZipIterator zip;
array_zip_iter_init(&zip, array1, array2);
void *e1, *e2;
while (array_zip_iter_next(&zip, &e1, &e2) != CC_ITER_END) {
...
}
Basically this gives you a simpler and more efficient API when you want to iterate over two collections in locksep.
When using list_add_all
(list1, list2) calling list_destroy_free
on both lists will cause a SIGSEGV. The solution, as I see it is that list_add_all
needs to do a deep copy of the data.
Looking at the list_add_all
, looks like it's calling list_add
, which in turn calls list_add_last
. That function creates a new Node
in the list but the data is not copied from input, just the pointers are equalized.
I suspect element size will need to be known somehow at copy time.
Minimal scenario to reproduce the issue:
HashTable *ht;
hashtable_new(&ht);
hashtable_add(ht, "1", "1");
hashtable_add(ht, "2", "2");
hashtable_add(ht, "3", "3");
Array *arr;
hashtable_get_values(ht, &arr);
printf("Array size: %i\n", (int)array_size(arr));
array_remove_at(arr, 0, NULL);
printf("Array size: %i\n", (int)array_size(arr));
Expecting to see
Array size: 3
Array size: 2
Getting
Array size: 3
==19255== Invalid read of size 8
==19255== at 0x4C2DECE: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1021)
==19255== by 0x5E41A48: array_remove_at (in /.../reqs/lib/libcollectc.so)
==19255== ...
==19255== Address 0xb01b358 is 0 bytes after a block of size 24 alloc'd
==19255== at 0x4C29BC3: malloc (vg_replace_malloc.c:299)
==19255== by 0x5E414D4: array_new_conf (in /.../reqs/lib/libcollectc.so)
==19255== by 0x5E459FF: hashtable_get_values (in /.../reqs/lib/libcollectc.so)
==19255== ...
==19255==
Array size: 2
If I create array manually using array_new
, I don't get the same issue. Looking at the source code of array_remove_at
I can see there is a call to memmove
(https://github.com/srdja/Collections-C/blob/master/src/array.c#L317) which I'm guessing is calling memcpy internally, but unfortunately I have no guesses as to why that produces the error.
Any ideas what might be causing it and whether there is a way to work around it?
Hi,
The treetable datastructure uses the pointer that is added as keys to compare. The function used is
cc_common_comp_ptr.
Since it is possible to add any pointer, this data structure will often compare pointers that are from different structures.
In general, this is also the function you use to see if two pointers are equal in the context of ArrayContains for example.
However, comparing pointers using <
and >
is undefined behavior. It is not guaranteed that if ptr1 < ptr2
returns 0
, ptr2 < ptr1
will not also return 0
even if the pointers are different. This might happen due to compiler optimisations.
See this description.
In the file collections-c.h, vecor.h file does not exist in src
directory.
valgrind shows many memory leaks when running 'list_test', and they do not disappear when calling list_destroy_free().
The function list_remove_all_free() called by list_destroy_free() looks suspicious as it calls unlink_all(list, false)
while the second parameter is supposed to be true
, as I understand.
However, after changing it to true
the binary starts failing on memory allocation:
(gdb) backtrace
#0 0x00007ffff7aaeb9d in _int_malloc () from /usr/lib/libc.so.6
#1 0x00007ffff7ab141a in calloc () from /usr/lib/libc.so.6
#2 0x0000000000403b89 in list_add_last (list=0x609010, element=0x6092d0) at ../src/list.c:190
#3 0x0000000000401d1e in list_1234 (l=l@entry=0x7fffffffe5f0) at list_test.c:1486
#4 0x00000000004028ad in test_list_remove_at () at list_test.c:959
#5 0x0000000000400728 in main (argc=<optimized out>, argv=<optimized out>) at list_test.c:77
Similar to
list_contains_value(List *list, void *element, int (*cmp) (const void*, const void*))
{
....
if (cmp(node->data, element) == 0)
e_count++;
....
}
Newbie here, so please bear with me, but I think I am missing something obvious.
What is the purpose of hashtable_hash_ptr
@ https://github.com/srdja/Collections-C/blob/master/src/hashtable.c#L918 ? My understanding is that it would take a pointer value and compute a hash of it.
But since a pointer is 4-8 bytes in most systems, would it not be more convenient to just use the pointer value itself as the "hash" and save the hashing operations?
Thanks.
This function does not remove the element that was last returned by deque_iter_next. It removes the next item
Collections should have *_filter
and *_filter_mut
functions so that one may do something like this:
bool predicate(void *e) {
int el = *((int*) e);
if (el > 5) return true;
return false;
}
...
// filters out all integers greater than 5 by mutating the array
array_filter_mut(array1, predicate);
// filters out all integers greater than 5 and stores the filter results in a new array
Array *new_array;
if (array_filter(array2, predicate, &new_array) != CC_OK)
...
After called list_iter_init
initializing the iterator, iter->last
is signed NULL
.
Then if invoke list_iter_add
,
line link_after(iter->last, new_node);
will throw exception.
table->key_cmp(replace->key, key) == 0
--> since replace has a key of NULL and strcmp can not handle NULL pointers.Hi,
First of all, thank you for the cool implementations of the collections. I'm using one array implementation to implement a Heap.
I have this function:
`bool heap_insert(Heap *heap, void *element)
{
size_t pos = array_size(heap->v) + 1;
for(; pos > 1; pos = pos/2 )
if(element > array_get(heap->v, pos/2))
if(!array_add_at(heap->v, array_get(heap->v, pos/2), pos))
return false;
if(pos > array_capacity(heap->v))
return array_add(heap->v, element);
return array_add_at(heap->v, element, pos);
}
`
I'm getting a seg fault when array_add_at(heap->v, element, pos) is executed.
(gdb) p ar->capacity
$7 = 8
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00007fff8fb64522 in _platform_memmove$VARIANT$Nehalem () from /usr/lib/system/libsystem_platform.dylib
(gdb) bt
#0 0x00007fff8fb64522 in _platform_memmove$VARIANT$Nehalem () from /usr/lib/system/libsystem_platform.dylib
#1 0x00000001000016c1 in array_add_at (ar=0x100104a70, element=0x7fff5fbffb54, index=1) at ../src/array.c:184
#2 0x00000001000012b3 in heap_insert (heap=, element=) at ../src/heap.c:97
#3 0x0000000100000f69 in test_heap_insert () at heap_test.c:23
#4 main (argc=, argv=) at heap_test.c:12
(gdb) l
179 if (ar->size == ar->capacity && !expand_capacity(ar))
180 return false;
181
182 size_t shift = (ar->size - index) * sizeof(void*);
183
184 memmove(&(ar->buffer[index + 1]),
185 &(ar->buffer[index]),
186 shift);
187
188 ar->buffer[index] = element;
So, the question I have here is: what is the expected behavior of array_add_at()? I see the initial capacity is 8, size = 0 and I'm trying to insert an element at pos 1. It should go through right? Somehow, the call to memmove() fails :(
The function hashtable_new_conf in hashtable.c. I found that
table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry));
It confuse me. The type of table->buckets
is TableEntry**
. Why do not just use
table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry*));
I modify table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry));
to table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry*));
and do test, It also run well!
There are potential memory leaks in the functions like below - if the caller provides 'out', they can free it afterwards; but if not, the memory will leak.
enum cc_stat slist_iter_remove(SListIter *iter, void **out)
{
if (!iter->current)
return CC_ERR_VALUE_NOT_FOUND;
void *e = unlink(iter->list, iter->current, iter->prev);
iter->current = NULL;
iter->index--;
if (out)
*out = e;
return CC_OK;
}
It might be convinient for collections to have a fold/reduce function so that you may do something like this:
void fn(void *e1, void *e2, void *result)
{
int el1 = *((int*)e1);
int el2 = *((int*)e2);
*((int*) reslut) = el1 + el2;
}
...
// accumulate values of array into result
int reslut;
array_reduce(array, fn, &result);
The computation of the number of blocks to move in array_remove is wrong. Whenever the array is full, and memory is moved, there is an extra 8 bytes that are moved inside the array's buffer.
A pull request is incoming to fix that.
The code shouldn't assume that memory allocation never fails.
When the size of the array is 0, the variable j
will become very large.
When called, a segmentation fault will be triggered.
What could be the possible reason for HashTable iterator while
loop body executing even though hashtable_size
is reported as 0?
It must be something I'm doing wrong as a simplified test doesn't reproduce the issue and while
loop body is not executed on empty HashTable inside the test.
Code structure is roughly like this:
struct MyStruct {
...
HashTable *my_table;
};
extern struct MyStruct **my_struct;
void init_data() {
my_struct = (struct MyStruct **) shm_malloc(sizeof(struct MyStruct *));
(*my_struct) = (struct MyStruct *) shm_malloc(sizeof(struct MyStruct));
memset(*my_struct, 0, sizeof(struct MyStruct));
...
(*my_struct)->my_table = NULL;
HashTableConf htc;
hashtable_conf_init(&htc);
htc.mem_alloc = &shm_malloc_func;
htc.mem_calloc = &shm_calloc_func;
htc.mem_free = &shm_free_func;
if (hashtable_new_conf(&htc, &((*my_struct)->my_table)) != CC_OK) {
LM_ERR("Failed to initialise hashtable!\n");
}
}
And the part of the code that iterates over the hashtable:
// Code that accesses the hashtable
HashTableIter hti;
hashtable_iter_init(&hti, (*my_struct)->my_table);
printf("table size = %i\n", hashtable_size((*my_struct)->my_table));
printf("table capacity = %i\n", hashtable_capacity((*my_struct)->my_table));
TableEntry *entry;
while (hashtable_iter_next(&hti, &entry) != CC_ITER_END) {
// do something
}
As mentioned, table size is reported as 0 and capacity as 16.
Does anyone have any idea what might be going wrong?
Issue repro steps
Add 4 numbers 1,2,3,4 in a deque.
Apply deque_remove_first
on the deque.
Get the first number using deque_get_first
. (We get 2 as expected)
But if we apply deque_get_buffer
on the deque and get buff[0]
we get 1.
Root cause
enum cc_stat deque_remove_first(Deque *deque, void **out)
{
if (deque->size == 0)
return CC_ERR_OUT_OF_RANGE;
void *element = deque->buffer[deque->first];
deque->first = (deque->first + 1) & (deque->capacity - 1);
deque->size--;
if (out)
*out = element;
return CC_OK;
}
In the above code, we only move the pointer a step and we din't move the buffer data causing the confusion. deque_get_first
gets the value indexed at first
and not the buff[0]
.
Suggestion for fix
We shall return the buffer starting from first
here in the function deque_get_buffer
.
const void* const *deque_get_buffer(Deque const * const deque)
{
return (const void* const*) (deque->buffer + deque->first);
}
Another way is to memmove
in deque_remove_first
itself.
Let me know your thoughts on this @srdja !
Would you like to replace any double quotes by angle brackets around file names for include statements?
Collections should also have a *_contains
function that looks for values instead of pointers.
bool equals(void *e1, void *e2) {
int el1 = *((int*)e1);
itn el2 = *((int*)e2);
if (el1 == el2) return true;
return false;
}
...
int a = 10;
size_t count = array_contains(array, &a, equals);
I once worked with the bitset of c++ its quite handy sometimes when you do competitive coding and is also useful for the works regarding OS studies. So I think it should be implemented
The documentation gives some hints that the default hashtable
and hashset
implementation are assuming C strings as keys. However, the "official" documentation use some defines that have been removed in the meantime, i.e. in 9b4c3cb. As of now using the hashing data structures with anything but strings as keys is rather a PITA. There is also no test case that would show how much boilerplate code is required (in comparison to string keys).
I propose re-adding the lost comparison functions and defines and adding new explicit functions that initialize hashsets and hashtables for use with pointers, strings (and maybe even a generic version if it makes sense) respectively, e.g. hashtable_new_str
and hashtable_new_ptr
.
I think memory allocated to the buckets is unnecessarily large:
table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry));
should be changed to:
table->buckets = conf->mem_calloc(table->capacity, sizeof(TableEntry *));
The same thing goes for resize case also.
Hash tables are an excellent data structure for amortized constant time operations. But, calculating the hash is not actually O(1). It's more like O(h), where h is the length of the hashed key.
With a ternary search tree, it's possible to get similar results to those obtained from a Hash Table without the need of resizing the table, or having extra unused entries as with the hash table.
Like standard C includes does:
#ifdef __cplusplus
extern "C" {
#endif
Small typo at this line
NLLL should be NULL?
Some code from C source files should be wrapped by the setting 'extern "C"
' for C++ tools, shouldn't it?
I would like to see a list which is basicly a linked list, but you should be able to add callback functions for the following actions:
Similar to: ObservableCollections in C#
I would want to implement this.
What do you think? would you like to see it implemented in the default linked list or in a seperate structure?
hashtable.c line 185
enum cc_stat hashtable_add(HashTable *table, void *key, void *val)
{
...
return true;
...
}
I wonder which cc_stat value would be returned... (probably 1).
The function does not return the removed element though the description says it should:
/**
* Removes the last returned element by <code>treeset_iter_next()</code>
* function without invalidating the iterator and optionally sets the
* out parameter to the value of the removed element.
*
* @note This Function should only ever be called after a call to <code>
* treeset_iter_next()</code>.
*
* @param[in] iter The iterator on which this operation is performed
* @param[out] out Pointer to where the removed element is stored, or NULL
* if it is to be ignored
*
* @return CC_OK if the element was successfully removed, or
* CC_ERR_KEY_NOT_FOUND.
*/
void treeset_iter_remove(TreeSetIter *iter)
{
treetable_iter_remove(&(iter->i));
}
The same applies to
void treetable_iter_remove(TreeTableIter *iter)
{
remove_node(iter->table, iter->current);
}
What is the desired behavior?
Hello,
I think there is an issue with the ring buffer capacity. The buffer is allocated in the following way :
ringbuf->buf = rconf->mem_calloc(rconf->capacity, sizeof(Rbuf))
I am pretty sure it should be sizeof(uint64_t)
instead. Right now, for a ring buffer of capacity 10, insteda of allocating 80 bytes, it allocates 640 bytes !
It is also possible to call rbuf_peek(rbuf, 79)
on a ring buffer of capacity 10.
If I am right, I'll happily submit a PR to fix this
Does the library provide a vector data structure? By a vector, I mean a similar data structure to C++/STL vector, ie a right-extensible array containing elements placed in contiguous storage and you can access by integer indexing.
The library provide a deque data structure ; however elements (not their address) are not placed in a contiguous sequence of bytes.
EDIT The issue is solved since vectors are implemented in array.c.
EDIT 2
After checking more closely, it appears that an array objetct is not a true container: the array stores only a reference to your data element:
enum cc_stat array_add(Array *ar, void *element)
{
if (ar->size >= ar->capacity) {
enum cc_stat status = expand_capacity(ar);
if (status != CC_OK)
return status;
}
ar->buffer[ar->size] = element;
ar->size++;
return CC_OK;
}
And the hash table has the same problem. I see little use cases where this kind of implementation could be of interest.
Different statuses are returned in the same situation - when iterator reaches end of array:
* @return CC_OK if a next element pair is returned, or CC_ITER_END if the end of one
* of the arrays has been reached.
*/
enum cc_stat array_zip_iter_next(ArrayZipIter *iter, void **out1, void **out2)
* @return CC_OK if the element was successfully removed, or CC_ERR_OUT_OF_RANGE.
*/
enum cc_stat array_zip_iter_remove(ArrayZipIter *iter, void **out1, void **out2)
Which status should be used to be consistent? I'd suggest using CC_ITER_END whenever an iterator is involved.
@srdja I am curious if you would entertain the idea of using cmake instead of make/autotools.
The advantage would be a little bit more modern build system, integration with testing frameworks, possibly building .rpm/.deb easier.
Speaking of testing frameworks, was wondering why not use something like CppUTest instead of rolling your own.
I understand the portability aspect of autotools, but cmake is readily available on many platforms.
/**
* Returns the number of occurrences of the element within the specified array.
*
* @param[in] ar the array that is being searched
* @param[in] element the element that is being searched for
*
* @return the number of occurrences of the element
*/
size_t array_contains(Array *ar, void *element)
{
size_t o = 0;
size_t i;
for (i = 0; i < ar->size; i++) {
if (element == ar->buffer[i]) <------------ this line
o++;
}
return o;
}
in array.c you are comparing only the addresses not the actual data, you need a comparison function as argument to array_contains
@srdja
This fixes an out-of-bounds lookup, which results in a segfault when popping/heapifying:
diff --git a/src/pqueue.c b/src/pqueue.c
index 0293922..875d40c 100644
--- a/src/pqueue.c
+++ b/src/pqueue.c
@@ -306,13 +306,13 @@ static void pqueue_heapify(PQueue *pq, size_t index)
size_t R = CC_RIGHT(index);
size_t tmp = index;
+ if (L >= pq->size || R >= pq->size)
+ return;
+
void *left = pq->buffer[L];
void *right = pq->buffer[R];
void *indexPtr = pq->buffer[index];
- if (L >= pq->size || R >= pq->size)
- return;
-
if (pq->cmp(indexPtr, left) < 0) {
indexPtr = left;
index = L;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.