php-ds / ext-ds Goto Github PK
View Code? Open in Web Editor NEWAn extension providing efficient data structures for PHP 7
Home Page: https://medium.com/p/9dda7af674cd
License: MIT License
An extension providing efficient data structures for PHP 7
Home Page: https://medium.com/p/9dda7af674cd
License: MIT License
The main reason for this is to be able to indicate an invalid key. For example, Map::find(mixed $value): mixed $key
would be ambiguous if either false
or null
indicates that a key could not be found. It doesn't make enough sense to have a null key, so we should just disallow it entirely and throw an exception when a null key is used.
Maybe also test these? Someone might be relying on grep. -__-
New (and awsome) data structures but still php, this should work 😋
<?php
class Foo {
public $bar;
public function __construct($bar) {
$this->bar = $bar;
}
}
$set = new \Ds\Set;
for ($i = 0; $i<=9; $i++) {
$set[] = new Foo($i);
}
foreach ($set as $key => $value) {
echo $value->bar;
}
echo PHP_EOL;
foreach ($set as $key => $value) {
echo $value->bar;
$set->remove($value);
}
echo PHP_EOL;
var_dump($set);
0123456789
01234
object(Ds\Set)#1 (5) {
[0]=>
object(class@anonymous)#7 (1) {
["bar"]=>
int(5)
}
[1]=>
object(class@anonymous)#8 (1) {
["bar"]=>
int(6)
}
[2]=>
object(class@anonymous)#9 (1) {
["bar"]=>
int(7)
}
[3]=>
object(class@anonymous)#10 (1) {
["bar"]=>
int(8)
}
[4]=>
object(class@anonymous)#11 (1) {
["bar"]=>
int(9)
}
}
Register package on packagist for installation via composer.
It can be used for highlights in IDE in dev environment.
It's currently possible to modify / update a collection during iteration. This leads to undefined behaviour, because the internal iterator is not aware of these modifications. There are three possible solutions here:
Could also just raise a warning instead of throwing an exception.
@krakjoe, curious to hear your thoughts. 💡
We currently allow allocation via allocate(int $capacity)
, but we don't check for overflow or have a maximum capacity. @nikic also mentioned that we should be using the safe_*
variants when multiplying sizeof
by a variable capacity.
Would be nice to show how some common tasks can be achieved with appropriate structures.
Iteration by reference is not supported, so this should take care of the following use case:
foreach ($collection as &$value) {
// Do something with $value
}
$collection->apply(function($value) {
// Return what value should become
});
@krakjoe pointed out that the tests are failing on Windows x64:
Fatal error: Return value of Ds\Deque::filter() must implement interface Ds\Sequence, null returned in Unknown on line 0
Warning: Uncaught Exception in C:\php-sdk\src\ext\ds\vendor\php-ds\tests\tests\Sequence\filter.php:58
There's a try/catch which should be catching the exception in the closure, and ignore the fact that the function returned null
.
Test case: testFilterCallbackThrowsException
public function testFilterCallbackThrowsException()
{
$instance = $this->getInstance([1, 2, 3]);
$filtered = null;
try {
$filtered = $instance->filter(function($value) {
throw new \Exception();
});
} catch (\Exception $e) {
$this->assertToArray([1, 2, 3], $instance);
$this->assertNull($filtered);
return;
}
$this->fail('Exception should have been caught');
}
Internal callback filter code: deque_filter_callback
void deque_filter_callback(Deque *deque, zval *obj, FCI_PARAMS)
{
if (DEQUE_IS_EMPTY(deque)) {
ZVAL_NEW_DEQUE(obj);
return;
} else {
zval param;
zval retval;
zval *src;
zval *ptr = ALLOC_ZVAL_BUFFER(deque->capacity);
zval *pos = ptr;
DEQUE_FOREACH(deque, src) {
ZVAL_COPY_VALUE(¶m, src);
fci.param_count = 1;
fci.params = ¶m;
fci.retval = &retval;
// Catch potential exceptions or other errors during comparison.
if (zend_call_function(&fci, &fci_cache) == FAILURE) {
efree(ptr);
ZVAL_UNDEF(obj);
return;
}
// Only push if the value is not falsey.
if (zend_is_true(&retval)) {
ZVAL_COPY(pos++, src);
}
}
DEQUE_FOREACH_END();
ZVAL_DEQUE(obj, deque_from_buffer(ptr, pos - ptr));
}
}
Whilst trying to calculate a set of mutations the xor operation hangs
I expect $removalChangeSet to contain only ['member']
I'm using DS 1.0.4 extension on OSX
require 'vendor/autoload.php';
$roles = new \Ds\Set(['guest', 'member']);
$requestedRemovals = new \Ds\Set(['member', 'nothing']);
$keepSet = $roles->diff($requestedRemovals);
$removalChangeSet = $roles->xor($keepSet);
var_dump($removalChangeSet->toArray());
What's the best way to do this?
I'm not sure what it would look like, but a PriorityQueue that could be reused, without cloning, may be useful.
Primary use case that I can think of would be priority based event aggregators. Cloning works, but has a significant cost for doing so.
each(callable): bool
stops on false, returns if finished
some([callable]): bool
whether any values pass the test or are true if no test
every([callable]): bool
whether all values pass the test or are true if no test
shuffle(): void
shuffles the set
partition([callable]): void
values that pass to the front, others to the back.
pluck(mixed key): Sequence
access each by key, creates a new set.
groupBy(mixed key): Map
groups values by given key.
swap(int, int)
isset(index)
unset(index)
These have been implemented for Sequence, but not the others.
Python example:
a = dict()
a['a'] = 3
a['b'] = 2
a['c'] = 1
print a
{'a': 3, 'c': 1, 'b': 2}
I had this implemented very very early when php-ds was still private, where the structures would echo something similar to JSON. Values that could not be converted to strings had default fallbacks like object(stdClass)#1
and resource(type)#1
.
I imagine the above equivalent would echo something like:
{'a': 3, 'b': 2, 'c': 1}
@krakjoe we can discuss things here. Current plan:
#include "[src/public/] ds/name.h"
#include "[src/private/] ds/common.h"
#include "[src/private/] ds/php_name.h"
#include "[src/private/] ds/ce/php_name.h"
#include "[src/private/] ds/handlers/php_name.h"
#include "[src/private/] ds/iterators/php_name.h"
ds_name_t
for the data structuresphp_ds_name_t
for the objectsInternal:
Z_{NAME}
Z_{NAME}_P
External:
ZVAL_{NAME}
RETURN_{NAME}
&ptr[index]
" over "ptr + index
".DS_
where appropriate.php_ds_
ds_
See #8
Currently the only documentation is the PHP include files (for IDE autocompletion etc). The plan is to create a complete * DocBook* as part of an official release.
See:
There is currently two ways to construct a structure:
$map = new \Ds\Map();
Advantages:
$map = ds::map();
Advantages:
use ds;
.(new Map())->reverse()
.Would it be better, for the sake of consistency, to keep to either one or the other? And if so, which?
I'm trying to separate the "internal" structures from all the zend stuff (zend_object, handlers, ce, etc), which has been smooth sailing so far. I came across a potential situation where this might not work as well as it sounds:
The case is with $vector->merge($iterable)
. We can take internal shortcuts when the $iterable
is a php-ds structure, eg. another php_ds_vector_t
. In order to access the internal buffer of that vector we have to follow this path: zval => zend_object => structure, ie. ((php_ds_vector_t*)(Z_OBJ(z)))->vector
. The problem is that the current scope doesn't have knowledge of php_ds_vector_t
, only ds_vector_t
.
The only solution I can think of is to create separate functions, one for each shortcut, and only accept the structures, rather than a generic zval, except for the case where the Z_OBJ_CE is not a php-ds structure (array or other iterable). This would work fine, effectively moving the logic of "is this a vector? is this a deque?" out of the internal structure and into the callee's scope.
So instead of calling ds_vector_t *ds_vector_merge(ds_vector_t *vector, zval *values)
I could use something like ds_vector_t *ds_vector_merge_vector(ds_vector_t *vector, ds_vector_t *values)
.
Is it possible to pass a void *
along with a typedef
to a function? That way the callee is responsible for extracting the structure, but the structure is responsible for the logic to merge it in. Would remove the need to have multiple x_merge_x
, x_merge_y
, x_merge_z
patterns.
This applies to all current and future variations of find
.
The best way I can think of doing this is to keep track of the number of active iterators on the structure, and check if this is > 0 when modifying the structure (during push, remove etc). This will incur a small amount of overhead.
The other question is what to do when that number is > 0?
If i run command
php -d extension=ds.so some_file.php
Then Ds extension works ok. But when i want add this line to php.ini:
extension=ds.so
Then php throw error:
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php/20151012/ds.so'
- /usr/lib/php/20151012/ds.so: undefined symbol: php_json_serializable_ce in Unknown on line 0
My PHP version:
PHP 7.0.3 on Ubuntu 14.04
The differences between Deque
and Vector
might be considered small enough to consider removing Vector
and Sequence
, and rename Deque
to Sequence.
Advantages of Deque
:
Advantages of Vector
:
Sequence
is also the only interface below Collection
.
The important question is: is it clear enough when you would use one over the other?
each(callable): bool
stops on false, returns if finishedany([callable]): bool
whether any values pass the test or are true if no testshuffle(): void
shuffles the mappartition([callable])
values that pass to the front, others to the back.pluck(mixed, mixed): Sequence|Map
access each by key, creates a new sequence.only(keys): Map
only keep certain keysgroupBy(mixed key): Map
groups values by given key.swap
update(key, callable)
equals
should receive the key as is, don't enforce type of class.This reduces the number of allocations and indirection.
Instead of
typedef struct _php_ds_map_t {
zend_object std;
ds_map_t *map;
} php_ds_map_t;
use
typedef struct _php_ds_map_t {
zend_object std;
ds_map_t map;
} php_ds_map_t;
It appears that allocated space is not being freed resulting in large memory usage. I believe entire object graphs are staying around when stored in sets.
I've tested these classes that all seem to have the problem:
Notes: Neither calling clear()
on the set nor the use of gc_collect_cycles()
frees any memory.
Example reproduction:
$original = memory_get_usage(true);
for ($i = 0; $i < 10000000; $i++) {
new \Ds\Set(["A", "B"]);
}
echo memory_get_usage(true) - $original, " additional bytes allocated\n";
Output:
79691776 additional bytes allocated
Environment:
Edit: 10x iteration count + output changes to showcase larger memory footprint
This would be an immutable, array access zval buffer which would serve as the groups in operations such as groupBy
. Instead of a Map<mixed, Sequence>
it would be Map<mixed, Tuple>
. It would be very easy to convert those tuples into sequences if you need the extra mutability and functionality.
$map->groupBy('key')->apply('sequence'); // assuming sequence is a function
$map->groupBy('key')->apply(function($key, $value) {
return new Sequence($value);
});
API I'm thinking of is basically:
__construct(iterable|array)
get(int)
toArray()
count()
clear()
??isEmpty()
??This is so that it's possible to call the array access functions directly, even though no one should ever need to do that.
Potentially an optionally weighted directed acyclic graph, implemented using a map of vectors internally for the adjacency list.
each(callable): bool
stops on false, returns if finished
some([callable]): bool
whether any values pass the test or are true if no test
shuffle(): void
shuffles the sequence
partition([callable]): void
values that pass to the front, others to the back.
pluck(mixed key): Sequence
access each by key, creates a new sequence.
groupBy(mixed key): Map
groups values by given key.
splice(int, int, iterable): Sequence
implementation of array_splice
swap(int, int): void
Swaps two values by index
All structures share the same global FCI, so if while sorting something you try to sort something else, things will break. The best way to handle this is to save the current FCI, use it, then restore it.
ext\ds\src\ds\ds_htable.c(830): warning C4018: '<=': signed/unsigned mismatch
ext\ds\src\ds\ds_htable.c(842): warning C4018: '<': signed/unsigned mismatch
ext\ds\src\ds\ds_htable.c(866): warning C4018: '<': signed/unsigned mismatch
Hello,
This is AWESOME! I have few questions:
What is scary about PHP extensions that you can have a project which requires upgrade of PHP and some of the extensions are not easy to recompile for upgraded version.
Here is the example: https://github.com/AOP-PHP/AOP this was AWESOME extension, pure awesomeness... However, every library that depends on this extension is scr***ed, unusable.
There are awesome extensions out there for PHP, like this one - but do have in mind that we are scared that things can go wrong as for AOP extension. Pure PHP implementation would ensure us that there is fail safe strategy.
Thanks for your time. Hope you find these suggestions and questions useful.
I don't see a use for knowing the current capacity of a structure.
I've run into a case where I have sets that contain hashable objects which themselves contain sets and hashable objects.1 The equals function on \Ds\Hashable
necessitates adding an equals()
method. To do this I ensure this I xor the two sets and check that the resulting set has no elements.2
However, it appears that this freezes when:
===
I've got a test case here. The only reason I didn't PR this one is it is seemingly fragile. I tried porting it over and it didn't fail. Not sure why that is.
class PauseTest extends PHPUnit_Framework_TestCase
{
/** @test */
public function doesnt_freeze_when_adding_items_to_a_set()
{
$a = new Container([new Box(0)]);
$b = new Container([new Box(0)]);
new Ds\Set([$a, $b]);
}
}
final class Container implements Ds\Hashable
{
public $items;
public function __construct($items) { $this->items = new Ds\Set($items); }
public function hash() { return 0; }
public function equals($obj) : bool { return $this->items->xor($obj->items)->count() === 0; }
}
class Box implements Ds\Hashable
{
private $id;
public function __construct($id) { $this->id = $id; }
public function equals($obj) : bool { return $this === $obj; }
public function hash() { return $this->id; }
}
Currently using iterators, could iterate on the internal structures directly.
We have to include the PHP json extension header in order to have Collection
extend JsonSerializable
. Should we include this header conditionally or specify json as a module dependency? The only reason to extend JsonSerializable
is so that we can support json_encode
, but a user who doesn't have the json extension enabled wouldn't be using json_encode
anyway.
Wondering what sort of support (both paid and unpaid) you see yourself doing for this repository, unless PHP accepts your extension as-is (or almost as-is).
The reason is because anyone grabbing this today may be taking a risk in doing so, because the PHP implementation does not perform as well as PHP's objects or arrays so it cannot act as an exact replacement, and it may be possible this extension stops working with any major release of PHP in the future like 7.1.
Like any library, there is a risk of deprecation in a large code base and normally such a thing does not get fixed for a while as it 'still works'. The problem is that with PHP extensions the code can simply stop working when a PHP upgrade is nearly required (when PHP 7.0.x gets deprecated) if the API becomes incompatible.
vovagp@debian:/tmp/ds$ php -v
PHP 7.0.10-1~dotdeb+8.1 (cli) ( NTS )
Copyright (c) 1997-2016 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies
with Zend OPcache v7.0.10-1~dotdeb+8.1, Copyright (c) 1999-2016, by Zend Technologies
vovagp@debian:/tmp/ds$ uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64 GNU/Linux
vovagp@debian:/tmp/ds$ cat test.php
<?php
function test() {
$m = new \Ds\Map();
for($i=0;$i<10;$i++) {
$k = "k$i";
$m->put($k, $i);
$m->get($k);
$m->remove($k);
}
}
test();
vovagp@debian:/tmp/ds$ php -r 'echo phpversion("ds");'
1.1.5
(gdb) run /tmp/ds/test.php
Starting program: /usr/bin/php /tmp/ds/test.php
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
ds_htable_lookup_bucket_by_hash (table=table@entry=0x7ffff5e62840, key=key@entry=0x7ffff5e13240, hash=hash@entry=5863493)
at /tmp/ds/1.1.5/extension-1.1.5/src/ds/ds_htable.c:349
349 if (DS_HTABLE_BUCKET_HASH(bucket) == hash && ds_htable_bucket_key_match(bucket, key)) {
(gdb) bt
#0 ds_htable_lookup_bucket_by_hash (table=table@entry=0x7ffff5e62840, key=key@entry=0x7ffff5e13240, hash=hash@entry=5863493)
at /tmp/ds/1.1.5/extension-1.1.5/src/ds/ds_htable.c:349
#1 0x00007fffe55b79df in ds_htable_lookup_or_next (table=0x7ffff5e62840, key=0x7ffff5e13240, bucket=bucket@entry=0x7fffffffa648)
at /tmp/ds/1.1.5/extension-1.1.5/src/ds/ds_htable.c:660
#2 0x00007fffe55b7a92 in ds_htable_put (table=<optimized out>, key=<optimized out>, value=0x7ffff5e13250)
at /tmp/ds/1.1.5/extension-1.1.5/src/ds/ds_htable.c:678
#3 0x00007fffe55b9a78 in ds_map_put (map=<optimized out>, key=<optimized out>, value=<optimized out>)
at /tmp/ds/1.1.5/extension-1.1.5/src/ds/ds_map.c:53
#4 0x00007fffe55c3d92 in zim_Map_put (execute_data=0x7ffff5e131e0, return_value=<optimized out>)
at /tmp/ds/1.1.5/extension-1.1.5/src/php/classes/php_map_ce.c:51
#5 0x00005555557a6a1a in dtrace_execute_internal ()
#6 0x000055555583b910 in ?? ()
#7 0x00005555557f6cab in execute_ex ()
#8 0x00005555557a68a8 in dtrace_execute_ex ()
#9 0x000055555583ba4d in ?? ()
#10 0x00005555557f6cab in execute_ex ()
#11 0x00005555557a68a8 in dtrace_execute_ex ()
#12 0x000055555584b597 in zend_execute ()
#13 0x00005555557b6d33 in zend_execute_scripts ()
#14 0x00005555557575d0 in php_execute_script ()
#15 0x000055555584d24a in ?? ()
#16 0x000055555563cfbd in main ()
(gdb)
Hi,
Is there a compeling reason to make all classes final? It makes it hard to do this;
class SomeSet extends Set
{
...
}
function someFunction(SomeSet $set) {
...
}
I know I can do composition over inheritance to create VO that use ds internally
class SomeSet
{
private $set;
function __construct($values)
{
$this->set = new Set($values);
}
... more methods exposing $this->set.
}
I've looked into class_alias but that only works with user provided classes
array_reverse
returns a reversed copy, but you can't reverse an array in place (without doing it manually). Something as common as reversing a sequence would benefit from the option of not creating a copy of the sequence.
Would be a heap using an optional comparator function (max heap by default). PriorityQueue
already uses a heap internally so we could extend that and create a dedicated implementation, potentially replacing PriorityQueue entirely.
See #4
Map
Sequence
Set
intersection
to intersect
difference
to diff
exclusive
to xor
union
to merge
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.