scriptfusion / porter Goto Github PK
View Code? Open in Web Editor NEW:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
License: GNU Lesser General Public License v3.0
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
License: GNU Lesser General Public License v3.0
If we allow CachingConnector to take a cache key parameter, then it can be used with existing or shared caches where the keys are not of the form CachingConnector::hash produces.
My use case for this is using existing ODM Mongo documents to cache values with the document ID being the cache key.
To maintain backward compatibility the parameter should be optional and in the event of null, CachingConnector should fallback to generating the cache key using CachingConnector::hash.
It was previously thought that directly integrating the async throttle with Porter was not needed because we can just throttle high level Porter import operations. However, this is false, for two reasons:
Each of these additional requests must be throttled independently to avoid triggering limits, whether a retry or the next resource in a sequence. For this to be possible, the throttle must be integrated into ImportConnector
so it can throttle transparently without burdening the developer with additional calls or configuration.
A default throttle should be provided for async imports but it should be possible to override with a custom configuration or implementation via AsyncImportSpecification
. Throttling will not be available for sync imports until such a time as the sync API converges with the async API internally.
The readme is written in a framework-agnostic way, as if one were to just use Porter in isolation, which is a good default tone to take since it makes no assumptions. However, a lot of people use Symfony and it would be useful to describe how a Porter integration with Symfony should look like for people looking to get started in a Symfony framework environment.
Any thoughts on adding some type of rate limiter functionality, as to not clobber the servers?
Type classes in the ScriptFUSION\Porter\Type
namespace are no longer used and should be migrated to the phptype organization.
The following characters are reserved for future extensions and MUST NOT be supported by implementing libraries: {}()/@:
How to enable CachingConnector in Laravel?
public function handle() {
app()->bind(HttpConnector::class, CachingConnector::class);
app()->bind(EuropeanCentralBankProvider::class, EuropeanCentralBankProvider::class);
$porter = new Porter(app() );
$specification = new ImportSpecification(new DailyForexRates() );
$specification->enableCache();
$rates = $porter->import($specification);
foreach ($rates as $rate) {
echo "$rate[currency]: $rate[rate]\n";
}
}
ScriptFUSION\Porter\Cache\CacheUnavailableException : Cannot cache: connector does not support caching.
Add examples either in the form of an FAQ or "cookbook" to demonstrate pattern solutions to common problems.
Scenarios:
It is possible to use Porter to import data we already have using static imports via StaticDataImportSpecification
. This brings with it the same post-import benefits as importing data over a network and is especially useful in testing.
Having to wrap a connector in CachingConnector
just to use caching is not as easy to use as if the cache just worked with any connector. Moreover, cache + connector is a violation of SRP. The cache should be refactored as a separate entity, apart from connectors.
Although we normally add a provider to the container by its class name and expect a single instance of each provider in the container, there are many valid use cases for adding the same provider multiple times. Document these use cases with examples and how-tos.
Often, we may operate multiple accounts with a given provider for various reasons. Examples:
The specification is cloned too late during import()
because members of the specifications are shared with other objects before cloning takes place thus creating shared mutable state. The specification must be cloned before any of its members are shared.
Explicitly document the public methods of Porter
, specifically import()
, importOne()
, the provider methods, including details about tagging, and all other public methods.
For example when server return 404 not found 412 ?
Travis occasionally fails to pass HttpConnectorTest
with an error similar to the following.
There was 1 error:
1) ScriptFUSIONTest\Functional\Porter\Net\Http\HttpConnectorTest::testConnectionToLocalWebserver
ScriptFUSION\Retry\FailingTooHardException: Operation failed after 5 attempt(s).
/home/travis/build/ScriptFUSION/Porter/vendor/scriptfusion/retry/src/retry.php:29
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:96
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:34
Caused by
ScriptFUSION\Porter\Net\Http\HttpConnectionException: file_get_contents(http://[::1]:12345/test?baz=qux): failed to open stream: Connection refused
/home/travis/build/ScriptFUSION/Porter/src/Net/Http/HttpConnector.php:65
/home/travis/build/ScriptFUSION/Porter/src/Connector/CachingConnector.php:62
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:110
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:86
/home/travis/build/ScriptFUSION/Porter/vendor/scriptfusion/retry/src/retry.php:26
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:96
/home/travis/build/ScriptFUSION/Porter/test/Functional/Porter/Net/Http/HttpConnectorTest.php:34
This never used to be a problem, and thanks to the five retries it should have plenty of time to spin up the server. However, this is also the first test in the suite so it may have something to do with PHPUnit start-up time. We should consider moving slower tests to the end of the suite, and if that doesn't work, we'll have to increase the retry delay coefficient.
Instead of requiring consumers to guess whether to use import()
or importOne()
, resources that emit only one record should implement a new SingleRecord
interface to clearly indicate that importOne()
should be used and which we can use to verify the correct method has been called.
This provides a clear mechanism for data publishers to express intent and makes sense, because resources always know if they export one or multiple records, so they should have a way to express this.
Since ImportSpecification
creates the ExponentialBackoffExceptionHandler
, the current retry delay is tied to the lifetime of the specification. That is, if an import fails five times and the same specification is used to import again, the next delay begins with the sixth attempt delay time instead of restarting from one.
Ideally the retry counter would restart at the beginning of a new import regardless of whether the specification is reused or not. However, this tends to be low impact bug because specifications are typically not reused. As a workaround, anyone encountering this issue can just create a new specification for each import instead of reusing specifications.
After rewriting a 4000 word manual for Porter v4 I didn't really feel like writing about FetchExceptionHandler
s. This feature will seldom be required, and for those whom do need it, if they can't figure it out for themselves, the docblocks in the file should probably suffice. Nevertheless, we should document the interface properly at some point.
It is currently planned to drop support for PHP 5 and target either 7.0 or 7.1 for Porter v5.
I just wanted to take Porter for a quick spin, created a new Symfony project and tried to require the Porter package, resulting in this error:
scriptfusion/porter 7.0.0 requires psr/cache ^1 -> found psr/cache[1.0.0, 1.0.1] but the package is fixed to 3.0.0 (lock file version)
Is an update feasible, best for psr/container
as well?
Porter's notion of records is arrays, which are very flexible to pass between interfaces, but once data leaves Porter it is common for applications to want to work with objects instead. The job of a hydrator is to use array data to populate object fields. We should investigate the value of designing a hydrator interface and whether there are any existing hydration libraries fit for purpose.
Durability is provided for the $provider->fetch
call, but Provider::fetch
is declared to return Iterator
, which is typically implemented using generators. Generators imply deferred code executions, which means that even if the generator throws an exception, it is not caught by the retry
handler because it already exited that code block.
This common case is not captured by PorterTest
because it only tests that Provider::fetch
throws an exception directly instead of the generator throwing an exception.
A recent high-concurrency import, that fails catastrophically when the target service is down, indicated with an integer overflow that somehow state is being shared across the default implementation of the recoverable exception handler.
A debugging session shows the handler is being cloned, and initialize()
is called at least once, but somehow the series of delays keeps growing beyond the default five retries.
In case it matters, the specific resource implementation calls fetchAsync()
80 times, but each call should still be independent as the ImportConnector
clones a new handler for each fetch*()
call.
The introduction of a developer mode would allow for an opinionated preset to be applied to Porter's features set, in contrast to its defaults, which subsequently enables/disables certain features or modifies default values to be more conducive to development work.
For example, developer mode may:
There's no point in implementing PSR-6 caching interfaces if the default caching implementation cannot be changed. However, due to some oversight, none of the first party connectors expose a method to change the cache implementation.
The only file not fully tested, and thus preventing 100% code coverage, is SoapConnector
. Its analogue, HttpConnector
, is tested by the functional test, HttpConnectorTest
, that spawns a temporary HTTP server using php -S
to test the connector. In a similar fashion I suggest spawning a temporary SOAP server to test SoapConnector
, however I do not know the best way to do this.
A question posted to StackOverflow asking how to write a minimum valid WSDL has received no answers.
Hi,
When running porter 3.* retry 1.1.2 will be installed because of the following composer requirement:
"scriptfusion/retry": "^1.1",
The retry lib works on 1.1.1 with porter, upgrading to 1.1.2 breaks stuff.
Specific lines in the retry lib that are triggered:
if ($result instanceof \Generator) {
throw new \UnexpectedValueException('Cannot retry a Generator. You probably meant something else.');
}
Porter causes this because a generator is returned in Porter.php line 98
function () use ($provider, $resource) {
if (($records = $provider->fetch($resource)) instanceof \Iterator) {
// Force generator to run until first yield to provoke an exception.
$records->valid();
}
return $records; <----- this breaks
},
This ticket is an open discussion about whether there is a good way to integrate formatters into the architecture. Data might flow through objects in the following order.
Connector
โ Formatter
โ ProviderResource
However, we need to understand what the interface for Formatter
must be and how it integrates into the rest of the system in a meaningful and reusable way.
Mapper is currently a required dependency, but users who do not use mappings do not need to install it at all. In order to make Mapper a suggested dependency care must be taken to ensure Porter works correctly when Mapper is unavailable, including tests to verify correct operation in this scenario.
Currently Porter believes resources should always want to return structured data as an array. However, there may be use-cases where structured data is either unavailable or undesirable. I am yet to encounter any compelling cases but am very interested to hear about any such cases.
If we open up the return type to be mixed
, this would allow resources to return objects, which would solve #12. Allowing objects can be convenient for object-oriented applications, but if resources return objects as the de-facto standard, this could be inefficient for applications that just want to work with raw data. However, mixed
would even permit resources to return different types depending on some configuration parameter.
Forcing the array return type is nice because it feeds into the transformers subsystem, giving transformers a consistent type to work with. However, I'm willing to forgo the entire transformers system in a future version, or change it to only be available when the return type is array
, or change it to work with any return type, as necessary. Ultimately, the consequences for the transformers system are not important because Porter's primary responsibility is fetching data reliably, not transforming it.
zend-uri
serves its purpose adequately but it's a heavy library bringing in additional dependencies of its own. We may wish to investigate simpler alternatives for URL parsing, such as purl.
Performing many sub-imports simultaneously is equivalent to queuing a series of I/O-bound operations whose total execution time is the sum of all imports' individual execution times. By running sub-requests concurrently in parallel asynchronously we reduce the total execution time to that of the the longest-running sub-import only. For highly concurrent sub-imports this is a significant time saving.
At first glance, one would think tying options to a connector would create concurrency issues where two requests could set different options on the connector at the same time. Due to cloning, this is not an issue, however the problems with ConnectorOptions
reach further than just potential concurrency issues. Since connectors may be decorated, finding the options you need to modify often means traversing the stack of connectors, but it's cumbersome and error prone to do this, by traversing the stack of connectors from ImportConnector
down.
We cannot simply remove connector options and let implementations do as they please because the cache needs knowledge of the particular options exported by the connector in order to determine whether two requests are identical and thus the cache may be reused.
We propose changing the signature of fetch(string)
to fetch(object)
, where object
is some implementation-defined object that encapsulates both the original source string
plus the connector options. In this way, everything needed to define the request is passed through all connectors in the stack and can be inspected or modified as needs be when it passes through. This also precludes the need to clone the connector (and its options), which makes implementations much easier and cleaner.
This change would be a BC break, and moreover, the signature is less convenient than simply passing a string, which can be sufficient for HTTP GET requests and some others. It is a consideration that we may support object|string
, however this does complicate the interface and make it more taxing to implement.
Rather than just fetch(object)
where object is literally typed to object
, which is unsupported in PHP 7.1 anyway, we should probably have a Source
interface that specifies toArray
and serializes all configurable options as an array, for use with caching.
A typical Porter factory might load many providers to support all use cases of an application, even though only a smaller subset may actually be used during one execution life-cycle. Therefore we would like a mechanism to lazy-load registered providers only when they are required.
One such mechanism may be a factory interface that looks similar to the following.
interface PorterProviderFactory
{
public function getProviderClassName() : string;
public function createProvider() : Provider;
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.