Coder Social home page Coder Social logo

apify-client-js's Introduction

Apify API client for JavaScript

apify-client is the official library to access Apify API from your JavaScript applications. It runs both in Node.js and browser and provides useful features like automatic retries and convenience functions that improve the experience of using the Apify API.

Quick Start

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({
    token: 'MY-APIFY-TOKEN',
});

// Starts an actor and waits for it to finish.
const { defaultDatasetId } = await client.actor('john-doe/my-cool-actor').call();
// Fetches results from the actor's dataset.
const { items } = await client.dataset(defaultDatasetId).listItems();

Features

Besides greatly simplifying the process of querying the Apify API, the client provides other useful features.

Automatic parsing and error handling

Based on the endpoint, the client automatically extracts the relevant data and returns it in the expected format. Date strings are automatically converted to Date objects. For exceptions, we throw an ApifyApiError, which wraps the plain JSON errors returned by API and enriches them with other context for easier debugging.

Retries with exponential backoff

Network communication sometimes fails, that's a given. The client will automatically retry requests that failed due to a network error, an internal error of the Apify API (HTTP 500+) or rate limit error (HTTP 429). By default, it will retry up to 8 times. First retry will be attempted after ~500ms, second after ~1000ms and so on. You can configure those parameters using the maxRetries and minDelayBetweenRetriesMillis options of the ApifyClient constructor.

Convenience functions and options

Some actions can't be performed by the API itself, such as indefinite waiting for an actor run to finish (because of network timeouts). The client provides convenient call() and waitForFinish() functions that do that. Key-value store records can be retrieved as objects, buffers or streams via the respective options, dataset items can be fetched as individual objects or serialized data and we plan to add better stream support and async iterators.

Usage concepts

The ApifyClient interface follows a generic pattern that is applicable to all of its components. By calling individual methods of ApifyClient, specific clients which target individual API resources are created. There are two types of those clients. A client for management of a single resource and a client for a collection of resources.

const { ApifyClient } = require('apify-client');
const apifyClient = new ApifyClient({ token: 'my-token' });

// Collection clients do not require a parameter.
const actorCollectionClient = apifyClient.actors();
// Creates an actor with the name: my-actor.
const myActor = await actorCollectionClient.create({ name: 'my-actor' });
// Lists all of your actors.
const { items } = await actorCollectionClient.list();
// Collection clients do not require a parameter.
const datasetCollectionClient = apifyClient.datasets();
// Gets (or creates, if it doesn't exist) a dataset with the name of my-dataset.
const myDataset = await datasetCollectionClient.getOrCreate('my-dataset');
// Resource clients accept an ID of the resource.
const actorClient = apifyClient.actor('john-doe/my-actor');
// Fetches the john-doe/my-actor object from the API.
const myActor = await actorClient.get();
// Starts the run of john-doe/my-actor and returns the Run object.
const myActorRun = await actorClient.start();
// Resource clients accept an ID of the resource.
const datasetClient = apifyClient.dataset('john-doe/my-dataset');
// Appends items to the end of john-doe/my-dataset.
await datasetClient.pushItems([{ foo: 1 }, { bar: 2 }]);

The ID of the resource can be either the id of the said resource, or a combination of your username/resource-name.

This is really all you need to remember, because all resource clients follow the pattern you see above.

Nested clients

Sometimes clients return other clients. That's to simplify working with nested collections, such as runs of a given actor.

const actorClient = apifyClient.actor('john-doe/hello-world');
const runsClient = actorClient.runs();
// Lists the last 10 runs of the john-doe/hello-world actor.
const { items } = await runsClient.list({ limit: 10, desc: true })

// Selects the last run of the john-doe/hello-world actor that finished
// with a SUCCEEDED status.
const lastSucceededRunClient = actorClient.lastRun({ status: 'SUCCEEDED' });
// Fetches items from the run's dataset.
const { items } = await lastSucceededRunClient.dataset().listItems();

The quick access to dataset and other storages directly from the run client can now only be used with the lastRun() method, but the feature will be available to all runs in the future.

Pagination

Most methods named list or listSomething return a Promise.<PaginationList>. There are some exceptions though, like listKeys or listHead which paginate differently. The results you're looking for are always stored under items and you can use the limit property to get only a subset of results. Other props are also available, depending on the method.

API Reference

All public classes, methods and their parameters can be inspected in the API reference.

apify-client-js's People

Contributors

0xjgv avatar andreybykov avatar b4nan avatar barjin avatar drobnikj avatar fnesveda avatar foxt451 avatar gippy avatar github-actions[bot] avatar honzakirchner avatar honzaturon avatar jancurn avatar jbartadev avatar jirimoravcik avatar jkuzz avatar lubos-turek avatar m-murasovs avatar metalwarrior665 avatar mnmkng avatar monkey-denky avatar mtrunkat avatar mvolfik avatar novotnyj avatar omikader avatar petrpatek avatar renovate[bot] avatar tobice avatar valekjo avatar vladfrangu avatar webrdaniel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apify-client-js's Issues

Rate limit errors should be logged nicer way

Currently, users see this in the log:

2019-11-15T10:10:44.402Z WARNING: Retry failed 5 times and will be repeated in 20089ms {"originalError":"API request failed on retry number 6",
 "errorDetails":{"url":"http://172.31.56.216:8010/v2/request-queues/YYY/requests",
"method":"POST",
"qs":{"forefront":false,"token":"*********","clientKey":"XXX"},"hasBody":true,"statusCode":429,"iteration":6}}

Which looks a bit scary, for a simple API rate limiting. We should show this error in some nicer way.

Actor run: Empty body cause infinite runs

Tested code:

const ApifyClient = require('apify-client');

const apifyClient = new ApifyClient({
    token: '<token>',
});

(async () => {
    const runOpts = {
        actId: 'drobnikj/rakuten-com',
        waitForFinish: 2,
        body: Buffer.from('', 'utf8'),
        contentType: 'application/json'
    };
    console.log(runOpts)
    const run = await apifyClient.acts.runAct(runOpts);
    console.log(run)
})();

The actor run looks failed and it tries to retry but it causes infinite actor runs because the run is successfully called.

The output from logs looks like:

{"level":"WARNING","msg":"Retry failed 4 times and will be repeated in 10629ms","originalError":"API request failed on retry number 5","errorDetails":{"url":"https://api.apify.com/v2/acts/drobnikj~rakuten-com/runs","method":"POST","qs":{"waitForFinish":2,"token":"<token>"},"hasBody":true,"error":"Error: Argument error, options.body.","iteration":5}}

Improve error logging

For example, the rate limit excess error is displayed like this, which doesn't tell the user anything. We should show the message from API server at least, and show full query string instead of qs=[object Object]

2020-02-08T20:00:08.348Z ERROR: The function passed to Apify.main() threw an exception: (error details: type=request-failed, url=http://172.31.57.222:8010/v2/request-queues/XYZ/requests/XYZ, method=GET, qs=[object Object], hasBody=false, error=undefined, statusCode=429, iteration=9)
2020-02-08T20:00:08.350Z   ApifyClientError: API request failed on retry number 9
2020-02-08T20:00:08.351Z     at makeRequest (/usr/src/app/node_modules/apify-client/build/utils.js:187:31)
2020-02-08T20:00:08.352Z     at runMicrotasks (<anonymous>)
2020-02-08T20:00:08.353Z     at processTicksAndRejections (internal/process/task_queues.js:94:5)
2020-02-08T20:00:08.354Z     at async exports.retryWithExpBackoff (/usr/src/app/node_modules/apify-shared/exponential_backoff.js:44:20)
2020-02-08T20:00:08.355Z     at async RequestQueue.getRequest (/usr/src/app/node_modules/apify/build/request_queue.js:331:17)
2020-02-08T20:00:08.356Z     at async RequestQueue.fetchNextRequest (/usr/src/app/node_modules/apify/build/request_queue.js:376:17)
2020-02-08T20:00:08.358Z     at async Promise.all (index 0)
2020-02-08T20:00:08.359Z     at async BasicCrawler._runTaskFunction (/usr/src/app/node_modules/apify/build/crawlers/basic_crawler.js:385:28)
2020-02-08T20:00:08.366Z     at async AutoscaledPool._maybeRunTask (/usr/src/app/node_modules/apify/build/autoscaling/autoscaled_pool.js:463:7)
2020-02-08T20:00:08.591Z npm ERR! code ELIFECYCLE
2020-02-08T20:00:08.592Z npm ERR! errno 91
2020-02-08T20:00:08.594Z npm ERR! [email protected] start: `node src/main.js`
2020-02-08T20:00:08.597Z npm ERR! Exit status 91
2020-02-08T20:00:08.598Z npm ERR! 
2020-02-08T20:00:08.599Z npm ERR! Failed at the [email protected] start script.
2020-02-08T20:00:08.600Z npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
2020-02-08T20:00:08.601Z 
2020-02-08T20:00:08.603Z npm ERR! A complete log of this run can be found in:
2020-02-08T20:00:08.604Z npm ERR!     /root/.npm/_logs/2020-02-08T20_00_08_591Z-debug.log

Add Storage size to dataset API

The web UI on the dataset page provides the "Storage size" but this does not seem to be exposed in the public API.

Please add. Thanks

Improve the documentation

The documentation is in terrible state, it needs to be greatly improved.

Descriptions of functions and params can be taken straight from API docs, plus from
each function, there should be a link to the corresponding API endpoint. Once we have jsdoc for all functions, we need to write a better intro.

Convert dates from string

Dates in JSON are encoded as strings. It would be great if the client converted theses strings to Date objects automatically. We can create a generic transformation function which will look for all properties ending with "At" (e.g. "createdAt", "modifiedAt"), and convert the string value (which must be in ISO format) to Date.

Binary data returned as String from keyValueStores.getRecord()

When I save binary data (a PNG image) using keyValueStores.putRecord() and then read it back using keyValueStores.getRecord(), the data is returned as String rather than Buffer.

I'm using the following code in my act:

await Apify.setValue('OUTPUT', buffer, { contentType: 'image/png' });
const val = await Apify.getValue('OUTPUT'); // returns string!

IMHO getRecord() should only convert to string if the data has the text/* content type, convert to JS object for the application/json content type, and for all other content types it should return raw Buffer. And if you specify useRawBody: true option, then it should return Buffer for everything.

Intellisense does not work

Hey Guys,
I have tried v0.6 and v0.10 both, but I dont get the intellisense working for the library. I am using VS Code v1.47.3.

Any idea, on how can I get it working.

The client property on Apify SDK also points to apify-client library, and hence even that is not working.

Code coverage generation not working

$ npm run test-cov
...
No coverage information was collected, exit without writing coverage information

It's has not worked since the beginning, so it's not a bug we introduced.

Honestly, it's kind of weird:

the test-cov script is npm run build && babel-node node_modules/isparta/bin/isparta cover --report html --report text node_modules/.bin/_mocha, which does not work.

However, if i switch node_modules/.bin/_mocha for my globally installed _mocha, it works perfectly. The version is the same, so I reckon it has something to do with relative paths.

Support parsing of more content-types to string

Apify SDK now uses a full list of content-type headers when running locally. We should review the list and add more stringifiable types to the parseBody() function. Supposedly, all text/* types should be stringifiable, but we should double check first.

Cover all crawler endpoints

Client still does not cover these endpoints:

  1. Create crawler
  2. Get crawler settings
  3. Update crawler settings
  4. Delete crawler
  5. Stop execution
  6. Get list of executions
  7. Get last execution
  8. Get last execution results

+ proper unit tests

Convert count, offset, limit, total attributes in PaginationList to Numbers

When I call e.g:
const results = await Apify.client.crawlers.getListOfExecutions({ crawlerId, limit, offset, desc: 1 });
It is returns PaginationList with attributes count, offset, limit, total in Strings not in Numbers:

{
  items: [],
  count: "0",
  offset: "0",
  limit: "1000",
  total: "0"
}

Maybe we can parse it to Numbers.

datasets.getItems() doesn't use exponential backoff on invalid JSON

IMHO this call should be retried too, similarly as the 500 errors from server.

2018-12-20T01:15:10.899Z SyntaxError: Unexpected end of JSON input
2018-12-20T01:15:10.901Z     at JSON.parse (<anonymous>)
2018-12-20T01:15:10.902Z     at exports.parseBody (/usr/src/app/node_modules/apify-client/build/utils.js:247:25)
2018-12-20T01:15:10.903Z     at parseResponse (/usr/src/app/node_modules/apify-client/build/datasets.js:322:79)
2018-12-20T01:15:10.905Z     at <anonymous>
2018-12-20T01:15:10.907Z     at process._tickCallback (internal/process/next_tick.js:189:7)

getRecord() should auto-parse JSON

getRecord() should automatically parse body if it has JSON content type. The resulting values should look like:

{
  body: Object, // Parsed JSON object, or null (for unsupported content type or )
  rawBody: String, // or Buffer
  contentType: String
}

Use type-is library and */json matcher for content type.

Typescript definitions

I'm struggling to use this library in Typescript, a .d.ts file would be brilliant (but sorry, I don't have the chops to do it myself).

Documentation is missing definition of returned objects

The generated documentation shows correctly return types of each function, but the return types are not defined anywhere so they do not show up in the documentation and thanks to it it's impossible to know what each function returns.

Escape/validate URL parameters

For example request to kv-store with invalid record key - /v2/key-value-stores/FqL6uryckkkdaXEmr/records/• causes an error TypeError: Request path contains unescaped characters.

Exponential backoff

It would be nice if the client API supported an exponential backoff algorithm with a random wait, to automatically handle rate limiting of the server.

Improve npm publish workflow

  • npm publish should only be allowed when all changes are committed and pushed, to ensure tags in git correspond to published versions
  • when running npm publish from non-master branch, the package should be published with "dev" tag, in order to enable testing of development versions
  • please do the same for apifier-sdk-js package

Improve stack traces of errors

The stack traces currently don't help with the debugging, because the package doesn't use await/async but Promise chains. For example, the error below doesn't say where in code the problem occurred. I think we can safely move to await/async, or at least have two builds - one for browsers and one for Node.js.

2019-02-15T15:46:18.433Z ERROR: PhantomCrawler: Unhandled exception (error details: type=invalid-value, statusCode=400)
2019-02-15T15:46:18.435Z   ApifyClientError: Invalid value provided: id is not allowed by the schema
2019-02-15T15:46:18.437Z     at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-15T15:46:18.439Z     at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-15T15:46:18.440Z     at process._tickCallback (internal/process/next_tick.js:68:7)

CORS error in FF using apify-client

Hey guys, I’m getting a cors error in FF and IE, but not Chrome. How do I start to resolve this? Thanks.

I’m using the apify-client

passing user-id and passing dataset-id
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://api.apify.com/v2/datasets/[task id]/items. (Reason: missing token ‘user-agent’ in CORS header ‘Access-Control-Allow-Headers’ from CORS preflight channel).

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://api.apify.com/v2/datasets/[task id]/items. (Reason: CORS request did not succeed).

Doesn't correctly catch empty task Id for task webhooks

If taskId is '', it throw wrong error

[ApifyClientError] ApifyClientError: We have bad news: there is no API endpoint at this URL. Did you specify it correctly? (statusCode=404, url="https://api.apify.com/v2/actor-tasks//webhooks", method="GET")

There is missing parameter check for empty task.

crawlers.getExecutionResults() doesn't return Buffer in items field

crawlers.getExecutionResults() doesn't return Buffer, but string which you can not use for example for excel attachment.

Example

const ApifyClient = require('apify-client');

// Configuration
const apifyClient = new ApifyClient();

const executionId = 'JdisLYATZZ6puqJXq';

(async () => {
    const getResultsOpts = { executionId, attachment: 1, format: 'xlsx' };
    const results = await apifyClient.crawlers.getExecutionResults(getResultsOpts);
    console.log(typeof results.items) // -> string 
    console.log(Buffer.isBuffer(results.items)) // -> false
})();

Better logging of errors on missed exponential backoff

For example, on RequestQueue throttling errors we get in log the following message, which doesn't actually tell anything about what happened. These errors are incorrectly reported by server as 502, which will be fixed in upcoming deploy. But still, we should show more info.

2018-08-11T01:55:13.304Z ApifyError: Server request failed with 9 tries.
2018-08-11T01:55:13.306Z     at Request._request2.default.(anonymous function) [as _callback] (/home/myuser/node_modules/apify-client/build/utils.js:142:35)
2018-08-11T01:55:13.308Z     at Request.self.callback (/home/myuser/node_modules/request/request.js:185:22)
2018-08-11T01:55:13.311Z     at emitTwo (events.js:126:13)
2018-08-11T01:55:13.313Z     at Request.emit (events.js:214:7)
2018-08-11T01:55:13.316Z     at Request.<anonymous> (/home/myuser/node_modules/request/request.js:1157:10)
2018-08-11T01:55:13.318Z     at emitOne (events.js:116:13)
2018-08-11T01:55:13.320Z     at Request.emit (events.js:211:7)
2018-08-11T01:55:13.322Z     at IncomingMessage.<anonymous> (/home/myuser/node_modules/request/request.js:1079:12)
2018-08-11T01:55:13.324Z     at Object.onceWrapper (events.js:313:30)
2018-08-11T01:55:13.326Z     at emitNone (events.js:111:20)
2018-08-11T01:55:13.328Z     at IncomingMessage.emit (events.js:208:7)
2018-08-11T01:55:13.330Z     at endReadableNT (_stream_readable.js:1064:12)
2018-08-11T01:55:13.333Z     at _combinedTickCallback (internal/process/next_tick.js:138:11)
2018-08-11T01:55:13.339Z     at process._tickCallback (internal/process/next_tick.js:180:9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.