therealpadster / diffbot-api-node Goto Github PK
View Code? Open in Web Editor NEWDiffbot-API-Node is a Promise-based library to use the Diffbot REST APIs
License: MIT License
Diffbot-API-Node is a Promise-based library to use the Diffbot REST APIs
License: MIT License
Certain Diffbot endpoints support plain text post:
Possibly add support for this in the future.
Hi! ๐
Firstly, thanks for your work on this project! ๐
Today I used patch-package to patch [email protected]
for the project I'm working on.
Paging can be passed as query param to both the analyze & article endpoints, but this client doesn't expose it as an option for analyze.
Here is the diff that solved my problem:
diff --git a/node_modules/diffbot-api-node/src/diffbot.js b/node_modules/diffbot-api-node/src/diffbot.js
index ca19e37..b1bbd80 100644
--- a/node_modules/diffbot-api-node/src/diffbot.js
+++ b/node_modules/diffbot-api-node/src/diffbot.js
@@ -19,6 +19,7 @@ class Diffbot {
* @param {string} [options.mode] By default the Analyze API will fully extract all pages that match an existing Automatic API -- articles, products or image pages. Set mode to a specific page-type (e.g., mode=article) to extract content only from that specific page-type. All other pages will simply return the default Analyze fields.
* @param {string} [options.fallback] Force any non-extracted pages (those with a type of "other") through a specific API. For example, to route all "other" pages through the Article API, pass &fallback=article. Pages that utilize this functionality will return a fallbackType field at the top-level of the response and a originalType field within each extracted object, both of which will indicate the fallback API used.
* @param {string[]} [options.fields] Specify optional fields to be returned from any fully-extracted pages, e.g.: &fields=querystring,links. See available fields within each API's individual documentation pages.
+ * @param {boolean} [options.paging] Pass paging=false to disable automatic concatenation of multiple-page articles. (By default, Diffbot will concatenate up to 20 pages of a single article.)
* @param {boolean} [options.discussion] Pass discussion=false to disable automatic extraction of comments or reviews from pages identified as articles or products. This will not affect pages identified as discussions.
* @param {number} [options.timeout] Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).
* @param {string} [options.callback] Use for jsonp requests. Needed for cross-domain ajax.
@@ -43,6 +44,9 @@ class Diffbot {
if (options.fields)
diffbot_url += `&fields=${options.fields.join(',')}`;
+ if (options.paging != undefined)
+ diffbot_url += `&paging=${options.paging}`;
+
if (options.discussion != undefined)
diffbot_url += `&discussion=${options.discussion}`;
This issue body was partially generated by patch-package.
Looks like Diffbot has a new Event API for scheduled events. Maybe add support for that.
Certain Diffbot endpoints support html post:
Possibly add support for this in the future.
There are some new ones since I last updated them. proxy
and proxyAuth
seem to be supported on everything, for example.
Not a huge priority since it's currently in beta, but I should be able to add what's currently there.
There are a lot of Crawlbot API params. Add support for them.
The docs pages I was using, at diffbot.com/docs seem to be outdated from the new ones on docs.diffbot.com. Video doesn't say beta, there's a new Events (beta) API, and a new accounts API.
I might want to check if anything in the bigger API has changed, and remove the (beta) label in my readme.
It also might be a good idea to see if any of the other APIs are different in any way.
When I added Event API support, I noticed the docs mentioned custom JS. It looks like the Video API also supports it, and likely others. Look into this and integrate better.
https://docs.diffbot.com/docs/en/api-video#custom-http-headers-and-javascript
Have a section for each module that outlines what params are supported. They're mostly the same as the ones from the official Diffbot docs, but some differ, like boolean fields, and not all fields are supported for each module.
Accounts API for retrieving Diffbot account info
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.