Simple client for the Diffbot API for Haskell.
The easiest way to install the package and its dependencies is to use
the cabal
command line tool. The
Cabal-Install page
explains how to use cabal
.
To install the package enter the following commands:
$ git clone https://github.com/tymmym/diffbot.git
$ cd diffbot
diffbot $ cabal install
You can also generate library documentation from annotated source code using Haddock:
diffbot $ cabal haddock
Alternatively you can read it online.
Diffbot uses computer vision, natural language processing and machine learning to automatically recognize and structure specific page-types.
To use the Automatic API, call diffbot
function with following
arguments:
Argument | Description |
---|---|
token | Developer token |
url | URL to process |
request | API settings |
Here is the full example with default request to the Article API:
import Diffbot
main = do
let token = "11111111111111111111111111111111"
url = "http://blog.diffbot.com/diffbots-new-product-api-teaches-robots-to-shop-online/"
resp <- diffbot token url defArticle
print resp
This code will print information about the primary article content on the submitted page:
Just fromList [("author",String "John Davi"),("title",String "Diffbot\8217s New Product API Teaches Robots to Shop Online"),...
You can extract values from response with a parser using parse
,
parseEither
or, in this example, parseMaybe
from
aeson package:
getInfo :: Object -> Maybe String
getInfo resp = flip parseMaybe resp $ \obj -> do
author <- obj .: "author"
title <- obj .: "title"
return $ title ++ ", by " ++ author
You can use the same diffbot
function to send requests to other
Automatic APIs (Frontpage, Product, Image and Page Classifier), e.g.:
diffbot token url . setTimeout 15000 $ defFrontPage { frontPageAll = True }
You can also simply create requests to your Custom
API. Just implement an instance
for the Request
class. Look at the
Article API sources
for the reference.
Crawlbot allows you to apply either Automatic APIs or your own Custom API to intelligently extract an entire site.
To create a new crawl you should use crawlbot
function:
import Diffbot
import Diffbot.Crawlbot
main = do
let token = "11111111111111111111111111111111"
crawl = defaultCrawl "sampleDiffbotCrawl" ["http://blog.diffbot.com"]
resp <- crawlbot token $ Create crawl
print resp
You also can view, pause, restart or delete crawls.
Please consult library documentation for additional information.
-Initial commit by Tim Tych-