Coder Social home page Coder Social logo

graby's Introduction


Graby logo



Join the chat at https://gitter.im/j0k3r/graby CI Coverage Status Total Downloads License

Graby helps you extract article content from web pages

Why this fork ?

Full-Text RSS works great as a standalone application. But when you need to encapsulate it in your own library it's a mess. You need this kind of ugly thing:

$article = 'http://www.bbc.com/news/entertainment-arts-32547474';
$request = 'http://example.org/full-text-rss/makefulltextfeed.php?format=json&url='.urlencode($article);
$result  = @file_get_contents($request);

Also, if you want to understand how things work internally, it's really hard to read and understand. And finally, there are no tests at all.

That's why I made this fork:

  1. Easiest way to integrate it (using composer)
  2. Fully tested
  3. (hopefully) better to understand
  4. A bit more decoupled

How to use it

Note These instructions are for development version of Graby, which has an API incompatible with the stable version. Please check out the README in the 2.x branch for usage instructions for the stable version.

Requirements

  • PHP >= 7.4
  • Tidy & cURL extensions enabled

Installation

Add the lib using Composer:

composer require 'j0k3r/graby dev-master' php-http/guzzle7-adapter

Why php-http/guzzle7-adapter? Because Graby is decoupled from any HTTP client implementation, thanks to HTTPlug (see that list of client implementation).

Graby is tested & should work great with:

  • Guzzle 7 (using php-http/guzzle7-adapter)
  • Guzzle 5 (using php-http/guzzle5-adapter)
  • cURL (using php-http/curl-client)

Note: if you want to use Guzzle 6, use Graby 2 (support has dropped in v3 because of dependencies conflicts, which does not happen with Guzzle 5 🤷)

Retrieve content from an url

Use the class to retrieve content:

use Graby\Graby;

$article = 'http://www.bbc.com/news/entertainment-arts-32547474';

$graby = new Graby();
$result = $graby->fetchContent($article);

var_dump($result->getResponse()->getStatus()); // 200
var_dump($result->getHtml()); // "[Fetched and readable content…]"
var_dump($result->getTitle()); // "Ben E King: R&B legend dies at 76"
var_dump($result->getLanguage()); // "en-GB"
var_dump($result->getDate()); // "2015-05-01T16:24:37+01:00"
var_dump($result->getAuthors()); // ["BBC News"]
var_dump((string) $result->getResponse()->getEffectiveUri()); // "http://www.bbc.com/news/entertainment-arts-32547474"
var_dump($result->getImage()); // "https://ichef-1.bbci.co.uk/news/720/media/images/82709000/jpg/_82709878_146366806.jpg"
var_dump($result->getSummary()); // "Ben E King received an award from the Songwriters Hall of Fame in …"
var_dump($result->getIsNativeAd()); // false
var_dump($result->getResponse()->getHeaders()); /*
[
  'server' => ['Apache'],
  'content-type' => ['text/html; charset=utf-8'],
  'x-news-data-centre' => ['cwwtf'],
  'content-language' => ['en'],
  'x-pal-host' => ['pal074.back.live.cwwtf.local:80'],
  'x-news-cache-id' => ['13648'],
  'content-length' => ['157341'],
  'date' => ['Sat, 29 Apr 2017 07:35:39 GMT'],
  'connection' => ['keep-alive'],
  'cache-control' => ['private, max-age=60, stale-while-revalidate'],
  'x-cache-action' => ['MISS'],
  'x-cache-age' => ['0'],
  'x-lb-nocache' => ['true'],
  'vary' => ['X-CDN,X-BBC-Edge-Cache,Accept-Encoding'],
]
*/

In case of error when fetching the url, graby won't throw an exception but will return information about the error (at least the status code):

var_dump($result->getResponse()->getStatus()); // 200
var_dump($result->getHtml()); // "[unable to retrieve full-text content]"
var_dump($result->getTitle()); // "BBC - 404: Not Found"
var_dump($result->getLanguage()); // "en-GB"
var_dump($result->getDate()); // null
var_dump($result->getAuthors()); // []
var_dump((string) $result->getResponse()->getEffectiveUri()); // "http://www.bbc.co.uk/404"
var_dump($result->getImage()); // null
var_dump($result->getSummary()); // "[unable to retrieve full-text content]"
var_dump($result->getIsNativeAd()); // false
var_dump($result->getResponse()->getHeaders()); // […]

The date result is the same as displayed in the content. If date is not null in the result, we recommend you to parse it using date_parse (this is what we are using to validate that the date is correct).

Retrieve content from a prefetched page

If you want to extract content from a page you fetched outside of Graby, you can call setContentAsPrefetched() before calling fetchContent(), e.g.:

use Graby\Graby;

$article = 'http://www.bbc.com/news/entertainment-arts-32547474';

$input = '<html>[...]</html>';

$graby = new Graby();
$graby->setContentAsPrefetched($input);
$result = $graby->fetchContent($article);

Cleanup content

Since the 1.9.0 version, you can also send html content to be cleanup in the same way graby clean content retrieved from an url. The url is still needed to convert links to absolute, etc.

use Graby\Graby;

$article = 'http://www.bbc.com/news/entertainment-arts-32547474';
// use your own way to retrieve html or to provide html
$html = ...

$graby = new Graby();
$result = $graby->cleanupHtml($html, $article);

Use custom handler & formatter to see output log

You can use them to display graby output log to the end user. It's aim to be used in a Symfony project using Monolog.

Define the graby handler service (somewhere in a service.yml):

services:
    # ...
    graby.log_handler:
        class: Graby\Monolog\Handler\GrabyHandler

Then define the Monolog handler in your app/config/config.yml:

monolog:
    handlers:
        graby:
            type: service
            id: graby.log_handler
            # use "debug" to got a lot of data (like HTML at each step) otherwise "info" is fine
            level: debug
            channels: ['graby']

You can then retrieve logs from graby in your controller using:

$logs = $this->get('monolog.handler.graby')->getRecords();

Timeout configuration

If you need to define a timeout, you must create the Http\Client\HttpClient manually, configure it and inject it to Graby\Graby.

  • For Guzzle 5:

    use Graby\Graby;
    use GuzzleHttp\Client as GuzzleClient;
    use Http\Adapter\Guzzle5\Client as GuzzleAdapter;
    $guzzle = new GuzzleClient([
        'defaults' => [
            'timeout' => 2,
        ]
    ]);
    $graby = new Graby([], new GuzzleAdapter($guzzle));
  • For Guzzle 7:

    use Graby\Graby;
    use GuzzleHttp\Client as GuzzleClient;
    use Http\Adapter\Guzzle7\Client as GuzzleAdapter;
    
    $guzzle = new GuzzleClient([
        'timeout' => 2,
    ]);
    $graby = new Graby([], new GuzzleAdapter($guzzle));

Full configuration

This is the full documented configuration and also the default one.

$graby = new Graby([
    // Enable or disable debugging.
    // This will only generate log information in a file (log/graby.log)
    'debug' => false,
    // use 'debug' value if you want more data (HTML at each step for example) to be dumped in a different file (log/html.log)
    'log_level' => 'info',
    // If enabled relative URLs found in the extracted content are automatically rewritten as absolute URLs.
    'rewrite_relative_urls' => true,
    // If enabled, we will try to follow single page links (e.g. print view) on multi-page articles.
    // Currently this only happens for sites where single_page_link has been defined
    // in a site config file.
    'singlepage' => true,
    // If enabled, we will try to follow next page links on multi-page articles.
    // Currently this only happens for sites where next_page_link has been defined
    // in a site config file.
    'multipage' => true,
    // Error message when content extraction fails
    'error_message' => '[unable to retrieve full-text content]',
    // Default title when we won't be able to extract a title
    'error_message_title' => 'No title found',
    // List of URLs (or parts of a URL) which will be accept.
    // If the list is empty, all URLs (except those specified in the blocked list below)
    // will be permitted.
    // Example: array('example.com', 'anothersite.org');
    'allowed_urls' => [],
    // List of URLs (or parts of a URL) which will be not accept.
    // Note: this list is ignored if allowed_urls is not empty
    'blocked_urls' => [],
    // If enabled, we'll pass retrieved HTML content through htmLawed with
    // safe flag on and style attributes denied, see
    // http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm#s3.6
    // Note: if enabled this will also remove certain elements you may want to preserve, such as iframes.
    'xss_filter' => true,
    // Here you can define different actions based on the Content-Type header returned by server.
    // MIME type as key, action as value.
    // Valid actions:
    // * 'exclude' - exclude this item from the result
    // * 'link' - create HTML link to the item
    'content_type_exc' => [
       'application/zip' => ['action' => 'link', 'name' => 'ZIP'],
       'application/pdf' => ['action' => 'link', 'name' => 'PDF'],
       'image' => ['action' => 'link', 'name' => 'Image'],
       'audio' => ['action' => 'link', 'name' => 'Audio'],
       'video' => ['action' => 'link', 'name' => 'Video'],
       'text/plain' => ['action' => 'link', 'name' => 'Plain text'],
    ],
    // How we handle link in content
    // Valid values :
    // * preserve: nothing is done
    // * footnotes: convert links as footnotes
    // * remove: remove all links
    'content_links' => 'preserve',
    'http_client' => [
        // User-Agent used to fetch content
        'ua_browser' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2',
        // default referer when fetching content
        'default_referer' => 'http://www.google.co.uk/url?sa=t&source=web&cd=1',
        // Currently allows simple string replace of URLs.
        // Useful for rewriting certain URLs to point to a single page or HTML view.
        // Although using the single_page_link site config instruction is the preferred way to do this, sometimes, as
        // with Google Docs URLs, it's not possible.
        'rewrite_url' => [
            'docs.google.com' => ['/Doc?' => '/View?'],
            'tnr.com' => ['tnr.com/article/' => 'tnr.com/print/article/'],
            '.m.wikipedia.org' => ['.m.wikipedia.org' => '.wikipedia.org'],
            'm.vanityfair.com' => ['m.vanityfair.com' => 'www.vanityfair.com'],
        ],
        // Prevent certain file/mime types
        // HTTP responses which match these content types will
        // be returned without body.
        'header_only_types' => [
           'image',
           'audio',
           'video',
        ],
        // URLs ending with one of these extensions will
        // prompt Humble HTTP Agent to send a HEAD request first
        // to see if returned content type matches $headerOnlyTypes.
        'header_only_clues' => ['mp3', 'zip', 'exe', 'gif', 'gzip', 'gz', 'jpeg', 'jpg', 'mpg', 'mpeg', 'png', 'ppt', 'mov'],
        // User Agent strings - mapping domain names
        'user_agents' => [],
        // AJAX triggers to search for.
        // for AJAX sites, e.g. Blogger with its dynamic views templates.
        'ajax_triggers' => [
            "<meta name='fragment' content='!'",
            '<meta name="fragment" content="!"',
            "<meta content='!' name='fragment'",
            '<meta content="!" name="fragment"',
        ],
        // number of redirection allowed until we assume request won't be complete
        'max_redirect' => 10,
    ],
    'extractor' => [
        'default_parser' => 'libxml',
        // key is fingerprint (fragment to find in HTML)
        // value is host name to use for site config lookup if fingerprint matches
        // \s* match anything INCLUDING new lines
        'fingerprints' => [
            '/\<meta\s*content=([\'"])blogger([\'"])\s*name=([\'"])generator([\'"])/i' => 'fingerprint.blogspot.com',
            '/\<meta\s*name=([\'"])generator([\'"])\s*content=([\'"])Blogger([\'"])/i' => 'fingerprint.blogspot.com',
            '/\<meta\s*name=([\'"])generator([\'"])\s*content=([\'"])WordPress/i' => 'fingerprint.wordpress.com',
        ],
        'config_builder' => [
            // Directory path to the site config folder WITHOUT trailing slash
            'site_config' => [],
            'hostname_regex' => '/^(([a-zA-Z0-9-]*[a-zA-Z0-9])\.)*([A-Za-z0-9-]*[A-Za-z0-9])$/',
        ],
        'readability' => [
            // filters might be like array('regex' => 'replace with')
            // for example, to remove script content: array('!<script[^>]*>(.*?)</script>!is' => '')
            'pre_filters' => [],
            'post_filters' => [],
        ],
        'src_lazy_load_attributes' => [
            'data-src',
            'data-lazy-src',
            'data-original',
            'data-sources',
            'data-hi-res-src',
        ],
        // these JSON-LD types will be ignored
        'json_ld_ignore_types' => ['Organization', 'WebSite', 'Person', 'VideoGame'],
    ],
]);

Credits

graby's People

Contributors

aaa2000 avatar caneco avatar girishpanchal30 avatar gitter-badger avatar holgerausb avatar j0k3r avatar jtojnar avatar kdecherf avatar nicosomb avatar phiamo avatar shtrom avatar simounet avatar tacman avatar tcitworld avatar techexo avatar vendin avatar zyuhel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graby's Issues

Need for a tool to quickly test site configs

Hello,

I do not know where to put that, because it concerns graby, graby-site-config and wallabag.
I was wondering if there was a way to have a small "standalone" version of graby that would read the config files without caching anything and return the content.

Basically, I am trying to help fivefilters (and thus graby-site-config) writing new config files, but doing it with wallabag running on a not-so-powerful server is really painful. Each time I make a change in the configuration files, I have to clear wallabag's cache, which is quite long (between 1-2 minutes on a Cubietruck!) ; delete the article and submit it again to wallabag. The whole process can take a few minutes, even when the issue was just a missed comma :( .

Unfortunately, I am hopeless coding anything in PHP... The ideal would be a php file without cache reading on-the-fly just one config file (or a specified config file) and that, given an URL, would display the content without any stylesheet (thus showing very quickly what are titles, paragraphs and so on).

Thanks in advance, and do not hesitate to ask further details if needed.

Regards

Attempted to call function "curl_init" from namespace "Graby\Ring\Client".

From @metasystem on November 10, 2015 16:3

Hi,
Just install wallabag and have issue on debian jessie

INFO - Matched route "new_entry".
DEBUG - Read existing security token from the session.
DEBUG - SELECT t0.username AS username_1, t0.username_canonical AS username_canonical_2, t0.email AS email_3, t0.email_canonical AS email_canonical_4, t0.enabled AS enabled_5, t0.salt AS salt_6, t0.password AS password_7, t0.last_login AS last_login_8, t0.locked AS locked_9, t0.expired AS expired_10, t0.expires_at AS expires_at_11, t0.confirmation_token AS confirmation_token_12, t0.password_requested_at AS password_requested_at_13, t0.roles AS roles_14, t0.credentials_expired AS credentials_expired_15, t0.credentials_expire_at AS credentials_expire_at_16, t0.id AS id_17, t0.name AS name_18, t0.created_at AS created_at_19, t0.updated_at AS updated_at_20, t0.authCode AS authCode_21, t0.twoFactorAuthentication AS twoFactorAuthentication_22, t0.trusted AS trusted_23, t24.id AS id_25, t24.theme AS theme_26, t24.items_per_page AS items_per_page_27, t24.language AS language_28, t24.rss_token AS rss_token_29, t24.rss_limit AS rss_limit_30, t24.user_id AS user_id_31 FROM wallabaguser t0 LEFT JOIN wallabagconfig t24 ON t24.user_id = t0.id WHERE t0.id = ?
DEBUG - User was reloaded from a user provider.
DEBUG - Notified event "kernel.request" to listener "Nelmio\CorsBundle\EventListener\CorsListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\DebugHandlersListener::configure".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\ProfilerListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\DumpListener::configure".
DEBUG - Notified event "kernel.request" to listener "Symfony\Bundle\FrameworkBundle\EventListener\SessionListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\FragmentListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\RouterListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Wallabag\CoreBundle\EventListener\LocaleListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\LocaleListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "FOS\RestBundle\EventListener\BodyListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\TranslatorListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\Security\Http\Firewall::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Bundle\AsseticBundle\EventListener\RequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Nelmio\ApiDocBundle\EventListener\RequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Liip\ThemeBundle\EventListener\ThemeRequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Scheb\TwoFactorBundle\Security\TwoFactor\EventListener\RequestListener::onCoreRequest".
DEBUG - Notified event "kernel.controller" to listener "FOS\RestBundle\EventListener\ParamFetcherListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Symfony\Bundle\FrameworkBundle\DataCollector\RouterDataCollector::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Symfony\Component\HttpKernel\DataCollector\RequestDataCollector::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\ControllerListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\ParamConverterListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\HttpCacheListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\SecurityListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "FOS\RestBundle\EventListener\ViewResponseListener::onKernelController".
DEBUG - Graby is ready to fetch
DEBUG - Fetching url: {url}
DEBUG - Trying using method "{method}" on url "{url}"
CRITICAL - Fatal Error: Call to undefined function Graby\Ring\Client\curl_init()
CRITICAL - Uncaught PHP Exception Symfony\Component\Debug\Exception\UndefinedFunctionException: "Attempted to call function "curl_init" from namespace "Graby\Ring\Client"." at /root/wallabag/vendor/j0k3r/graby/src/Ring/Client/SafeCurlHandler.php line 49
DEBUG - Notified event "kernel.request" to listener "Nelmio\CorsBundle\EventListener\CorsListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\DebugHandlersListener::configure".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\ProfilerListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\DumpListener::configure".
DEBUG - Notified event "kernel.request" to listener "Symfony\Bundle\FrameworkBundle\EventListener\SessionListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\FragmentListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\RouterListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Wallabag\CoreBundle\EventListener\LocaleListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\LocaleListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "FOS\RestBundle\EventListener\BodyListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\HttpKernel\EventListener\TranslatorListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Component\Security\Http\Firewall::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Symfony\Bundle\AsseticBundle\EventListener\RequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Nelmio\ApiDocBundle\EventListener\RequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Liip\ThemeBundle\EventListener\ThemeRequestListener::onKernelRequest".
DEBUG - Notified event "kernel.request" to listener "Scheb\TwoFactorBundle\Security\TwoFactor\EventListener\RequestListener::onCoreRequest".
DEBUG - Notified event "kernel.controller" to listener "FOS\RestBundle\EventListener\ParamFetcherListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Symfony\Bundle\FrameworkBundle\DataCollector\RouterDataCollector::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Symfony\Component\HttpKernel\DataCollector\RequestDataCollector::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\ControllerListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\ParamConverterListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\HttpCacheListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "Sensio\Bundle\FrameworkExtraBundle\EventListener\SecurityListener::onKernelController".
DEBUG - Notified event "kernel.controller" to listener "FOS\RestBundle\EventListener\ViewResponseListener::onKernelController".
INFO - Defining the initRuntime() method in the "form" extension is deprecated. Use the needs_environment option to get the Twig_Environment instance in filters, functions, or tests; or explicitly implement Twig_Extension_InitRuntimeInterface if needed (not recommended).
INFO - Defining the getGlobals() method in the "assetic" extension is deprecated without explicitly >implementing Twig_Extension_GlobalsInterface.

Installed with pdo_mysql

Copied from original issue: wallabag/wallabag#1511

Can't extract content from edition.cnn.com article. Meta refresh tags was not replaced

grabby.php:

<?php
use Graby\Graby;

require(__DIR__ . '/vendor/autoload.php');
require(__DIR__ . '/src/Graby.php');

$url = 'http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html';
$graby = new Graby(['debug' => true]);
$result = $graby->fetchContent($url);
print_r($result);

Command

php ./grabby.php

returns

Array
(
    [status] => 310
    [html] => [unable to retrieve full-text content]
    [title] => No title found
    [language] =>
    [date] =>
    [authors] => Array
        (
        )

    [url] => http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html
    [content_type] =>
    [open_graph] => Array
        (
        )

    [native_ad] =>
    [all_headers] => Array
        (
        )

    [summary] => [unable to retrieve full-text content]
)

graby.log

[2018-02-25 12:51:35] graby.DEBUG: Graby is ready to fetch [] []
[2018-02-25 12:51:35] graby.DEBUG: . looking for site config for {host} in primary folder {"host":"edition.cnn.com"} []
[2018-02-25 12:51:35] graby.DEBUG: ... found site config {host} {"host":"edition.cnn.com.txt"} []
[2018-02-25 12:51:35] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-25 12:51:35] graby.DEBUG: . looking for site config for {host} in primary folder {"host":"global"} []
[2018-02-25 12:51:35] graby.DEBUG: ... found site config {host} {"host":"global.txt"} []
[2018-02-25 12:51:35] graby.DEBUG: Cached site config with key: {key} {"key":"edition.cnn.com"} []
[2018-02-25 12:51:35] graby.DEBUG: . looking for site config for {host} in primary folder {"host":"global"} []
[2018-02-25 12:51:35] graby.DEBUG: ... found site config {host} {"host":"global.txt"} []
[2018-02-25 12:51:35] graby.DEBUG: Appending site config settings from global.txt [] []
[2018-02-25 12:51:35] graby.DEBUG: Cached site config with key: {key} {"key":"global"} []
[2018-02-25 12:51:35] graby.DEBUG: Cached site config with key: {key} {"key":"edition.cnn.com.merged"} []
[2018-02-25 12:51:35] graby.DEBUG: Fetching url: {url} {"url":"http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html"} []
[2018-02-25 12:51:35] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html"} []
[2018-02-25 12:51:35] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html"} []
[2018-02-25 12:51:35] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html"} []
[2018-02-25 12:51:36] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.67.1/static/unsupp.html [] []
[2018-02-25 12:51:36] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:36] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:36] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:37] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/ [] []
[2018-02-25 12:51:37] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:37] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:37] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:37] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.67.1/static/unsupp.html [] []
[2018-02-25 12:51:37] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:37] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:37] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:38] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/ [] []
[2018-02-25 12:51:38] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:38] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:38] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:38] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.67.1/static/unsupp.html [] []
[2018-02-25 12:51:38] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:38] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:38] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:39] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/ [] []
[2018-02-25 12:51:39] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:39] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:39] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:39] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.67.1/static/unsupp.html [] []
[2018-02-25 12:51:39] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:39] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:39] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:40] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/ [] []
[2018-02-25 12:51:40] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:40] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:40] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:40] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/2.67.1/static/unsupp.html [] []
[2018-02-25 12:51:40] graby.DEBUG: Trying using method "{method}" on url "{url}" {"method":"get","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:40] graby.DEBUG: Use default user-agent "{user-agent}" for url "{url}" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:40] graby.DEBUG: Use default referer "{referer}" for url "{url}" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://edition.cnn.com/2.67.1/static/unsupp.html"} []
[2018-02-25 12:51:41] graby.DEBUG: Meta refresh redirect found (http-equiv="refresh"), new URL: https://edition.cnn.com/ [] []
[2018-02-25 12:51:41] graby.DEBUG: Endless redirect: 11 on "{url}" {"url":"https://edition.cnn.com/"} []
[2018-02-25 12:51:41] graby.DEBUG: Opengraph data: {ogData} {"ogData":[]} []
[2018-02-25 12:51:41] graby.DEBUG: Looking for site config files to see if single page link exists [] []
[2018-02-25 12:51:41] graby.DEBUG: Returning cached and merged site config for {host} {"host":"edition.cnn.com"} []
[2018-02-25 12:51:41] graby.DEBUG: No "single_page_link" config found [] []
[2018-02-25 12:51:41] graby.DEBUG: Attempting to extract content [] []
[2018-02-25 12:51:41] graby.DEBUG: Returning cached and merged site config for {host} {"host":"edition.cnn.com"} []
[2018-02-25 12:51:41] graby.DEBUG: Strings replaced: {count} (find_string and/or replace_string) {"count":0} []
[2018-02-25 12:51:41] graby.DEBUG: Attempting to parse HTML with {parser} {"parser":"libxml"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for title {"pattern":"//meta[@property=\"og:title\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for date {"pattern":"//meta[@property=\"article:published_time\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for language {"pattern":"//html[@lang]/@lang"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for language {"pattern":"//meta[@name=\"DC.language\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {string} to strip element {"string":"highlights"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for body (content length: {content_length}) {"pattern":"//section[contains(@class, 'body-text')]","content_length":232} []
[2018-02-25 12:51:41] graby.DEBUG: Using Readability [] []
[2018-02-25 12:51:41] graby.DEBUG: Detected title: {title} {"title":""} []
[2018-02-25 12:51:41] graby.DEBUG: Trying again without tidy [] []
[2018-02-25 12:51:41] graby.DEBUG: Strings replaced: {count} (find_string and/or replace_string) {"count":0} []
[2018-02-25 12:51:41] graby.DEBUG: Attempting to parse HTML with {parser} {"parser":"libxml"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for title {"pattern":"//meta[@property=\"og:title\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for date {"pattern":"//meta[@property=\"article:published_time\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for language {"pattern":"//html[@lang]/@lang"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for language {"pattern":"//meta[@name=\"DC.language\"]/@content"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {string} to strip element {"string":"highlights"} []
[2018-02-25 12:51:41] graby.DEBUG: Trying {pattern} for body (content length: {content_length}) {"pattern":"//section[contains(@class, 'body-text')]","content_length":154} []
[2018-02-25 12:51:41] graby.DEBUG: Using Readability [] []
[2018-02-25 12:51:41] graby.DEBUG: Detected title: {title} {"title":""} []
[2018-02-25 12:51:41] graby.DEBUG: Success ? {is_success} {"is_success":false} []
[2018-02-25 12:51:41] graby.DEBUG: Extract failed [] []

edition.cnn.com.txt:

body: //section[contains(@class, 'body-text')]

strip_id_or_class: highlights

# Avoid redirecting to 'unsupported browser' page
find_string: <meta http-equiv="refresh"
replace_string: <meta norefresh

test_url: http://edition.cnn.com/2012/05/13/us/new-york-police-policy/index.html
test_contains: this discriminatory and ineffective practice

test_url: http://rss.cnn.com/rss/edition.rss
test_url: http://rss.cnn.com/rss/edition_technology.rss

the other websites I checked works correct
it seems like issue caused by non-cutting IF IE tags from the page and non-replacing <meta refresh tags

Uncaught PHP Exception Exception: Url is not valid

Hi,

in wallabag v2.0.0. i see the following error on importing a json file:

[2016-04-06 23:52:33] request.CRITICAL: Uncaught PHP Exception Exception: "Url "http://www.pro-linux.de/news/1/23430/linus-torvalds-über-das-internet-der-dinge.html" is not valid." at wallabag/vendor/j0k3r/graby/src/Graby.php line 388 {"exception":"[object] (Exception(code: 0): Url \"http://www.pro-linux.de/news/1/23430/linus-torvalds-über-das-internet-der-dinge.html\" is not valid. at wallabag/vendor/j0k3r/graby/src/Graby.php:388)"} []

I have no idea why this url is not valid. Maybe you have?

edit: maybe because of the umlauts "ü"?

Is there a way to strip all inline styles?

Hi,

I'm trying to figure out if there is a way to strip all inline styles? I'm debating whether I should fork and add functionality here or post process on the HTML after running through graby.

Joe

linuxjournal.com multi-page fetches only first page

I am posting this issue here, because i think it is a bug in graby. The site config appears to be valid.

When adding http://www.linuxjournal.com/content/papas-got-brand-new-nas to wallabag, content fetching lasts very long. Only for adding this page to wallabag, the prod.log of wallabag grows by 2,4 MB.

It is only a problem with multi-page articles of linuxjournal.

What i observe:

  • wallabag needs some minutes to fetch this article
  • after finished fetching only the first page of the article is in wallabag
  • the article in wallabag ends with the sentence: This article appears to continue on subsequent pages which we could not extract

I have attached the log: prod.txt

From a first superficial view at the log, graby has a problem with the URL for the next page, because it contains an unusual string (maybe the comma?).

Use background-image as content

Hi there,
One site I'm reading is using background-image on links to display images. On wallabag, the content is empty. Is there a way to fix that? I told them not to do this but you know how it works…

<a href="/image.jpg" style="background-image: url('/image.jpg');"><span></span></a>

Can't install via composer

Output:

~/workspace $ composer require j0k3r/graby
Using version ^1.12 for j0k3r/graby
./composer.json has been created
Loading composer repositories with package information
Updating dependencies (including require-dev)
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - Installation request for j0k3r/graby ^1.12 -> satisfiable by j0k3r/graby[1.12.0].
    - j0k3r/graby 1.12.0 requires htmlawed/htmlawed dev-master -> satisfiable by htmlawed/htmlawed[dev-master] but these conflict with your requirements or minimum-stability.


Installation failed, deleting ./composer.json

Switch to newer HTMLawed

htmlawed/htmlawed composer package is not compatible with PHP 7.2 and the package maintainer is not responsive. Since PHP 7.2 was released, I forked the package. I can make you a repo co-owner, if you wish.

Tests\Graby\GrabyFunctionalTest::testDate is failing with incorrect date

On master:

2) Tests\Graby\GrabyFunctionalTest::testDate with data set #1 ('https://www.reddit.com/r/Linu...guide/', '2013-05-30T16:01:58+00:00')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-2013-05-30T16:01:58+00:00
+2013-05-30T16:10:50+00:00

The actual date seems to be the one of the first comment below the main content.

Problem with escaped fragment when fetching some websites

Hello !

I come from Wallabag, which is using this project.

I have problem retrieving content from a webpage, because graby automatically adds an ?_escaped_fragment_= at the end of the URL, for crawling AJAX purpose.
That's a problem because the website in question gives a 404 error when detecting this escaped fragment. Probably to avoid being fetched by robots ?

Still, the content seems to be accessible without the fragment.

A solution would be to try to fetch again the URL without this escaped fragment if a 404 error is answered ?

Here is the website, you can test with or without the escaped fragment:
https://dzone.com/
https://dzone.com/?_escaped_fragment_=

Thank you !

Antonin

Parser: add support for list's start attribute

From @pVesian on June 7, 2017 8:14

Issue details

In HTML, lists can have a "start" attribute that allows a number in the list, instead of the default one.

Environment

Wallabagit & f43.me

Steps to reproduce/test case

Store this article: http://www.timothysykes.com/blog/10-things-know-short-selling/
Scroll to "“Called out” or “Buy in”", it's point 4 in the article, but point 1 in the stored article. By inspecting the HTML code, you will find that the parser removes the "start" attribute.

Thanks

Copied from original issue: wallabag/wallabag#3185

Link goes removed

For that url: https://www.washingtonpost.com/world/national-security/trump-to-meet-russian-foreign-minister-at-the-white-house-as-moscows-alleged-election-interference-is-back-in-spotlight/2017/05/10/c6717e4c-34f3-11e7-b412-62beef8121f7_story.html

The first content is converted to:

<h3>By  and ,</h3>

When the original content is:

<span class="pb-byline" itemprop="author" itemscope="" itemtype="http://schema.org/Person">
    By
    <a href="https://www.washingtonpost.com/people/carol-morello/">
        <span itemprop="name">Carol Morello</span>
    </a> 
    and 
    <a href="https://www.washingtonpost.com/people/greg-miller/">
        <span itemprop="name">Greg Miller</span>
    </a>
</span>

Allow setting custom logger

The default log file will be located in the composer directory which should not even be writeable. For greater flexibility, setting a logger should be allowed similarly to how php-readability allows it.

Couldn't fetch Readability\JSLikeHTMLElement

In wallabag, when I try to add this URL http://www.journaldugamer.com/tests/rencontre-ils-bossaient-sur-une-exclu-kinect-qui-ne-sortira-jamais/, I've got this error in my logs:

[2016-12-06 22:43:06] app.ERROR: Error while saving an entry {"exception":"[object] (Symfony\\Component\\Debug\\Exception\\ContextErrorException(code: 0): Warning: DOMDocument::importNode(): Couldn't fetch Readability\\JSLikeHTMLElement at /var/www/wallabag/vendor/j0k3r/graby/src/Graby.php:297)","entry":"[object] (Wallabag\\CoreBundle\\Entity\\Entry: {})"} []

Set default title if empty

In the case of wallabag/wallabag#1632, it seems Graby returns an empty string for the title. This should be tested and a default title should be shown.

array(8) {
  ["status"]=>
  int(500)
  ["html"]=>
  string(38) "[unable to retrieve full-text content]"
  ["title"]=>
  string(0) ""
  ["language"]=>
  NULL
  ["url"]=>
  string(86) "https://sulek.fr/index.php?article60/configuration-ipv6-pour-une-dedibox-sous-centos-7"
  ["content_type"]=>
  string(0) ""
  ["open_graph"]=>
  array(0) {
  }
  ["summary"]=>
  string(38) "[unable to retrieve full-text content]"
}

Add support for httplug

Instead of relying on Guzzle 5 and lock deps down to this version (see #8), it should be better to add support for httplug to be able to support multiple Guzzle version (or even other http lib).

Xpath used twice doesn't work

For mobile.twitter.com configuration, I want to do this:

title: (//div[contains(@class, 'TweetDetail-text') or contains(@class, 'tweet-text')])[1]
author: (//*[contains(@class, 'UserNames-displayName') or contains(@class, 'fullname')])[1]
body: (//div[contains(@class, 'TweetDetail-text') or contains(@class, 'tweet-text')])[1]
date: (//div[contains(@class, 'TweetDetail-timeAndGeo') or contains(@class, 'metadata')])[1]

I have the same xpath for title and body.
But the content is OK only for the title.
The body is wrong.

Do you have any idea?

add ability to send HTTP header like user-agent or referer

Hi,

i would love to see graby being able to send additional http headers as configured in some ftr-site-config recipes.

Currently this is not supported:

// NOT YET USED

There are some site configs wanting to send the user-agent: https://github.com/fivefilters/ftr-site-config/search?utf8=%E2%9C%93&q=user-agent

An example is also wallabag/wallabag#2150 where the website thinks we are an internet explorer and we get redirected to hell.

Support for Guzzle 6?

My installation seems to be failing because I have guzzle 6 I think?

    - Installation request for j0k3r/graby ^1.10 -> satisfiable by j0k3r/graby[1.10.0].
    - j0k3r/graby 1.10.0 requires guzzlehttp/guzzle ^5.2.0 -> satisfiable by guzzlehttp/guzzle[5.2.0, 5.3.0, 5.3.1, 5.3.x-dev] but these confli
ct with your requirements or minimum-stability.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.