Coder Social home page Coder Social logo

dotandimet / mojo-useragent-role-queued Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 2.0 67 KB

A role for Mojo::UserAgent that processes non-blocking requests in a rate-limiting queue.

License: Other

Perl 100.00%
mojo mojolicious perl perl5 user-agent web-crawler

mojo-useragent-role-queued's Introduction

Build Status MetaCPAN Release

NAME

Mojo::UserAgent::Role::Queued - A role to process non-blocking requests in a rate-limiting queue.

SYNOPSIS

   use Mojo::UserAgent;

   my $ua = Mojo::UserAgent->new->with_roles('+Queued');
   $ua->max_redirects(3);
   $ua->max_active(5); # process up to 5 requests at a time
   for my $url (@big_list_of_urls) {
   $ua->get($url, sub {
           my ($ua, $tx) = @_;
           if (! $tx->error) {
               say "Page at $url is titled: ",
                 $tx->res->dom->at('title')->text;
           }
          });
  };
  Mojo::IOLoop->start unless Mojo::IOLoop->is_running;

  # works with promises, too:
 my @p = map {
   $ua->get_p($_)->then(sub { pop->res->dom->at('title')->text })
     ->catch(sub { say "Error: ", @_ })
 } @big_list_of_urls;
  Mojo::Promise->all(@p)->wait;

DESCRIPTION

Mojo::UserAgent::Role::Queued manages all non-blocking requests made through Mojo::UserAgent in a queue to limit the number of simultaneous requests.

Mojo::UserAgent can make multiple concurrent non-blocking HTTP requests using Mojo's event loop, but because there is only a single process handling all of them, you must take care to limit the number of simultaneous requests you make.

Some discussion of this issue is available here http://blogs.perl.org/users/stas/2013/01/web-scraping-with-modern-perl-part-1.html and in Joel Berger's answer here: http://stackoverflow.com/questions/15152633/perl-mojo-and-json-for-simultaneous-requests.

Mojo::UserAgent::Role::Queued tries to generalize the practice of managing a large number of requests using a queue, by embedding the queue inside Mojo::UserAgent itself.

ATTRIBUTES

Mojo::UserAgent::Role::Queued has the following attributes:

max_active

$ua->max_active(5);  # execute no more than 5 transactions at a time.
print "Execute no more than ", $ua->max_active, " concurrent transactions"

Parameter controlling the maximum number of transactions that can be active at the same time.

EVENTS

Mojo::UserAgent::Role::Queued adds the following event to those emitted by Mojo::UserAgent:

queue_empty

$ua->on(queue_empty => sub { my ($ua) = @_; .... })

Emitted when the queue has been emptied of all pending jobs. In previous releases, this event was called stop_queue (this is a breaking change).

LICENSE AND COPYRIGHT

This software is Copyright (c) 2017-2019 by Dotan Dimet [email protected].

This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License version 2.0.

AUTHOR

Dotan Dimet [email protected]

mojo-useragent-role-queued's People

Contributors

dotandimet avatar grinnz avatar vtyldum avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

grinnz tyldum

mojo-useragent-role-queued's Issues

Tests fail (with newest Mojolicious?)

On some of my smokers the test suite fails:

#   Failed test 'Non-blocking skips queue'
#   at t/01-compose.t line 28.
#          got: ''
#     expected: 'Hello World'
Can't use string ("4e9b91cceae8bccd2472ef4e69546e43") as a subroutine ref while "strict refs" in use at /home/cpansand/.cpan/build/2018062419/Mojo-UserAgent-Role-Queued-0.04-nBDzjJ/blib/lib/Mojo/UserAgent/Role/Queued.pm line 38.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 35 just after 2.
t/01-compose.t .. 
Dubious, test returned 35 (wstat 8960, 0x2300)
Failed 1/2 subtests 

This seems to happen with the latest version of Mojolicious. Statistical analysis:

****************************************************************
Regression 'mod:Mojolicious'
****************************************************************
Name           	       Theta	      StdErr	 T-stat
[0='const']    	      1.0000	      0.0000	75757728634694000.00
[1='eq_7.57']  	      0.0000	      0.0000	   0.00
[2='eq_7.58']  	      0.0000	      0.0000	   0.00
[3='eq_7.59']  	      0.0000	      0.0000	   0.00
[4='eq_7.61']  	      0.0000	      0.0000	   0.00
[5='eq_7.67']  	      0.0000	      0.0000	   0.00
[6='eq_7.68']  	     -0.0000	      0.0000	  -3.18
[7='eq_7.71']  	      0.0000	      0.0000	   0.00
[8='eq_7.75']  	      0.0000	      0.0000	   4.21
[9='eq_7.81']  	      0.0000	      0.0000	   0.00
[10='eq_7.83'] 	      0.0000	      0.0000	   0.00
[11='eq_7.84'] 	     -0.0000	      0.0000	  -4.21
[12='eq_7.85'] 	     -1.0000	      0.0000	-66443923423544144.00

R^2= 1.000, N= 108, K= 13
****************************************************************

Off by one in parallel

Seems to be one more active connection than the one specified in ->max_active().

This is minor and I might get to send you a PR, but right now I just had to document it as I hurry on in my project ;)

For most people this would simply mean that UA will keep opening and closing one additional connection if max_active is set equal to max_connections.

Neat tshark command to summarize this when testing:
tshark -z endpoints,tcp,ip.addr==$serverip

Role seems to turn the UserAgent into a singleton

Following example code shows how UserAgent changes behaviour with the role.
There should be two failed requests, but I only get one (version 1.10):

#!/usr/bin/env perl
use v5.22;
use DDP;
use Mojo::UserAgent;
use Mojo::URL;

package testbase;
use DDP;
use Mojo::Base -base, -signatures;

has ua => sub { Mojo::UserAgent->new; };
sub new {
  my $self = shift->SUPER::new(@_);
  return $self;
}
sub poll($self) {
  $self->ua->get(
    "https://untrusted-root.badssl.com/" => sub {
      my ($ua, $tx) = @_; 
      p $tx->res->error if $tx->res->error;
    }); 
}

package testbase::insecure;
use Mojo::Base 'testbase';

package testbase::secure;
use Mojo::Base 'testbase';


package main;

my $insecure = testbase::insecure->new(ua => Mojo::UserAgent->new->insecure(1));
my $secure = testbase::secure->new(ua => Mojo::UserAgent->new);


my $insecure_q = testbase::insecure->new(ua => Mojo::UserAgent->new->with_roles('+Queued')->insecure(1));
my $secure_q = testbase::secure->new(ua => Mojo::UserAgent->new->with_roles('+Queued'));

say "There should be two failures below:";

# Works. $insecure works, $secure fails as expected.
$insecure->poll;
$secure->poll;

# Both these succeed, one should fail
$insecure_q->poll;
$secure_q->poll;

Mojo::IOLoop->start;

Yields one instead of two:

There should be two failures below:
\ {
    message   "SSL connect attempt failed error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
"
}

Errors when running with EV backend

Reported by tyldis via irc:

I wanted to use it, but even the example in the synopsis isn't working for me. https://pastebin.com/8veqfYB9
My short testing seems to indicate that issues arise as the queue grows past max_active
This was on Mojo 7.90-7.94

The pastebin output looks like this:

Page at https://mojolicious.io/ is titled: Welcome To mojolicious.io! - mojolicious.io Page at https://mojolicious.io/ is titled: Welcome To mojolicious.io! - mojolicious.io Page at https://mojolicious.io/ is titled: Welcome To mojolicious.io! - mojolicious.io Page at https://mojolicious.io/ is titled: Welcome To mojolicious.io! - mojolicious.io Page at https://mojolicious.io/ is titled: Welcome To mojolicious.io! - mojolicious.io Use of uninitialized value $ref_to_this_sub in method lookup at /home/vidar/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/Mojo/UserAgent/Role/Queued.pm line 22. Mojo::Reactor::EV: I/O watcher failed: Can't locate object method "" via package "Mojo::UserAgent__WITH__Mojo::UserAgent::Role::Queued" at /home/vidar/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/Mojo/UserAgent/Role/Queued.pm line 22. Use of uninitialized value $ref_to_this_sub in method lookup at /home/vidar/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/Mojo/UserAgent/Role/Queued.pm line 22. Mojo::Reactor::EV: I/O watcher failed: Can't locate object method "" via package "Mojo::UserAgent__WITH__Mojo::UserAgent::Role::Queued" at /home/vidar/perl5/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/Mojo/UserAgent/Role/Queued.pm line 22.

Possible memory leak

What I am doing

In my code I have 5 different Mojo::UserAgent with this role. I am polling some devices and they have different limits on maximum concurrent requests, hence the 5 and not a single one.

Observation

  • CPU usage steadily increases over time with role applied
    (I also see memory growing a lot faster, but need to debug more to be sure I'm not causing it)

Debug process

I used Devel::MAT and dumped after running an hour.

pmat [more]> largest
HASH(24798) at 0x55eb7bf5bda0=strtab: 1.5 MiB
ARRAY(96519,!REAL) at 0x55eb7fbbf618: 754.1 KiB
ARRAY(30588,!REAL) at 0x55eb7fbfdbc0: 239.0 KiB
ARRAY(30583) at 0x55eb7fbd5038: 239.0 KiB
SCALAR(PV) at 0x55eb7d14b9c0: 199.2 KiB

pmat [more]> identify 0x55eb7fbbf618
ARRAY(96519,!REAL) at 0x55eb7fbbf618 is:
└─the backrefs list of STASH(12) at 0x55eb7fbbf3a8, which is:
  └─the symbol '%Mojo::UserAgent::Role::Queued::'


pmat [more]> elems 0x55eb7fbbf618                                                                                                                                                                                                             
  [0]  GLOB(*) at 0x55eb7fbbf600                                                                                                                                                                                                              
  [1]  GLOB(&*) at 0x55eb7fbbfb10                                                                                                                                                                                                             
  [2]  GLOB(*) at 0x55eb7fbbf6f0                                                                                                                                                                                                              
  [3]  GLOB(&*) at 0x55eb7fbbf708                                                                                                                                                                                                             
  [4]  GLOB(&*) at 0x55eb7fbbfba0                                                                                                                                                                                                             
  [5]  GLOB(&*) at 0x55eb7fbbf918
  [6]  GLOB(&*) at 0x55eb7fbbf9c0
  [7]  GLOB(&*) at 0x55eb7fbbfa68
  [8]  GLOB($*) at 0x55eb7fbbf630
  [9]  GLOB(%*) at 0x55eb7fbc7fd0
  [10] GLOB(*) at 0x55eb7fbbf570
  [11] CODE(PP) at 0x55eb7fbbfc00
  [12] CODE(PP) at 0x55eb7fbbf5b8
  [13] CODE(PP,P) at 0x55eb7fbc8d38
  [14] CODE(PP,P) at 0x55eb7fbcaa78
  [15] CODE(PP,P) at 0x55eb7fbcaad8
  [16] CODE(PP) at 0x55eb7fbc7ee0
  [17] GLOB(*) at 0x55eb7fbc8048
  [18] GLOB(&*) at 0x55eb7fbc8c18
  [19] GLOB(*) at 0x55eb7fbca880
  [20] GLOB(&*) at 0x55eb7fbca898
  [21] CODE(PP,C) at 0x55eb7fbd4de0
  [22] CODE(PP,C) at 0x55eb7fed5568
  [23] CODE(PP,C) at 0x55eb7fbd4f60
  [24] CODE(PP,C) at 0x55eb7fc0c8f0
  [25] CODE(PP,C) at 0x55eb7ff33448
  [26] CODE(PP,C) at 0x55eb7fc0ca40                                                                                                                                                                                                           
  [27] CODE(PP,C) at 0x55eb7fc12b00                                                                                                                                                                                                           
  [28] CODE(PP,C) at 0x55eb7fff9970                                                                                                                                                                                                           
  [29] CODE(PP,C) at 0x55eb7fc12c50                                                                                                                                                                                                           
  [30] CODE(PP,C) at 0x55eb7fefd1a0
  [31] CODE(PP,C) at 0x55eb7fe22a10
  [32] CODE(PP,C) at 0x55eb80001d48
  [33] CODE(PP,C) at 0x55eb7fe23778
  [34] CODE(PP,C) at 0x55eb80091b18
  [35] CODE(PP,C) at 0x55eb7fe29830
  [36] CODE(PP,C) at 0x55eb7ff4b8f8
  [37] CODE(PP,C) at 0x55eb7fe31f88
  [38] CODE(PP,C) at 0x55eb7ff49eb0
  [39] CODE(PP,C) at 0x55eb7fe34400
  [40] CODE(PP,C) at 0x55eb7fe96780
  [41] CODE(PP,C) at 0x55eb7fe34a78
  [42] CODE(PP,C) at 0x55eb7ff10308
  [43] CODE(PP,C) at 0x55eb7fe3a3a8
  [44] CODE(PP,C) at 0x55eb7fed2fb8
  [45] CODE(PP,C) at 0x55eb7fe3c7d0
  [46] CODE(PP,C) at 0x55eb80094c08
  [47] CODE(PP,C) at 0x55eb7fe3ce48
  [48] CODE(PP,C) at 0x55eb7ff52ba8
  [49] CODE(PP,C) at 0x55eb7fe425e8
  ... (96469 more)                 

Then by poking at some of these they are all:

pmat [more]> show 0x55eb80091b18
CODE(PP,C) at 0x55eb80091b18 with refcount 1
  size 128 bytes
  named as &Mojo::UserAgent::Role::Queued::__ANON__
  no hekname
  stash=STASH(12) at 0x55eb7fbbf3a8
  glob=GLOB(*) at 0x55eb7fbbf570
  location=/opt/xxx/perl5/lib/perl5/Mojo/UserAgent/Role/Queued.pm line 20
  no scope
  no padlist
  no padnames_av
  pad[0]=PAD(3) at 0x55eb7fdebba0

This array seems to grow indefinately. While not large enough to really leak memory, it seems the CPU cycles need to traverse it increases in a linear fashion.

Looking at
https://github.com/dotandimet/Mojo-UserAgent-Role-Queued/blob/master/lib/Mojo/UserAgent/Role/Queued.pm#L20 I have no obvious suggestions.

Caveat: I have no idea what I'm doing here :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.