Coder Social home page Coder Social logo

cpan-testers-www-reports's People

Contributors

barbie avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

cpan-testers-www-reports's Issues

Per-author RSS feeds broken

Thanks for running the infrastructure for the helpful CPAN Testers service!

My author's page works as expected (if JavaScript is enabled) and also the top right orange YAML link seems to work.

But the other two top right orange (RSS and RSS (No PASSes)) don't work properly. They both return an HTML page showing only the site's framework with no contents, but no RSS feed at all.

I've looked through this repo's code, but it seems it's either a misconfiguration of Labyrinth or a bug in Labyrinth itself. At least for me there seems no obvious place in this repo where this issue could be hidden.

crash viewing cpantester report

Some detailed cpantester report from the list below is failing:

This are failing for eg:

Web user interface error message:

  • Can't call method "resource" on an undefined value at /home/cpantesters/perl5/lib/perl5/CPAN/Testers/Web/Controller/Legacy.pm line 249.

I'm nod sure if the right place to fire this bug is in this project or not, sorry if this isn't the right place to report it.

reports-summary.cgi is unable to take data for latest versions

Program reports-summary.cgi seems to be unable to take data for latest versions of a few distributions (I suspect it applies to every distribution but I'm not sure). For example, you can take a look at the following report in JSON format for Data::Money.

The elements in the report are not properly populated. I don't know what it depends on (I can't see the raw data in the database), probably something to do with the SQL query at line 355 or skipping at lines 369-372.

I'd guess this has also a bad effect on meta::cpan, because the "Testers" line on the left does not show the useful PASS/FAIL/NA count. E.g. compare Template::Perlish 1.21

template-perlish-1.21

against Template::Perlish 1.50

template-perlish-1.50

Do we still need static pages?

Do we need to keep the static pages any more?

For example: http://static.cpantesters.org/author/B/BARBIE.html

When I first took over there were several who complained that they had to have the static pages because they refused to turn on Javascript in the browser. I get some were hypersensitive to being tracked, but pretty much every major website requires javascript now, and Facebook/Google/etc are far more more clued up to tracking even static pages now, that keeping the static pages seems a bit of a waste. It would also save some time saving the file to disk for every distro/author update.

For many authors and distros these pages are huge, and the amount of scrolling needed to get to the information you're looking for. My page above is currently 20MB and takes at least 33 seconds to load.

In the last couple of days the only things accessing these pages are Bots.

Deprecating these pages would reduce disk writes and storage, as well as marginally improving the builder performance.

Does anyone know of anyone who might have a problem with these pages not being around any more? As part of the proposed dropping of these pages, the whole static.cpantesters.org site can be dropped, meaning one less site for bots to hit.

Any thoughts, @preaction & @glasswalk3r?

Recent releases missing

Today I have noticed that some recent releases are missing from the default listings on www.cpantesters.org.

eg. For Path::Tiny, compare http://matrix.cpantesters.org/?dist=Path-Tiny+0.122 against http://www.cpantesters.org/distro/P/Path-Tiny.html - the most recent release on the latter is 0.112. However, if I then change the Availability selector in the preferences (left menu) from On CPAN ONLY to All distributions then version 0.122 magically appears.

It seems that whatever is determining which releases are on CPAN is missing the newer ones.

Distro Building is taking too long

The Problem:

The builder for DistroPages can take a long time to run for a single distribution.

Adding some progress tracking lines to the code, gave me the following example (based on Version 1 code below):

2017/05/30 09:16:20 .. .. starting PadWalker
2017/05/30 09:16:20 .. .. getting records for PadWalker
2017/05/30 09:16:20 .. .. retrieved records for PadWalker 
2017/05/30 09:16:20 .. .. loading JSON data for PadWalker
2017/05/30 09:16:21 .. .. loaded JSON data for PadWalker
2017/05/30 09:16:24 .. .. starting data update for PadWalker
2017/05/30 09:16:24 .. .. summary data update complete for PadWalker
2017/05/30 09:16:24 .. .. version data update complete for PadWalker
2017/05/30 09:27:27 .. .. OS data update complete for PadWalker
2017/05/30 09:37:00 .. .. Pass Stats data update complete for PadWalker
2017/05/30 09:37:01 .. .. memory data update complete for PadWalker
2017/05/30 09:37:01 .. .. summary data stored for PadWalker
2017/05/30 09:37:01 .. .. building static pages for PadWalker
2017/05/30 09:37:02 .. .. Dynamic HTML page written for PadWalker
2017/05/30 09:37:04 .. .. JS page written for PadWalker
2017/05/30 09:37:04 .. .. JSON page written for PadWalker
2017/05/30 09:37:04 .. .. Static HTML page written for PadWalker
2017/05/30 09:37:06 .. .. removing page_request entries for PadWalker

As you can see the sticking points are loading the OS data and the Pass stats data. These lines amount to Version 1 code below


Version 1 Code: (taken from Labyrinth/Plugin/CPAN/Builder.pm, lines 736-752)

        my ($stats,$oses);
        @rows = $dbi->GetQuery('hash','GetDistrosPass',{dist=>$dist});
        for(@rows) {
            my ($osname,$code) = $cpan->OSName($_->{osname});
            $stats->{$_->{perl}}{$code}{count} = $_->{count};
            $oses->{$code} = $osname;
        }
$progress->( ".. .. OS data update complete for $name" ) if(defined $progress);

        # distribution PASS stats
        my @stats = $dbi->GetQuery('hash','GetStatsPass',{dist=>$dist});
        for(@stats) {
            my ($osname,$code) = $cpan->OSName($_->{osname});
            $stats->{$_->{perl}}{$code}{version} = $_->{version}
                if(!$stats->{$_->{perl}}->{$code} || _versioncmp($_->{version},$stats->{$_->{perl}}->{$code}{version}));
        }
$progress->( ".. .. Pass Stats data update complete for $name" ) if(defined $progress);

These use the following 2 queries:

GetDistrosPass=SELECT perl, osname, count(*) AS count FROM cpanstats.cpanstats WHERE dist IN ('$dist') AND state = 'pass' GROUP BY perl, osname
GetStatsPass=SELECT perl, osname, version FROM cpanstats.cpanstats WHERE dist IN ('$dist') AND state='pass'

The GROUP BY is particularly painful, as the two fields 'perl' and 'osname' are not indexed.

However, the two requests are essentially requesting similar information in two subtly different ways. So a first optimisation was to simplify the request and use a single loop to store the data. This resulted in Version 2 code below.


Version 2 Code:

        # retrieve perl/os stats
        my ($stats,$oses);
        my @stats = $dbi->GetQuery('hash','GetStatsPass',$name);
        for(@stats) {
            my ($osname,$code) = $cpan->OSName($_->{osname});
            $stats->{$_->{perl}}{$code}{version} = $_->{version}
                if(!$stats->{$_->{perl}}->{$code} || _versioncmp($_->{version},$stats->{$_->{perl}}->{$code}{version}));

            $stats->{$_->{perl}}{$code}{count}++;
            $oses->{$code} = $osname;
        }

This did improve things, but not by as big a margin as I would have hoped. Some queries for new or recent distributions were quick anyway, so this was something that needed assessing with older distributions. One such example was 'Math-BigInt'. Looking at the potential result set, we can see this would return 84,956 rows to process, which took 21 minutes to process!

mysql> explain SELECT perl, osname, version FROM cpanstats.cpanstats WHERE dist IN ('Math-BigInt') AND state='pass';
+----+-------------+-----------+------------+------+-------------------------------------+----------------------+---------+-------------+-------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys                       | key                  | key_len | ref         | rows  | filtered | Extra |
+----+-------------+-----------+------------+------+-------------------------------------+----------------------+---------+-------------+-------+----------+-------+
|  1 | SIMPLE      | cpanstats | NULL       | ref  | distvers,state,cpanstats_dist_state | cpanstats_dist_state | 293     | const,const | 84956 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+-------------------------------------+----------------------+---------+-------------+-------+----------+-------+

In other areas of CPAN Testers, particularly CPAN Statistics website, due to these large dataset lookups, storing summary information is needed to avoid overloading the database, and to speed up processing time.

Likewise, this area could also benefit from a summary storage, and updating only those records added since the last dist build would mean a smaller dataset is processed every time for all distributions.

As such, I would like to propose a new table to be used, with new queries and a Version 3 code block to replace the blocks used above.


New Table:

CREATE TABLE stats_store (
  `dist`    varchar(255) NOT NULL,
  `version` varchar(255) DEFAULT NULL,
  `perl`    varchar(255) DEFAULT NULL,
  `oscode`  varchar(255) DEFAULT NULL,
  `counter` int(10) unsigned NOT NULL,
  `lastid`  int(10) unsigned NOT NULL,
  KEY `dist` (`dist`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

New Queries:

GetStatsStore=SELECT * FROM cpanstats.stats_store WHERE dist=?
DelStatsStore=DELETE FROM cpanstats.stats_store WHERE dist=?
SetStatsStore=INSERT INTO cpanstats.stats_store (dist,perl,oscode,version,counter,lastid) VALUES (?,?,?,?,?,?)
GetStatsPass2=SELECT perl, osname, version FROM cpanstats.cpanstats WHERE dist IN ('$dist') AND state='pass' AND id>?

Version 3 Code:

        # retrieve perl/os stats
        my ($stats,$oses);
        my $lastid = 0;
        my @rows = $dbi->GetQuery('hash','GetStatsStore',{dist=>$dist});
        for(@rows) {
            $stats->{$_->{perl}}{$_->{oscode}}{version} = $_->{version};
            $stats->{$_->{perl}}{$_->{oscode}}{count}   = $_->{counter};
            $oses->{$_->{oscode}} = $osname;
            $lastid |= $_->{lastid};
        }

        my @stats = $dbi->GetQuery('hash','GetStatsPass2',{dist=>$dist},$lastid);
        for(@stats) {
            my ($osname,$code) = $cpan->OSName($_->{osname});
            $stats->{$_->{perl}}{$code}{version} = $_->{version}
                if(!$stats->{$_->{perl}}->{$code} || _versioncmp($_->{version},$stats->{$_->{perl}}->{$code}{version}));

            $stats->{$_->{perl}}{$code}{count}++;
            $oses->{$code} = $osname;
            $lastid = $_->{id} if($lastid < $_->{id};
        }

        $dbi->DoQuery('DelStatsStore',$name);
        for my $perl (keys %$stats) {
            for my $code (keys %{$stats->{$perl}}) {
                $dbi->DoQuery('SetStatsStore',$name,$perl,$code,$stats->{$perl}{$code}{version},$stats->{$perl}{$code}{count},$lastid);
            }
        }

Note that none of this is live, and I'm not able to test with large datasets, so this may not be as big a benefit as it might appear on paper. However, I would be interested to hear thoughts about this, as to whether this would be worth tackling.

It would mean updating just Builder.pm for the code, and phrasebook.ini for the queries. As such I should be able to do a small release of CPAN-Testers-WWW-Reports. However, if the current builder processes are being rewritten, a note should be added there to capture this change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.