Coder Social home page Coder Social logo

Comments (4)

josepharhar avatar josepharhar commented on June 23, 2024

The fact that these aren't sorted also makes me worried that the sites listed aren't even "top sites" with regards to anything at all

from chromium-dashboard.

chrishtr avatar chrishtr commented on June 23, 2024

@tunetheweb

from chromium-dashboard.

tunetheweb avatar tunetheweb commented on June 23, 2024

So it's a bit complicated. But it is a sample of the URLs from the top sites using this feature.

First up, the rankings available in the HTTP Archive are based on the CrUX course rank magnitude. This means we only get groupings like top 1,000 then top 5,000, then top 10,000, then top 50,000, then 100,000...etc. So we do not have a precise "ranking" of 1, 2, 3, 4....etc.

We take the top 100 urls as ordered by rank, and url for mobile and for desktop so we have a max of 200 urls if they are distinct (often there's sites combined in both). This limiting to 100 for each is mainly done so they can be precomputed to keep the dashboard reasonably fast. Importantly, this list now is just URLs and no longer contains rank as it's stored as a simple array of URLs.

Then we combine this list and report it by alphabetical order.

What does all this mean? Let's take following usage as an example:

rank url
1,000 https://z.com
10,000 https://a.com
10,000 https://b.com
50,000 https://c.com
50,000 https://d.com
50,000 https://e.com
100,0000 https://f.com

Then let's say we only took the top 4 sites, instead of 200 for simplicity. It's already ordered by rank and url so we would take the following:

(note we include all of the top 1,000, all of the top 10,000 and only a bit of the top 50,000).

And then we present them as the following order (as we no longer have the rank to sort by):

This means we ARE broadly giving a sample of the urls ordered by rank (note that we didn't include f.com for example as it was lower ranked), but there's definitely some nuance here as it's no longer in strict rank order in the end . Though it STILL is still the most popular 200-ish URLs we have for that feature. Just not exactly in the rank order anymore.

I've updated the text to "Sample URLs of the most popular sites using this feature ordered alphabetically" in some vague way to try to explain this but not sure if it makes it any clearer!

If you want the actual rank order, then you can run the following SQL:

#standardSQL
SELECT DISTINCT yyyymmdd, feature, id, rank, url
FROM `httparchive.blink_features.features`
WHERE (feature = 'SelectParserDroppedTag' OR id = '4844')
AND yyyymmdd = (SELECT MAX(yyyymmdd) FROM `httparchive.blink_features.features`)
ORDER BY yyyymmdd DESC, rank, url
LIMIT 200;

In fact if you remove the limit, you'll get all 493 URLs as shown in this sheet in rank and then url order. Note that even then it is still in the course rank order.

yyyymmdd feature id rank url
2024-03-01 SelectParserDroppedTag 4844 50,000 https://www.loewe.com/
2024-03-01 SelectParserDroppedTag 4844 100,000 https://billetterie.rclens.fr/
2024-03-01 SelectParserDroppedTag 4844 100,000 https://onlinesbi.sbi/
2024-03-01 SelectParserDroppedTag 4844 100,000 https://www.buybestgear.com/
2024-03-01 SelectParserDroppedTag 4844 500,000 https://m.maccosmetics.com.mx/
2024-03-01 SelectParserDroppedTag 4844 500,000 https://m.maccosmetics.es/
... ... ... ... ...

Unfortunately this SQL is not possible to run in the Dashboard as it's very slow to run for all features (which is how Data Studio works). Hence why we go for the rather convoluted route we still gives broadly the same data, but sometimes not in the exact order expected.

However, you'll also note the top 200 URLs in the sheet are also what was provided by chromestatus. Just not in the same order.

The history of this is this used to be a completely random sampling of URLs which was not that useful at all. Now it's at least the most popular URLs but yes the ordering within that is still a little messy.

With a little more effort we could have a new table with the top 200 URLS and the rank column to make this all more obvious. But it still would be the same URL list and without fine-grained ranking that you may be looking for.

from chromium-dashboard.

mfreed7 avatar mfreed7 commented on June 23, 2024

Thanks for the very detailed explanation of what's going on with this list! I feel a lot better about the quality of the results.

Having said that, I do think it'd be very useful to either a) keep the list ordered by rank "bucket" so that the top-1000 results are at the top, or at least b) add a rank column so we could do that ourselves. While a fine-grained 1,2,3 ranking would be the best, we can still extract a lot of value from top-1000 vs. top-50000.

How much of a project would it be to do one of those things?

from chromium-dashboard.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.