Coder Social home page Coder Social logo

counter-robots's Introduction

COUNTER-Robots

Official list of user agents that are regarded as robots/spiders by Project COUNTER

Purpose of this list

The growing use of web crawler robots have the potential to inflate usage statistics. Only genuine, user-driven usage should be reported in usage statistics. This list can be used to determine whether a request is performed by a bot or if it is performed by a genuine user.

Pull requests

This list is open for extension and anyone can make a new pull request with new additions to this list. When proposing new bots it is important to give an explanation/argumentation on why you think that is a bot. It would also be appreciated if you can give a small description and eventually an url where the crawler comes from.

Versioning

Merges of changes to the master branch will automatically trigger a new official release by the end of the day. At the end of each day when changes are made to master, a tag with the date is automatically made. Versions and changes can be viewed in CHANGES.md. CHANGES.md is manually updated and curated.

Format of the list

The list is available in a JSON format, this for the purpose so we can add extra data like date added, a description, an url, ... . Only the pattern field is obligatory, dates should be in the following format: YYYY-MM-DD. In the folder generated you can find the list in text format, this is a simplified format with one bot on each line. This file should NOT be edited, as it will be generated from the list in json-format. This derived list receives an automated update with each new release of the source list in json.

There is a script included called convert_to_text which will convert the list to a text file with a bot on every line. Just run the script in the same folder as the COUNTER_Robots_list.json file and it will generate a COUNTER_Robots_list.txt file (You need jq for this: https://github.com/stedolan/jq).

Case insensitivity of pattern-matching

When matching against the patterns in this list, we strongly advice you to use case-insensitive matching as there are multiple case-specific iterations of certain web crawler robots. Instead of matching these on a case by case basis, which would inflate the number of patterns,(For example, if we want to match 'bot', we can be fairly certain that 'BoT' is to be matched as well) case insensitive matching reduces the number of configs to take into consideration.

counter-robots's People

Contributors

davidatmire avatar atmire-github avatar jonas-atmire avatar alanorth avatar bram-atmire avatar tomdesair avatar philipvis avatar mrabro avatar trianglepb avatar ajnyga avatar ctgraham avatar heydevhey avatar jmvezic avatar jnugent avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.