Coder Social home page Coder Social logo

qurbat / blocked-hosts Goto Github PK

View Code? Open in Web Editor NEW
72.0 6.0 10.0 4.88 MB

A periodically updated list of websites known to be blocked in India

License: Creative Commons Zero v1.0 Universal

Shell 83.30% Python 16.70%
censorship hostnames blocklists url-blocklist

blocked-hosts's Introduction

blocked-hosts

Statistics GitHub last commit

This repository houses a periodically updated list of websites (first-level domains only) that are known to be blocked on the ACT Fibernet network. A current list of hostnames blocked by ACT Fibernet can be found here. Historic results are available in the output directory.

Update 25-06-2023: A list of websites blocked by Hathway Broadband can be found here. The DNS resolvers for Hathway Broadband have also been added to the resources folder. Support for checking websites blocked by Hathway Broadband will be added soon.

Note: The list(s) published here are not fully representative of all hostnames that might be blocked by ACT Fibernet at a given time.

date of test total hosts removed since last test added since last test
January 8, 2024 13,782 - 278 hosts added
January 7, 2024 13,504 - 737 hosts added
January 2, 2024 12,767 - 2455 hosts added
December 31, 2023 10,312 - 4226 hosts added
June 3, 2023 6,086 32 hosts removed 137 hosts added
February 13, 2023 5,981 - 547 hosts added
January 26, 2023 5,434 96 hosts removed 128 hosts added
May 26, 2022 5,402 87 hosts removed 33 hosts added
November 22, 2021 5,456 1 host removed 1176 hosts added
July 28, 2021 4,281 - 231 hosts added
June 8, 2021 4,050 - 555 hosts added
April 16, 2021 3,495 179 hosts removed 76 hosts added
March 28, 2021 3,593 - 3593 hosts added

Method

The primary web censorship technique employed by ACT Fibernet is of poisoning the DNS A record entry for the root domain of a blocked host.

tencent.com. 0 IN A 202.83.21.14
qq.com. 0 IN A 202.83.21.14
ucweb.com. 0 IN A 202.83.21.14

The poisoned A record entry has been documented to consistently point to only a few IP addresses. This characteristic enables fingerprinting blocked hostnames.

Data

As a uniform list of suitable hostnames was not readily available, several publicly available domain name lists were collated and used as input. The collated list was further modified to exclude subdomains and duplicate entries.

  1. Top 1 million from Alexa

  2. Top 10 million from DomCop

  3. Collections released by Domains Project

  4. List from How India Censors the Web

  5. List from Citizen Lab's reposistory

Installation

The install.sh script can be used to install the tldextract package using pip, and to download, compile, and install the massdns binary from source.

Note: python3 is required to already be installed on your system.

Usage

./run.sh <input_list.txt>

If you intend to run the script using the network of an Internet service provider other than ACT Fibernet, you will have to modify the variable defined on line 4 for identifying a blocked host.

The run.sh script makes use of massdns to query a sizeable number of hostnames with speed, the responses of which are used for extraploating blocked hostnames. The apex.py script extracts root-level hostnames from the results with the help of the tldextract package. The list of root-level hostnames is then de-duplicated and saved to disk.

Notes

This repository builds on the paper How India Censors the Web authored by Kushagra Singh, Gurshabad Grover, and Varun Bansal. The primary intention behind this repository is to introduce some amount of transparency to the otherwise opaque processes associated with web censorship in India.

The captn3m0/airtel-blocked-hosts repository provides a similar list of hostnames known to be blocked on the Airtel Broadband network.

blocked-hosts's People

Contributors

bhvsh avatar qurbat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

blocked-hosts's Issues

Expired domain names in the list

I ran massdns against the list of blocked hosts using Google's Public DNS (8.8.8.8) and found that 21.66% of them returned an NXDOMAIN response, indicating expired domain names.

GC1zqQfacAAOGP0

This presents a unique problem. While these domain names are no longer active, they continue to remain on ACT Fibernet's block list. If someone were to re-register one of these domain names tomorrow, it would still probably continue to remain blocked. What should be done in this situation? Should this project only document currently active domain names? Or should it document even expired domain names that continue to remain on ACT Fibernet's block list?

Provide input.txt file

The input.txt file (Combined list of hosts to check) is not committed in the repo, and makes it hard to reproduce this work for other providers.

Update repository badges through GitHub Actions

Screenshot 2023-01-18 at 20-14-15 qurbat_blocked-hosts A periodically updated list of websites known to be blocked in India on the ACT Fibernet network

Badges for the repository (as pictured above) are currently updated manually every time an update is made to the compiled_block_list.txt file. The process of manually updating the README file every time a change is made to the block list, however, is a part of the commit workflow that can be dealt with quite reliably using GitHub Actions.

This issue is meant to document and track progress for any change(s) that may be required for this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.