Coder Social home page Coder Social logo

satellite's Introduction

Satellite: Mapping the Internet's Stars

Satellite is an open platform for measuring the accessibility and presence of websites on a regular basis. It consists of a zmap-based probing engine for both HTTP and DNS requests, an aggregation pipeline for data processing, and a web interface for interactive visualization.

Through this process we attempt to answer the high level question: How much of the global view of a remote domain can we understand from a single measurement machine?

We release the data we have collected using this platform at scans.io.

How does Satellite Work?

The top level functions of satellite are documented in the fullrun.sh file, which is generally scheduled as a cron. job. This script manages the following process:

  • Determine the list of domains of interest. Currently based off of the Alexa top 10,000 list.
  • Determine the list of servers of interest. Uses zmap to scan all ipv4 addresses on port 53, and a filtering process to extract a diverse and stable set of servers to use for subsequent interactions.
  • Query servers for domains of interest. A managed set of zmap runs. HTTP scans use a direct zmap query, while for efficiency the DNS scans use the custom udp_multi zmap probing module.
  • Aggregate results into more managable forms. Our initial aggregation converts from the raw packets received from remote hosts to the list of acceptable IPs seen from different remote machines. This produces a file that is much more amenable to further analysis.
  • File management: Archiving retrieved data, performing backups, keeping a clean workspace.

Installation

Contribution guidelines

  • Attempt to pass jslint as specified by the brackets editor.
  • Comments should go in a branch or fork and then be pull requested for review.

Contact

Files

  • asn_aggregation Contains scripts for compressing raw scans to ASN or Country level aggregates.

  • cluster_correlation Contains scripts for clustering domains based on similar DNS responses.

  • dns Contains scripts around dns specific packet generation & processing

  • favicon Contains scripts for fetching favicon information to test if an IP serves a domain.

  • http Contains scripts for testing whether IPs have servers on ports 443 and 80.

  • interference Contains scripts for detecting anomalies in ASN level behavior.

  • util Contains general-purpose utility scripts for working with zmap.

  • runs Contains raw data

  • temp Contains downloaded files used during scanning.

  • fullrun.sh Does a full satellite run!

  • generateStudyMetadata.js Creates the .study file needed by scans.io, and uploads files

  • package.json Lists node dependencies needed by the code.

Changelog

This attempts to keep track of the evolving Satellite code base, and explain major changes as they occur.

  • 06/28/2015 Initial aggregation is performed in parallel by multiple cores, reducing time to ~10 hours.

satellite's People

Contributors

scrivener avatar sidberg avatar willscott avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.