Coder Social home page Coder Social logo

tresca-msw / 4cat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from digitalmethodsinitiative/4cat

0.0 0.0 0.0 80.11 MB

4CAT: Capture and Analysis Toolkit

License: Other

Python 48.91% JavaScript 38.70% CSS 2.07% HTML 10.03% Shell 0.12% PLpgSQL 0.11% Dockerfile 0.06%

4cat's Introduction

4CAT: Capture and Analysis Toolkit

DOI: 10.5281/zenodo.4742623 License: MPL 2.0 Requires Python 3.8 Docker Image CI Status

A screenshot of 4CAT, displaying its 'Create Dataset' interfaceA screenshot of 4CAT, displaying a network visualisation of a dataset

4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to make the capture and analysis of data from these platforms accessible to people through a web interface, without requiring any programming or web scraping skills. Our target audience is researchers, students and journalists interested using Digital Methods in their work.

In 4CAT, you create a dataset from a given platform according to a given set of parameters; the result of this (usually a CSV file containing matching items) can then be downloaded or analysed further with a suite of analytical 'processors', which range from simple frequency charts to more advanced analyses such as the generation and visualisation of word embedding models.

4CAT has a (growing) number of supported data sources corresponding to popular platforms that are part of the tool, but you can also add additional data sources using 4CAT's Python API. The following data sources are currently supported actively:

  • 4chan
  • 8kun
  • Bitchute
  • Parler
  • Reddit
  • Telegram
  • Twitter API (Academic Track, full-archive search)

The following platforms are supported through other tools, from which you can import data into 4CAT for analysis:

A number of other platforms have built-in support that is untested, or requires e.g. special API access. You can view the full list of data sources in the GitHub repository.

Install

You can install 4CAT locally or on a server via Docker or manually. The usual

docker-compose up

will work, but detailed and alternative installation instructions are available in our wiki. Currently 4chan, 8chan, and 8kun require additional steps; please see the wiki.

Please check our issues and create one if you experience any problems (pull requests are also very welcome).

Components

4CAT consists of several components, each in a separate folder:

  • backend: A standalone daemon that collects and processes data, as queued via the tool's web interface or API.
  • webtool: A Flask app that provides a web front-end to search and analyze the stored data with.
  • common: Assets and libraries.
  • datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
  • processors: A collection of data processing scripts that can plug into 4CAT and manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Credits & License

4CAT was created by OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by the TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT development is supported by the Dutch PDI-SSH foundation through the CAT4SMR project.

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Links

4cat's People

Contributors

stijn-uva avatar sal-uva avatar stijnstijn avatar dale-wahl avatar xmacex avatar dependabot[bot] avatar pgr-me avatar guidoajansen avatar kuchosauronad0 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.