Coder Social home page Coder Social logo

neeharika-sonowal / myanimelist-dataset-crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from debasish-dutta/myanimelist-dataset-crawler

0.0 0.0 0.0 7.9 MB

This repo contains the python web crawler that scrapes the anime data as well as user data for use in analysis.

License: GNU General Public License v3.0

Python 100.00%

myanimelist-dataset-crawler's Introduction

Anime-User dataset crawler

This repo contains the file used to create the dataset which contains the animelist from the MyAnimeList website as well as the userdata. It uses python as the underlying language and an unofficial MAL API Jikan to scrape the data as well as BeautifullSoup4.

Get anime list:

For the anime list file you just need a range to go through the mal_id from 1 till the user limit.

Just edit the input in the file under the function call in the main function.

Column metadata:

  • animeID: id of anime as in anime url https://myanimelist.net/anime/ID
  • name: title of anime
  • premiered: premiered on. default format (season year)
  • genre: list of genre
  • type: type of anime (example TV, Movie etc)
  • episodes: number of episodes
  • studios: list of studio
  • source: source of anime (example original, manga, game etc)
  • scored: score of anime
  • scoredBy: number of member scored the anime
  • members: number of member added anime to their list

Get user data:

For the userdata you can use the following script to get all the userdata. This script uses Jikan API to get the data as well as BS4 to get the usernames from the MAL website directly as you need the username to get the user data directly.

Column metadata:

  • user_id: id of user
  • username: username of the user
  • gender: gender of the user
  • birthday: birthday of the user
  • location: location of the user
  • joined: date joined
  • days_watched: days spent watching,
  • mean_score: mean score rated,
  • watching: total animes currently watching,
  • completed: total anime completed,
  • on_hold: total anime on hold,
  • dropped: total anime dropped,
  • plan_to_watch: anime planned to watch,
  • total_entries: total animes,
  • rewatched: animes rewatched,
  • episodes_watched: total episodes watched

Syntax

python getUser.py UserList.txt user.csv

How to create User List from forum post:

For this you need to get topic ID. Go to MAL -> Community -> Forums -> Select a forum

For example for the following forums links their respective ID are highlighted in bold below:

https://myanimelist.net/forum/?topicid=1699126 -> 1699126

https://myanimelist.net/forum/?topicid=1696289 -> 1696289

After getting the topic ID, you can use createUserListFromPost script.

Syntax:
python getUserFromPost.py topicID UserList.txt

How to create User List from club:

For this you need to get club ID. Go to MAL -> Community -> Clubs -> Select a club

For example for the following clubs links their respective ID are highlighted in red below:

https://myanimelist.net/clubs.php?cid=72250 -> 72250

https://myanimelist.net/clubs.php?cid=32683 -> 32683

After getting the topic ID, you can use createUserListFromClub script.

Syntax:
python getUserFromClub.py clubID UserList.txt

myanimelist-dataset-crawler's People

Contributors

debasish-dutta avatar neeharika-sonowal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.