Coder Social home page Coder Social logo

youtube-scraper's Introduction

Background:

This youtube scraper takes a keyword and extracts the youtube URLs related to it. The data point it collects are:

Column Description
video_id id of the Youtube video
video_url url of the Youtube video
published_at date when the video was published
channel_id id of the channel that posted the video
channel_title name of the channel
title title of the video
view_count number of views of the Youtube video
like_count number of likes of the Youtube video
comment_count number of comments of the Youtube video
duration duration of the video

File Structure

PROJECT: youtube-scraper
|
│   .gitignore
│   README.md
│   requirements.txt
│
└───scraper
        main.py -> script to run to do the scraping
        merger.py -> script the merge the output CSV's and remove duplicate videos
        scraper.py -> contains the YoutubeVideoScraper class
        config.py -> this is where you'll put your YOUTUBE API KEY

Setup

  • Open a terminal and clone this repository
  • Once done, cd to the folder
  • In the terminal, create a Python virtual environment using:
Windows
> python -m virtualenv .venv
  • Activate the environment by typing in your terminal
Windows
> .venv\Scripts\activate.bat
  • Install dependencies by typing in your terminal
> python -m pip install -r requirements.txt

Usage

Scraping


  1. cd to scraper folder
  2. Input your API key in config.py
YOUTUBE_API_KEY = ''
  1. Input the word that you want to search for in main.py, as well as the export path. (You can also specify the MINUTES_LIMIT AND SEARCH_LIMIT if you want.)
from scraper import YoutubeVideoScraper
from config import YOUTUBE_API_KEY

API_KEY = YOUTUBE_API_KEY
KEYWORD = ''
EXPORT_PATH = ''
MINUTES_LIMIT = 
SEARCH_LIMIT = 
  1. Open your terminal, run python main.py

Merge


If you want to merge all the exported CSVs that you've scraped, you can run the merge.py file.

It consolidates everything, and remove videos that are duplicated from similar keywords or search word.

  1. Just input the file path where you've exported the data
import pandas as pd 
import os 

FILE_PATH = ''
  1. In your terminal, run python merger.py

youtube-scraper's People

Contributors

vincevertulfo avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.