Coder Social home page Coder Social logo

maws_script's Introduction

IEEE Slack Member Webscraper

This repository contains Python scripts that automate Slack member management by scraping email queries from the Cal Directory https://www.berkeley.edu/directory. The resulting .csv file will contain emails of people who no longer attend Berkeley, meaning these emails can be deleted from the Slack. Written and maintained by the Technical Operations committee of IEEE@berkeley.

System Requirements

I recommend using a virtual environment to run these scripts. More information can be found at: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/.

Python Version

Python 3.9.1

Necessary Packages

pandas, time, requests, sys, bs4

csv_merger.py

This script is meant for merging emails from different .csv files together. It can be used to get rid of duplicate emails and outputs a .csv file with all unique emails from the .csv files provided as input.

Usage: python3 webscraper_script.py filename_1 colname_1 filename_2 colname_2 ...

filename_x: the xth .csv file to merge
colname_x: the name of the column that contains email addresses for the xth .csv file

Output: Result will be put in all_emails.csv after script completes

directory_scraper.py

This script is meant for the actual webscraping. It scrapes email queries from the Cal Directory, keeping track of whether the email was present in the directory or not. In order to avoid getting a 403 forbidden error for requesting too quickly, the script waits 5 seconds between each query. So, leave it on in the background while you do something else.

Usage: Usage: python3 webscraper_script.py filename colname

filename: the name of the .csv file containing the emails to scrape
colname: the name of the column that contains email addresses for the .csv file

Output: Result will be put in graduated_members.csv after script completes

maws_script's People

Contributors

edwinlim0919 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.