Coder Social home page Coder Social logo

dsc-scraping-concerts-lab-online-ds-ft-090919's Introduction

Scraping Concerts - Lab

Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!

In this lab, you'll practice your scraping skills on an online music magazine and events website called Resident Advisor.

Objectives

You will be able to:

  • Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

View the Website

For this lab, you'll be scraping the https://ra.co website. For reproducibility we will use the Internet Archive Wayback Machine to retrieve a version of this page from March 2019.

Start by navigating to the events page here in your browser. It should look something like this:

Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

Write a Function to Scrape all of the Events on the Given Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, and Number_of_Attendees.

Start by importing the relevant libraries, making a request to the relevant URL, and exploring the contents of the response with BeautifulSoup. Then fill in the scrape_events function with the relevant code.

# Relevant imports
EVENTS_PAGE_URL = "https://web.archive.org/web/20210326225933/https://ra.co/events/us/newyork?week=2019-03-30"

# Exploration: making the request and parsing the response
# Find the container with event listings in it
# Find a list of events by date within that container
# Extract the date (e.g. Sat, 30 Mar) from one of those containers
# Extract the name, venue, and number of attendees from one of the
# events within that container
# Loop over all of the event entries, extract this information
# from each, and assemble a dataframe
# Bring it all together in a function that makes the request, gets the
# list of entries from the response, loops over that list to extract the
# name, venue, date, and number of attendees for each event, and returns
# that list of events as a dataframe

def scrape_events(events_page_url):
    #Your code here
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df
# Test out your function
scrape_events(EVENTS_PAGE_URL)

Write a Function to Retrieve the URL for the Next Page

As you scroll down, there should be a button labeled "Next Week" that will take you to the next page of events. Write code to find that button and extract the URL from it.

This is a relative path, so make sure you add https://web.archive.org to the front to get the URL.

next page

# Find the button, find the relative path, create the URL for the current `soup`
# Fill in this function, to take in the current page's URL and return the
# next page's URL
def next_page(url):
    #Your code here
    return next_page_url
# Test out your function
next_page(EVENTS_PAGE_URL)

Scrape the Next 500 Events

In other words, repeatedly call scrape_events and next_page until you have assembled a dataframe with at least 500 rows.

Display the data sorted by the number of attendees, greatest to least.

We recommend adding a brief time.sleep call between requests.get calls to avoid rate limiting.

# Your code here

Summary

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!

dsc-scraping-concerts-lab-online-ds-ft-090919's People

Contributors

mathymitchell avatar mas16 avatar hoffm386 avatar

Watchers

James Cloos avatar  avatar Mohawk Greene avatar Victoria Thevenot avatar Belinda Black avatar Bernard Mordan avatar Otha avatar raza jafri avatar  avatar Joe Cardarelli avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar  avatar Matt avatar Antoin avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar Ahmed avatar Nicole Kroese  avatar Kaeland Chatman avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.