Coder Social home page Coder Social logo

clscraper's Introduction

CLscraper

A simple script that will send you emails when a craigslist search query gets new results, i.e. when something new is posted that matches your search criteria.

The program uses the Beautiful Soup package, to install run

sudo easy_install beautifulsoup4

Additionally, this program uses gmail to send email messages. You must set up your gmail account to work with "less secure apps" here

You can also make a new gmail account to do this if your main one has two factor authentification or you generally want stronger security.

Usage

To use the program, first go to craigslist and search for whatever you are looking for, then replace the tokens for URLs in CLscrapper.py with the URLs for the searches of insterest.

For instance, if you want an apartment in Pacific Beach, San Diego with at least 4 bedrooms and costing less than 3500 per month, the craigslist search string will be

https://sandiego.craigslist.org/search/apa?sort=date&availabilityMode=0&hasPic=1&max_price=3500&min_bedrooms=4&query=pacific%20beach

The same search in La Jolla, CA would be

https://sandiego.craigslist.org/search/apa?sort=date&availabilityMode=0&hasPic=1&max_price=3500&min_bedrooms=4&query=la%20jolla

In this example, replace

urls = ['SEARCH_URL_1_HERE','SEARCH_URL_2_HERE','SEARCH_URL_3_HERE','SEARCH_URL_4_HERE']

with

urls = ['https://sandiego.craigslist.org/search/apa?sort=date&availabilityMode=0&hasPic=1&max_price=3500&min_bedrooms=4&query=pacific%20beach', 'https://sandiego.craigslist.org/search/apa?sort=date&availabilityMode=0&hasPic=1&max_price=3500&min_bedrooms=4&query=la%20jolla']

Note: For searches with a lot of results, they will not all load on one page. To remedey this, make sure you have sorted the page by 'newest', which will be reflected in the URL as of May 2018 with the token sort=date. The default page count is 120, so as long as there are not 120 new posts within the time between searches, set by SLEEPTIME, you should still catch them all. But generally, this means it's better to use many specific searches than one broad search.

You can use any number of searches for any number of different things. By default, the code is configured to use send email updates using a gmail account. If you have a gmail account, you can simply replace the lines

username='GMAIL_USER' #gmail username
password='GMAIL_PASSWORD' #gmail password

with your login credentials and replace

fromaddr = '[email protected]'

with your gmail address. Then update

with a list of email addresses that should get the updates. You can make this list as long as you want. It's also fine to just email yourself, but this needs to stay a list.

The last two configurables are SLEEP_TIME and CHECK_OLD_LISTINGS. SLEEP_TIME is the number of seconds between searches, and CHECK_OLD_LISTINGS attempts to not send reposted listings.

This program must stay alive to work properly, you will need a machine that can stay on and connected to the internet. Run with

python CLscraper.py

clscraper's People

Contributors

bth5032 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.