Coder Social home page Coder Social logo

driscoll42 / ebaymarketanalyzer Goto Github PK

View Code? Open in Web Editor NEW
195.0 11.0 26.0 96.49 MB

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

Python 100.00%
ebay scraping-websites python webscraping

ebaymarketanalyzer's Introduction

eBay Marker Analyzer

Formerly eBay Sold Price Scraper

This code is free for use and I encourage others to use it for their projects. If you do I would love to see how you used it, shoot me an email or message if you're willing to share. Further, feel free to open up new issues for defects or new features. I can't promise to get to all of them, but I can try.

Formerly this program would scrape eBay automatically and compile statistics. However, eBay has added CAPTCHAs to their site which I will not attempt to break with proxies or automated solving. However, it is still very easy to get the XML manually and then this program will read through the XML, get the item details, and compile statistics for you. The steps are as follows:

1. Search eBay for whatever you are searching for
2. Make an XML folder in the same directory as this code
3. Inside that folder save the XML. I found it easiest to use Firefox => View Page Source => Copy into NotePad++ => Save (file name does not matter)
4. Run the script "run_manual_xml.py" where the parameter passed in has the same name as the folder

This program is built to scrape all sold item data from eBay for any particular item. It will save the data to an excel file and create a scatter plot of the sold prices by date along with the median plot line and trendline. Further if you enter in the MSRP, it will plot a line for that and the break even prices of scalpers (particularly relevant when this was written during the PS5, Zen 3, and Xbox Series X launch).

Note: If you need to do commercial research, make actual business decisions, etc... off of eBay data, I highly encourage you to use eBay's TeraPeaks instead. It goes back further in time, has more detail, is faster, and is officially supported, and as of mid-April, is free to use.

The code was used in a series of articles I wrote in late 2020 to early 2021:

Examples:

PS5 Example PS5 Rolling Average Example

Install Instructions

  • Create an Anaconda 3.8 python environment
  • Install packages in environment.yml or requirements.txt

How to Run

  • By default the program is setup to allow for easy scraping of CPUs, GPUs, Consoles, and Motherboards
  • There are a number of examples in run.py, see below for details on the main class and functions

ebay_search Parameters

  • query: str - The query you want to search on eBay, e.g. 'RTX 3080'
  • e_vars: EbayVariables - An instance of EbayVariables class that gets passed into the function.
  • query_exceptions: List - A list of exclusions to add to the query (e.g. ['pics', 'photos', 'paper']), these all get appended when eBay is searched
  • msrp: float - The MSRP of the item, useful when plotting to get an idea of what the price should be normally
  • min_price: float - The minimum price of the query you want to search on eBay for
  • max_price: float - The maximum price of the query you want to search on eBay for

EbayVariables Class Parameters

General Parameters

  • run_cached: bool - default=False: If True does not get new data from eBay, just runs the plots/analysis on the saved xlsx files. Most useful if want to get the data then run the plots using a different min date (e.g. for all time and then after post-launch only)
  • sleep_len: float - default=5: How long to wait between url calls. This is to prevent DoSing eBay's servers and having your connection killed

plotting Parameters

  • show_plots: bool - default=False: Whether to display plots as the code runs, always saves to a directory regardless
  • main_plot: bool - default=False: Whether to show the Sales Plot as the code runs, always saves to a directory regardless. If show_plots is False this is False
  • profit_plot: bool - default=False: Whether to show the Cumulative Profit plot as the code runs, always saves to a directory regardless. If show_plots is False this is False
  • trend_type: str - default='linear': What kind of Trendline to plot on the Sales Plot. Allowed values are "linear", " poly", "roll", or "none"
    • linear - Creates a Linear Regression trendline
    • poly - Creates a polynomial best fit line
    • roll - Creates a rolling average of the best fit line
    • none - Does not plot any trendline
  • trend_param: List[int] - default=[14]
    • linear - This should be a list with a single value, e.g. [14], how many days in the future it should project the trendline. If 0 it will not project at all.
    • poly - This should be a list with two values, e.g. [2, 14]. The first parameter is the degree of the polynomial, the second how many days in the future to project. The degree should be >=1 and the days should be >=01
    • roll - This should be a list with a single value, e.g. [7]. This is how many days to use for the rolling average
    • none - Does not matter what is in this field.

Search Parameters

  • sacat: int - default=0: Can filter down to a specific category on eBay (For example, video game consoles = 139971)

Rate Parameters

  • tax_rate: float - default=0.0625 - The tax rate to use when calculating profits
  • store_rate: float - default=0.04 - The rate to use for eBay stores when calculating profits
  • non_store_rate: float - default=0.1 - The rate to use for non-stores when calculating profits

Data Scraping Parameters

  • country: str - default='USA': Allows for searching of different countries, currently only supports 'USA' and 'UK'
  • ccode: str - default='$': What currency code to use when making plots
  • days_before: int - default=30: How far back in time to search listings. Ends the search at current date - days_before. Note: eBay only makes public data 90 days old so there's no point in making this greater than 90
  • feedback: bool - default=False: Gets the seller feedback for each sold item. WARNING: This explodes run times as the code needs to call the url of every single item. In testing the 5950X extract with this false takes 8 seconds, with True it takes 40 minutes the first time. This is forced True if full_quantity is True as there is no extra work to get the feedback
  • quantity_hist: bool - default=False: Gets the full sold history of a multi-item listing. WARNING: This explodes run times
  • desc_ignore_list: List[str] - default=[]: If populated, will check the sub_description field on eBay for keywords and if they exist, set ignore=1.

Misc. Parameters

  • extra_title_text: str - default='': Extra text to add to the file name and plot titles
  • brand_list: List[str] - default=[]: If populated, will search for brands in the list in the title and populate a column with the brand found. This is case insensitive.
  • model_list: List[str] - default=[]: If populated, will search for models in the list in the title and populate a column with the model found. This is case insensitive.

debugging parameters

  • debug: bool - default=False: If True prints out values found as the program finds them
  • verbose: bool - default=False: If True prints out a number of exception statements, useful for debugging code issues. If you encounter a problem with the code it is VERY helpful if you set this to True, rerun it, and attach the output

median_plotting Parameters

TO DO

ebay_seller_plot Parameters

TO DO

brand_plot Parameters

TO DO

FAQ

  • This is awesome! But it takes forever to run, what can I do to make it faster?

    • There are a number of variables you can set to speed up the program
    1. query - Obviously make this as specific as possible
    2. query_exceptions - Any minus conditions (e.g. if you're searching for 3070s but don't want EVGA, add EVGA to query_exclusions to filter those out)
    3. min_msrp/max_msrp - Set the minimum and maximum prices you want the program to search between
    4. sacat - If you want to search an item, choose the most specific category on eBay. For example, if you want to search for 3080s they fall under:
    1. quantity_hist - If you don't need to capture every single sale on eBay, and just need it mostly accurate, set quantity_hist=False. This value is when you want to go into a listing which has multiple sales and get all those sales. This requires more query calls to eBay and takes longer. Most sales are not multilistings so this normally will not result in a large difference, but it depends on each item
    2. feedback - If you don't care about getting the seller feedback, knowing if the seller is a store, the city, state, and country of the seller, this can be False. Note that if quantity_hist is True, this is value doesn't matter as getting to the sale history requires going to the item page which gets this info
    3. sleep_len - This is a sleep timer added to the code to reduce load on eBay's servers. If this is too low eBay will terminaate your connection. Also if too low eBay will start giving CAPTHCAs, but only on the multi listing sales history page. If you have quantity_hist = False this can definitely be set lower. However, it's also just polite to not have this too low.

The quantity_hist and feedback settings are the two which will most dramatically improve your run times, but they also reduce the amount of data you get. All depends on what data you need or don't need.

Release History

  • 0.1.0
    • The first proper release
  • 0.5.0
    • Added a number of performance enhancements and ensuring correct data being scraped

ebaymarketanalyzer's People

Contributors

driscoll42 avatar maximo1491 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ebaymarketanalyzer's Issues

Beautify the Graphs

Is your feature request related to a problem? Please describe.
Let's be real, the graphs are "eh" in terms of prettiness. They work, but they could look far nicer.

Describe the solution you'd like
Don't know yet, I need to research data visualizations. If anyone has suggestions I'm very open, this is not a strong suit of mine.

Describe alternatives you've considered
Leave them be.

Additional context
None.

Add Integration Tests

Is your feature request related to a problem? Please describe.
These might not be as required, but add in Integration testing

Implement Contributing Guidelines

Is your feature request related to a problem? Please describe.
To make the project more sustainable and clear, implementing GitHub Best Practices

Describe the solution you'd like
Create Contributing Guidelines

Describe alternatives you've considered
None really

Additional context
Will likely need to Google examples

Move the filter search conditions from the main query into a parameter

Is your feature request related to a problem? Please describe.
Right now when searching the main query is forced to be similar to '5950X -image -jpeg -img -picture -pic -jpg' to remove false positives. this makes the final excel outputs be in the form of "5950X -image -jpeg -img -picture -pic -jpg.xlsx" which is cumbersome

Describe the solution you'd like
It would be more legible and flexible to make the filters a parameter like ['image', 'jpeg', 'img', 'picture', 'pic', 'jpg'] to make the final outputs cleaner and to make it easier to add more filters to runs without needing to update each one. Functionality should still be allowed to directly add the -condition to the query if needed. For example RX 6800 -XT would be important to keep in the main query itself.

Describe alternatives you've considered
Use as currently working.

Additional context
None.

Group Images/Spreadsheets by Family

Is your feature request related to a problem? Please describe.
The Images and Spreadsheets folders get very cumbersome once you start graphing a large number of queries.

Describe the solution you'd like
Group all the Images by the family (e.g. Zen+, RTX 30 Series, etc...). This would require a new parameter to the program and to refactor the individual runs under a single family

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Use the min_date functionality to improve the binary search

Is your feature request related to a problem? Please describe.
Right now when a search is done, the code keeps splitting an ebay search into chunks until they are 800 items or fewer. This is because eBay only lets you go back 800 items at a time. To get around this the code changes the min/max price until there are 800 items or fewer. This can result in inefficiencies as it may scan over already scarped items.

Describe the solution you'd like
Since the items are sorted by date ended, if we use the min_date parameter and the most recent sold date already scraped, we can split if there are fewer than 800 items, or on page 4 of the ebay search the min date is >= most recent sold date - min_Date

Make it easier to call the appropriate plot functions

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Move eBay_plot out of ebay_scrape

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Run code and automate pylint for PEP8 standards

Is your feature request related to a problem? Please describe.
The code is not up to PEP8 standards, this should be fixed.

Describe the solution you'd like
Setup PyCharm to run it on each commit, and when I set up GitHub actions and pull request template make sure it's required.

Create a Readthedocs site

Is your feature request related to a problem? Please describe.
Currently the documentation is limited and scattered

Describe the solution you'd like
Create a readthedocs site to document everything better

Add Pull Request Template

Is your feature request related to a problem? Please describe.
To make the project more sustainable and clear, implementing GitHub Best Practices

Describe the solution you'd like
Create a Pull Request Template

Describe alternatives you've considered
None really

Additional context
Will likely need to Google examples

KeyError was raised

ASUS Dark Hero -image -jpeg -img -picture -pic -jpg
Traceback (most recent call last):

File "/Users/caleb/opt/anaconda3/envs/Ebayscraper/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)

File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Ignore'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/Users/caleb/Desktop/Ebay Scrapper/main.py", line 1074, in
df_darkhero = ebay_search('ASUS Dark Hero -image -jpeg -img -picture -pic -jpg', http, 399, 400, 1000,

File "/Users/caleb/Desktop/Ebay Scrapper/main.py", line 618, in ebay_search
df = df[df['Ignore'] == 0]

File "/Users/caleb/opt/anaconda3/envs/Ebayscraper/lib/python3.8/site-packages/pandas/core/frame.py", line 2899, in getitem
indexer = self.columns.get_loc(key)

File "/Users/caleb/opt/anaconda3/envs/Ebayscraper/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
raise KeyError(key) from err

KeyError: 'Ignore'

Add a requirements.txt

Describe the solution you'd like
I have an environments.yml, but some users would probably prefer a requirements.txt

Add in Coverage Testing

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Add in Mutation Testing

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Hide Total Sold Line + trendline option

Is your feature request related to a problem? Please describe.
The Total sold listings can make the plot look a bit dirty, and the trendline can be absolutely useless at times.

Describe the solution you'd like
Add an option to suppress those lines

Describe alternatives you've considered
None.

Additional context
None.

Add type hinting

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Dashed lines overlapping

Describe the bug
Think I found the issue. mymodel and x end up with duplicate values -- so the line should be on (X,Y) pairs, but it ends up with one pair per card sold on a given date. Here's debug data for 41 RX 6800 cards sold last week:

(Found by Jarred Walton of Tom's Hardware)

To Reproduce
Create any plot with a dashed line

Expected behavior
Should have clear dashes

Screenshots
Bad:
image

Good:
image

Desktop (please complete the following information):

  • OS: Windows

Apparent fix is

res = list(set(x))
mymodel = list(map(myfunc, res))

(First basically removes x duplicates)

Maybe also change the linestyle of the plot

Add in Functional Tests

Is your feature request related to a problem? Please describe.
Create Functional Tests to verify the code on commit

Make setting to show or not show plots

Is your feature request related to a problem? Please describe.
When the script is set to run as a batch job, because of the plt.show() function calls it makes a ton of popups on the screen of the plots. The plots don't need to be displayed, just saved.

Describe the solution you'd like
Add another option to ebay_scrape.py to allow/disallow the plots from being show. so all lines where it was:

plt.show()

Have:

if show_plots: plt.show()

Save Log output to file instead of just printing

Is your feature request related to a problem? Please describe.
Currently the verbose/debug output gets printed to the python terminal. In case you run again, want to debug later, etc... it gets overwritten

Describe the solution you'd like
Save the verbose to a log file for easier reference and debugging.

Assign the query to the item name

Is your feature request related to a problem? Please describe.
Currently to make the ebay_seller_plot the code requires, for example:

df_5600X = df_5600x.assign(item='5600X')
df_5800X = df_5800x.assign(item='5800X')
df_5900X = df_5900x.assign(item='5900X')
df_5950X = df_5950x.assign(item='5950X')

Describe the solution you'd like
The text is the same as the query and should just be part of the process of creating the df that the query, with anything "-XT" for instance removed, assigned as the item val. This should be after the df is saved to an excel file though.

Allow for MSRP to change over time

Is your feature request related to a problem? Please describe.
MSRP prices can change over time for various reasons, and currently the code cannot handle this. It assumes the MSRP is constant over time. This is an issue for instance when the tariffs went into affect on graphics cards on 1/1/21 and the code calculates profits as if it were the old price.

Describe the solution you'd like
The user should be able to set the MSRP to change on a certain date and the code will use the new price in future calculations.

Pull out repeated variables to a class for more pythonic code

Is your feature request related to a problem? Please describe.
There are a number of variables which get passed down a chain of function calls, ebay_search => ebay_scrape => get_quantity_hist, etc... It just makes the code messy and often it's easy to miss one if adding a new variable.

Describe the solution you'd like
Make a Dataclass (https://www.geeksforgeeks.org/understanding-python-dataclasses/) to store all the variables and just call that instead

Check Data Integrity before saving and output errors

Is your feature request related to a problem? Please describe.
Sometimes the program is forced to put in a '' or "None" value in a field because it can't find that data.

Describe the solution you'd like
When that happens, the program should do a data integrity check for those kinds of values and output a warning to the user to have them go in and check it.

Add filter for eBay Descriptions

Is your feature request related to a problem? Please describe.
In the descriptions on eBay there's often things worth filtering, this is typically where someone would list "Parts Only"

Describe the solution you'd like
Add a filter to filter those out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.