thefakequake / pypartpicker Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 8.0 125 KB

A Python package that can be used to fetch information from PCPartPicker on products and parts lists.

License: MIT License

Python 100.00%

pypartpicker's People

Contributors

Stargazers

Watchers

Forkers

lukadd16 hotdogfinba11 robbrazier terminalswagdisorder thefroh not-neko barisr333 giuseppe99barchetta

pypartpicker's Issues

Issue scraping prices

Searching for prices using the following code

from pypartpicker import Scraper

# creates the scraper object
pcpp = Scraper()
# returns a list of Part objects we can iterate through
print ("Including entries without prices")
parts = pcpp.part_search("AMD Ryzen", region="uk")

# iterates through every part object
for part in parts:
    # prints the name and price of the part
    print(part.name)
    print(part.price)  

print("Excluding entries without prices")
for part in parts:
    if part.price is not None:
        # prints the name and price of the part
        print(part.name)
        print(part.price)

Yields the following results

But upon checking the results for the same search on pcpp, you can see that many of these cpus do have prices

Am I missing something or are the prices not being scraped correctly?

Question about scraping parts

Is there a way to search / scrape by the type of product? Like for example just scrape all CPU parts. Thanks. Awesome app :)

User (Saved) Parts Lists Not Supported

On to a more code-oriented issue now...

Looking at the codebase it seems like your regex along with other logic contained within parts list-related functions only check for URLs that contain /list/. However, this does not account for saved parts lists which go by the format /user/myusername/saved/id/ (unless you choose to edit this parts list in which case PCPP proceeds to convert it to a normal /list/). This leads to a plain Exception being thrown (which you might want to change to a ValueError or choose to create a custom exception class called IllegalArgumentException that inherits ValueError) even though in theory the underlying code could support such a list.

Is there any reason why saved parts lists couldn't be supported? If there aren't any reasons against, then I want to ask if the team was already aware of this "bug" and further when we can hope to see it fixed (which I can help with if needed).

Questions about the Project

Given that this project is in a public repository and is "open source", this (at least to me) implies you are willing to accept contributions from the community (whether that be people who use your software or who happen to stumble upon it). That being said, there is no publicly available roadmap for this project that lists planned features nor are there any indicators in general as to what prospective contributors could be helping out with (and contributions don't always need to take the form of code either).

If I'm wrong in concluding that this project is open to contributions, then you might want to rethink the decision to make this repository public or at least make it obvious in the README that contributions aren't being accepted.

However, if my observation is correct, then I have a few suggestions as to how you can make this project more inviting and organized:

Create a CONTRIBUTING.md file that outlines how and in what capacity someone can contribute to this project. This file is also the place to mention any idiosyncrasies of your development process and any specific guidelines that a code contributor should follow (e.g. lint your code before committing).
Create GitHub issue, bug fix, pull request and feature request templates. This ensures that a consistent format is used in all future issues/PRs and allows you to specify what kind of information you want from the person opening the issue or PR (e.g. Describe the bug you are experiencing, list steps to reproduce this bug, etc).
(a bit off topic but) Add useful GitHub actions to the repository, things like linting Python code when changes are pushed to master branch or automatically generated releases are all possible.
Create a Project Board and/or add a section in the README that lists some short-term/long-term goals you have for this project.

If you agree with some of these suggestions, then I'd be more than happy to help you implement them - just let me know.

Scrape only 1st page

Hi, is it possible to scrape more than 20 products?

I suppose that right now the code can scrape only the 1st page of the search, right?

Issue scraping AttributeError: 'NoneType' object has no attribute 'get_text'

Script was working fine till last week but now it generates "AttributeError: 'NoneType' object has no attribute 'get_text'" error all the time

from pypartpicker import Scraper
from time import sleep

# returns a list of Part objects we can iterate through
# the region is set to "uk" so that we get prices in GBP
pcpp = Scraper(headers={ "cookie": "cf_clearance=XXXXXXXXXXXXXXXXXXXXXXXX; domain=.pcpartpicker.com; HttpOnly; Secure;", "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36" })
parts = pcpp.part_search("i3", region="uk")

# iterates through the parts
for part in parts:
    # checks if the price is lower than 110
    if float(part.price.strip("£")) <= 110:
        print(f"I found a valid product: {part.name}")
        print(f"Here is the link: {part.url}")
        # gets the product object for the parts
        product = pcpp.fetch_product(part.url)
        print(product.specs)
        # makes sure the product has reviews
        if product.reviews != None:
            # gets the first review
            review = product.reviews[0]
            print(f"Posted by {review.author}: {review.content}")
            print(f"They rated this product {review.rating}/5.")
        else:
            print("There are no reviews on this product!")
            
    # slows down the program so as not to spam PCPartPicker and potentially get IP banned
    sleep(3)

part_search() error

AttributeError: 'NoneType' object has no attribute 'get_text'

Concerns About Project License

This project's license has a "YOUR NAME" field in it rather than an actual name, which tells me that the license was just an afterthought rather than viewed as a requirement. Either that or you were hesitant to put your legal name in.

If the latter is true, then I'll have you know that putting down your legal name in a software license isn't a requirement. A pseudonym (i.e. your GitHub username QuaKe8782) is sufficient enough, so long as you are aware that in a court of law you may need to be able to prove your identity (which in theory is possible with things like public PGP keys).

I'm clearly not a lawyer and you have every right to ignore the points brought up in this GitHub issue, but it's still a valid question (that should be answered sooner rather than later).

NoneType object has no attribute get_text apparently

I tried to reinstall with pip but nothing changed

Input:
from pypartpicker import Scraper
Scraper().part_search("i5")
Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/username/.local/lib/python3.10/site-packages/pypartpicker/scraper.py", line 232, in part_search
    soup = self.__make_soup(f"{search_link}&page={i + 1}")
  File "/home/username/.local/lib/python3.10/site-packages/pypartpicker/scraper.py", line 94, in __make_soup
    if "Verification" in soup.find(class_="pageTitle").get_text():
AttributeError: 'NoneType' object has no attribute 'get_text'

Product doesn't properly inherit the Part's image url

from pypartpicker import Scraper
pcpp = Scraper()

cpus = pcpp.part_search("AMD Ryzen 7 5800X")

# Properly prints: https://cdna.pcpartpicker.com/static/forever/images/product/9b4cefb2e43f2c358f3a97a31e1be90b.256p.jpg
print(cpus[0].image)

# Turning the Part into a Product
product = pcpp.fetch_product(cpus[0].url)

# Prints None
print(product.image)

When i run this code, the first print statement returns the proper link, but the second one returns None and this has happened for every part/product combo I've tested. Not too big of an issue since you can at least get the image from the Part, but you might want to add this in your README as a note.

Why is part_search limited to 20 items?

When searching for parts and trying to set the limit higher than 20, it defaults to 20. Why? Is this a PCPP TOS limitation? What happens if the max limit is increased?

v1.9.0 seems to contain changes of v1.8.0

Apologies only just got around to pulling my changes :(
I tried pulling a few times just to make sure it wasn't a caching issue, but v1.9.0 in pypi.org seems to still have the same contents as v1.8.0 - none of my changes are present

I did a couple of readme changes for new documentation as well which doesn't look like it's in the description of https://pypi.org/project/pypartpicker/1.9.0/

Would you be able to try re-releasing? or maybe do 1.9.1?

Fetching list fails because of Cloudflare check

I'm developing a Discord bot and used this library to post an embed with the details of a PCPP list whenever the link for one is posted by a user. Here's my code:

import discord
from discord.ext import commands
from pypartpicker import Scraper, get_list_links

@commands.Cog.listener()
async def on_message(self, message):
    if len(get_list_links(message.content)) >= 1:
        pcpp = Scraper()
        link = get_list_links(message.content)[0]
        list = pcpp.fetch_list(link)

        description = ""
        for part in list.parts:
            description = description + f"**{part.type}:** {part.name} **({part.price})**\n"
        description = description + f"\n**Estimated Wattage:** {list.wattage}\n**Price:** {list.total}"

        embed = discord.Embed(title="PCPartPicker List", url=link, description=description, color=0x00a0a0)
        await message.channel.send(embed=embed)

It works perfectly for me, here's the result:

The only problem is, the bot is hosted by my friend @Philipp-spec in Germany, and to view PCPartPicker lists he needs to go through a Cloudflare check first. As a result, the bot gives this error whenever it tries to scrape a page:

Ignoring exception in on_message
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/discord/client.py", line 343, in _run_event
    await coro(*args, **kwargs)
  File "/home/tsc/TSCBot-py/cogs/listeners.py", line 10, in on_message
    list = pcpp.fetch_list(link)
  File "/usr/lib/python3.9/site-packages/pypartpicker/scraper.py", line 106, in fetch_list
    soup = self.__make_soup(list_url)
  File "/usr/lib/python3.9/site-packages/pypartpicker/scraper.py", line 84, in __make_soup
    if "Verification" in soup.find(class_="pageTitle").get_text():
AttributeError: 'NoneType' object has no attribute 'get_text'

Is there any way to fix this? The whole point of the Cloudflare check is to make sure you're not a bot…
Best way to reproduce is with a VPN, though sometimes it doesn't give you the check.

soup.find() returning 'None', resulting in AttributeError: 'NoneType' object has no attribute 'get_text

Traceback (most recent call last):
File "/workspaces/pcpartpickertest/index.py", line 7, in
parts = pcpp.part_search("i7")
File "/home/codespace/.python/current/lib/python3.10/site-packages/pypartpicker/scraper.py", line 232, in part_search
soup = self.__make_soup(f"{search_link}&page={i + 1}")
File "/home/codespace/.python/current/lib/python3.10/site-packages/pypartpicker/scraper.py", line 95, in _make_soup
if "Verification" in soup.find(class="pageTitle").get_text():
AttributeError: 'NoneType' object has no attribute 'get_text'

The error is returned by this example code:

from pypartpicker import Scraper

pcpp = Scraper()
parts = pcpp.part_search("i7")


for part in parts:
    print(part.name)

first_product_url = parts[0].url
product = pcpp.fetch_product(first_product_url)
print(product.specs)`

The issue is with the last line of code in the error message, which tells us that it occurs at scraper.py, line 95, in __make_soup.

The key is in the statement:

if "Verification" in soup.find(class_="pageTitle").get_text():

The call to soup.find should return some object reference on which a call to get_text will return some data. But if soup.find does not succeed it does not return any object (in reality it returns None). So the call to get_text is impossible because there is no object to call it from. Which results in the error message: AttributeError: 'NoneType' object has no attribute 'get_text'

This may be a problem with the scraper.py code or the Soup package itself.

key error on fetch_product

Hi,

i have a key error on fetch_product

Traceback (most recent call last):
  File "/Users/admin/projects/pypartpicker/test.py", line 20, in <module>
    product = pcpp.fetch_product(first_product_url)
  File "/Users/admin/opt/anaconda3/lib/python3.9/site-packages/pypartpicker/scraper.py", line 409, in fetch_product
    + review.find(class_="userDetails__userName")["href"],
  File "/Users/admin/opt/anaconda3/lib/python3.9/site-packages/bs4/element.py", line 1519, in __getitem__
    return self.attrs[key]
KeyError: 'href'

Incorporate a Versioning Standard

Consider using a versioning standard so as to make clear to your user-base if a new release merely addresses bug fixes, adds new features or if it introduces breaking changes.

I personally suggest using the Semantic Versioning (SemVer) specification as it is widely used by the open source community and thus can be easily understood by any and all who may want to use/are already using your software. Being a specification it has certain set rules, which is a good thing as it leaves no opportunity for ambiguity.

The full spec is linked above (which I suggest you read), but in short SemVer specifies version numbers with three numerical components called MAJOR.MINOR.PATCH (ex: 1.2.1).

Increment the MAJOR version when you make incompatible (breaking) API changes.
Increment the MINOR version when you add functionality in a backwards-compatible manner.
Increment the PATCH version when you make backwards-compatible bug fixes.

Take the latest release v1.7 for example. This release bumped the package version from 1.6 to 1.7 as a result of commit e4f899e. Looking into the nature of this commit reveals it was simply a bug fix, yet in terms of Semantic Versioning you bumped the MINOR version. This gave me a false impression as to what the new release was addressing (and is also what motivated me to open this issue). I guess the point I'm trying to make is that adherence to a standard like SemVer is important for packages/softwares that are dependent on pypartpicker since a version number alone can tell a lot. In production, I may be more inclined to update to new releases that only bump PATCH rather than ones that bump MINOR and MAJOR since I know PATCHes won't break my code.

This last bit is out of scope for this issue but in tandem with SemVer I also like to use the Release Please GitHub action which automatically generates GitHub releases depending on what type of commits you make...but that's a discussion for another day (one which I'd be happy to have).