This project uses BeautifulSoup and Selenium to extract product information (name, original price, and current price) from Daraz, a popular online shopping platform in Nepal.
- BeautifulSoup
- Selenium
- lxml
- Install Python 3 on your system if you don't have it already.
- Clone this repository to your local system.
- Open terminal in the project folder.
- Run the following command to install the required packages:
- pip install beautifulsoup
- pip install lxml
- pip install selenium
- Run the script in terminal: main.py
- Enter the URL of the product you want to scrape from Daraz's official websites.
- Enter your budget range and get list of product came under your budget range.
- First the script creates an instance of the "FirefoxOptions" class and sets the browser to run on headless mode by adding "--headless" argument to the options.
- Then, the script opens a web driver using options and loads the website.
- After the website is loaded, the script waits for the dynamic content to load.
- The HTML source code is then extracted and parsed using BeautifulSoup.
- The required product information is extracted by finding specific HTML elements and their classes/ids.
- The extracted data is then printed in formatted manner.
- Finally,the web driver is closed.
This project is just a basic example of web scraping with BeautifulSoup and Selenium. The code can be furthur optimized and extende to scrape more information and more complex operations.