Universal Web Scraping Agent using LLMs

This project demonstrates a universal web scraping agent that uses the Firecrawl framework to scrape data from the web and utilizes a large language model (LLM) to format the data into a JSON response. This project was demonstrated during a peer learning session to illustrate web scraping techniques and data extraction.

Introduction
Features
Installation
Usage
Project Structure
Requirements
Contributions
Contact

Introduction

The universal web scraping agent is designed to scrape data from any web page using the Firecrawl framework and format the scraped data using a large language model (LLM) provided by Groq. The output is a JSON object containing the structured data extracted from the web page.

Features

Scrapes data from any web page using the Firecrawl framework. Formats the scraped data into a structured JSON object using Groq's LLM. Saves the raw scraped data in markdown format. Saves the formatted data in both JSON and Excel formats.

Installation

To get started with this project, clone the repository and install the required dependencies.

git clone https://github.com/badrinarayanan17/Scrape-It-With-LLM.git
cd Scrape-It-With-LLM
pip install -r requirements.txt

Usage

Create a .env file in the project directory and add your API keys for Firecrawl and Groq.

FIRECRAWL_API_KEY=your_firecrawl_api_key

GROQ_API_KEY=your_groq_api_key
Run the script to scrape data from a specified URL and format it.

python app.py
The raw and formatted data will be saved in the output folder.

Project Structure

Scrape-It-With-LLM/

├── app.py

├── requirements.txt

├── .env

├── output/

│ ├── rawData_.md

│ ├── sorted_data_.json

│ ├── sorted_data_.xlsx

└── README.md

Requirements

firecrawl-py
pandas
openpyxl
py-dotenv
groq

Contributions

Contributions are welcome! Please open an issue or submit a pull request with your changes.

Contact

Name: BadriNarayanan S

Email: [email protected]

LinkedIn: BadriNarayanan S

badrinarayanan17 / scrape-it-with-llm Goto Github PK

scrape-it-with-llm's Introduction

Universal Web Scraping Agent using LLMs

Contents

Introduction

Features

Installation

Usage

Project Structure

Requirements

Contributions

Contact

scrape-it-with-llm's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent