Coder Social home page Coder Social logo

me-shweta / scrape-ml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from recode-hive/scrape-ml

0.0 0.0 0.0 92.25 MB

For new data generation Semi-supervised-sequence-learning-Project we have writtern a python script to fetch📊, data from the 💻, imdb website 🌐 and converted into txt files.

Home Page: https://scrape-review-analysis.streamlit.app/

License: MIT License

Python 0.15% HTML 3.40% Jupyter Notebook 96.46%

scrape-ml's Introduction

🎬IMDB Movie review Scrapping📊

Scrapping the movie review ✏️ using python programming language💻.

🔍Welcome to the IMDb Movie Review Scraper project! 🌟 This Python script is designed to scrape movie reviews from IMDb, providing valuable data for analysis and research purposes. The IMDb Movie Review Scraping project aims to gather a new dataset by automatically extracting movie reviews from IMDb. This dataset will support various natural language processing tasks, including sentiment analysis and recommendation systems. Using web scraping techniques, such as Beautiful Soup, movie reviews are collected, preprocessed, and structured into a CSV format suitable for analysis, including Support Vector Machine classification. 📈

Features

Semi-supervised-sequence-learning-Project : replication process is done over here and for further analysis creation of new data is required.

  1. Scraping Movie Reviews 🕵️‍♂️
  • Movie_review_imdb_scrapping.ipynb - The script fetches user reviews from IMDb, providing access to a diverse range of opinions and feedback for different movies. It utilizes BeautifulSoup, a powerful Python library for web scraping, to extract data from IMDb's web pages efficiently and accurately. 🎥🔎
  1. Customizable Scraper 🛠️
  • rename_files.ipynb - Users can customize the scraper to target specific time periods, ratings, and other parameters, enabling focused data collection based on their requirements. This flexibility allows researchers, analysts, and enthusiasts to tailor the scraping process to their specific needs. 🎯🔧
  1. CSV Output 📁
  • convert_texts_to_csv.ipynb - The scraped data is saved into a CSV file, allowing for easy import into data analysis software or further processing. The CSV format ensures compatibility with a wide range of tools and platforms, making it convenient to incorporate the scraped data into various workflows and projects. 💾💼

Getting Started

Dependencies

Make sure you have the following dependencies installed:

  • Python 3.x
  • BeautifulSoup (Install using pip install beautifulsoup4
  • Pandas (Install using pip install pandas

Installation

  1. Fork the Semi-supervised-sequence-learning-Project/ repository Link to `Semi-supervised-sequence-learning-Project' Follow these instructions on how to fork a repository

  2. Clone the repository to your local machine.

git clone [email protected]:your-username/Semi-supervised-sequence-learning-Project.git
  1. Clone the repository to your local machine.(from HTTPS)
https://github.com/your-username/Semi-supervised-sequence-learning-Project.git

Usage

  • Customize the scraper settings in the scraper.py file as per your requirements. This includes specifying the time period, ratings, and any other parameters you want to filter by.

  • Run the scraper.py script:

    python scraper.py

  • The scraped data will be saved into a CSV file named data.csv in the data_scrapped directory.

Contribution

🎉Contributions are welcome! If you have any suggestions for improvements or new features, please feel free to submit a pull request. Your contributions help make this project better for everyone. 🚀

Final Dataset

🔬Here is the Link to Final Dataset: Drive Link containing the scraped IMDb movie reviews. This dataset can be used for analysis, research, or any other purposes you require. 📦

Support

🤝For any issues regarding the scraper, feel free to open an issue on GitHub. We'll be happy to assist you with any problems or inquiries you may have. 🛠️

scrape-ml's People

Contributors

sanjay-kv avatar revanth1718 avatar asymtode712 avatar sanikaahadap avatar devidutta-learn avatar yashwe0 avatar iabn0rma1 avatar vikranth3140 avatar litesh1123 avatar kairveeehh avatar sujanrupu avatar soubeer avatar saksh8 avatar merciajeno avatar amroodh avatar abhay182005dat avatar psyuktha avatar aryan1165 avatar varsagupta avatar suhanipaliwal avatar somya2115 avatar sanjanabankar avatar saitejaswanibikkasani avatar poojaharihar03 avatar pooja8748 avatar mahimaramireddy avatar gss0c24 avatar ismokedata avatar cheshta17 avatar ankitmodanwall avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.