Coder Social home page Coder Social logo

fp_book_recommender's Introduction

Iron Hack Final Project: Book Recommender

This project aims to run a book recommender based on the inputted book title and outputs 3 book title recommendations, its respective authors, genres, and book covers. This data was accumulated through two Kaggle datasets, and web-scraping using Selenium and BeautifulSoup. Here is the link to my Google Slides presentation.

Motivation

Decided to further enquire on this topic, as before the Iron Hack Data Analytics Bootcamp my friends and I were organising a book club and since being in the bootcamp I haven't been able to enjoy said books. Now that the Bootcamp is over, I would be very much interested in restarting this book club and receiving recommendations based on titles and genres of books we previously liked.

Build Status

The code for this Jupyter Notebook is divided into parts:

  • Web Scraping: Web Scraping was done through the use of Selenium off of the Barnes & Noble website (link below), information such as title, author, year_published, isbn, image_link, genre, description, publisher, page_count and rating were pulling and organized into a DataFrame. For the other 2 sets, they were pulled from the Kaggle website(links below).
  • Data Cleaning and EDA: Divided into different book databases, then ultimately concatenating the data with columns: title, author, year_published, isbn, image_link, genre, description, publisher, page_count and rating.
  • Analysis/Recommender: Used the CountVectorizer recommender/model to analyse, cluster, and recommend books based on the user inputing a book title.
  • A large part of the book has repearted code

Code Style

Used the Python 3 code in my Jupyter Notebook.

Screenhots

Screen Shot 2023-02-03 at 3 58 25 PM Screen Shot 2023-02-03 at 4 01 59 PM Screen Shot 2023-02-03 at 4 02 46 PM

Tech/Framework used

I used Python, and ran the following librairies to help execute the code: import time import requests import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.metrics.pairwise import cosine_similarity from sklearn.feature_extraction.text import CountVectorizer import matplotlib.pyplot as plt import imageio from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.common.keys import Keys import time import requests from bs4 import BeautifulSoup as bs4

Features

The features that were the main focus for the book recommender are: title, author, rating, genre, description, isbn, publisher, year_published, page_count, and image_link.

Code Examples

  • *%%time
  • *bookdata = final_data.drop(['isbn','year_published','page_count','description'],axis=1)
  • *bookdata['data'] = bookdata[bookdata.columns[1:]].apply(
    • *lambda x: ' '.join(x.dropna().astype(str)),
    • *axis=1
  • )*
  • print(bookdata['data'].head())

Installation

I already had Jupyter Notebook installed through my Anaconda Navigator, but I had to "pip install ChromeDriverManager" and "pip install selenium.webdriver" for the web-scraping.

API reference

Didn't use an API reference for this project as I used Selenium to web-scrape.

Tests

As previously mentioned, I used the CountVetorizer for my recommender/model.

  • Here is an example of the code used to run the recommender:

  • *%%time

  • try:

    • recommendations = pd.DataFrame(df.nlargest(4,input_title)['title'])
    • recommendations = recommendations[recommendations['title']!=input_title]
    • a = recommendations.index.values.tolist()
    • print('Here are some fun recommendations for you:')
    • display(pd.concat([recommendations,final_data.iloc[a][['author','genre']]],axis = 1))
    • b = bookdata.iloc[a]['image_link'].values.tolist()
    • for i in b:
      • plt.imshow(imageio.imread(i))
      • plt.show()*
  • *except:

    • print("Sorry, I don't have any book recommendations. You should go for a walk instead!")*

How to use?

I would recommend first reading the "Readme" before opening the .ipynb file. When going through the latter file, it is apparent to see how and why I decided to keep and delete certain values when cleaning the data. Upon the analysis, I further concatenated the data to later add to the recommender and to generate book suggestions. The whose process from scraping to recommending is broken up into 3 phases: Phase 1: Web-Scraping, Phase 2: Data Cleaning and EDA, and Phase 3: Recommender.

Contribute

You can contribute by opening the "Readme" file, clicking on "Edit" and leaving a comment at the bottom of the document with your github link and suggested comments. Here is my github repository link

Credits

I used the following websites for datasets:

License

Used Python 3 License.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.