Coder Social home page Coder Social logo

jayantkatia / upmob-api Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 3.37 MB

:timer_clock: :mag: Scheduler scraps 91mobiles.com to get upcoming devices information which is made accessible by endpoints

Home Page: https://upmob.koreacentral.cloudapp.azure.com/

License: MIT License

Makefile 9.22% Go 89.38% Shell 1.40%
api go golang open-source upmob upmob-api

upmob-api's People

Contributors

jayantkatia avatar

Stargazers

 avatar

Watchers

 avatar  avatar

upmob-api's Issues

Scrap more information by navigating to each product's details page

Current scenario

Only minimal information is being scraped from the list pages.

Proposal

I propose scraping more information by navigating into each product details page

Please note

  • #1 completely changes the way we are scraping data, therefore this enhancement must be compatible with it.
  • #3 proposes introduction of new field scrape_timestamp and also changes the way records are stored into database
  • This will involve change in database table structure.

Instead of deleting and adding all records, update them using scrape_timestamp

Current implementation

All previous records are deleted and newly scraped records are inserted into the table in a database transaction.

Alternate solution

Scrape, product_name and last_updated from the product details page.


If a record exists with same last_updated  and product_name
  then update its scrape_timestamp field with current timestamp
else
  scrape all the remain information and store the record in the database

Benefits of this approach

  1. Keeps previous(legacy) records in the database which can be deleted after any X number of days using scrape_timestamp
  2. Results can be sorted using scrape_timestamp

Please note

  • This revamp must be timed in accordance with #1 & #2 and must be compatible with them.

Use static scraping libraries instead of headless browser to make it more performant

Current implementation

chromedp which uses headless browser to scrap information.

Better solution

Use static scraping libraries and make same network calls which the website makes internally (passing same query params and request headers).

Example,

curl --location --request GET 'https://www.91mobiles.com/template/category_finder/finder_ajax.php?show_next=1&ord=0.17428677812670812&excludeId=&hash=&search=&hidFrmSubFlag=1&page=1&category=mobile&unique_sort=ga_views&gaCategory=Upcoming+Mobiles+Price+List+in+India-filter&requestType=1&showPagination=1&listType=list&listType_v3=list&listType_v1=list&listType_v2=list&listType_v4=list&listType_v5=list&listType_v6=list&page_type=upcoming&finderRuleUrl=&selMobSort=ga_views&hdnCategory=mobile&user_search=&url_feat_rule=upcoming-mobiles-in-india&buygaCat=upcoming-mob&amount=0%3B200000&sCatName=mobile&price_range_apply=1&tr_fl%5B%5D=mob_market_status_filter.marketstatus_filter%3Aupcoming&tr_fl%5B%5D=mob_market_status_filter.marketstatus_filter%3Arumoured' \
--header 'x-requested-with: XMLHttpRequest' \
--header 'user-agent: Mozilla/5.0'

Massive overhaul

Since this will lead to massive changes, i propose using another branch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.