Coder Social home page Coder Social logo

Comments (13)

anilabhadatta avatar anilabhadatta commented on May 27, 2024 1

@abhinavm24 ScrapeAllTopicUrl is not implemented yet.
You need to manually get the first topic url of any course and put it in course urls txt file.
For resume put the url from where you want to resume.
Refer the pic in readme as recommended settings.
I will look into your urls and see if i am facing similar issues with those errors.

from educative.io_scraper.

anilabhadatta avatar anilabhadatta commented on May 27, 2024 1

I know this is openly available and you're not making any money either. I'm willing to debug it myself too, just need some help in understanding the setup

ApiUtility - Getting Course Collections JSON from URL: https://educative.io/api/collection/10370001/4941429335392256?work_type=collection

I see that most of the info is already available in the link

Yes from where quiz and code related info are scraped

from educative.io_scraper.

anilabhadatta avatar anilabhadatta commented on May 27, 2024 1

Can i resume from any url in this log ? there is nothing special about the url passed in course list file?

2023-10-05 10:07:14,635 - DEBUG - ExtensionScraper - Course Topic URLs: ['https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/course-structure-for-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview',

Yes, u can resume from any url

from educative.io_scraper.

anilabhadatta avatar anilabhadatta commented on May 27, 2024 1

@abhinavm24 The latest update should fix the issues, I skipped the part to add showContent=true in textFile URLs and auto-find Course Url.
Also fixed the logic in expanding all sections

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

I know this is openly available and you're not making any money either. I'm willing to debug it myself too, just need some help in understanding the setup

ApiUtility - Getting Course Collections JSON from URL: https://educative.io/api/collection/10370001/4941429335392256?work_type=collection

I see that most of the info is already available in the link

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

Can i resume from any url in this log ? there is nothing special about the url passed in course list file?

2023-10-05 10:07:14,635 - DEBUG - ExtensionScraper - Course Topic URLs: ['https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/course-structure-for-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview',

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

I tried resuming from first link and it fails with
` 2023-10-05 10:26:29,208 - INFO - ExtensionScraper - ExtensionScraper initiated...
2023-10-05 10:26:29,208 - INFO - ExtensionScraper - Started Scraping from Text File URL: https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design
2023-10-05 10:26:29,208 - INFO - BrowserUtility - Loading Browser...
2023-10-05 10:26:30,626 - INFO - BrowserUtility - Browser Initiated
2023-10-05 10:26:32,854 - INFO - ApiUtility - Course Type Selector: a[href*='/courses/']
2023-10-05 10:26:32,868 - INFO - ApiUtility - Getting Next Data
2023-10-05 10:26:32,886 - INFO - ApiUtility - Getting Course Topic URLs List from URL: https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design
2023-10-05 10:26:34,639 - DEBUG - SeleniumBasicUtility - Expanding all sections function
2023-10-05 10:26:44,837 - ERROR - StartScraper - start: 20: ExtensionScraper:start: 47: ExtensionScraper:scrapeCourse: 61: ApiUtility:getCourseTopicUrlsList: 107: SeleniumBasicUtility:expandAllSections: 33: Message:
Stacktrace:
0 chromedriver 0x0000000104832d98 chromedriver + 4337048
1 chromedriver 0x000000010482ae14 chromedriver + 4304404
2 chromedriver 0x0000000104457a5c chromedriver + 293468
3 chromedriver 0x000000010449cd50 chromedriver + 576848
4 chromedriver 0x00000001044d7908 chromedriver + 817416
5 chromedriver 0x0000000104490a5c chromedriver + 526940
6 chromedriver 0x0000000104491908 chromedriver + 530696
7 chromedriver 0x00000001047f8de4 chromedriver + 4099556
8 chromedriver 0x00000001047fd2a0 chromedriver + 4117152
9 chromedriver 0x000000010480352c chromedriver + 4142380
10 chromedriver 0x00000001047fdda0 chromedriver + 4119968
11 chromedriver 0x00000001047d5a74 chromedriver + 3955316
12 chromedriver 0x000000010481aa48 chromedriver + 4237896
13 chromedriver 0x000000010481abc4 chromedriver + 4238276
14 chromedriver 0x000000010482aa8c chromedriver + 4303500
15 libsystem_pthread.dylib 0x0000000183b93034 _pthread_start + 136
16 libsystem_pthread.dylib 0x0000000183b8de3c thread_start + 8

2023-10-05 10:26:44,837 - DEBUG - StartScraper - Exiting Scraper...`

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

I'm able to resume from https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

Tried resuming from https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/design-of-a-distributed-logging-service?showContent=true
and it start from first section itself.
So I guess it's not considering url as initial page for scrapping.

One last question, is scrapeallurl just logging url or saving it somewhere?
image

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

thanks for quick fix

from educative.io_scraper.

abhinavm24 avatar abhinavm24 commented on May 27, 2024

@anilabhadatta I have some suggestions/PRs for this projects. How can I talk out to you? Discord?

from educative.io_scraper.

anilabhadatta avatar anilabhadatta commented on May 27, 2024

@abhinavm24 discord(partially active) white_wolf_1999

from educative.io_scraper.

anilabhadatta avatar anilabhadatta commented on May 27, 2024

@abhinavm24 telegram(active) white_wolf_1999

from educative.io_scraper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.