Comments (13)
@abhinavm24 ScrapeAllTopicUrl is not implemented yet.
You need to manually get the first topic url of any course and put it in course urls txt file.
For resume put the url from where you want to resume.
Refer the pic in readme as recommended settings.
I will look into your urls and see if i am facing similar issues with those errors.
from educative.io_scraper.
I know this is openly available and you're not making any money either. I'm willing to debug it myself too, just need some help in understanding the setup
ApiUtility - Getting Course Collections JSON from URL: https://educative.io/api/collection/10370001/4941429335392256?work_type=collection
I see that most of the info is already available in the link
Yes from where quiz and code related info are scraped
from educative.io_scraper.
Can i resume from any url in this log ? there is nothing special about the url passed in course list file?
2023-10-05 10:07:14,635 - DEBUG - ExtensionScraper - Course Topic URLs: ['https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/course-structure-for-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview',
Yes, u can resume from any url
from educative.io_scraper.
@abhinavm24 The latest update should fix the issues, I skipped the part to add showContent=true in textFile URLs and auto-find Course Url.
Also fixed the logic in expanding all sections
from educative.io_scraper.
I know this is openly available and you're not making any money either. I'm willing to debug it myself too, just need some help in understanding the setup
ApiUtility - Getting Course Collections JSON from URL: https://educative.io/api/collection/10370001/4941429335392256?work_type=collection
I see that most of the info is already available in the link
from educative.io_scraper.
Can i resume from any url in this log ? there is nothing special about the url passed in course list file?
2023-10-05 10:07:14,635 - DEBUG - ExtensionScraper - Course Topic URLs: ['https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/course-structure-for-modern-system-design', 'https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview',
from educative.io_scraper.
I tried resuming from first link and it fails with
` 2023-10-05 10:26:29,208 - INFO - ExtensionScraper - ExtensionScraper initiated...
2023-10-05 10:26:29,208 - INFO - ExtensionScraper - Started Scraping from Text File URL: https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design
2023-10-05 10:26:29,208 - INFO - BrowserUtility - Loading Browser...
2023-10-05 10:26:30,626 - INFO - BrowserUtility - Browser Initiated
2023-10-05 10:26:32,854 - INFO - ApiUtility - Course Type Selector: a[href*='/courses/']
2023-10-05 10:26:32,868 - INFO - ApiUtility - Getting Next Data
2023-10-05 10:26:32,886 - INFO - ApiUtility - Getting Course Topic URLs List from URL: https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/introduction-to-modern-system-design
2023-10-05 10:26:34,639 - DEBUG - SeleniumBasicUtility - Expanding all sections function
2023-10-05 10:26:44,837 - ERROR - StartScraper - start: 20: ExtensionScraper:start: 47: ExtensionScraper:scrapeCourse: 61: ApiUtility:getCourseTopicUrlsList: 107: SeleniumBasicUtility:expandAllSections: 33: Message:
Stacktrace:
0 chromedriver 0x0000000104832d98 chromedriver + 4337048
1 chromedriver 0x000000010482ae14 chromedriver + 4304404
2 chromedriver 0x0000000104457a5c chromedriver + 293468
3 chromedriver 0x000000010449cd50 chromedriver + 576848
4 chromedriver 0x00000001044d7908 chromedriver + 817416
5 chromedriver 0x0000000104490a5c chromedriver + 526940
6 chromedriver 0x0000000104491908 chromedriver + 530696
7 chromedriver 0x00000001047f8de4 chromedriver + 4099556
8 chromedriver 0x00000001047fd2a0 chromedriver + 4117152
9 chromedriver 0x000000010480352c chromedriver + 4142380
10 chromedriver 0x00000001047fdda0 chromedriver + 4119968
11 chromedriver 0x00000001047d5a74 chromedriver + 3955316
12 chromedriver 0x000000010481aa48 chromedriver + 4237896
13 chromedriver 0x000000010481abc4 chromedriver + 4238276
14 chromedriver 0x000000010482aa8c chromedriver + 4303500
15 libsystem_pthread.dylib 0x0000000183b93034 _pthread_start + 136
16 libsystem_pthread.dylib 0x0000000183b8de3c thread_start + 8
2023-10-05 10:26:44,837 - DEBUG - StartScraper - Exiting Scraper...`
from educative.io_scraper.
I'm able to resume from https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/what-is-a-system-design-interview
from educative.io_scraper.
Tried resuming from https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers/design-of-a-distributed-logging-service?showContent=true
and it start from first section itself.
So I guess it's not considering url as initial page for scrapping.
One last question, is scrapeallurl just logging url or saving it somewhere?
from educative.io_scraper.
thanks for quick fix
from educative.io_scraper.
@anilabhadatta I have some suggestions/PRs for this projects. How can I talk out to you? Discord?
from educative.io_scraper.
@abhinavm24 discord(partially active) white_wolf_1999
from educative.io_scraper.
@abhinavm24 telegram(active) white_wolf_1999
from educative.io_scraper.
Related Issues (20)
- v3 release when? HOT 1
- ERROR - StartScraper - start: 20: ExtensionScraper:start: 43: ExtensionScraper:scrapeCourse: 67: CourseCollectionsJson and CourseTopicUrlsList Urls are not equal HOT 8
- Error and Bug HOT 16
- failed to change window state to 'normal', current state is 'maximized' HOT 2
- Crash HOT 14
- Get error when download judgeContentPrepend HOT 1
- THE CHROME BROWSER DOES NOT WORK HOT 1
- ERROR - StartScraper - start: 20: ExtensionScraper:start: 50: ExtensionScraper:scrapeCourse: 91: ExtensionScraper:scrapeTopic: 152: QuizUtility:downloadQuizFiles: 25: QuizUtility:downloadQuiz: 69: 'explanation' HOT 4
- Error: CourseCollectionsJson and CourseTopicUrlsList Urls are not equal HOT 9
- failure HOT 4
- unable to proceed due to captcha HOT 3
- ERROR CourseCollectionsJson and CourseTopicUrlsList Urls are not equal HOT 4
- Can't get full source code files HOT 9
- Unable to run in Macos Apple silicon HOT 9
- [Feature request] Support category in viewer HOT 7
- I am getting this error HOT 5
- Error: ShowUtility:showCodeSolutions: 59: Message: javascript error: Cannot read properties of null (reading 'click') HOT 3
- Exception when get course has Cloud lab inside HOT 2
- request-html would might be a good and fast alternative for selenium HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from educative.io_scraper.