Coder Social home page Coder Social logo

khan-dl's Introduction

khan-dl

A python script to download courses from Khan Academy using youtube-dl and beautifulsoup4.

PyPI GitHub

Installation

Once yt-dlp is updated, you can install khan-dl from PyPI. Until then, install the latest version using

pip install git+https://github.com/rand-net/khan-dl

Usage

$ khan-dl -h

usage: khan-dl [-h] [-i] [-c COURSE_URL]

optional arguments:
  -h, --help            show this help message and exit
  -i, --interactive     Enter Interactive Course Selection Mode
  -c COURSE_URL, --course_url COURSE_URL
                        Enter Course URL
  -a, --all             Download all Courses from all Domains
  • You can download courses interactively on a prompt, which will list all course domains and their respective courses available with tab completion.
$ khan-dl -i
 _  __ _   _     _     _   _         ____   _
| |/ /| | | |   / \   | \ | |       |  _ \ | |
| ' / | |_| |  / _ \  |  \| | _____ | | | || |
| . \ |  _  | / ___ \ | |\  ||_____|| |_| || |___
|_|\_\|_| |_|/_/   \_\|_| \_|       |____/ |_____|



Domain: Math
Selected Domain: math

Downloading Courses...

Course: Early math
Selected Course: Early math
Course URL: https://www.khanacademy.org/math/early-math

Generating Path Slugs.....


Collecting Youtube IDs: 100.0% [========================================================================================================================================>]   4/  4 eta [00:00]
Downloading Videos:   0.0% [>                                                                                                                                          ]   0/ 75 eta [?:??:??]
  • Download a specific course.
$  khan-dl -c "https://www.khanacademy.org/math/early-math"
 _  __ _   _     _     _   _         ____   _
| |/ /| | | |   / \   | \ | |       |  _ \ | |
| ' / | |_| |  / _ \  |  \| | _____ | | | || |
| . \ |  _  | / ___ \ | |\  ||_____|| |_| || |___
|_|\_\|_| |_|/_/   \_\|_| \_|       |____/ |_____|


Looking up https://www.khanacademy.org/math/early-math...
Course URL: https://www.khanacademy.org/math/early-math

Generating Path Slugs...

Collecting Youtube IDs: 100.0% [========================================================================================================================================>]   4/  4 eta [00:00]
Downloading Videos:   0.0% [>                                                                                                                                          ]   0/ 75 eta [?:??:??]
  • Download all courses on traditional subjects like Math, Science, Computing, Humanities, Economics-Finance-Domain.
$ khan-dl -a

 _  __ _   _     _     _   _         ____   _
| |/ /| | | |   / \   | \ | |       |  _ \ | |
| ' / | |_| |  / _ \  |  \| | _____ | | | || |
| . \ |  _  | / ___ \ | |\  ||_____|| |_| || |___
|_|\_\|_| |_|/_/   \_\|_| \_|       |____/ |_____|


Downloading all Courses from all Domains...
Selected Domain:  math

Downloading Courses...

Selected Domain:  science

Downloading Courses...

Selected Domain:  computing

Downloading Courses...

Selected Domain:  humanities

Downloading Courses...

Selected Domain:  economics-finance-domain

Downloading Courses...

Selected Domain:  ela

Downloading Courses...


Course URL: https://www.khanacademy.org/math/early-math

Generating Path Slugs...


Collecting Youtube IDs: 100.0% [========================================================================================================================================>]   4/  4 eta [00:00]
Downloading Videos:   0.0% [>                                                                                                                                          ]   0/ 75 eta [?:??:??]

Other solutions

Khan Academy is also available for offline usage through these Open Source projects:

khan-dl's People

Contributors

baas-hans avatar rahimnathwani avatar rand-net avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

khan-dl's Issues

AttributeError

khan-dl -c https://www.khanacademy.org/economics-finance-domain/microeconomics
 _  __ _   _     _     _   _         ____   _
| |/ /| | | |   / \   | \ | |       |  _ \ | |
| ' / | |_| |  / _ \  |  \| | _____ | | | || |
| . \ |  _  | / ___ \ | |\  ||_____|| |_| || |___
|_|\_\|_| |_|/_/   \_\|_| \_|       |____/ |_____|


Looking up https://www.khanacademy.org/economics-finance-domain/microeconomics...
Course URL: https://www.khanacademy.org/economics-finance-domain/microeconomics
Traceback (most recent call last):
  File "/data/data/com.termux/files/usr/bin/khan-dl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/khan_dl/__init__.py", line 65, in main
    khan_down.download_course_given(selected_course_url)
  File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/khan_dl/khan_dl.py", line 424, in download_course_given
    self.get_course_title()
  File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/khan_dl/khan_dl.py", line 188, in get_course_title
    self.course_title = course_title.text.replace(" ", "_")
                        ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text'

Cannot download Physics library: Unsupported URL

I installed khan-dl in the archlinux docker image, if that helps. When I try to download the physics library, it can collect 19/20 youtube IDs but fails at the last one. Here is the error:

Domain: Science
Selected Domain: science

Downloading Courses...

Course: Physics library
Selected Course: Physics library
Course URL: https://www.khanacademy.org/science/physics

Generating Path Slugs...

Collecting Youtube IDs:  95.0% [================================================>   ]  19/ 20 eta [00:00]
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1395, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1465, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 642, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/generic.py", line 4031, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://www.khanacademy.org/science/cosmology-and-astronomy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/khan_dl/khan_dl.py", line 334, in get_course_youtube_ids
    info_dict = ydl.extract_info(unit_url, download=False)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1386, in extract_info
    return self.__extract_info(url, self.get_info_extractor(ie_key), download, extra_info, process)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1413, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 936, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 879, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: Unsupported URL: https://www.khanacademy.org/science/cosmology-and-astronomy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1395, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1465, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 642, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/generic.py", line 4031, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://www.khanacademy.org/science/cosmology-and-astronomy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/khan-dl", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/khan_dl/__init__.py", line 57, in main
    khan_down.download_course_interactive()
  File "/usr/lib/python3.10/site-packages/khan_dl/khan_dl.py", line 417, in download_course_interactive
    self.get_course_youtube_ids()
  File "/usr/lib/python3.10/site-packages/khan_dl/khan_dl.py", line 343, in get_course_youtube_ids
    info_dict = ydl.extract_info(
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1386, in extract_info
    return self.__extract_info(url, self.get_info_extractor(ie_key), download, extra_info, process)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 1413, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 936, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/yt_dlp/YoutubeDL.py", line 879, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: Unsupported URL: https://www.khanacademy.org/science/cosmology-and-astronomy

Error when downloading all of Khan

After running khan-dl --all twice, I receive the Python error AttributeError: 'Khan_DL' object has no attribute 'video_topics_list'. Both times, it errored out on video with id "8AdcPD50aTQ".

Download Error When Downloading Any Course/Domain

I'm trying to download some economics courses but getting the following error every time, I tried downloading the complete library, download courses by url, download interactively and all with the same problem.
I tried to use it with python version 3.7 and 3.11 and still getting the same error.

Domain: Economics-Finance-Domain
Selected Domain: economics-finance-domain

Downloading Courses...

Course: Macroeconomics
Selected Course: Macroeconomics
Course URL: https://www.khanacademy.org/economics-finance-domain/macroeconomics

Generating Path Slugs...

Collecting Youtube IDs:   0.0% [>                                                       ]   0/  8 eta [?:??:??]
Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 4052, in urlopen
    return self._request_director.send(req)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/common.py", line 114, in send
    response = handler.send(request)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/_helper.py", line 204, in wrapper
    return func(self, *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/common.py", line 325, in send
    return self._send(request)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/_requests.py", line 341, in _send
    raise HTTPError(res, redirect_loop=max_redirects_exceeded)
yt_dlp.networking.exceptions.HTTPError: HTTP Error 400: Bad Request

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 847, in _request_webpage
    return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 4074, in urlopen
    raise _CompatHTTPError(e) from e
yt_dlp.networking.exceptions._CompatHTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1567, in wrapper
    return func(self, *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1702, in __extract_info
    ie_result = ie.extract(url)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/khanacademy.py", line 39, in _real_extract
    'countryCode': 'US',
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 1069, in download_content
    res = getattr(self, download_handle.__name__)(url_or_request, video_id, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 1035, in download_handle
    data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 903, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 860, in _request_webpage
    raise ExtractorError(errmsg, cause=err)
yt_dlp.utils.ExtractorError: [khanacademy:unit] economics-finance-domain/macroeconomics/macro-basic-economics-concepts: Unable to download JSON metadata: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/khan_dl/khan_dl.py", line 348, in get_course_youtube_ids
    info_dict = ydl.extract_info(unit_url, download=False)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1556, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1585, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1045, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 984, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [khanacademy:unit] economics-finance-domain/macroeconomics/macro-basic-economics-concepts: Unable to download JSON metadata: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 4052, in urlopen
    return self._request_director.send(req)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/common.py", line 114, in send
    response = handler.send(request)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/_helper.py", line 204, in wrapper
    return func(self, *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/common.py", line 325, in send
    return self._send(request)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/networking/_requests.py", line 341, in _send
    raise HTTPError(res, redirect_loop=max_redirects_exceeded)
yt_dlp.networking.exceptions.HTTPError: HTTP Error 400: Bad Request

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 847, in _request_webpage
    return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 4074, in urlopen
    raise _CompatHTTPError(e) from e
yt_dlp.networking.exceptions._CompatHTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1567, in wrapper
    return func(self, *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1702, in __extract_info
    ie_result = ie.extract(url)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/khanacademy.py", line 39, in _real_extract
    'countryCode': 'US',
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 1069, in download_content
    res = getattr(self, download_handle.__name__)(url_or_request, video_id, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 1035, in download_handle
    data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 903, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/extractor/common.py", line 860, in _request_webpage
    raise ExtractorError(errmsg, cause=err)
yt_dlp.utils.ExtractorError: [khanacademy:unit] economics-finance-domain/macroeconomics/macro-basic-economics-concepts: Unable to download JSON metadata: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mohamed/.pyenv/versions/3.7.12/bin/khan-dl", line 8, in <module>
    sys.exit(main())
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/khan_dl/__init__.py", line 57, in main
    khan_down.download_course_interactive()
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/khan_dl/khan_dl.py", line 456, in download_course_interactive
    self.get_course_youtube_ids()
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/khan_dl/khan_dl.py", line 358, in get_course_youtube_ids
    unit_url, download=False, process=False
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1556, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1585, in wrapper
    self.report_error(str(e), e.format_traceback())
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 1045, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/home/mohamed/.pyenv/versions/3.7.12/lib/python3.7/site-packages/yt_dlp/YoutubeDL.py", line 984, in trouble
    raise DownloadError(message, exc_info)
yt_dlp.utils.DownloadError: ERROR: [khanacademy:unit] economics-finance-domain/macroeconomics/macro-basic-economics-concepts: Unable to download JSON metadata: HTTP Error 400: Bad Request (caused by <HTTPError 400: Bad Request>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U

AttributeError when tried to download specific lecture of the course.

Sir @rand-net, tried as answered in the issue-14 : pip install khan-dl -U. Thanks for that, it worked fine for the 2nd time run.
__
But, when tried to download the specific lecture of the course, it gives the below error.**AttributeError: 'NoneType' object has no attribute 'text'**

More details:
`
$ khan-dl -c https://www.khanacademy.org/math/multivariable-calculus/thinking-about-multivariable-function/introduction-to-multivariable-calculus/v/multivariable-functions


| |/ /| | | | / \ | \ | | | _ \ | |
| ' / | || | / _ \ | | | _____ | | | || |
| . \ | _ | / ___ \ | |\ ||_____|| |
| || |___
||_|| ||// _|| _| |___/ |_____|

Looking up https://www.khanacademy.org/math/multivariable-calculus/thinking-about-multivariable-function/introduction-to-multivariable-calculus/v/multivariable-functions .....
Generating Path Slugs.....
Traceback (most recent call last):
File "/home/krishna/.local/bin/khan-dl", line 7, in
from khan_dl.init import main
File "/home/krishna/.local/lib/python3.8/site-packages/khan_dl/init.py", line 50, in
khan_down.generate_unit_slugs()
File "/home/krishna/.local/lib/python3.8/site-packages/khan_dl/khan_downloader.py", line 26, in generate_unit_slugs
course_title = self.course_root_page_html.find(
AttributeError: 'NoneType' object has no attribute 'text'
`

A small doubt: Does this library intended to designed to work also with the specific individual lectures of the course or the entire course?

Doesn't work for "non-traditional" subjects i.e. LSAT

Hello, I tried this for the LSAT (multiple iterations of URLs, but it seems the scraper hasn't been setup to handle this).

At some point I'd like to open a PR for this but figured I'd let you know in the meantime.

stacktrace:

https://www.khanacademy.org/test-prep/lsat

Looking up https://www.khanacademy.org/test-prep/lsat .....
Generating Path Slugs..... 
Traceback (most recent call last):
  File "./khan-dl.py", line 40, in <module>
    khan_down.generate_unit_slugs()
  File "/~/repos/khan-dl/khan_downloader.py", line 25, in generate_unit_slugs
    course_title = self.course_root_page_html.find(
AttributeError: 'NoneType' object has no attribute 'text'

Youtube-dl: CERTIFICATE_VERIFY_FAILED

Tried running the sample script to download AP Physics and got the following error.

File "khan-dl.py", line 45, in <module> khan_down.download_videos() File "/Users/bryantvergara/Developer/khan-dl/khan_downloader.py", line 115, in download_videos ydl.download([youtube_url]) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 2055, in download res = self.extract_info( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 799, in extract_info return self.__extract_info(url, ie, download, extra_info, process) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper self.report_error(compat_str(e), e.format_traceback()) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 628, in report_error self.trouble(error_message, tb) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 598, in trouble raise DownloadError(message, exc_info) youtube_dl.utils.DownloadError: ERROR: Unable to download webpage: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)> (caused by URLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))

Missing dependency

Installating using pip install -U khan-dl and using the khan-dl command afterwards raises the exception:

ModuleNotFoundError: No module named 'yt_dlp'

A possible fix would be to include yt-dlp in requirements.txt or add python -m pip install -U yt-dlp in the installation steps.

HTTP Error 410: API removed

I'm unable to download the algebra course with a fresh install of khan-dl through pip.

ubuntu@optiplex9020:/mnt/g/khan-dl-downloads         2022-03-13 13:52:36
$ khan-dl -c https://www.khanacademy.org/math/algebra
_  __ _   _     _     _   _         ____   _
| |/ /| | | |   / \   | \ | |       |  _ \ | |
| ' / | |_| |  / _ \  |  \| | _____ | | | || |
| . \ |  _  | / ___ \ | |\  ||_____|| |_| || |___
|_|\_\|_| |_|/_/   \_\|_| \_|       |____/ |_____|


Looking up https://www.khanacademy.org/math/algebra...
Course URL: https://www.khanacademy.org/math/algebra

Generating Path Slugs...

Collecting Youtube IDs:   0.0% [>                                                                                                        ]   0/ 15 eta [?:??:??]
Traceback (most recent call last):
 File "/usr/lib/python3/dist-packages/youtube_dl/extractor/common.py", line 627, in _request_webpage
   return self._downloader.urlopen(url_or_request)
 File "/usr/lib/python3/dist-packages/youtube_dl/YoutubeDL.py", line 2238, in urlopen
   return self._opener.open(req, timeout=self._socket_timeout)
 File "/usr/lib/python3.8/urllib/request.py", line 531, in open
   response = meth(req, response)
 File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
   response = self.parent.error(
 File "/usr/lib/python3.8/urllib/request.py", line 569, in error
   return self._call_chain(*args)
 File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
   result = func(*args)
 File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
   raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 410: API removed. Please see https://github.com/Khan/khan-api for details.

ubuntu@optiplex9020:/mnt/g/khan-dl-downloads         2022-03-13 13:52:39
$ pip --version
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
ubuntu@optiplex9020:/mnt/g/khan-dl-downloads         2022-03-13 13:52:43
$ pip list | egrep 'khan|youtube'
khan-dl                1.2.1
youtube-dl             2020.3.24

Not Downloading all Parts of Linear Algebra

I was trying to download Linear Algebra with python khan-dl.py -c https://www.khanacademy.org/math/linear-algebra but it only downloaded 0_Vectors and spaces. I looked on Khan Academy and it should include Matrix Transformations and Alternate Coordinate Systems.

I thought this was a problem with the -c flag so I also tried the interactive download as well. I ran python khan-dl -i, Math, Linear algebra, and I got the same result.

Any ideas on what's going on?

Can't install on Windows 10

C:\>pip3 install khan-dl -U
Collecting khan-dl
  Downloading khan_dl-1.0.1-py3-none-any.whl (7.3 kB)
Collecting prompt-toolkit
  Downloading prompt_toolkit-3.0.18-py3-none-any.whl (367 kB)
     |████████████████████████████████| 367 kB 1.7 MB/s
Collecting art
  Downloading art-5.2-py2.py3-none-any.whl (571 kB)
     |████████████████████████████████| 571 kB 6.8 MB/s
Collecting lxml
  Downloading lxml-4.6.3-cp39-cp39-win_amd64.whl (3.5 MB)
     |████████████████████████████████| 3.5 MB 3.2 MB/s
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
     |████████████████████████████████| 115 kB 6.4 MB/s
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 1.3 MB/s
Collecting python-Levenshtein
  Downloading python-Levenshtein-0.12.2.tar.gz (50 kB)
     |████████████████████████████████| 50 kB 1.0 MB/s
Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Collecting youtube-dl
  Downloading youtube_dl-2021.5.16-py2.py3-none-any.whl (1.9 MB)
     |████████████████████████████████| 1.9 MB 6.8 MB/s
Collecting soupsieve>1.2
  Downloading soupsieve-2.2.1-py3-none-any.whl (33 kB)
Collecting wcwidth
  Downloading wcwidth-0.2.5-py2.py3-none-any.whl (30 kB)
Requirement already satisfied: setuptools in c:\python\lib\site-packages (from python-Levenshtein->khan-dl) (56.0.0)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.4-py2.py3-none-any.whl (153 kB)
     |████████████████████████████████| 153 kB 6.4 MB/s
Collecting certifi>=2017.4.17
  Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 6.4 MB/s
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 3.3 MB/s
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 1.1 MB/s
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: wcwidth, urllib3, soupsieve, idna, chardet, certifi, youtube-dl, requests, python-Levenshtein, prompt-toolkit, lxml, fuzzywuzzy, beautifulsoup4, art, khan-dl
    Running setup.py install for python-Levenshtein ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\python\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\testo101\\AppData\\Local\\Temp\\pip-install-49n0b8sx\\python-levenshtein_8a3ec7b1023b4f0891e0621324c6a695\\setup.py'"'"'; __file__='"'"'C:\\Users\\testo101\\AppData\\Local\\Temp\\pip-install-49n0b8sx\\python-levenshtein_8a3ec7b1023b4f0891e0621324c6a695\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\testo101\AppData\Local\Temp\pip-record-p5q2lzjt\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\python\Include\python-Levenshtein'
         cwd: C:\Users\testo101\AppData\Local\Temp\pip-install-49n0b8sx\python-levenshtein_8a3ec7b1023b4f0891e0621324c6a695\
    Complete output (28 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.9
    creating build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\StringMatcher.py -> build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\__init__.py -> build\lib.win-amd64-3.9\Levenshtein
    running egg_info
    writing python_Levenshtein.egg-info\PKG-INFO
    writing dependency_links to python_Levenshtein.egg-info\dependency_links.txt
    writing entry points to python_Levenshtein.egg-info\entry_points.txt
    writing namespace_packages to python_Levenshtein.egg-info\namespace_packages.txt
    writing requirements to python_Levenshtein.egg-info\requires.txt
    writing top-level names to python_Levenshtein.egg-info\top_level.txt
    adding license file 'COPYING' (matched pattern 'COPYING*')
    reading manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no previously-included files matching '*pyc' found anywhere in distribution
    warning: no previously-included files matching '*so' found anywhere in distribution
    warning: no previously-included files matching '.project' found anywhere in distribution
    warning: no previously-included files matching '.pydevproject' found anywhere in distribution
    writing manifest file 'python_Levenshtein.egg-info\SOURCES.txt'
    copying Levenshtein\_levenshtein.c -> build\lib.win-amd64-3.9\Levenshtein
    copying Levenshtein\_levenshtein.h -> build\lib.win-amd64-3.9\Levenshtein
    running build_ext
    building 'Levenshtein._levenshtein' extension
    error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\python\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\testo101\\AppData\\Local\\Temp\\pip-install-49n0b8sx\\python-levenshtein_8a3ec7b1023b4f0891e0621324c6a695\\setup.py'"'"'; __file__='"'"'C:\\Users\\testo101\\AppData\\Local\\Temp\\pip-install-49n0b8sx\\python-levenshtein_8a3ec7b1023b4f0891e0621324c6a695\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\testo101\AppData\Local\Temp\pip-record-p5q2lzjt\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\python\Include\python-Levenshtein' Check the logs for full command output.

image

SSL error

Hi - I'm getting this error:

File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 598, in trouble
raise DownloadError(message, exc_info)
youtube_dl.utils.DownloadError: ERROR: Unable to download API page: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)> (caused by URLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))

Few videos are saved with wrong file name

  • Each Unit webpage(https://www.khanacademy.org/math/cc-1st-grade-math/cc-1st-place-value) currently contains more youtube ids than the lessons themselves.

  • In order to resolve the wrong file name issue, titles of lessons from the webpage and the youtube video have to be matched against one another* In matching those titles, some of the lesson titles from webpage are quitedivergent from their respective youtube titles

  • Eg., webpage_lesson_title = "Addition and subtraction word problems:gorillas", youtube_id_titles = ["Exercising gorillas", "Comparison word problems: marbles"]. The latter youtube title is more similar, but it is an entirely different video.

Files not saving

I am using Ubuntu with WSL.
I tried to download the files for multiple sources, but no files saved. To be sure this wasn't an issue with Ubuntu, I used Windows Powershell and Command Prompt. There is no error message, but there was no folder or files in the directory I was using.

Many sub folders are skipped and Videos are saved with wrong file name in wrong folder

Please check with the following command. Many sub folders are skipped and videos are getting saved with wrong filename.
khan-dl -c https://www.khanacademy.org/math/cc-1st-grade-math.

Tried to debug the problem.
for unit_sub_heads, unit_sub_head_body in zip(
self.unit_page_html.find_all(
attrs={"data-test-id": "lesson-card-link"}
),
self.unit_page_html.find_all("div", class_="_1o51yl6"),
):

In the above line "self.unit_page_html.find_all("div", class_="_1o51yl6")" does not find all videos. Please rectify this problem. I tried. But i couldn't. I need this script asap. It would be very much helpful If you solved this issue.

Option download all (--all) fails

Trying to use command line param --all FAILS after downloading roughly 2.2Gb of courses with the following exception (raised in youtube-dl):

youtube_dl.utils.DownloadError: ERROR: Error in output template: unsupported format character 'm' (0x6d) at index 123 (encoding: 'UTF-8')

Workaround is not clear at this time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.