Coder Social home page Coder Social logo

c0d3d3v / moodle-downloader Goto Github PK

View Code? Open in Web Editor NEW
28.0 7.0 5.0 197 KB

A Moodle Crawler that downloads course content from Moodle (eg. lecture pdfs)

License: GNU General Public License v3.0

Python 100.00%
moodle moodle-crawler crawler dhbw donwnloader moodle-downloader downloads download crawl content

moodle-downloader's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

moodle-downloader's Issues

Ein vorschlag für den moodle crawler

bis jz werden die links ja ein einer datei namens external-links.log gespeichert und die datei liegt in einem ordner der so heißt wie die linkbeschreibung
ich fände (eine option) gut wenn wenn die .log datei nicht einfach sinnvoll benannt wird also am besten so wie der ordner

und dann muss die datei auch nicht in einen eigenen ordner

Where is startup?

I don't understand the part "Put watch -n 3600 python moodleCrawler.py in startup to fetch the files every hour"

Is it the Windows startup folder? If so, how do you put a command in there?

Thank you

Can login but no download (redirect login)

Base URL: https://studium.umontreal.ca
It appears to be using a token as Firefox always asks if it needs to update the password as "f5-sso-token"

Error in the terminal:

02:36:11 Moodle Crawler started working.
02:36:11 Try to copy logintoken!
02:36:11 Download has started.
02:36:11 Download complete.                                       
Traceback (most recent call last):
  File "moodleCrawler.py", line 1470, in <module>
    if inputLogintoken is None or inputLogintoken[0] is None:
IndexError: list index out of range

My config.ini is set for only one course:

`[dirs]
root_dir = "/home/usuario/Escritorio/mo"

[auth]
username =  ""   
password = ""
authurl = "https://studium.umontreal.ca/login/index.php"
baseurl = "https://studium.umontreal.ca"
useauthstate = false

uselogintoken = true
reloginonfile = false

[crawl]
allcourses = false
forum = false
wiki = false
history = true
maxdepth = 9
loglevel = 5
externallinks = false
crawlcourseslink = "course/index.php"
findduplicates = true
findallduplicates = true
deleteduplicates = false
informationaboutduplicates = true
downloadcoursepages = true
dontcrawl = "mp4,mkv,mp3"

onlycrawlcourses = "128531"
dontcrawlcourses = ""
antirecrusion = true

[other]
colors = true
notifications = false`

Cant login with valid credentials

I'm trying to get this script to work with this site https://aules.edu.gva.es/moodle/login/index.php from Spain, but I'm not getting any luck even tho I have a valid username and password that i can use from the browser.
I'm getting cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.

Some Ideas:

  • Optimize Recrusion!
  • Make the crawler multithreaded (One thread for every course)
  • Json config instead of this bugy config
  • notify via Email if something new got downloaded
  • configuration guide

move config ini

I think it could be helpful to move config ini to config.ini.sample and add config.ini to gitignore.

What do you think?

Unable to find courses

terminal output with loglevel 5:

`22:11:05 Moodle Crawler started working.
22:11:05 Try to copy logintoken!
22:11:05 Download has started.
22:11:05 No Content-Length available.
22:11:05 Downloaded 33324 bytes
22:11:05 Download complete.
22:11:05 Warning: Found 2 forms!
22:11:05 Logintoken: deletedtokenforthispost
22:11:05 Try to login...
22:11:06 Download has started.
22:11:06 No Content-Length available.
22:11:06 Downloaded 81924 bytes
22:11:06 Downloaded 147104 bytes
22:11:06 Download complete.
22:11:06 Logged in!
22:11:06 Searching Courses...
22:11:07 Download has started.
22:11:07 No Content-Length available.
22:11:07 Downloaded 81924 bytes
22:11:07 Downloaded 163848 bytes
22:11:07 Downloaded 192558 bytes
22:11:07 Download complete.
22:11:07 No link to this course was found!
22:11:07 Full page:

22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 No link to this course was found! 22:11:07 Full page:
22:11:07 Unable to find courses 22:11:07 Full page:
22:11:07 Scanning directory "XXXXXXXXXXXXX".... 22:11:07 Finding potential dupes... 22:11:07 Found 0 sets of potential dupes... 22:11:07 Scanning for real dupes... 22:11:07 Update Complete`

terminal output with loglevel 0:

22:09:17 Moodle Crawler started working. 22:09:17 Try to copy logintoken! 22:09:17 Warning: Found 2 forms! 22:09:17 Logintoken: XXXXX 22:09:19 Unable to find courses 22:09:19 Update Complete

one thing to note is that my university recently updated moodle. maybe the problem is related to that update. is there a fix ?

edit: sorry the insert code function is all messed up. I can screenshot it for you if you provide an email address.

Recursive download bug

Some files (.pdf) are each stored in new folders. Sometimes these folders with the file in it are saved in the folder downloaded before.

As an example here is one directory of one course:

On Moodle:
github2

On disk after download:
.
└── HMI-Ü-6
├── HMI-Ü-7-►
│ ├── HMI-Ü-8-►
│ │ ├── HMI-Ü-9-►
│ │ │ ├── HMI-Ü-10-►
│ │ │ │ ├── HMI-Ü-11-►
│ │ │ │ │ ├── HMI-Ü-12-►
│ │ │ │ │ │ └── HMI_Ü_12.pdf
│ │ │ │ │ └── HMI_Ü_11.pdf
│ │ │ │ └── HMI_Ü_10.pdf
│ │ │ └── HMI_Ü_9.pdf
│ │ └── HMI_Ü_8.pdf
│ └── HMI_Ü_7.pdf
└── HMI_Ü_6.pdf

Unable to log in

moodle instance (baseurl): https://moodle.bulme.at
authurl: https://moodle.bulme.at/login/index.php

python2 moodleCrawler.py 
09:11:01 Moodle Crawler started working.
09:11:01 Try to login...
09:11:01 Download has started.
09:11:01 No Content-Length available.
09:11:01 Downloaded 29405 bytes
09:11:01 Download complete.
09:11:01 Cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.

I have to add that my password does contain special characters, but this shouldn't be a problem right?

config value types inside README

config value types should be specified in the README.md, for example
history [boolean]: If a history file should be used
instead of
history : If a history file should be used
or something like that

Invalid Syntax

File "moodleCrawler.py", line 39
except Exception, e:
^
SyntaxError: invalid syntax

Python 3.5.1 on Windows 10

sre_constants.error: bad character range

Hi,

I always get the following error:

13:17:22 Finished course: 'Informatik-für-Telematiker-(WS19-20)(Janine-Breßler)'
13:17:22 Check course: 'Ortung-und-Navigation-für-Telematik-Dienste-ONTD-WS19-20' ID: 15066
Traceback (most recent call last):
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 1706, in
crawlMoodlePage(course[1], course[0], current_dir, mainpageURL + "my/")
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 1363, in crawlMoodlePage
PageLinkContent, responsePageLink, pagelink)
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 440, in saveFile
file_name, fileName, filetype, pathtoSearch)
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 704, in searchfordumpsSpecific
fileName + '*' + filetype)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/fnmatch.py", line 56, in filter
_cache[pat] = re_pat = re.compile(res)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range

can't login anymore

problem persists for a while now. didn't tamper with the config file or anything. it just stopped working over night.

terminal output:
18:53:58 Cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.