c0d3d3v / moodle-downloader Goto Github PK
View Code? Open in Web Editor NEWA Moodle Crawler that downloads course content from Moodle (eg. lecture pdfs)
License: GNU General Public License v3.0
A Moodle Crawler that downloads course content from Moodle (eg. lecture pdfs)
License: GNU General Public License v3.0
bis jz werden die links ja ein einer datei namens external-links.log gespeichert und die datei liegt in einem ordner der so heißt wie die linkbeschreibung
ich fände (eine option) gut wenn wenn die .log datei nicht einfach sinnvoll benannt wird also am besten so wie der ordner
und dann muss die datei auch nicht in einen eigenen ordner
I don't understand the part "Put watch -n 3600 python moodleCrawler.py in startup to fetch the files every hour"
Is it the Windows startup folder? If so, how do you put a command in there?
Thank you
Base URL: https://studium.umontreal.ca
It appears to be using a token as Firefox always asks if it needs to update the password as "f5-sso-token"
Error in the terminal:
02:36:11 Moodle Crawler started working.
02:36:11 Try to copy logintoken!
02:36:11 Download has started.
02:36:11 Download complete.
Traceback (most recent call last):
File "moodleCrawler.py", line 1470, in <module>
if inputLogintoken is None or inputLogintoken[0] is None:
IndexError: list index out of range
My config.ini is set for only one course:
`[dirs]
root_dir = "/home/usuario/Escritorio/mo"
[auth]
username = ""
password = ""
authurl = "https://studium.umontreal.ca/login/index.php"
baseurl = "https://studium.umontreal.ca"
useauthstate = false
uselogintoken = true
reloginonfile = false
[crawl]
allcourses = false
forum = false
wiki = false
history = true
maxdepth = 9
loglevel = 5
externallinks = false
crawlcourseslink = "course/index.php"
findduplicates = true
findallduplicates = true
deleteduplicates = false
informationaboutduplicates = true
downloadcoursepages = true
dontcrawl = "mp4,mkv,mp3"
onlycrawlcourses = "128531"
dontcrawlcourses = ""
antirecrusion = true
[other]
colors = true
notifications = false`
see log:
moodlecrawler.lol.log
I'm trying to get this script to work with this site https://aules.edu.gva.es/moodle/login/index.php
from Spain, but I'm not getting any luck even tho I have a valid username and password that i can use from the browser.
I'm getting cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.
Vorschlag nur bei wichtigen Änderungen die HTML Datei neu laden oder HTML Dateien komplett ignorieren.
I think it could be helpful to move config ini to config.ini.sample and add config.ini to gitignore.
What do you think?
terminal output with loglevel 5:
`22:11:05 Moodle Crawler started working.
22:11:05 Try to copy logintoken!
22:11:05 Download has started.
22:11:05 No Content-Length available.
22:11:05 Downloaded 33324 bytes
22:11:05 Download complete.
22:11:05 Warning: Found 2 forms!
22:11:05 Logintoken: deletedtokenforthispost
22:11:05 Try to login...
22:11:06 Download has started.
22:11:06 No Content-Length available.
22:11:06 Downloaded 81924 bytes
22:11:06 Downloaded 147104 bytes
22:11:06 Download complete.
22:11:06 Logged in!
22:11:06 Searching Courses...
22:11:07 Download has started.
22:11:07 No Content-Length available.
22:11:07 Downloaded 81924 bytes
22:11:07 Downloaded 163848 bytes
22:11:07 Downloaded 192558 bytes
22:11:07 Download complete.
22:11:07 No link to this course was found!
22:11:07 Full page:
Laufende
</span>
</span>
</span>
terminal output with loglevel 0:
22:09:17 Moodle Crawler started working. 22:09:17 Try to copy logintoken! 22:09:17 Warning: Found 2 forms! 22:09:17 Logintoken: XXXXX 22:09:19 Unable to find courses 22:09:19 Update Complete
one thing to note is that my university recently updated moodle. maybe the problem is related to that update. is there a fix ?
edit: sorry the insert code function is all messed up. I can screenshot it for you if you provide an email address.
Some files (.pdf) are each stored in new folders. Sometimes these folders with the file in it are saved in the folder downloaded before.
As an example here is one directory of one course:
On disk after download:
.
└── HMI-Ü-6
├── HMI-Ü-7-►
│ ├── HMI-Ü-8-►
│ │ ├── HMI-Ü-9-►
│ │ │ ├── HMI-Ü-10-►
│ │ │ │ ├── HMI-Ü-11-►
│ │ │ │ │ ├── HMI-Ü-12-►
│ │ │ │ │ │ └── HMI_Ü_12.pdf
│ │ │ │ │ └── HMI_Ü_11.pdf
│ │ │ │ └── HMI_Ü_10.pdf
│ │ │ └── HMI_Ü_9.pdf
│ │ └── HMI_Ü_8.pdf
│ └── HMI_Ü_7.pdf
└── HMI_Ü_6.pdf
moodle instance (baseurl): https://moodle.bulme.at
authurl: https://moodle.bulme.at/login/index.php
python2 moodleCrawler.py
09:11:01 Moodle Crawler started working.
09:11:01 Try to login...
09:11:01 Download has started.
09:11:01 No Content-Length available.
09:11:01 Downloaded 29405 bytes
09:11:01 Download complete.
09:11:01 Cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.
I have to add that my password does contain special characters, but this shouldn't be a problem right?
config value types should be specified in the README.md, for example
history [boolean]: If a history file should be used
instead of
history : If a history file should be used
or something like that
File "moodleCrawler.py", line 39
except Exception, e:
^
SyntaxError: invalid syntax
Python 3.5.1 on Windows 10
Hi,
I always get the following error:
13:17:22 Finished course: 'Informatik-für-Telematiker-(WS19-20)(Janine-Breßler)'
13:17:22 Check course: 'Ortung-und-Navigation-für-Telematik-Dienste-ONTD-WS19-20' ID: 15066
Traceback (most recent call last):
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 1706, in
crawlMoodlePage(course[1], course[0], current_dir, mainpageURL + "my/")
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 1363, in crawlMoodlePage
PageLinkContent, responsePageLink, pagelink)
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 440, in saveFile
file_name, fileName, filetype, pathtoSearch)
File "/Users/romankobosil/Documents/Telematik/Moodle-Downloader/src/moodleCrawler.py", line 704, in searchfordumpsSpecific
fileName + '*' + filetype)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/fnmatch.py", line 56, in filter
_cache[pat] = re_pat = re.compile(res)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
problem persists for a while now. didn't tamper with the config file or anything. it just stopped working over night.
terminal output:
18:53:58 Cannot connect to moodle or Moodle has changed. Crawler is not logged in. Check your login data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.