Coder Social home page Coder Social logo

sergiosim / mooccubex Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thu-keg/mooccubex

0.0 0.0 0.0 1.31 MB

A large-scale knowledge repository for adaptive learning, learning analytics, and knowledge discovery in MOOCs, hosted by THU KEG.

License: GNU General Public License v3.0

Shell 48.08% Python 51.92%

mooccubex's Introduction

MOOCCubeX

Paper | 中文版

MOOCCubeX is maintained by the the Knowledge Engineering Group of Tsinghua Univerisity, and supported by XuetangX, one of the largest MOOC websites in China. This repository consists of 4,216 courses, 230,263 videos, 358,265 exercises, 637,572 fine-grained concepts and over 296 million raw behavioral data of 3,330,294 students, for supporting the research topics on adaptive learning in MOOCs.

We summarize the contributions of MOOCCubeX as follows:

  • High Coverage: MOOCCubeX obtains diverse MOOC resources and external educational resources, as well as the data records of the students' learning, exercising and discussion.
  • Large-scale: Compared with other open-access educational data repository, the scale of MOOCCubeX is larger, thereby supporting the exploration of deep models with high data requirements.
  • Concept-centric: Heterogeneous data is organized using fine-grained concepts, which makes resources more relevant and easier to represent, find and model.

News !!

  • The prerequisite relations of mathematics, psychology and Computer science are refined !!
  • Our paper is submitted to CIKM2021 resource track !!
  • The MOOCCubeX Dataset Builder toolkit is updated !!
  • Our paper is accepted by CIKM2021 resource track !!

Repository Framework

The framework of MOOCCubeX is shown as bellow.

Framework

The data of MOOCCubeX are organized with a large fine-grained concept graph. The resources of MOOCCubeX are listed in the tables.

The course resources (more details are introduced in course.md).

Course Resource Type Description Download Size
Course Info Course video and exercise organization. entities/course.json 43M
Video Video name and captions. entities/video.json 580M
Exercise A group of problems of the course relations/exercise-problem.txt 129M
Problem Practice problems of a group of exercise. entities/problem.json 1.2G
School School information. entities/school.json 613K
Teacher Teacher information. entities/teacher.json 8.7M
Field/Discipline The fields a course belongs to. Annotated by human. relations/course-field.json 62K

The student behavioral data (more details are introduced in user.md).

Student Behavior Type Description Download Size
Student Profile User id, school, course register order, etc. entities/user.json 770M
Video Watching The speed and jumps of time of users watching video. relations/user-video.json 3.0G
Exercising Users doing exercise problems. relations/user-problem.json 21G
Comment Users' comment on a video or an exercise. entities/comment.json 2.1G
Reply Users' replies on comments of other users. entities/reply.json 50M
Xiaomu User interaction with Xiaomu (the QA bot of XuetangX). relations/user-xiaomu.json 9.7M

The fine-grained concepts and their links with other types of MOOC resources, as well as the external resources are introduced in concept.md.

Concept and Links Description Download Size
Concept Concepts extracted from video captions. entities/concept.json 156M
Concept-prerequisite Prediction and human annotation of prerequisites of Psychology, Math and Computer Science. prerequisites/psy.json prerequisites/math.json prerequisites/cs.json 87M 59M 133M
Concept-course Linked concepts of a course. relations/concept-course.txt 19M
Concept-video Linked concepts of a video. relations/concept-video.txt 39M
Concept-problem Linked concepts of a problem. relations/concept-problem.txt 1.3M
Concept-comment Linked concepts of a comment. relations/concept-comment.txt 1.2M
Concept-others Linked concepts of other resource. relations/concept-other.txt 19M

Toolkit

We provide 2 toolkits for convenient usage. They are built for the refinement or DIY employment of our data.

  • MOOCCubeX Dataset Builder

    Name Description Usage Example
    download_dataset.sh Download the full dataset. ./scripts/download_dataset.sh
    count.sh Count the number of courses/videos/... ./scripts/count.sh
    user_freq_histgram.py Plot usage frequency of videos/problems/... (Figure 4 in paper) python3 scripts/user_freq_histgram.py
    concept_course.py The script used to generate relations/concept-course.txt. python scripts/concept_course.py
    concept_finder.sh Find ccids related to the given concept. ./scripts/concept_finder.sh K_晶体三极管组态放大器_电子科学与技术
    course_info_finder.sh Find course information whose name contains the given string ./scripts/course_info_finder.sh 数据结构
    video_viewed_by_user_and_course.sh Get all resource_ids of videos viewed by a given user of a given course ./scripts/video_viewed_by_user_and_course.sh U_94015 C_1824928
    problems_by_user.sh Get all problems attempted to solve by a given user. ./scripts/problems_by_user.sh U_10000835
    concepts_of_video.sh Get all concepts of given video ./scripts/concepts_of_video.sh V_479945
    who_replied.sh Get all other users who replied a given user's comment ./scripts/who_replied.sh U_10006544

Some tools above have dependency on jq or Python packages like matplotlib and tqdm.

Hints and Features

MOOCCubeX has some statistical features of its concepts and behavirol data.

  • MOOCCubeX contains more fine-grained concept compared with the previous version MOOCCube.
  • The video watching behaviors are long-tail, while the exercising is normal distribution.

Plots

Reference

@inproceedings{yu2021mooccubex,
  title={{MOOCCubeX}: A Large Knowledge-centered Repository for Adaptive Learning in {MOOCs}},
  author={Yu, Jifan and Wang, Yuquan and Zhong, Qingyang and Luo, Gan and Mao, Yiming and Sun, Kai and Feng, Wenzheng and Xu, Wei and Cao, Shulin and Zeng, Kaisheng and others},
  booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  pages={4643--4652},
  year={2021}
}

mooccubex's People

Contributors

yuq-1s avatar luzixiao avatar yujifan0326 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.