CS224n-2023-solution

This project is the newest solution for CS224n: Stanford NLP.（作业代码实现） From the time I was in my sophomore year and first encountered the concept of artificial intelligence, I felt that NLP was a more perplexing subject compared to others. Back then, when I was watching Andrew Ng's lecture on the simplest form of spam email classification (Naive Bayes), it was already challenging for me as I hadn't completed my probability theory studies. Since then, I've always had a sense of awe for NLP. Now that I'm in my junior year, I wanted to systematically study NLP. After completing CS231N and CS229, I chose Stanford's CS224N. Personally, I don't have a strong liking for this course. Only the assignments provide a bit of enjoyment; the rest of the content often feels difficult to comprehend logically, possibly due to my own limitations.

1. Prerequisites

This course, in my opinion, isn't very beginner-friendly. Often, the explanation of foundational concepts is difficult for those who understand it and incomprehensible for those who don't. Therefore, I strongly recommend having a certain foundation before attempting this course to achieve a sense of accomplishment.
Python: All assignments are done using numpy and PyTorch, so you must be proficient in this programming language.
Basics of deep learning: You can refer to CS231N or read "Dive into Deep Learning" by "Mu Li" (also known as d2l).
Basics of machine learning: You can take a look at CS229, "Pattern Recognition and Machine Learning," or at least have a foundation in probability theory. (Expressing language models in terms of probability can be confusing, so not understanding probability might lead to questioning life choices)

2. Course Structure

In my opinion, this course isn't aimed at beginners. The pace is fast, starting with the first lecture on word vectors as the main theme of NLP. The first eight lectures cover word vectors, word2vec/GloVe, dependency parsing, RNN, NMT, and Transformers. This learning path and content are challenging to follow without some foundational knowledge. I've discussed this with school teachers, and they also believe this course is difficult to teach. The worst part of the experience is the syntax analysis section. Many linguistic terminology terms are hard to understand. The course also covers several traditional statistical natural language processing models like the transition-based parser. Only when I worked on the assignments did I truly understand how to translate these into classification tasks, and I understood the meaning of RNN learning contextual dependencies.

Subsequent content, including pre-training, question-answering, Prompt & RLHF, and NLG, is still quite good. It includes references to numerous cutting-edge papers. However, to be honest, the slides are not easy to comprehend, relying heavily on the lecturer's explanation (unfortunately, the CS224N team's teaching abilities are average and there are no updated videos). As a result, many details need to be supplemented by reading papers. Personally, I find these slides more akin to technical reports, requiring a higher degree of self-learning from the students. Understanding some foundational concepts like zero/one/few-shot learning, In-context learning, and Instruction finetuning somewhat bridges the gap between newcomers like me and the advancements in the academic field. This helps in grasping what researchers are currently working on (even though I might not understand it again after a month of relaxation). It also helps to understand the different forms of datasets for various NLP tasks and how to convert them into classification tasks. Finally, on a slightly restless note, I went through the original papers of GPT1-4, the BERT series, and T5, gaining a general understanding of their popularity. The course also covers topics like multimodal and Tree Recursive NN, but I'm personally not interested (since not practicing what I learn makes me uncomfortable).

3. Assignment Structure

There are a total of five assignments, with the first three being relatively simple and the last two requiring considerable effort.

Assignment 1: Using word vectors. This part is straightforward, involving constructing vectors and implementing a co-occurrence matrix algorithm. It takes about two to three hours to complete.
Assignment 2: This part involves deriving formulas. It's similar to the problem setup in CS229. Understanding the content in the slides requires going through the derivations step by step. After that, you need to implement a word2vec model using numpy, which is relatively easy if you've studied CS231N.
Assignment 3: Dependency parsing. Initially, I didn't understand this part well. However, after going through the written exam section, I grasped the concepts to some extent. Then, I used PyTorch to implement a Neural Dependency Parser. In essence, it's a classification task with a bit of data structure thinking, and the difficulty isn't high.
Assignment 4: Neural Machine Translation. This assignment requires you to reimplement embedding, 1d-CNN, bidirectional LSTM, and Attention using PyTorch. Here, the shortcomings of the course become apparent. There are too few test cases, and sometimes even if they pass, the code might encounter errors halfway through execution. Additionally, the logic of this assignment is quite complex, and it took me almost a week of start-stop to complete.
Assignment 5: This assignment mainly guides you through the formulas of Attention. It also provides an example that helped me understand what the phrase "extracting different subspace information" means in the context of multi-head attention. The code is relatively small, focusing on implementing fine-tuning training for a GPT model.
Project: The default project requires you to reimplement a BERT model (excluding the Hugging Face library). You need to perform sentiment classification on the Stanford Sentiment Treebank (SST) and the IMDb movie reviews dataset (CFIMDB). This project is slightly more extensive than Assignment 4. Personally, I think reproducing BERT isn't too difficult, but there's an issue in reproducing CFIMDB where the final output shape doesn't match the label shape. After several debugging attempts, I gave up in the end, as I wasn't enrolled in this course. Nevertheless, this project is still worth working on.

4. Summary and Recommendations

In summary, this is a technical report-style course. It provides you with a direction in the field of NLP and leaves the rest for you to explore and learn on your own. I recommend completing all assignments and reading the first ten lecture slides within three weeks. Afterward, focus on spending 1-2 weeks to complete the project and read the supplementary material based on your research interests.

curlyfool / cs224n-2023-solution Goto Github PK