Course Code: AI 829
Course Name: Natural Language Processing
Course Instructor: Professor Srinath Srinivasa
Course Pre-Requisites: Mathematics for Machine Learning, Discrete Mathematics, Data Structures and Algorithms
This repository contains to all the materials, resources, tutorials, etc. delivered during the NLP course at International Institute of Information Technology (IIIT) Bangalore, 2024.
- History of language and linguistics
- Language paradigms
- Language and thought
- Mould and cloak hypothesis
- Linguistic determinism
- Linguistic relativism
- NLP fundamentals
- History of NLP
- NLP and Symbolic Logic
- Statistical NLP
- Neural NLP
- Architecture of LLMs
- Foundation Models and Transfer Learning
- Fine-tuning LLMs
- Retrieval Augmented Generation
- Distributional Semantics
- Relevance models
- Regular expressions
- Stems, lemmas and morphological forms
- Keyphrase extraction
- Phrase identification models (CAP, PMI, N-grams)
- Spelling variants and spelling mistake corrections
- Phonetic hashing
- Semantic hashing and word embeddings
- Tutorials on lexical processing
- Shallow parsing and POS tagging
- HMMs and Viterbi heuristic
- Introduction to CFGs and Parsing
- Ambiguity, left recursion and probabilistic parsing
- Long range dependencies and coreference resolution
- Free word-order languages
- Dependency parsing
- Tutorials on syntactic processing
- Conceptual modeling fundamentals
- Word sense disambiguation
- Named entity recognition
- Spectral models for latent semantics (LSA, PLSA, PCA, word and document embeddings)
- Topic modeling
- Masked Language Model (MLM)
- Discourse and conversation modeling
- Tutorials on semantic processing
- Christopher Manning, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
- Dan Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson 2014.
- Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O’Reilly.
- PyTorch resources
- NLTK
- Stanford CoreNLP
- Apache OpenNLP
- SpaCy
- AllenNLP
- Gensim
- TextBlob
- NLP Architect
- NLLB from Meta
- AI4Bharat IndicNLP
- IndicNLP python library
- iNLTK (Indic NLTK library)
- Indic NLP Library
- BhashaIndia
- Bhashini
- IndicNLP Resources from School of Sanskrit and Indic Studies at JNU
- Linguistic Data Consortium for Indic Languages, CIIL, Mysore
A set of rubrics for grading a mandate contribution include the following:
-
Relevance: The contribution should be relevant to the current mandate and should contribute to the overall collective knowledge of the class pertaining to this mandate.
-
Originality: Mandate contributions should be original knowledge-creation exercises. Plagiarism is strictly forbidden. Contributions with plagiarised content automatically get an F grade.
-
Specificity: Mandate contributions that address a specific problem, or make specific points with the required rigour, are graded higher than contributions that make very general “newspaper-style” statements.
-
Synthesis: Mandate contributions that synthesize knowledge from multiple sources and bring out the contributor’s own constructed knowledge, are rated higher than contributions that simply report on an existing paper or result.
-
Impact: Mandate contributions are also rated for their impact on the rest of the class, based on the quality of responses it generates from other members of the class.