최신 LLM 관련 논문 스터디. 항상 오후에 진행. LLM, NLG, Dialogue, Reinforcement learning, Distillation, Efficient, Sentence similarity, multiple tasks, multimodal, Stable diffusion, TTS, Text-To-Video, All-To-All ETC...
- 영어 금지.
- 외국인 금지.
- 1주일에 논문 2개 이상.
- 되는 사람은 10개 이상.
- 최대 20분 현장에서 논문 읽기.
- 최대 40분 토론.
- 1시간 스터디 시 바로 나가도 됨.
- 자유롭게.
- 모두연 규칙 붙여넣기.
- 다들 대단한 분들이니 질문 많이.
- 공유 자주.
- 각자 더 뛰어난게 있다는 것을 인지.
- 겸손하기 노력하기 잘하기.
- 잠수 금지.
2023-02-16 11:30 ~ 12:45 염기웅, 강수진, 고현웅
- GPT Understands, To
- P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
- Do Prompt-Based Models Really Understand the Meaning of their Prompts?
2023-02-18 7:30 ~ 8:30 염기웅, 박상준, 강수진,
2023-02-19 11:30 ~ 12:30 염기웅, 박상준, 강수진, 김찬란,
2023-02-22 7:30 ~ 8:30 염기웅, 박상준, 강수진, 고현웅, 이현제
앞으로 할만한 논문, 코드, 강의 등.
- Improving language models by retrieving from trillions of tokens
- FLAN: Finetuned Language Models Are Zero-Shot Learners
- T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- The Wisdom of Hindsight Makes Language Models Better Instruction Followers
- Exploring the Benefits of Training Expert Language Models over Instruction Tuning
- Unsupervised Imputation of Non-ignorably Missing Data Using Importance-Weighted Autoencoders
- The Power of Scale for Parameter-Efficient Prompt Tuning
- Constitutional AI: Harmlessness from AI Feedback
- Deep reinforcement learning from human preferences
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Large Language Models with Controllable Working Memory
- Do Prompt-Based Models Really Understand the Meaning of their Prompts?
- Muse: Text-To-Image Generation via Masked Generative Transformers
- Structure and Content-Guided Video Synthesis with Diffusion Models
- Generative Pretraining from Pixels
- A hunt for the Snark: Annotator Diversity in Data Practices
- Accurate global machine learning force fields for molecules with hundreds of atoms
- Algorithms with More Granular Differential Privacy Guarantees
- Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types
- Are we cobblers without shoes? Making Computer Science data FAIR
- Code Generation for In-Place Stencils
- Creating, Calibrating, and Validating Large-Scale Microscopic Traffic Simulation
- Increasing Impact of Mobile Health Programs: SAHELI for Maternal and Child Care
- Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges
- Developer Productivity for Humans: A Human-Centered Approach to Developer Productivity
- Development of a Machine Learning Model for Sonographic Assessment of Gestational Age
- Drug Design on Quantum Computers
- Estimates of broadband upwelling irradiance from GOES-16 ABI
- Information Processing and Management
- Flake Aware Culprit Finding
- Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation
- Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation
- High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs
- Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation
- Infrastructuring Care: How Trans and Non-Binary People Meet Health and Well-Being Needs through Technology
- KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals
- Learning to Bid in Contextual First Price Auctions
- Machine Learning for Healthcare: A Bibliometric Study of Contributions from Africa
- Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health
- Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program
- Recitation-Augmented Language Models
- RL4ReAl: Reinforcement Learning for Register Allocation
- Quantum Simulation of Exact Electron Dynamics can be more Efficient than Classical Mean-Field Methods
- Quantum simulation of exact electron dynamics can be more efficient than classical mean-field methods
- Propeller: A Profile Guided, Relinking Optimizer for Warehouse-Scale Applications
- Large Language Models are Zero-Shot Reasoners
- DistilKoBiLSTM
- Donut 🍩 : Document Understanding Transformer
- The RWKV Language Model (and my LM tricks)
-
Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club
-
ECMLPKDD2021: WuDao: Pretrain the World, Keynote speaker talk by Jie Tang
-
StrictlyVC in conversation with Sam Altman, part two (OpenAI)
-
Are Bigger Language Models Better? | DeepMind Gopher and RETRO
- Deepmind: Improving language models by retrieving from trillions of tokens
- Deepmind: Building safer dialogue agents
- Deepmind: Competitive programming with AlphaCode
- Deepmind: Mastering Stratego, the classic game of imperfect information
- Deepmind: DeepMind’s latest research at NeurIPS 2022
- Deepmind: Building interactive agents in video game worlds
- Deepmind: Discovering novel algorithms with AlphaTensor
- Deepmind: AlphaFold reveals the structure of the protein universe
- Deepmind: Tackling multiple tasks with a single visual language model
- Deepmind: Exploring the beauty of pure mathematics in novel ways
- Deepmind: Nowcasting the next hour of rain
- Deepmind: Putting the power of AlphaFold into the world’s hands
- Google Research: Deciphering clinical abbreviations with privacy protecting ML
- Google Research: Google Research, 2022 & beyond: Language, vision and generative models
- Google Research: Google Research, 2022 & beyond: Responsible AI
- Google Research: Learning with queried hints
- Google Research: Open Source Vizier: Towards reliable and flexible hyperparameter and blackbox optimization
- Google Research: Google Research, 2022 & beyond: ML & computer systems
- Google Research: Real-time tracking of wildfire boundaries using satellite imagery
- Google Research: Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median
- Google Research: Chimane-Mosetén
- Google Research: Differentially Private All-Pairs Shortest Path Distances: Improved Algorithms and Lower Bounds
- Google Research: Differentially Private Fair Division
- Google Research: DiffQG: Generating Questions on Paired Sentences
- Google Research: Assessment of Security Defense of Native Programs Against Software Faults
- Google Research: Adaptive mixing of auxiliary losses in supervised learning
- OpenAI: Multimodal Neurons in Artificial Neural Networks
- OpenAI: DALL·E: Creating Images from Text
- OpenAI: CLIP: Connecting Text and Images
- OpenAI: Image GPT
- OpenAI: Jukebox
- OpenAI: Solving Rubik’s Cube with a Robot Hand
- OpenAI: Multimodal Neurons in Artificial Neural Networks
- OpenAI: CLIP: Connecting Text and Images
- OpenAI: Image GPT
- OpenAI: MuseNet
- OpenAI: Emergent Tool Use from Multi-Agent Interaction
- CS224U: Natural Language Understandin
- Gen-1: The Next Step Forward for Generative AI
- DIFF-SVC FOR VOCAL SYNTH USERS
- Chat GPT detector by ZeroGPT: detect OpenAI text
- 염기웅: 저는 여러분을 모으고 프로메우스와 바드의 꿈이라는 책을 쓰는 염기웅입니다. LLM Dialogue Distillation 에 관심과 경험이 있습니다! 경량화 Mlops 서빙 멀티모달 멀티태스크 모델에도 관심이 있습니다. [email protected] https://github.com/gyunggyung
- 강수진:
- 고현웅:
- 박상준:
- 김찬란:
- 이현제: 삼성SDS에서 자연어처리를 연구하고 있습니다. instruction finetuning, sentence representation 쪽에 관심이 있습니다. [email protected]
- 김기현:
Google Scholar에서 추천 받은 것들
[PDF] Stabilized In-Context Learning with Pre-trained Language Models for Few Shot Dialogue State Tracking D Chen, K Qian, Z Yu - arXiv preprint arXiv:2302.05932, 2023 Prompt-based methods with large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks. These models improve even further with the addition of a few labeled in-context exemplars to guide output … 저장 Twitter LinkedIn Facebook
[PDF] Decoupling the Skeleton Parsing and Schema Linking for Text-to-SQL H Li, J Zhang, C Li, H Chen - arXiv preprint arXiv:2302.05965, 2023 One of the recent best attempts at Text-to-SQL is the pre-trained language model. Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (ie, tables and columns) and the … 저장 Twitter LinkedIn Facebook
[PDF] What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories O Sainz, OL de Lacalle, E Agirre, G Rigau - arXiv preprint arXiv:2302.03353, 2023 Language Models are the core for almost any Natural Language Processing system nowadays. One of their particularities is their contextualized representations, a game changer feature when a disambiguation between word senses is necessary. In this … 저장 Twitter LinkedIn Facebook
[PDF] The Wisdom of Hindsight Makes Language Models Better Instruction Followers T Zhang, F Liu, J Wong, P Abbeel, JE Gonzalez - arXiv preprint arXiv:2302.05206, 2023 Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive … 저장 Twitter LinkedIn Facebook
[PDF] Task-Specific Skill Localization in Fine-tuned Language Models A Panigrahi, N Saunshi, H Zhao, S Arora - arXiv preprint arXiv:2302.06600, 2023 Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-shot settings. Thus fine-tuning allows the model to quickly pick up task- specific skills,''but there has been limited study of where these newly-learnt skills … 저장 Twitter LinkedIn Facebook
[PDF] Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues C Li, P Huber, W Xiao, M Amblard, C Braud, G Carenini - arXiv preprint arXiv …, 2023 Discourse processing suffers from data sparsity, especially for dialogues. As a result, we explore approaches to build discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs). We investigate … 저장 Twitter LinkedIn Facebook
[PDF] LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization K Krishna, E Bransom, B Kuehl, M Iyyer, P Dasigi… - arXiv preprint arXiv …, 2023 While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of … 저장 Twitter LinkedIn Facebook
[PDF] Prompting Large Language Model for Machine Translation: A Case Study B Zhang, B Haddow, A Birch - arXiv preprint arXiv:2301.07069, 2023 Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study … 저장 Twitter LinkedIn Facebook
[PDF] Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering X Ho, AKD Nguyen, S Sugawara, A Aizawa - arXiv preprint arXiv:2302.05963, 2023 To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets. However, it remains an open question as to how effective … 저장 Twitter LinkedIn Facebook
[PDF] Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information YT Lin, A Papangelis, S Kim, S Lee, D Hazarika… - arXiv preprint arXiv …, 2023 This work focuses on in-context data augmentation for intent detection. Having found that augmentation via in-context prompting of large pre-trained language models (PLMs) alone does not improve performance, we introduce a novel approach based …
[PDF] Knowledge is a Region in Weight Space for Fine-tuned Language Models A Gueta, E Venezian, C Raffel, N Slonim, Y Katz… - arXiv preprint arXiv …, 2023 Research on neural networks has largely focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, especially those trained or tested on different datasets. We … 저장 Twitter LinkedIn Facebook
[PDF] MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization P Manakul, A Liusie, MJF Gales - arXiv preprint arXiv:2301.12307, 2023 State-of-the-art summarization systems can generate highly fluent summaries. These summaries, however, may contain factual inconsistencies and/or information not present in the source. Hence, an important component of assessing the quality of … 저장 Twitter LinkedIn Facebook
[PDF] Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning Methods H Cho, C Park, J Kim, HJ Kim, KM Yoo, S Lee - arXiv preprint arXiv:2301.11660, 2023 As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive … 저장 Twitter LinkedIn Facebook
[PDF] Progressive Prompts: Continual Learning for Language Models A Razdaibiedina, Y Mao, R Hou, M Khabsa, M Lewis… - arXiv preprint arXiv …, 2023 We introduce Progressive Prompts-a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task … 저장 Twitter LinkedIn Facebook
[PDF] Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection C Wang, Y Lu, Y Mu, Y Hu, T Xiao, J Zhu - arXiv preprint arXiv:2302.00444, 2023 Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use of them … 저장 Twitter LinkedIn Facebook
Multi-stage transfer learning with BERTology-based language models for question answering system in vietnamese K Van Nguyen, PNT Do, ND Nguyen, AGT Nguyen… - International Journal of …, 2023 With the fast growth of information science and engineering, a large number of textual data generated are valuable for natural language processing and its applications. Particularly, finding correct answers to natural language questions or … 저장 Twitter LinkedIn Facebook
[PDF] Debiased Fine-Tuning for Vision-language Models by Prompt Regularization B Zhu, Y Niu, S Lee, M Hur, H Zhang - arXiv preprint arXiv:2301.12429, 2023 We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg). Different from traditional fine-tuning which easily overfits to the downstream task data, ProReg uses … 저장 Twitter LinkedIn Facebook
[PDF] Understanding Finetuning for Factual Knowledge Extraction from Language Models M Kazemi, S Mittal, D Ramachandran - arXiv preprint arXiv:2301.11293, 2023 Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph … 저장 Twitter LinkedIn Facebook
[PDF] Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization Z Guo, M Yan, J Qi, J Zhou, Z He, Z Lin, G Zheng… - arXiv preprint arXiv …, 2023 Pre-trained language models (PLM) have achieved remarkable advancement in table-to-text generation tasks. However, the lack of labeled domain-specific knowledge and the topology gap between tabular data and text make it difficult for … 저장 Twitter LinkedIn Facebook
[PDF] Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning MD Ma, JY Kao, S Gao, A Gupta, D Jin, T Chung… - arXiv preprint arXiv …, 2023 Dialogue state tracking (DST) is an important step in dialogue management to keep track of users' beliefs. Existing works fine-tune all language model (LM) parameters to tackle the DST task, which requires significant data and computing resources for … 저장 Twitter LinkedIn Facebook
Grounded language-image pre-training LH Li, P Zhang, H Zhang, J Yang, C Li, Y Zhong, L Wang, L Yuan, L Zhang, JN … Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
Vinvl: Revisiting visual representations in vision-language models P Zhang, X Li, X Hu, J Yang, L Zhang, L Wang, Y Choi, J Gao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
Unified vision-language pre-training for image captioning and vqa L Zhou, H Palangi, L Zhang, H Hu, J Corso, J Gao Proceedings of the AAAI conference on artificial intelligence, 2020