I'm interested in Large-scale Engineering, Data Engineering, Representation Learning, Multi-modal Understanding, Training Optimization, Data Curation
🛠️ LLM Data Engineer (now) - 42dot
🔍 Research Intern - Kakaobrain
🌿 Research Intern @kakaobrain
🇺🇸 Intern as a UI developer - Wavity
🇰🇷 Bachelor degree of Computer Science Engineering at Sogang University (2012 - 2019)
🇰🇷 Master degree of Computer Science Engineering at Sogang University (2020 - 2022)
🥈 2020 Korea Health Dataton 2nd Prize (Binary Classification on Breast Cancer Pathology Image)
🥇 2020 Naver AI Rush Challenge, 1st Prize on 3 Areas (Auto Tagging on Naver Shopping Image, Mood Classification on Music, Genre Classification on Japanese Music)
📚 coyo-700M Dataset: A large-scale dataset aimed at enhancing data curation and multi-modal understanding, publicly released for the research community. Check it out here: coyo-700M.
✍️ ViT Alignment Blog Post on Hugging Face: Based on the coyo-700M dataset, this blog post discusses the reproduction of Vision Transformer (ViT) models. Read the blog post: vit-align.