Coder Social home page Coder Social logo

ko-fine-tuning_datagen's Introduction

KO-FINE-TUNING_DATAGEN ๐Ÿ“‹

worker icon


  • ๋ณธ ์—ฐ๊ตฌ๋Š” (์ฃผ)๋งˆ์ปค์™€ (์ฃผ)๋ฏธ๋””์–ด๊ทธ๋ฃน์‚ฌ๋žŒ๊ณผ์ˆฒ์˜ ์˜คํ”ˆ์†Œ์Šค LLM ์—ฐ๊ตฌ ์ปจ์†Œ์‹œ์—„์—์„œ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ํ•œ๊ตญ ์˜คํ”ˆ์†Œ์Šค ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ๋ฒˆ์˜๊ณผ, ์ธ๊ณต์ง€๋Šฅ ๋ถ„์•ผ์˜ ๋ฐœ์ „์„ ๊ธฐ์›ํ•ฉ๋‹ˆ๋‹ค.

  • ์ด Repo๋Š” Self Supervised Learning ๋งค๋„ˆ๋กœ Corpus์—์„œ Fine-tuning Data๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ repository ์ž…๋‹ˆ๋‹ค.

  • ํ•œ๊ตญ LLM ์ƒํƒœ๊ณ„์˜ ๊ฒฝ์šฐ, ๊ณ ํ’ˆ์งˆ์˜ Fine-tuning ๋ฐ์ดํ„ฐ์…‹์ด ๋งค์šฐ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.

  • GPT4๋‚˜ Gemini๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ํ˜„์žฌ ๊ฐ ๋ผ์ด์„ผ์Šค์—์„œ ๊ฐ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒฝ์Ÿํ•˜๋Š” ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š”๊ฑด ํ˜„์žฌ ๋ผ์ด์„ผ์Šค ์œ„๋ฐ˜์œผ๋กœ ๋‚˜์™€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ, ๋ผ์ด์„ผ์Šค Freeํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ๊ณต์œ ํ•˜๊ณ ์ž ์ด๋ ‡๊ฒŒ ๊นƒํ—ˆ๋ธŒ repo๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๊ธฐ์กด ํ—ˆ๊น…ํŽ˜์ด์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—๋Š”, ์ €์ž‘๊ถŒ ๋…ผ์˜๊ฐ€ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ์–ด SSL ๋ฐฉ๋ฒ•๋ก ์„ ์ ์šฉํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ์…‹์„

  • AI-Hub์˜ ๋ง๋ญ‰์น˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.


Methodology ๐Ÿ“•

  • 1. Multi question
    • ์œ„ ๋ฐ์ดํ„ฐ๋Š”, ์ฃผ์–ด์ง€๋Š” ๋‹ค์–‘ํ•œ ์งˆ๋ฌธ์„ ๋ชจ๋ธ์ด ์ดํ•ดํ•˜๊ณ , ๊ฐ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ๋ชจ๋ธ์ด ๋‹ตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ๊ตฌ์„ฑํ•œ ๋ฐ์ดํ„ฐ ์ž…๋‹ˆ๋‹ค.

    • ์—ฐ์†์ ์ธ ์งˆ๋ฌธ์— ๋Œ€๋‹ตํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๊นŒ๋‹ค๋กœ์šด task์ด๋ฉฐ, ์ด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ํ–ฅํ›„ RAG์™€ ๊ฐ™์€ ์‹œ์Šคํ…œ ๋‚ด์—์„œ์—์„œ๋„ ๋ฉ€ํ‹ฐํ„ด ํƒœ์Šคํฌ์— ๋Œ€์‘ ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

<Instruction>
์ฃผ์–ด์ง„ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์งˆ๋ฌธ์— ๋‹ตํ•˜์„ธ์š”. ๋‹ต์„ ๋ชจ๋ฅธ๋‹ค๋ฉด ๋‹ต์„ ์ง€์–ด๋‚ด์ง€ ๋ง๊ณ  ๊ทธ๋ƒฅ ๋ชจ๋ฅธ๋‹ค๊ณ  ๋งํ•˜์„ธ์š”.

1839๋…„ ๋ฐ”๊ทธ๋„ˆ๋Š” ๊ดดํ…Œ์˜ ํŒŒ์šฐ์ŠคํŠธ์„ ์ฒ˜์Œ ์ฝ๊ณ  ๊ทธ ๋‚ด์šฉ์— ๋งˆ์Œ์ด...

์งˆ๋ฌธ:
1. ๋ฐ”๊ทธ๋„ˆ๋Š” ๊ดดํ…Œ์˜ ํŒŒ์šฐ์ŠคํŠธ๋ฅผ ์ฝ๊ณ  ๋ฌด์—‡์„ ์“ฐ๊ณ ์ž ํ–ˆ๋Š”๊ฐ€?
2. ๋ฐ”๊ทธ๋„ˆ๋Š” ๊ตํ–ฅ๊ณก ์ž‘๊ณก์„ ์–ด๋””๊นŒ์ง€ ์“ด ๋’ค์— ์ค‘๋‹จํ–ˆ๋Š”๊ฐ€?
3. ๋ฐ”๊ทธ๋„ˆ๊ฐ€ ํŒŒ์šฐ์ŠคํŠธ ์„œ๊ณก์„ ์“ธ ๋•Œ ์–ด๋–ค ๊ณก์˜ ์˜ํ–ฅ์„ ๋ฐ›์•˜๋Š”๊ฐ€?
4. 1839๋…„ ๋ฐ”๊ทธ๋„ˆ๊ฐ€ ๊ตํ–ฅ๊ณก์˜ ์†Œ์žฌ๋กœ ์“ฐ๋ ค๊ณ  ํ–ˆ๋˜ ์ฑ…์€?
5. ํŒŒ์šฐ์ŠคํŠธ ์„œ๊ณก์˜ ๋ผ๋‹จ์กฐ ์กฐ์„ฑ์ด ์˜ํ–ฅ์„ ๋ฐ›์€ ๋ฒ ํ† ๋ฒค์˜ ๊ณก์€?
6. ๋ฐ”๊ทธ๋„ˆ๊ฐ€ ํŒŒ์šฐ์ŠคํŠธ๋ฅผ ์ฒ˜์Œ์œผ๋กœ ์ฝ์€ ๋…„๋„๋Š”?
7. ๋ฐ”๊ทธ๋„ˆ๊ฐ€ ์ฒ˜์Œ ๊ตํ–ฅ๊ณก ์ž‘๊ณก์„ ํ•œ ์žฅ์†Œ๋Š”?
8. ๋ฐ”๊ทธ๋„ˆ์˜ 1์•…์žฅ์˜ ์ดˆ์—ฐ์€ ์–ด๋””์„œ ์—ฐ์ฃผ๋˜์—ˆ๋Š”๊ฐ€?

<Answer>
1. ๊ตํ–ฅ๊ณก
2. 1์•…์žฅ
3. ๋ฒ ํ† ๋ฒค์˜ ๊ตํ–ฅ๊ณก 9๋ฒˆ
4. ํŒŒ์šฐ์ŠคํŠธ
5. ํ•ฉ์ฐฝ๊ตํ–ฅ๊ณก
6. 1839
7. ํŒŒ๋ฆฌ
8. ๋“œ๋ ˆ์Šค๋ด
  • 2. Summary & Instruction-Answer
    • ์œ„ ๋ฐ์ดํ„ฐ๋Š”, ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ํ†ตํ•ด ์ ์ ˆํ•œ ์ œ๋ชฉ์„ ์ƒ์„ฑํ•˜๊ณ , ๋‚ด์šฉ์„ ์š”์•ฝํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ž…๋‹ˆ๋‹ค.

    • ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ์ •๋ณด ๊ธฐ๋ฐ˜์—์„œ ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๊ณ , ์š”์ฒญ์— ์ ์ ˆํ•œ ๋‹ต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์—์„œ ๊ณ ์•ˆ๋œ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

<Instruction>
์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ์ ์ ˆํ•œ ์ œ๋ชฉ์„ ์ƒ์„ฑํ•˜๊ณ , ๋‚ด์šฉ์„ ์š”์•ฝํ•ด์ฃผ์„ธ์š”.

๋ฌธ์žฅ: ์›๊ณ ๊ฐ€ ์†Œ์†ํšŒ์‚ฌ์˜ ๋…ธ๋™์กฐํ•ฉ์—์„œ ๋ถ„๊ทœ๊ฐ€ ๋ฐœ์ƒํ•˜์ž ๋…ธ์กฐํ™œ๋™์„ ๊ตฌ์‹ค๋กœ ์ •์ƒ์ ์ธ ๊ทผ๋ฌด๋ฅผ ํ•ดํƒœํ•˜๊ณ , ...

<Answer>
์ œ๋ชฉ: ๋ถ€๋‹น๋…ธ๋™ํ–‰์œ„๊ตฌ์ œ์žฌ์‹ฌํŒ์ •์ทจ์†Œ
์›๊ณ ๊ฐ€  ์ฃผ๋™ํ•˜์—ฌ ํšŒ์‚ฌ์—…๋ฌด๋Šฅ๋ฅ ์„ ์ €ํ•ดํ•˜๊ณ  ํšŒ์‚ฌ์—…๋ฌด์ƒ์˜ ์ง€ํœ˜๋ช…๋ น์— ์œ„๋ฐ˜ํ•˜์˜€๋‹ค๋ฉด ์ด์— ๋”ฐ๋ฅธ ์ง•๊ณ„ํ•ด๊ณ ๋Š” ์‚ฌ๋‚ด์งˆ์„œ๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉ์ž ๊ณ ์œ ์˜ ์ •๋‹นํ•œ ์ง•๊ณ„๊ถŒ์˜ ํ–‰์‚ฌ๋กœ ๋ณด์•„์•ผ ํ•œ๋‹ค.
  • 3. Original sentence inference
    • ์œ„ ๋ฐ์ดํ„ฐ๋Š”, ์ฃผ์–ด์ง„ ์š”์•ฝ๋ฌผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์›๋ณธ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

    • ๋ชจ๋ธ์ด ์ ์ ˆํ•œ ์ถ”๋ก ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆํ•œ ๋ฐ์ดํ„ฐ ์ž…๋‹ˆ๋‹ค.

<Instruction>
์ฃผ์–ด์ง„ ์ œ๋ชฉ๊ณผ ์š”์•ฝ๋ฌธ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ† ๋Œ€๋กœ, ์š”์•ฝ๋˜๊ธฐ ์ „ ๋ฌธ์žฅ์„ ์œ ์ถ”ํ•ด์„œ ์ƒ์„ฑํ•ด์ฃผ์„ธ์š”.

์ œ๋ชฉ: ์ˆ˜์‚ฐ๋ฌผ ์ˆ˜๊ธ‰ ์œ„๊ธฐ๊ด€๋ฆฌ์ฒด๊ณ„ ๊ตฌ์ถ•์„ ์œ„ํ•œ ๊ธฐ์ดˆ์—ฐ๊ตฌ
์š”์•ฝ๋ฌธ: ํ˜„๋Œ€ ์‚ฌํšŒ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋‹ค์–‘ํ•˜๊ณ ...

<Answer>
์ง€๊ธˆ์˜ ๊ตญ๊ฐ€๊ฐ€ ์ง๋ฉดํ•˜๋Š” ์œ„๊ธฐ๋Š” ์ „ํ†ต์‚ฌํšŒ์˜ ๊ทธ๊ฒƒ๊ณผ ์œ„๊ธฐ์˜ ๊ทœ๋ชจ๋ฟ๋งŒ์•„๋‹ˆ๋ผ...
  • 4. Sentence order inference
    • ์œ„ ๋ฐ์ดํ„ฐ๋Š”, ์ฃผ์–ด์ง„ ๋ฌธ์žฅ ํ˜น์€ ๋‹จ์–ด๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์ ์ ˆํ•œ ๋ฌธ์žฅ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ž…๋‹ˆ๋‹ค.

    • ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ๋ฌธ์žฅ ํ˜น์€ ๋‹จ์–ด๋ฅผ ํ†ตํ•ด, ์ ์ ˆํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ž…๋‹ˆ๋‹ค.

<Instruction>
์ž„์˜์˜ ์ˆœ์„œ๋กœ ๋‚˜์—ด๋œ ๋ฌธ์žฅ๋“ค์ด ์ฃผ์–ด์ง‘๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ๋ฌธ์žฅ๋“ค์„ ์ด์šฉํ•ด ์›๋ณธ์˜ ๋ฐฐ์—ด์„ ์œ ์ถ”ํ•˜๊ณ , ๊ทธ ๋‚ด์šฉ์„ ์žฌ๊ตฌ์„ฑํ•˜์„ธ์š”.

์ž„์˜์˜ ์ˆœ์„œ๋กœ ๋‚˜์—ด๋œ ๋ฌธ์žฅ: ['๋‚˜๋Š”', '์ฒœ์žฌ๋‹ค', '๊ทธ๋Ÿฌ๋‚˜', '๋ฐ”๋ณด๋‹ค', '๋™์‹œ์—']

<Answer>
๋‚˜๋Š” ์ฒœ์žฌ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋™์‹œ์— ๋ฐ”๋ณด๋‹ค.
  • 5. Last sentence prediction
    • ์œ„ ๋ฐ์ดํ„ฐ๋Š”, ์ฃผ์–ด์ง„ ๋ฌธ๋‹จ์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ž…๋‹ˆ๋‹ค.

    • ๋ชจ๋ธ์˜ ๋ฌธ๋งฅ ์ดํ•ด๋ ฅ ํ–ฅ์ƒ๊ณผ, ์ ์ ˆํ•œ ์ƒ์„ฑ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆํ•œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

<Instruction>
์ฃผ์–ด์ง„ ๋ฌธ์žฅ ๋’ค์— ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ด์–ด์งˆ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด์ฃผ์„ธ์š”.

๋ฌธ์žฅ: ...์ตœ๊ทผ์— ๋ฐฉ๋ฌธํ•œ ์กฐ์„ ์˜ˆ์ˆ ์˜ํ™”์ดฌ์˜์†Œ ์— ์žˆ๋Š” โ€˜๋ฌธํ™”์„ฑํ˜๋ช…์‚ฌ์ ๊ด€โ€™(๊น€์ •์ผ๊ด€)์—๋Š” 1960๋…„๋Œ€ ์ค‘๋ฐ˜๋ถ€ํ„ฐ 2000๋…„๋Œ€๊นŒ์ง€ 40๋…„ ๋™์•ˆ ๊น€์ •์ผ์˜ ๋ฌธํ™”์˜ˆ์ˆ  ๋ถ€๋ฌธ ์ง€๋„๊ฐ€ 11,890๊ฑด์ด๋ฉฐ, ๊ทธ ์ค‘ ๋ฌธํ™”์˜ˆ์ˆ ๊ธฐ๊ด€์„ ์ง์ ‘ ๋ฐฉ๋ฌธํ•˜์—ฌ ์ง€๋„ํ•œ ์ด๋ฅธ๋ฐ” โ€˜ํ˜„์ง€์ง€๋„โ€™๊ฐ€ 1,770๊ฑด์ด๋ผ๋Š” ์•ˆ๋‚ดํŒ์ด ์žˆ์—ˆ๋‹ค.

<Answer>
๋ถํ•œ ์—ฐ๊ทน์ด ๊น€์ •์ผ๊ณผ ์ฃผ์ฒด์‚ฌ์ƒ์ด๋ผ๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ๋– ๋‚˜ ์กด์žฌํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ๋‹จ์ ์œผ๋กœ ๋งํ•ด ์ค€๋‹ค
  • 6. Mask Prediction
    • LLM ์€ Transformer์˜ Decoder๋งŒ์„ ํ™œ์šฉํ•˜์—ฌ ๊ตฌ์„ฑ๋œ Auto-regressive ์–ธ์–ด ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

    • Transformer Decoder์—๋งŒ ์กด์žฌํ•˜๋Š” Masked-Multi Head Attetntion ๋ธ”๋ก์€, ๋ชจ๋ธ์ด ๋ฏธ๋ž˜์˜ ์ •๋ณด๋ฅผ ์ฐธ์กฐํ•˜์ง€ ์•Š๊ณ , ๊ณผ๊ฑฐ์™€ ํ˜„์žฌ state๋งŒ์„ ํ™œ์šฉํ•˜์—ฌ ํ† ํฐ์„ ์ƒ์„ฑํ•˜๊ฒŒ ํ•˜๋Š”๋ฐ, ์ด๋Š” ๊ตฌ์กฐ์ ์œผ๋กœ ํ˜„์žฌ LLM์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ์‹์ธ CLM ๋ฐฉ์‹์„ ์ ์šฉํ•˜๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.

    • ๋‹ค๋งŒ, ๊ธฐ์กด Transforemer์˜ Encoder๋งŒ์„ ํ™œ์šฉํ•œ BERT๋Š” Bidirectional ํ•˜๊ฒŒ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ธฐ์—, MLM ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์ „ํ•™์Šต์„ ์ง„ํ–‰ํ•ด ์–ธ์–ด์˜ ๋ฌธ๋งฅ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

    • ์ด๋Ÿฌํ•œ ๋งค์ปค๋‹ˆ์ฆ˜์— ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป์–ด, ๋ฌธ์žฅ์— ๋žœ๋คํ•œ ๋‹จ์–ด๋ฅผ maskingํ•˜๊ณ , ๋งˆ์Šคํ‚น ํ•œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๊ฒŒ ํ•˜์—ฌ ๋ชจ๋ธ์—๊ฒŒ ๋ฌธ๋งฅ์˜ ์ดํ•ด์™€ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ‚ค์šฐ๊ณ ์ž ํ•˜๋Š” ๋ชฉ์ ์„ฑ์—์„œ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

<Instruction>
<Instruction>
์ฃผ์–ด์ง„ ๋ฌธ์žฅ์—์„œ <MASK>์— ๋“ค์–ด๊ฐˆ ์ ์ ˆํ•œ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•ด์ฃผ์„ธ์š”.

๋…๋„๋Š” <MASK> ์ด๋‹ค.

<Answer>
์šฐ๋ฆฌ๋•…

How to Use?

    1. ์šฐ์„  ์ €ํฌ repo์— ์˜ฌ๋ผ์™€ ์žˆ๋Š”, KoCommercial-Dataset.ipynb๋ฅผ ์‹คํ–‰ ์‹œํ‚จํ›„ fine-tuning dataset์ผ๋ถ€์™€ Corpus์ •์ œ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    1. ์ดํ›„ , แ„‹แ…ตแ†ฏแ„‡แ…กแ†ซแ„‰แ…กแ†ผแ„‰แ…ตแ†จแ„†แ…ฎแ†ซแ„Œแ…กแ†ผแ„‰แ…ขแ†ผแ„‰แ…ฅแ†ผแ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅ.ipynb์™€ แ„‚แ…ฉแ†ซแ„†แ…ฎแ†ซแ„Œแ…กแ„…แ…ญแ„‹แ…ญแ„‹แ…ฃแ†จ.ipynb๋ฅผ ์‹คํ–‰ํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Ref

  • LLM์„ ์—ฐ๊ตฌํ•˜๋ฉด์„œ ์ฐธ ๋งŽ์€ ์˜คํ”ˆ์†Œ์Šค ๊ฐœ๋ฐœ์ž๋“ค๊ณผ ์—ฐ๊ตฌ์›๋“ค์˜ ๋…ธ๋ ฅ์„ ๋ณด๋ฉฐ ์ €ํฌ๋„ ์—ด์‹ฌํžˆ ๋…ธ๋ ฅํ•˜๊ณ  ์—ฐ๊ตฌํ•˜์˜€๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ด์— ์ €ํฌ๋„ ์ƒํƒœ๊ณ„ ๋ฐœ์ „๊ณผ, ๋” ๋‚˜์•„๊ฐ€ ์ „์ฒด์ ์ธ ์ˆ˜์ค€ํ–ฅ์ƒ์— ์กฐ๊ธˆ์ด๋ผ๋„ ๊ธฐ์—ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ด๋ ‡๊ฒŒ ๊ณตํ—Œํ•ด๋ณด๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

  • ๋‹ค์‹œ ํ•œ๋ฒˆ, ๊ฐ์‚ฌ๋“œ๋ฆฌ๋ฉฐ, ์ฝ”๋“œ๋Š” ๋ฌด๋‹จ์œผ๋กœ ๊ฐ€์ ธ๊ฐ€์…” ๋ฉ๋‹ˆ๋‹ค (๋ ˆํผ๋Ÿฐ์Šค๋งŒ ๋‚จ๊ฒจ์ฃผ์„ธ์š”๐Ÿฅฒ).


Acknowledgement

  • (์ฃผ)๋งˆ์ปค์™€ (์ฃผ)๋ฏธ๋””์–ด๊ทธ๋ฃน์‚ฌ๋žŒ๊ณผ์ˆฒ์˜ ์ปจ์†Œ์‹œ์—„์—์„œ ํ•™์ˆ ์ ์ธ ๋ชฉ์ ์œผ๋กœ ์—ฐ๊ตฌ๋˜์—ˆ์œผ๋ฉฐ, MIT License๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

  • ์ด ๋ชจ๋ธ์€ ๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€ยท๊ด‘์ฃผ๊ด‘์—ญ์‹œ๊ฐ€ ๊ณต๋™ ์ง€์›ํ•œ '์ธ๊ณต์ง€๋Šฅ ์ค‘์‹ฌ ์‚ฐ์—…์œตํ•ฉ ์ง‘์ ๋‹จ์ง€ ์กฐ์„ฑ์‚ฌ์—…'์œผ๋กœ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

  • This model was supported by Artificial intelligence industrial convergence cluster development project funded by the Ministry of Science and ICT(MSIT, Korea)&Gwangju Metropolitan City.

  • ๋ฐ์ดํ„ฐ ์›์ฒœ์„ ์ œ๊ณตํ•ด ์ค€ NIA์™€ AI-Hub์— ๊ฐ์‚ฌ์˜ ์ธ์‚ฌ๋ฅผ ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

  • ํ•œ๊ตญ์˜ LLM ์ƒํƒœ๊ณ„ ๋ฐœ์ „์— ํž˜์จ์ฃผ์‹ , ํ•œ๊ตญ ์˜คํ”ˆ์†Œ์Šค ๊ฐœ๋ฐœ์ž๋“ค๊ณผ ์—ฐ๊ตฌ์› ๋ถ„๋“ค์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ko-fine-tuning_datagen's People

Contributors

dopeornope-lee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.