Comments (6)
Please copy-paste your code and config (formatted with ``` ```) into this thread.
from spacy-llm.
examples.yml:
-
text: 前白蛋白(PA) 302.65 mg/L 180-400
entities:
实验室检查的指标:
- 前白蛋白(PA)
实验室检查的单位:
- mg/L
实验室检查的结果数值:
- 302.65
实验室检查的范围值:
- 180-400 -
text: 谷氨酰转肽酶(GGT) 17 IU/L 10-60
entities:
实验室检查的指标:
- 谷氨酰转肽酶(GGT)
实验室检查的单位:
- IU/L
实验室检查的结果数值:
- 17
实验室检查的范围值:
- 10-60fewshot.cfg:
[paths]
examples = null
[nlp]
lang = "zh"
pipeline = ["llm_ner"]
[components]
[components.llm_ner]
factory = "llm"
[components.llm_ner.task]
@llm_tasks = "spacy.NER.v2"
labels = 实验室检查的指标,实验室检查的单位,实验室检查的结果数值,实验室检查的范围值
[components.llm_ner.task.examples]
@misc = "spacy.FewShotReader.v1"
path = ${paths.examples}
[components.llm_ner.model]
@llm_models = "spacy.GPT-3-5.v2"
name = "gpt-3.5-turbo"
config = {"temperature": 0.0}
zeroshot.cfg:
[nlp]
lang = "zh"
pipeline = ["llm_ner"]
[components]
[components.llm_ner]
factory = "llm"
[components.llm_ner.task]
@llm_tasks = "spacy.NER.v2"
labels = 实验室检查的指标,实验室检查的单位,实验室检查的结果数值,实验室检查的范围值
[components.llm_ner.model]
@llm_models = "spacy.GPT-3-5.v2"
name = "gpt-3.5-turbo"
config = {"temperature": 0.0}
run_pipeline.py
import os
from pathlib import Path
from typing import Optional
import typer
from wasabi import msg
from spacy_llm.util import assemble
Arg = typer.Argument
Opt = typer.Option
def run_pipeline(
# fmt: off
text: str = Arg("", help="Text to perform text categorization on."),
config_path: Path = Arg(..., help="Path to the configuration file to use."),
examples_path: Optional[Path] = Arg(None, help="Path to the examples file to use (few-shot only)."),
verbose: bool = Opt(False, "--verbose", "-v", help="Show extra information."),
# fmt: on
):
if not os.getenv("OPENAI_API_KEY", None):
msg.fail(
"OPENAI_API_KEY env variable was not found. "
"Set it by running 'export OPENAI_API_KEY=...' and try again.",
exits=1,
)
msg.text(f"Loading config from {config_path}", show=verbose)
nlp = assemble(
config_path,
overrides={}
if examples_path is None
else {"paths.examples": str(examples_path)},
)
doc = nlp(text)
msg.text(f"Entities: {[(ent.text, ent.label_,ent.start,ent.end) for ent in doc.ents]}")
if name == "main":
typer.run(run_pipeline)
!python run_pipeline.py
"**医学科学院 阜外医院 检验报告单 姓名: 贾全喜 性别:男 年龄: 55岁 门诊:0066000117992 样品号: 科别: 门诊 床号: 诊断: 标本种类:血清 送检项目: 0265 生化全套 项 目 结果 单位 参考值 项 目 结果 单位 1 前白蛋白(PA) 302.65 mg/L 180-400 参考值 2 *总蛋白(TP) 69.9 19*尿酸(URIC) 542.06 umol/L 1 148.8-416.5 g/L 65-85 20 *肌酸激酶(CK) IU/L 0-200 16 3 *白蛋白(溴甲酚绿法)(ALB) 41.6 g/L 40-55 21 肌酸激酶同工酶(CKMB-Mass) 2.06 ng/nL 0-5 4 *丙氨酸氨基转移酶(ALT) 22 IU/L 9-50 22*乳酸脱氢酶(LDH) 149 IU/L 0-250 5 *天门冬氨酸氨基转移酶(AST) 24 IU/L 15-40 23 淀粉酶(AMY) 100 U/L 0-220 6 *碱性磷酸酶(ALP) 85 45-125 24 脂蛋白(a)(Lp(a)) 827.42 ng/L ↑ 10-300 1/0I 7 *谷氨酰转肽酶(GGT) 17 IU/L 10-60 25 超敏C反应蛋白(HSCRP) 1.28 mg/L 0.00-3.00 8 *总胆红素(TBi1) 16.94 umo1/L 5.1-19 26 同型半胱氨酸(HCY) 8.31 umol/L 6-15 9 直接胆红素(DBil) 4.34 μmol/L 0-6.8 27 游离脂肪酸(FFA) 0.65 mmol/L t 0.1-0.6 10*钾(K) 4.41 mmol/L 3.5-5.3 28*甘油三酯(TG) 0.94 mmol/L 0.38-1.76 11*钠(NA) 141.69 mmol/1 137-147 29*总胆固醇(CHOL) 3.38 mmol/L 13.64-5.98 12*氯(CL) 101.89mmol/L 99-110 30*高密度脂蛋白胆固醇(HDL-C) 1.10 mmol/L 0.7-1.59 13二氧化碳(C02) 32.65 mmol/L ↑21.0-31.0 31*低密度脂蛋白胆固醇(LDL-C) 1.86 mmol/L 一般人群<3.37 14*葡萄糖(GLU) 5.01 mmol/L 3.58-6.05 高危人群<2.59 15*磷(P) 0.95 mmol/L ↓0.97-1.50 极高危人群<2.00 16*钙(CA) 2.40 mmol/L 2.2-2.75 32 小密低密度脂蛋白(sdLDL) 0.55 mmol/L 0.23-1.39 17*肌酐(苦味酸法)(CREA) 89.54 umol/L 44-133 33 载脂蛋白A1(apoA1) 1.05 g/L 11.1-1.8 18*尿素氮(BUN) 5.45 mmol/L 2.86-7.90 34 载脂蛋白B(apoB) 0.67 g/L 0.5-1.2 极高危人群:急性冠脉综合征(ACS)或冠心病/缺血性脑卒中/周围动脉硬化合并糖尿病。 申请日期:2021.08.18 采样时间:2021.08.19 09:08 接收时间:2021.08.19 10:08 报告时间:2021.08.19 11:43 申请医师:李子煦 检验者: 邢跃雷 审核者: 苏保满 备 注: 此报告仅对送检样本负责。 *标记项目为北京市三级医院互认项目 实验诊断中心生化 电话: 88398271"
./zeroshot.cfg
from spacy-llm.
The complete code is included in the attachment
from spacy-llm.
Hello, I have discovered a problem. Just treat me as a zero shot. Once fewshot reports an error. Why did providing fewshot knowledge report an error.
from spacy-llm.
It's difficult to diagnose why fewshotting yields worse results here. I recommend debugging with one example at a time and looking into the raw output received from the model (see here on how to do that).
The fact that you get no entities at all if you include fewshot examples indicate that the LLM might have issues understanding those examples, or that the output produced by the LLM if those examples are included is incoherent and cannot be parsed. Either way the best way forward is to have a closer look at how both the prompt and the response look like if you add one example at a time.
from spacy-llm.
Thank you very much for your answers. It would be even better if we could add a Chinese model later on
from spacy-llm.
Related Issues (20)
- Bug: Custom Endpoint Validation Breaks Feature Support HOT 1
- Working dummy example for custom LLM endpoint integration HOT 1
- spacy-llm custom task configuration HOT 5
- How to surpass BERT through large models HOT 1
- Many returns are not what I want HOT 1
- out put with llm in a string?
- Entity extraction and mapping
- Connection time out with OpenAI API HOT 3
- Can't use Gpt4 models
- ValueError: max() arg is an empty sequence
- Potential REL sharding issue HOT 3
- How to write the correct config file for GPT4All? HOT 1
- How to load a model that's not supported by spacy RestAPI nor HF? HOT 1
- I have downloaded the llama2 model to local path, but program always try downloaded the llama2 from huggingface
- A tutorial to integrate other open-source LLMs on HF HOT 1
- GPT4ALL basically all the small models reply mess up the parser HOT 1
- GPT4 context_length is not working
- Few-Shot Relationship Extraction Example Error
- `transformers` > 4.38 causes bug in inference for HF models
- Inconsistent entity parsing using GPT-3.5 Turbo 16k
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spacy-llm.