Coder Social home page Coder Social logo

Comments (17)

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024 1

I run the following code and faced no error:

from langchain_huggingface import HuggingFacePipeline
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer

llm = HuggingFacePipeline.from_model_id(model_id='baichuan-inc/Baichuan2-7B-Chat',task="text-generation")

loader = TextLoader('doc.txt')
documents = loader.load() # + docx_documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
graph = Neo4jGraph(url = 'bolt://localhost:7687',database='neo4j',username='neo4j',password='')
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

The doc txt was as follow

Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.

Can you double check your input file or possible share a sample because I feel like that might be causing a problem.

this is one part of my doc samples:

卷一 建  置-->  第一章 沿  革-->    第一节 隶 属-->      三、中华人民共和国成立后的隶属

三、中华人民共和国成立后的隶属
1949年9月9日,晋江县人民政府成立,归属福建省人民政府第五行政督察专员公署(治设泉州城区)。1950年4月,属泉州行政督察专员公署。1950年9月,属晋江区行政督察专员公署。1955年4月,属晋江专区专员公署。1967年6月,由晋江专区军事管制委员会管辖。1968年9月,属晋江专区革命委员会。1971年7月,属晋江地区革命委员会。1980年1月,属晋江地区行政公署。1986年1月至1988年12月,属泉州市。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变

第二节 境域析变
晋江县位于福建省东南沿海,晋江下游。东经118°24′~118°46′,北纬24°30′~24°54′。东濒**海峡,西接南安县,南与金门隔海相望,北邻鲤城区。南北长42公里,东西宽37公里。总面积809.24平方公里。建县时境域包括今惠安县、鲤城区。宋代兼辖澎湖岛。后几经析变形成今晋江县域。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变-->      一、开元建县

一、开元建县
据清道光《晋江县志》记载,晋江之名“以晋南渡时,衣冠避此者多沿江而居,故名”。西晋末年,北方士族为避兵燹,纷纷南迁,部分在今晋江两岸定居,劳动生息。晋江之名,即始于此。建县时遂以江名命县。唐初,晋江一带属南安县地。景云二年(711),改武荣州为泉州(即今泉州),属闽州都督府。州治无县,刺史冯仁知以此为由,呈请置县。于是在唐开元六年(718),析南安县东南部设置新县,即为晋江县之始建。县治在今鲤城区内,州县同城而治。晋江县唐建县境域示意图晋江县当代境域析变示意图

and I run this code will meet the same error:

doc = Document(page_content="Elon Musk is suing OpenAI")
graph_documents = llm_transformer.convert_to_graph_documents([doc])

maybe the version is incorrect?

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

this code will meet the same error:
doc = Document(page_content="Elon Musk is suing OpenAI")
graph_documents = llm_transformer.convert_to_graph_documents([doc])

from langchain.

keenborder786 avatar keenborder786 commented on August 24, 2024

what is the llm you are using?

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

what is the llm you are using?

baichuan2-7b-chat. Thank u for your reply

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

what is the llm you are using?

my code about using llm as follows:

model_path='../../Models/Baichuan2-7B-Chat'
model = HuggingFacePipeline.from_model_id(model_id=model_path,
                                              task="text-generation",
                                              model_kwargs={
                                                  "torch_dtype": load_type,
                                                  "low_cpu_mem_usage": True,
                                                  "temperature": 0.2,
                                                  "max_length": 1000,
                                                  "device_map": "auto",
                                                  "repetition_penalty": 1.1,
                                                  "trust_remote_code": True,
                                                  "quantization_config": quantization_config, }
                                              )

from langchain.

keenborder786 avatar keenborder786 commented on August 24, 2024

okay I am checking it. Will get back to you.

from langchain.

keenborder786 avatar keenborder786 commented on August 24, 2024

I run the following code and faced no error:

from langchain_huggingface import HuggingFacePipeline
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer

llm = HuggingFacePipeline.from_model_id(model_id='baichuan-inc/Baichuan2-7B-Chat',task="text-generation")

loader = TextLoader('doc.txt')
documents = loader.load() # + docx_documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
graph = Neo4jGraph(url = 'bolt://localhost:7687',database='neo4j',username='neo4j',password='')
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

The doc txt was as follow

Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.

Can you double check your input file or possible share a sample because I feel like that might be causing a problem.

from langchain.

drahmad89 avatar drahmad89 commented on August 24, 2024

i have the same issue

from langchain.

li-hhhh avatar li-hhhh commented on August 24, 2024

我也遇到了相同的问题,请问最后是如何解决的呢?谢谢

from langchain.

dumanting avatar dumanting commented on August 24, 2024

from langchain.

SatSadhu avatar SatSadhu commented on August 24, 2024

Could someone solve the error?

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

I have not solved this error

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

I run the following code and faced no error:

from langchain_huggingface import HuggingFacePipeline
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer

llm = HuggingFacePipeline.from_model_id(model_id='baichuan-inc/Baichuan2-7B-Chat',task="text-generation")

loader = TextLoader('doc.txt')
documents = loader.load() # + docx_documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
graph = Neo4jGraph(url = 'bolt://localhost:7687',database='neo4j',username='neo4j',password='')
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

The doc txt was as follow

Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.

Can you double check your input file or possible share a sample because I feel like that might be causing a problem.

this is one part of my doc samples:

卷一 建  置-->  第一章 沿  革-->    第一节 隶 属-->      三、中华人民共和国成立后的隶属

三、中华人民共和国成立后的隶属
1949年9月9日,晋江县人民政府成立,归属福建省人民政府第五行政督察专员公署(治设泉州城区)。1950年4月,属泉州行政督察专员公署。1950年9月,属晋江区行政督察专员公署。1955年4月,属晋江专区专员公署。1967年6月,由晋江专区军事管制委员会管辖。1968年9月,属晋江专区革命委员会。1971年7月,属晋江地区革命委员会。1980年1月,属晋江地区行政公署。1986年1月至1988年12月,属泉州市。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变

第二节 境域析变
晋江县位于福建省东南沿海,晋江下游。东经118°24′~118°46′,北纬24°30′~24°54′。东濒**海峡,西接南安县,南与金门隔海相望,北邻鲤城区。南北长42公里,东西宽37公里。总面积809.24平方公里。建县时境域包括今惠安县、鲤城区。宋代兼辖澎湖岛。后几经析变形成今晋江县域。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变-->      一、开元建县

一、开元建县
据清道光《晋江县志》记载,晋江之名“以晋南渡时,衣冠避此者多沿江而居,故名”。西晋末年,北方士族为避兵燹,纷纷南迁,部分在今晋江两岸定居,劳动生息。晋江之名,即始于此。建县时遂以江名命县。唐初,晋江一带属南安县地。景云二年(711),改武荣州为泉州(即今泉州),属闽州都督府。州治无县,刺史冯仁知以此为由,呈请置县。于是在唐开元六年(718),析南安县东南部设置新县,即为晋江县之始建。县治在今鲤城区内,州县同城而治。晋江县唐建县境域示意图晋江县当代境域析变示意图

and I run this code will meet the same error:

doc = Document(page_content="Elon Musk is suing OpenAI")
graph_documents = llm_transformer.convert_to_graph_documents([doc])

maybe the version is incorrect?

@keenborder786 could my sample cause the problem? it confuses me a long time.

from langchain.

VJJJJJJ1 avatar VJJJJJJ1 commented on August 24, 2024

I run the following code and faced no error:

from langchain_huggingface import HuggingFacePipeline
from langchain_text_splitters.character import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers.llm import LLMGraphTransformer

llm = HuggingFacePipeline.from_model_id(model_id='baichuan-inc/Baichuan2-7B-Chat',task="text-generation")

loader = TextLoader('doc.txt')
documents = loader.load() # + docx_documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
graph = Neo4jGraph(url = 'bolt://localhost:7687',database='neo4j',username='neo4j',password='')
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

The doc txt was as follow

Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.

Can you double check your input file or possible share a sample because I feel like that might be causing a problem.

this is one part of my doc samples:

卷一 建  置-->  第一章 沿  革-->    第一节 隶 属-->      三、中华人民共和国成立后的隶属

三、中华人民共和国成立后的隶属
1949年9月9日,晋江县人民政府成立,归属福建省人民政府第五行政督察专员公署(治设泉州城区)。1950年4月,属泉州行政督察专员公署。1950年9月,属晋江区行政督察专员公署。1955年4月,属晋江专区专员公署。1967年6月,由晋江专区军事管制委员会管辖。1968年9月,属晋江专区革命委员会。1971年7月,属晋江地区革命委员会。1980年1月,属晋江地区行政公署。1986年1月至1988年12月,属泉州市。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变

第二节 境域析变
晋江县位于福建省东南沿海,晋江下游。东经118°24′~118°46′,北纬24°30′~24°54′。东濒**海峡,西接南安县,南与金门隔海相望,北邻鲤城区。南北长42公里,东西宽37公里。总面积809.24平方公里。建县时境域包括今惠安县、鲤城区。宋代兼辖澎湖岛。后几经析变形成今晋江县域。




卷一 建  置-->  第一章 沿  革-->    第二节 境域析变-->      一、开元建县

一、开元建县
据清道光《晋江县志》记载,晋江之名“以晋南渡时,衣冠避此者多沿江而居,故名”。西晋末年,北方士族为避兵燹,纷纷南迁,部分在今晋江两岸定居,劳动生息。晋江之名,即始于此。建县时遂以江名命县。唐初,晋江一带属南安县地。景云二年(711),改武荣州为泉州(即今泉州),属闽州都督府。州治无县,刺史冯仁知以此为由,呈请置县。于是在唐开元六年(718),析南安县东南部设置新县,即为晋江县之始建。县治在今鲤城区内,州县同城而治。晋江县唐建县境域示意图晋江县当代境域析变示意图

and I run this code will meet the same error:

doc = Document(page_content="Elon Musk is suing OpenAI")
graph_documents = llm_transformer.convert_to_graph_documents([doc])

maybe the version is incorrect?

@keenborder786 could my sample cause the problem? it confuses me a long time.

from langchain.

SatSadhu avatar SatSadhu commented on August 24, 2024

I was able to put together a solution to the problem...
What I did was edit the 'llm.py', in your case (@VJJJJJJ1) the path is -> /root/miniconda3/envs/rag/lib/python3.10/site-packages/langchain_experimental/graph_transformers/llm.py

What I did was go to line 714 and rename the function called 'process_response' as 'process_response_old' and then create another function called 'process_response' which is the following:

def process_response(self, document: Document) -> GraphDocument: 
   """
   Processes a single document, transforming it into a graph document using
   an LLM based on the model's schema and constraints.
   """
   text = document.page_content
   raw_schema = self.chain.invoke({"input": text})
   if self._function_call:
       raw_schema = cast(Dict[Any, Any], raw_schema)
       nodes, relationships = _convert_to_graph_document(raw_schema)
   else:
       nodes_set = set()
       relationships = []
       if not isinstance(raw_schema, str):
           raw_schema = raw_schema.content
       parsed_json = self.json_repair.loads(raw_schema)
       print(parsed_json)
       
       properties = parsed_json.get('properties', {})

       # Handle different types of values: strings and lists
       def get_first_element(value):
           return value[0] if isinstance(value, list) else value

       head = get_first_element(properties.get('head', 'Unknown'))
       head_type = get_first_element(properties.get('head_type', 'Unknown'))
       tail = get_first_element(properties.get('tail', 'Unknown'))
       tail_type = get_first_element(properties.get('tail_type', 'Unknown'))
       relation = get_first_element(properties.get('relation', 'Unknown'))

       # Nodes need to be deduplicated using a set
       nodes_set.add((head, head_type))
       nodes_set.add((tail, tail_type))

       source_node = Node(id=head, type=head_type)
       target_node = Node(id=tail, type=tail_type)
       relationships.append(
           Relationship(
               source=source_node, target=target_node, type=relation
           )
       )
       
       # Create nodes list
       nodes = [Node(id=el[0], type=el[1]) for el in list(nodes_set)]

   # Strict mode filtering
   if self.strict_mode and (self.allowed_nodes or self.allowed_relationships):
       if self.allowed_nodes:
           lower_allowed_nodes = [el.lower() for el in self.allowed_nodes]
           nodes = [
               node for node in nodes if node.type.lower() in lower_allowed_nodes
           ]
           relationships = [
               rel
               for rel in relationships
               if rel.source.type.lower() in lower_allowed_nodes
               and rel.target.type.lower() in lower_allowed_nodes
           ]
       if self.allowed_relationships:
           relationships = [
               rel
               for rel in relationships
               if rel.type.lower()
               in [el.lower() for el in self.allowed_relationships]
           ]

   return GraphDocument(nodes=nodes, relationships=relationships, source=document)

I hope it has been helpfull! It worked for me...

from langchain.

ingmars1709 avatar ingmars1709 commented on August 24, 2024

I had the same error when using llama2:

nodes_set.add((rel["head"], rel["head_type"]))
TypeError: list indices must be integers or slices, not str

When I changed the model string variable to 'llama3' I got correct results using the old implementation of process_response using the following set-up:

loader = TextLoader('doc.txt')
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

llm=ChatOllama(model="llama3") # make sure you run llama3 model

llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(texts)

print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

from langchain.

Otnielush avatar Otnielush commented on August 24, 2024

It happens when extracting information from LLM output with input prompt (for me it was by default).
can be fixed by:

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=2000, return_full_text=True)
llm = HuggingFacePipeline(pipeline=pipe)

from langchain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.