Light

f-period / regex_triples_extractor Goto Github PK

View Code? Open in Web Editor NEW

11.0 2.0 6.0 37 KB

这段代码会根据用户定义的正则表达式规则从文本中抽取属性三元组，我用它完成知识图谱构建的一个环节，使用方法写在README中，欢迎交流和指正！

Python 100.00%

regex_triples_extractor's Introduction

regex_triples_extractor 使用说明

这是一个利用正则表达式对文本进行抽取或者标注的项目。

编程环境：Windows10，Python 3.7.9

一、用户定义文件格式说明

template.json

该文件定义属性对应的模板规则，用户可以遵循下列规则修改或添加自己的模板：
1. json数据的内容格式满足{'属性':'属性对应正则表达式'}，不符合格式的数据将导致报错。
2. 同一属性可以对应多个正则表达式。
3. 不可以出现重复的规则(即属性和属性对应正则表达式完全一致)，否则会导致报错。
4. 正则表达式必须是符合规范的，不然会在编译正则表达式时报错。
source_data.json

该文件定义了需要处理的格式：
1. json数据的内容格式需满足{'实体名':'文本'}，不符合格式的数据将导致报错。

二、使用方式

在项目路径下打开命令行，输入：

python main.py EXTRACT

项目中默认的template.json是针对百度百科人物图谱属性定义的一个模板，source_data.json则是从我们数据源中抽取的一小部分数据，用于测试与展示。

欢迎给出关于我们在正则模板定义上的意见！

三、输出文件

extracted_triple.txt

抽取的三元组将会输出至这个txt当中，会以实体;;;;属性;;;;属性值的形式表示三元组，其中每一个三元组占一行，以四个分号“;;;;”作为行内分隔符

需要注意的是，每一次运行的操作都会覆盖上一次的生成结果，请注意及时保存

regex_triples_extractor's People

Contributors

Stargazers

Watchers

Forkers

tantailong ggdnacl yl1445618487 ying-ke skrtty fanmous

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.