Coder Social home page Coder Social logo

sentsplit's Introduction

sentsplit

本项目实现了中文分句功能。split_sentence函数定义在 sentsplit.py 文件中。

def split_sentence(
    text: str,
    min_length: int = 32,
    max_length: int = 256,
    return_loc: bool = False
)
  • text: 要进行分句的中文文本。

  • min_length: 当段落内句子长度小于min_length时,执行段落内的短句合并操作。不同段落的句子不做合并。

  • max_length: 当段落内句子长度大于max_length时,结合标点符号','','进行长句切分操作。

  • return_loc: 是否返回句子在原文本串中的字符位置。

示例:

from sentsplit import split_sentence

text = '春节假期结束,许多人踏上了归途,奔向自己的工作岗位,告别时刻,总是满满的不舍和牵挂,即将返程,后备箱里必然塞的满满的,有各种家乡的特产,有妈妈亲手制作的各种吃食,也就又开启了“后备箱大赛”,前两天我看到一段视频,在浙江嘉兴,一位女子返程时,后备箱被塞的满满的,还有四只妈妈养的鸭子,由于车内空间有限,放不下,只好将鸭子挂在车尾,以免过多占用后备箱空间。\n\n返程时,每个人的后备箱里都塞满了家乡的味道和父母的牵挂,父母把最好的东西给儿女带上,这是家乡的味道,更是一种情感的寄托,是一份沉垫垫的来自父母的爱。使我们在离家的路上能感受到父母的爱和牵挂,挂在车尾的鸭子,显示出浓浓的母爱和期盼,写满了满满的爱和牵挂。我们一定要铭记这份爱意,把家人时刻放在心里。今年返程时,你的后备箱里都装了些啥呢?'
sents, locs = split_sentence(text, min_length=32, max_length=256, return_loc=True)
for p, sent in zip(locs, sents):
    assert(text[p] == sent[0])
    print('{}\t{}'.format(p, sent))
0	春节假期结束,许多人踏上了归途,奔向自己的工作岗位,告别时刻,总是满满的不舍和牵挂,即将返程,后备箱里必然塞的满满的,有各种家乡的特产,有妈妈亲手制作的各种吃食,也就又开启了“后备箱大赛”,前两天我看到一段视频,在浙江嘉兴,一位女子返程时,后备箱被塞的满满的,还有四只妈妈养的鸭子,由于车内空间有限,放不下,只好将鸭子挂在车尾,以免过多占用后备箱空间。
178	返程时,每个人的后备箱里都塞满了家乡的味道和父母的牵挂,父母把最好的东西给儿女带上,这是家乡的味道,更是一种情感的寄托,是一份沉垫垫的来自父母的爱。
252	使我们在离家的路上能感受到父母的爱和牵挂,挂在车尾的鸭子,显示出浓浓的母爱和期盼,写满了满满的爱和牵挂。
304	我们一定要铭记这份爱意,把家人时刻放在心里。今年返程时,你的后备箱里都装了些啥呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.