Coder Social home page Coder Social logo

Comments (17)

Werneror avatar Werneror commented on July 16, 2024

这是个好注意,但是该怎么实现呢?若仅以句子是否整齐来判断大概是不妥的,如”安能催眉折腰事权贵,使我不得开心颜。“前后半句字数不等但却为诗而不是词。

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

我感觉如果要做的话, 只能将所有的词牌的格式字数拿去匹配了, 不过这个想一想就蛮难的 😂

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

粗略的来看的话是不是可以直接哪词牌/曲牌来区分呢?

from poetry.

Werneror avatar Werneror commented on July 16, 2024

可以试试,哪里有比较全的词牌/曲牌数据呢?

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

这.... 不知道诶. 估计还是要爬一爬

http://www.shicimingju.com/cipai/index.html

https://www.gushiwen.org/shiwen/cipai/

(但是感觉也不好说是全的...

from poetry.

Werneror avatar Werneror commented on July 16, 2024

可以考虑给每首诗词添加一个“词曲牌名”字段,匹配到的就填入相应的词牌或曲牌名,没有匹配到的就填未知。

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

所以我们第一步是不先把这些数据转到数据库里面去 😂

from poetry.

Werneror avatar Werneror commented on July 16, 2024

我在数据库里有存的,这里的数据是导出来的:joy:

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

想到一个问题, 如果用词牌来判定的话, 应该不能简单的用contains来判, 刚刚随便翻了一下宋词部分,发现有这么一下几种格式:

"好事近 其二 待月不至"
"如梦令"
"红娘子/连理枝"

然后要搞的话是不是用这些格式之一来匹配 (不知道还有没有其他格式)

from poetry.

Werneror avatar Werneror commented on July 16, 2024

似乎只通过标题就能判断出来,用startsWith,如:

"好事近 其二 待月不至".startsWith("好事近")

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

我一开始的想法是把这三种模式都去匹配, 当其中一个匹配通过的时候就返回真, , 大概是这样

def check(title, cipai):
  return title == cipai 
     || title.startWith(cipai) && title.index(cipai) == ' '
     || title.startWith(cipai) && title.index(cipai) == '/'

from poetry.

Werneror avatar Werneror commented on July 16, 2024

我原本想先找出标题中最前面的连续汉字字符串,用此字符串和词/曲牌名做比较。代码如下:

>>> import re
>>> matchObj = re.match('[\u4e00-\u9fa5]+', "红娘子/连理枝")
>>> if matchObj:
...   print ("matchObj.group() : ", matchObj.group())
... 
matchObj.group() :  红娘子

但看到了“好事近二首”这样的标题,所以此方法行不通,考虑还是用startsWith做判断。

但进一步检查发现一些词/曲牌名是另一些词/曲牌名的前缀,如:

梁州令 <----> 梁州令叠韵
惜花春 <----> 惜花春起早慢
虞美人 <----> 虞美人令
愁倚阑 <----> 愁倚阑干令

而一个词/曲牌又可能有多个别名,如《鹊桥仙》,词牌名,又名《鹊桥仙令》、《金风玉露相逢曲》、《广寒秋》等,这就让事情变得更复杂了。

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

看了一下cipai_2里面是包括了 梁州令梁州令叠韵, 鹊桥仙鹊桥仙令,广寒秋的, 但是金风玉露相逢曲曲没有看到, 所以其实上面的逻辑问题应该不是很大, 可能只是少量词牌不会被识别罢了,

关于匹配模式, 我上面所说的模式主要是担心比如有些诗的名字恰好与词牌重复, 比如惜花春去啊什么之类的会判为词

from poetry.

Werneror avatar Werneror commented on July 16, 2024

发现原来一个词牌可以对应好多种不同的格式,如《浣溪沙》就有:

  • 双调四十二字,前段三句三平韵,后段三句两平韵
  • 双调四十二字,前后段各三句、两平韵
  • 双调四十四字,前段三句三平韵,后段五句两平韵
  • 双调四十六字,前段五句三平韵,后段五句两平韵
  • 双调四十二字,前后段各三句、三仄韵

五种格式。参考:https://sou-yun.com/QueryCiTune.aspx?id=96

from poetry.

zhaokuohaha avatar zhaokuohaha commented on July 16, 2024

对的, 所以根据字数格式判断其实还是蛮难的

from poetry.

xiu-ze avatar xiu-ze commented on July 16, 2024

我发现在cipai_2文档中出现了叫”九日“的词牌。但是有很多诗的题目就叫做《九日》,这些就不能算作词。而许多词标题又仅由词牌组成,如《浣溪沙》。

from poetry.

Werneror avatar Werneror commented on July 16, 2024

5 年过去了,也没有找到合适的方法。这个 Issue 先关闭了。

from poetry.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.