您好，这个可以更换自己的训练集吗，我想训练一些医学方面的数据 about chinsesner-pytorch HOT 9 OPEN

yanwii commented on June 2, 2024

您好，这个可以更换自己的训练集吗，我想训练一些医学方面的数据

from chinsesner-pytorch.

Comments (9)

Zhang-JiaBin commented on June 2, 2024 1

您好，您计算loss函数是一个batch的loss还是一个句子的loss,我更换数据集后训练的loss好大，我换了一个22万行的数据集，训练了一会发现loss上万了，这个需要自己训练好词向量吗，还是在训练中就会一起训练词向量？

from chinsesner-pytorch.

yanwii commented on June 2, 2024

只要训练数据按照目前的格式替换，就可以训练自己的模型

from chinsesner-pytorch.

Zhang-JiaBin commented on June 2, 2024

而且我训练你的数据loss刚开始就只有10多点，我自己的数据loss好大

from chinsesner-pytorch.

shenhui12 commented on June 2, 2024

你好，请问该代码有用bert产生词向量吗

from chinsesner-pytorch.

CherylZou commented on June 2, 2024

而且我训练你的数据loss刚开始就只有10多点，我自己的数据loss好大

我也遇到了，loss一直在5左右，请问您解决这个问题了吗？

from chinsesner-pytorch.

SuperBruceJia commented on June 2, 2024

您好，您计算loss函数是一个batch的loss还是一个句子的loss,我更换数据集后训练的loss好大，我换了一个22万行的数据集，训练了一会发现loss上万了，这个需要自己训练好词向量吗，还是在训练中就会一起训练词向量？

计算loss的时候可以做一下平均，比如说：

def neg_log_likelihood(self, sentences, tags, length):
    """
    Negative Log-Likelihood (NLL) Loss Function -> - (Real Path Score - Total Score)
    :param sentences:
    :param tags:
    :param length:
    :return:
    """
    self.batch_size = sentences.size(0)
    # Get the output tag_size tensor from the Linear Layer
    logits = self.prediction(sentences)
    real_path_score = torch.zeros(1)
    total_score = torch.zeros(1)
    all_length = torch.zeros(1)
    for logit, tag, leng in zip(logits, tags, length):
        logit = logit[:leng]
        tag = tag[:leng]
        # Calculate the Real Path Score
        real_path_score += self.real_path_score(logit, tag)
        # Calculate the total score
        total_score += self.total_score(logit, tag)
        # Add all the length
        all_length += leng
    # print("total score ", total_score)
    # print("real score ", real_path_score)
    # Output the NLL Loss
    return (total_score - real_path_score) / all_length

from chinsesner-pytorch.

SuperBruceJia commented on June 2, 2024

你好，请问该代码有用bert产生词向量吗

该代码是character-level的字向量，没有用到词向量。字向量是用

    self.word_embeddings = nn.Embedding(num_embeddings=vocab_size, embedding_dim=self.embedding_dim)

进行初始化的+训练的，然后可以在网络学习过程中训练字向量。

from chinsesner-pytorch.

SuperBruceJia commented on June 2, 2024

而且我训练你的数据loss刚开始就只有10多点，我自己的数据loss好大

我也遇到了，loss一直在5左右，请问您解决这个问题了吗？

我现在遇到的问题是loss不稳定下降，到了后期loss起起伏伏，就算调小学习率也不稳定，是不是因为数据原因呢？

from chinsesner-pytorch.

YijianLiu commented on June 2, 2024

直接替换为英文数据集可以训练吗

from chinsesner-pytorch.

您好，这个可以更换自己的训练集吗，我想训练一些医学方面的数据 about chinsesner-pytorch HOT 9 OPEN

Comments (9)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent