Coder Social home page Coder Social logo

显存不足 about flat-lattice-transformer HOT 9 OPEN

leesureman avatar leesureman commented on August 22, 2024
显存不足

from flat-lattice-transformer.

Comments (9)

LeeSureman avatar LeeSureman commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

from flat-lattice-transformer.

nlp4whp avatar nlp4whp commented on August 22, 2024

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

from flat-lattice-transformer.

zelin-x avatar zelin-x commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

请问您如何在源码中设置batch_size,为什么无论我怎样缩减batch_size,都是爆显存

from flat-lattice-transformer.

LeeSureman avatar LeeSureman commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

请问您如何在源码中设置batch_size,为什么无论我怎样缩减batch_size,都是爆显存

请问你的数据中的原句最大长度是多少?

from flat-lattice-transformer.

zelin-x avatar zelin-x commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

请问您如何在源码中设置batch_size,为什么无论我怎样缩减batch_size,都是爆显存

请问你的数据中的原句最大长度是多少?

700左右

from flat-lattice-transformer.

zelin-x avatar zelin-x commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

请问您如何在源码中设置batch_size,为什么无论我怎样缩减batch_size,都是爆显存

请问你的数据中的原句最大长度是多少?

parser.add_argument('--train_clip', default=True, help='是不是要把train的char长度限制在200以内') 但是已经限制200参数已经改为True,仍然显存不足

from flat-lattice-transformer.

LeeSureman avatar LeeSureman commented on August 22, 2024

我1080ti,长度200,batch可以10

from flat-lattice-transformer.

nlp4whp avatar nlp4whp commented on August 22, 2024

吃的显存应该和训练数据中的最长句子的长度相关,对10g显存来说,一般支持长度200的句子

这么吃显存吗? 8万条训练数据16g就跑不动了

是的,max_len=100, batch_size=32; 大约占9G, 6层6 * 100 dim的Transformer和BERT一样大:)

请问您如何在源码中设置batch_size,为什么无论我怎样缩减batch_size,都是爆显存

这里有:

parser.add_argument('--batch', default=10, type=int)

from flat-lattice-transformer.

WMT123 avatar WMT123 commented on August 22, 2024

请问作者,这里的200指的是words+character的总长度吗?

from flat-lattice-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.