Coder Social home page Coder Social logo

interactive_e2e_speech_recognition's Introduction

Introduction

This repo is my notebook to learn about end-to-end speech recognition. There are many projects that work on end-to-end speech recognition. But they are either hard to install or hard to reproduce, i.e. not simple enough for a beginner. This project is intended for beginners who know basics about speech recognition and desire a handy 101 tutorial on end-to-end SR.

If you have any question, welcome to file an issue!

List (to be expanded)

  • CTC: an interactive CTC trainer using YesNo data.

Credits

This project is inspired by:

interactive_e2e_speech_recognition's People

Contributors

naxingyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interactive_e2e_speech_recognition's Issues

Loss descends unusually on reproduction

Thanks for your code!
But I the Loss descends unusually, so the recognition is also at low accuracy. In your result, the loss is descend rapidly.
I haven't modify the code yet.

My environment is:

  • PyTorch 1.4
  • torchaudio 0.4

The Loss log.

epoch 1: avg_loss 6.371933863713191
epoch 2: avg_loss 4.70939903992873
epoch 3: avg_loss 4.671203154783982
epoch 4: avg_loss 3.8700154561262865
epoch 5: avg_loss 3.468390097984901
epoch 6: avg_loss 3.273801638529851
epoch 7: avg_loss 3.261995315551758
epoch 8: avg_loss 3.233139918400691
epoch 9: avg_loss 3.1941104118640604
epoch 10: avg_loss 3.1285105301783633
epoch 11: avg_loss 3.14404559135437
epoch 12: avg_loss 3.095609554877648
epoch 13: avg_loss 3.068654280442458
epoch 14: avg_loss 3.057704448699951
epoch 15: avg_loss 3.0675538136408877
epoch 16: avg_loss 2.992294329863328
epoch 17: avg_loss 2.9800828236799974
epoch 18: avg_loss 2.9441260741307187
epoch 19: avg_loss 2.894284596809974
epoch 20: avg_loss 2.936523822637705
epoch 21: avg_loss 2.946532964706421
epoch 22: avg_loss 2.9159597066732554
epoch 23: avg_loss 2.8778565480158877
epoch 24: avg_loss 2.901293919636653
epoch 25: avg_loss 2.832904577255249
epoch 26: avg_loss 2.8665461356823263
epoch 27: avg_loss 2.870034859730647
epoch 28: avg_loss 2.8736569698040304
epoch 29: avg_loss 2.8818020820617676
epoch 30: avg_loss 2.940280767587515
epoch 31: avg_loss 2.8641006396367
epoch 32: avg_loss 2.755228482759916
epoch 33: avg_loss 2.7636474646054783
epoch 34: avg_loss 2.7970426632807803
epoch 35: avg_loss 2.8201447266798754
epoch 36: avg_loss 2.827310708852915
epoch 37: avg_loss 2.851408389898447
epoch 38: avg_loss 2.7742223739624023
epoch 39: avg_loss 2.797891195003803
epoch 40: avg_loss 2.7421194406656118
epoch 41: avg_loss 2.7057653023646426
epoch 42: avg_loss 2.82857293349046
epoch 43: avg_loss 2.75337945497953
epoch 44: avg_loss 2.8449047345381517
epoch 45: avg_loss 2.8148957582620473
epoch 46: avg_loss 2.7799972112362203
epoch 47: avg_loss 2.779747247695923
epoch 48: avg_loss 2.7723453778486986
epoch 49: avg_loss 2.7705439971043515
epoch 50: avg_loss 2.7763983469742994
epoch 51: avg_loss 2.716023793587318
epoch 52: avg_loss 2.7847886635706973
epoch 53: avg_loss 2.7497686789585996
epoch 54: avg_loss 2.7348319567166843
epoch 55: avg_loss 2.804599248445951
epoch 56: avg_loss 2.80349848820613
epoch 57: avg_loss 2.757105882351215
epoch 58: avg_loss 2.805212516051072
epoch 59: avg_loss 2.72385520201463
epoch 60: avg_loss 2.7668622640463023
epoch 61: avg_loss 2.702036655866183
epoch 62: avg_loss 2.8305543019221377
epoch 63: avg_loss 2.778572999514066
epoch 64: avg_loss 2.8520910923297587
epoch 65: avg_loss 2.7859388498159556
epoch 66: avg_loss 2.728653302559486
epoch 67: avg_loss 2.8242092774464536
epoch 68: avg_loss 2.773068721477802
epoch 69: avg_loss 2.7353643820835996
epoch 70: avg_loss 2.821800415332501
epoch 71: avg_loss 2.808179286810068
epoch 72: avg_loss 2.75307664504418
epoch 73: avg_loss 2.8359962976895847
epoch 74: avg_loss 2.736109733581543
epoch 75: avg_loss 2.8306180330423207
epoch 76: avg_loss 2.742473767353938
epoch 77: avg_loss 2.7637857107015757
epoch 78: avg_loss 2.7592280828035793
epoch 79: avg_loss 2.7490466007819543
epoch 80: avg_loss 2.828728043116056
epoch 81: avg_loss 2.917681437272292
epoch 82: avg_loss 2.8417651928388157
epoch 83: avg_loss 2.7903012128976674
epoch 84: avg_loss 2.6843167268312893
epoch 85: avg_loss 2.790263982919546
epoch 86: avg_loss 2.7775218303386984
epoch 87: avg_loss 2.7608486322256236
epoch 88: avg_loss 2.7123066095205455
epoch 89: avg_loss 2.7112955496861386
epoch 90: avg_loss 2.725948663858267
epoch 91: avg_loss 2.724459904890794
epoch 92: avg_loss 2.6767702469458947
epoch 93: avg_loss 2.7425847236926737
epoch 94: avg_loss 2.760955755527203
epoch 95: avg_loss 2.7504168106959415
epoch 96: avg_loss 2.6886834914867697
epoch 97: avg_loss 2.8154220581054688
epoch 98: avg_loss 2.736657527776865
epoch 99: avg_loss 2.725706320542556

Have you come across this phenomenon?
Any help will be appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.