Coder Social home page Coder Social logo

Comments (4)

colbec avatar colbec commented on May 25, 2024

Make sure you have the right site for HTK (http://htk.eng.cam.ac.uk/) - there is a note about the recently released beta 3.5 version which happened Dec 2015, so your source seems to be out of date.

I run openSUSE Leap 42.1 64 bit and have no problem compiling HTK. You might have to specify what error you see.

There are several tools for building models in ARPA and converting between formats. They tend to have minor differences in output, it depends what you are looking for. Try a google search for "language model generator."

Voxforge is occasionally slow, but right now it is ok for me. I selected a 4 MB file from the downloads section and it completed in 16 seconds on my slow connection.

from julius.

palles77 avatar palles77 commented on May 25, 2024

I agree with colbec. HTK is old in some places, however there is a beta version available. I have been using HTK for years now for both language and acoustic modelling. Best way is to follow HTK tutorials provided in Voxforge for acoustic modeling and HTK tutorials for language modeling.

You need to be aware that creating a decent acoustic model is a non trivial process and you need to consider how much effort you are prepared to put into it. I myself have a few English UK models from my own experiments in the past, but their quality is not the best (around 25% WER).

from julius.

pdtwonotes avatar pdtwonotes commented on May 25, 2024

Since colbec reported that VoxForge downloads worked, on a hunch I created a VPN tunnel out of my local area and tried again. I was able to download the English model in just a few seconds. So something is wrong with my local ISP.

At first glance the VoxForge model appeared to work, and a quite large pronounciation dictionary was included. Unfortunately, the hmmdef file is missing many of the triphones that the dictionary uses.

from julius.

colbec avatar colbec commented on May 25, 2024

One of the downsides of a phone based model is that triphone possibilities are of the order of N^3; in English this might mean 40^3 or 64000 triphone candidates. It is really hard to exercise them all, even the most commonly used ones unless you are working with a very large audio database. This is made harder by trying to get a wide variety of voices. Sometimes you can bend your requirements for additional words by substituting phones that the model is aware of.

from julius.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.