The speech_yolo from mlspeech

inference with 1 wav without target

parser = argparse.ArgumentParser(
    description='Inference for Speech Commands Recognition')
parser.add_argument('--model', default='gcommand_toy_example/models/model.pth',
                    help='path to the model')

parser.add_argument('--test_path', default='path/to/test',
                    help='path to the test')
# parser.add_argument('--batch_size', type=int, default=16,
#                     metavar='N', help='training and valid batch size')
# parser.add_argument('--test_batch_size', type=int, default=16,
#                     metavar='N', help='batch size for testing')

# feature extraction options
parser.add_argument('--max_len', type=int, default=101,
                    help='window size for the stft')
parser.add_argument('--window_size', default=.02,
                    help='window size for the stft')
parser.add_argument('--window_stride', default=.01,
                    help='window stride for the stft')
parser.add_argument('--window_type', default='hamming',
                    help='window type for the stft')
parser.add_argument('--normalize', default=True,
                    help='boolean, wheather or not to normalize the spect')
parser.add_argument('--save_folder', type=str,  default='gcommand_pretraining_model/',
                    help='path to save the final model')
parser.add_argument('--class_num', type=int,  default=38,
                    help='number of classes to classify')
parser.add_argument('--cuda', default=True, help='enable CUDA')
parser.add_argument('--batch_size', type=int, default=16,
                    metavar='N', help='training and valid batch size')

args = parser.parse_args()
print(args)

checkpoint = torch.load(args.model, 
    map_location=torch.device('cuda:1'))   

model = VGG("VGG11", 38)
model.load_state_dict(checkpoint['net'])

# loading data
#root/1234.wav
#root/qwer.wav

test_dataset = InferLoader(args.test_path, window_size=args.window_size, window_stride=args.window_stride,
                                     window_type=args.window_type, normalize=args.normalize, max_len=args.max_len)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=args.batch_size, #shuffle=None,
                                           num_workers=20, pin_memory=args.cuda, 
                                           sampler=None)

if args.cuda:
    print('Using CUDA with {0} GPUs'.format(torch.cuda.device_count()))
    model = torch.nn.DataParallel(model).cuda()
model.eval()

#need: 
#1234.wav - like
#qwer.wav - two
pred = []
with torch.no_grad():
    for data in test_loader:
        if args.cuda:
            data = data.cuda()
        output=model(data)
        # pred = output.data.max(1, keepdim=True)[1] 
        print(f'Output: \n{output}')
        break

Can you please tell me how to get a specific answer, to which class does 1 record that has no purpose belong?

Thanks!

the learning rate has not changed

error running test code

I have tried to run the test code using model uploaded by you, but I keep getting getting the following error

Traceback (most recent call last):
File "F:\Download\speech_yolo-master\test_yolo.py", line 53, in
model, acc, epoch = load_model(model)
File "F:\Download\speech_yolo-master\model_speech_yolo.py", line 112, in load_model
arc_type = checkpoint['arc']
KeyError: 'arc'

mlspeech / speech_yolo Goto Github PK

speech_yolo's People

Contributors

Stargazers

Watchers

Forkers

speech_yolo's Issues

SpeechYolo pertained model link is not available.

inference with 1 wav without target

Dear authors

the learning rate has not changed

error running test code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent