Coder Social home page Coder Social logo

bayko / aws-transcribe-multiple-jobs-s3 Goto Github PK

View Code? Open in Web Editor NEW
5.0 0.0 0.0 20 KB

Script to enqueue multiple transcription jobs for AWS Transcribe using a S3 bucket source

Shell 100.00%
aws transcribe s3 s3-bucket transcription aws-cli transcribe-audio-files multiple

aws-transcribe-multiple-jobs-s3's Introduction

AWS Transcribe - Multiple Job Queue

This script will create multiple AWS Transcribe jobs out of video or audio files saved in a S3 Bucket

Note

  • AWS account credentials must be configured on your local terminal (instructions found here: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
  • jq binary must be installed on your machine (installation instructions found here: https://stedolan.github.io/jq/download/)
  • Script automatically ignores any files in the bucket not matching the provided media type
  • All files in target folder that match the provided media type will be submitted to AWS Transcribe
  • If you wish to use a custom vocabulary you must have already created it inside the AWS Transcribe console

Adjust parameters to your environment

# S3 Bucket Name
BUCKET_NAME="example-bucket"

# Folder in bucket containing files
PARENT_FOLDER="video-files"

# Optional sub-folder - leave value as "" if none
SUB_FOLDER=""

# Format for the media files being transcribed
MEDIA_FORMAT="mp4"

# Language for Transcription
LANGUAGE_CODE="en-US"

# Optional name of custom vocabulary - leave value as "" if none
CUSTOM_VOCABULARY=""

# Desired AWS region to create transribe jobs in
AWS_REGION="us-east-1"

After custom parameters are set in the file simply make it executable and run the script at the command line

~>:  chmod +x queue-transcribe.sh
~>:  ./queue-transcribe.sh

After completion of the script you can check your transcription job progress within the AWS Transcribe Console

aws-transcribe-multiple-jobs-s3's People

Contributors

bayko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

aws-transcribe-multiple-jobs-s3's Issues

The code just creates a json job object for each file on the s3 bucket. It never submits the job to aws transcribe for execution

Hi,

The code runs perfectly fine. I created multiple json objects. But it is not being executed on the AWS Transcribe console, and there is no output. It just prints that output object on the console w/o executing the job. Why?

How to fix this?

************************************************************
Detected Object: audio/zuora_ceo_subscription_for_success_mad_money.mp3
Key matches media format of mp3... creating transcribe job
{
    "TranscriptionJob": {
        "TranscriptionJobName": "transcribe-zuora_ceo_subscription_for_success_mad_money",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "en-US",
        "MediaFormat": "mp3",
        "Media": {
            "MediaFileUri": "https://ceos-videos.s3.amazonaws.com/audio/zuora_ceo_subscription_for_success_mad_money.mp3"
        },
        "StartTime": 1615314064.554,
        "CreationTime": 1615314064.532,
        "Settings": {
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 5
        }
    }
}
************************************************************
Detected Object: audio/zuora_ceo_transforming_the_subscription_economy_mad_money_cnbc.mp3
Key matches media format of mp3... creating transcribe job
{
    "TranscriptionJob": {
        "TranscriptionJobName": "transcribe-zuora_ceo_transforming_the_subscription_economy_mad_money_cnbc",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "en-US",
        "MediaFormat": "mp3",
        "Media": {
            "MediaFileUri": "https://ceos-videos.s3.amazonaws.com/audio/zuora_ceo_transforming_the_subscription_economy_mad_money_cnbc.mp3"
        },
        "StartTime": 1615314067.194,
        "CreationTime": 1615314067.168,
        "Settings": {
            "ShowSpeakerLabels": true,
            "MaxSpeakerLabels": 5
        }
    }
}
Please verify job status within AWS Transcribe Console
https://console.aws.amazon.com/transcribe/
$ aws transcribe list-transcription-jobs
{
    "TranscriptionJobSummaries": []
}

image
image

Not creating unique job names

Hi,
the code does not create unique names for different jobs, and it is not helping.

Error:

************************************************************
Detected Object: audio/23andme_ceo_empowering_health_mad_money_cnbc.mp3
audio/23andme_ceo_empowering_health_mad_money_cnbc.mp3
audio/23andme_ceo_empowering_health_mad_money_cnbc.mp3
Key matches media format of mp3... creating transcribe job

An error occurred (ConflictException) when calling the StartTranscriptionJob operation: The requested job name already exists. Use a different job name.
************************************************************
Detected Object: audio/2u_ceo_buying_trilogy_and_doubling_our_reach_mad_money_cnbc.mp3
audio/2u_ceo_buying_trilogy_and_doubling_our_reach_mad_money_cnbc.mp3
audio/2u_ceo_buying_trilogy_and_doubling_our_reach_mad_money_cnbc.mp3
Key matches media format of mp3... creating transcribe job

An error occurred (ConflictException) when calling the StartTranscriptionJob operation: The requested job name already exists. Use a different job name.
************************************************************
Detected Object: audio/3m_ceo_addressing_softness_in_china_autos_and_electronics_mad_money_cnbc.mp3
audio/3m_ceo_addressing_softness_in_china_autos_and_electronics_mad_money_cnbc.mp3
audio/3m_ceo_addressing_softness_in_china_autos_and_electronics_mad_money_cnbc.mp3
Key matches media format of mp3... creating transcribe job

An error occurred (ConflictException) when calling the StartTranscriptionJob operation: The requested job name already exists. Use a different job name.
************************************************************
Detected Object: audio/3m_ceo_inge_thulin_secrets_to_relevance_mad_money_cnbc.mp3
audio/3m_ceo_inge_thulin_secrets_to_relevance_mad_money_cnbc.mp3
audio/3m_ceo_inge_thulin_secrets_to_relevance_mad_money_cnbc.mp3
Key matches media format of mp3... creating transcribe job

Can you please suggest edits?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.