I’m looking for input on designing and implementing an audio transcription pipeline with AWS Transcribe, EC2, S3, and Lambda. I’m a computer science student in my last full time semester. I’m working on my technical writing project and hope to use it when I interview with tech companies in May. The paper I’m writing has a survey component to add input from industry professionals. Any and all input is welcome.
I’m trying to plan an audio transcription pipeline for use in a call center to transcribe call audio and perform natural language analysis.
I am planning to deploy a django or flask python application to an EC2 instance with a front end written in React.js. This would be used as an interface to upload content to S3. I’d like to use AWS Lambda to trigger an event that calls the AWS Transcribe service on the audio object in S3. Then another AWS Lambda function that triggers when a Transcription job is done and sends the transcription to another S3 bucket. How can I make this design better?
Am I doing too much with the Python application? Are there any AWS services that would make this portion obsolete? How would you go about implementing this idea in AWS services? This is my first AWS project that I’ve planned on my own and very much a learning experience for me. Thank you all for your input.