Voice recognition software design

Job Description

I would like to make a program that does the following things:
1. hosted on a webserver, have a UI that let user put in a URL to a video (can be youtube only but better cover everything)
2. download that video to the server and transcribe the voice content using voice recognition.
3. use the txt generated from the video speech content to index the video, making it searchable by keyword (use .Net (Microsoft), youtube-dl, ffmpeg is my suggestion)
4. Present the finished product (indexed video) in a dynamically generated page. when searching a keyword, it shows on the timeline where the keyword is mentioned and also shows a thumbnail popout from that point on timeline.

The user interface looks something like the attachment for desktop. Once developed mobile integration is next project.

