I want to develop an app that recognizes what a user speaks and translate it to text. We can choose the library to use for speech to text together. We want to store audio as well as get the translation if possible.
The text will be sent to server. The app will receive text from the server as well which is to be converted to speech and played on the speaker of the phones.