You can use this code as a base for doing real time transcription of a phone call using Google Speech to Text API.
An audio stream is sent via websocket connection to your server and then relayed to the Google streaming interface. Speech recognition is performed and the text returned to the console.
You will need to set up a Google Cloud project and service account. Once these steps are completed, you will have a downloaded JSON file to set up the rest of the project. You will need this file prior to using the
Deploy to Heroku button. If you plan on running this locally, make sure this file is saved in the project folder.
In order to run this on Heroku, you will need to gather the following information:
API_KEY- This is the API key from your Nexmo Account.
API_SECRET- This is the API secret from your Nexmo Account.
GOOGLE_CLIENT_EMAIL- You can find this in the
GOOGLE_PRIVATE_KEY- You can find this in the
-----BEGIN PRIVATE KEY-----\nXXXXXXXXX\n-----END PRIVATE KEY-----\n
This will create a new Nexmo application and phone number to begin testing with. View the logs to see the transcription response from the service. You can do this in the Heroku dashboard, or with the Heroku CLI using
heroku logs -t.
You will need to create a new Nexmo application in order to work with this app:
Install the CLI by following these instructions. Then create a new Nexmo application that also sets up your
event_url for the app running locally on your machine.
nexmo app:create google-speech-to-text http://<your_hostname>/ncco http://<your_hostname>/event
This will return an application ID. Make a note of it.
If you don't have a number already in place, you will need to buy one. This can also be achieved using the CLI by running this command:
Finally, link your new number to the application you created by running:
nexmo link:app YOUR_NUMBER YOUR_APPLICATION_ID
To run this on your machine you'll need an up-to-date version of Node.
Start by installing the dependencies with:
Then copy the example.env file to a new file called .env:
cp .env.example > .env
Edit the .env file to add in your application ID and the location of the credentials file from Google.
GOOGLE_APPLICATION_CREDENTIALS=./google_creds.json APP_ID="12345678-aaaa-bbbb-4321-1234567890ab" LANG_CODE="en-US"
If you aren't going to be working in the en-US language then you can change the language to any of the other supported languages listed in the Google Speech to Text API documentation.
To run the app using Docker run the following command in your terminal:
This will create a new image with all the dependencies and run it at http://localhost:3000.