Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Next »

This document contains recommendations on how to provide speech data to the various SoapBox speech solutions. These guidelines are designed for greater efficiency and accuracy as well as reasonable response times from the service. Use of a SoapBox speech solution works best when data sent to the service is within the parameters described in this document.

For optimal results...

If possible, avoid...

Capture audio with a sampling rate of 16,000 Hz or higher.

Lower sampling rates may reduce accuracy. However, avoid re-sampling.

Use a lossless codec to record and transmit audio. WAV is recommended.

Audio should be recorded as WAV

Our technology is designed to ignore background voices and noise without additional noise-canceling. However, for optimal results, position the microphone as close to the user as possible, particularly when background noise is present.

Excessive background noise and echoes may reduce accuracy, especially if a lossy codec is also used.

If you are capturing audio from more than one person, and each person is recorded on a separate channel, send each channel separately to get the best recognition results. However, if all speakers are mixed in a single channel recording, send the recording as is.

Multiple people talking at the same time, or at different volumes may be interpreted as background noise and ignored.

Sampling rate

Set the sampling rate of the audio source to 16000 Hz.

Frame size

Streaming recognition recognizes live audio as it is captured from a microphone or other audio source. The audio stream is split into frames and sent in consecutive StreamingRecognizeRequest messages. Any frame size is acceptable. Larger frames are more efficient, but add latency. A 100-millisecond frame size is recommended as a good tradeoff between latency and efficiency.

Audio pre-processing

It's best to provide audio that is as clean as possible by using a good quality and well-positioned microphone. The service is designed to handle noisy audio.

For best results:

  • Position the microphone as close as possible to the person that is speaking, particularly when background noise is present.

  • Avoid audio clipping.

  • Listen to some sample audio. It should sound clear, without distortion or unexpected noise.

  • No labels