updated the stt streaming code to track and calculate message latency #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've needed to update our STT latency scripts for a while. This code is meant to do that. It tracks two types of STT latency: interim result latency and EOT latency.
interim_result=truemessages or (for Flux) for Update messages. The code performs the typicalaudio cursor - transcript cursorcalculation.How this script measures EOT latency:
When an EOT message is received (speech_final, is_final, UtteranceEnd, EndOfTurn, or EagerEndOfTurn), the calculation is simple: find the prior interim_result / Update message and subtract the
receivedtimes. The result is the amount of wall clock time it took to receive an EOT signal after the prior interim result was received; said another way, it's the amount of time it took the EOT to trigger after Deepgram finished processing the most recent non-EOT message.There are better ways to calculate EOT latency, but all of them (to my knowledge) require ground truth timestamps and careful labeling of the audio. Since we don't have that information, I believe the current calculation is a reasonable approximation.
Below is a snippet of the output from the
print_transcript.pyscript for a Flux transcript:And below are the same sections for a Nova transcript:
You'll notice that the format of the messages/transcript is similar between both Flux and Nova, and you'll also notice that the summarized latency data at the end is representative of the significant improvements that Flux provides, particularly for EOT latency.