Reference - gRPC

To analyze your audio, 4 different programming languages are supported :


The file represents an object that can inference audio from an audio file. An audio file is any source of audio data whose duration is known at runtime. The server will wait for the whole file to be received before it begins inferencing. All inferenced data will be received in one payload. A file can be, for instance, an mp3 file stored locally, a WAV file accessible from an URL, and etc. Currently, WAV, FLAC, mp3, Ogg, and mp4 files are supported. If you want to use another file encoding format, let us know at so that we can prioritize it in our internal roadmap.


The stream represents an object that can inference audio from an audio stream. An audio stream is any source of data whose duration is not known at the beginning of the inferencing. Because the duration is not known, the server will inference audio in real-time. Our [client] minimum audio duration is 1 second. Therefore, 1 second of audio will be required before the first result is given. After that, one result will be given every 0.5 seconds of audio. The stream can be stopped at any moment of inferencing. A stream can be, for instance, the audio data coming from a microphone, audio data coming from a web radio, and etc. For now, the only format supported for streaming is a raw data stream (PCM stream). Raw data must be a mono channel audio stream, andits format (int32 / float32 / int64 / float64) must be given to describe the raw audio data. For best performance, we recommend using a sampling rate of 22050Hz and data represented as float32. Any other sampling rates/data types are fine and will also work out of the box. Multiple results will be returned by inferencing a stream. For each result, a callback will be called. If you want to use another file encoding format, let us know at so that we can prioritize it in our internal roadmap. A stream object is created using a StreamBuilder object, following the builder pattern.


The result is a class that is returned by both file and stream class. Multiple results will be returned by a stream by calling a callback function. For a file, only one result will be returned. A Result object has the following methods (case can change depending on the programming language)

service()returns the service name : “Human Interaction”, “Emergency”, “Human Status” for instance
allEvents()returns all events
detectedEvents()returns all events that match the “filter” function defined below
detectedEventTiming()groups events that match the “filter function” and shows segments of time of when they were detected
detectedTags()returns only the “tag” of the event that matches the “filter” function
toJSON()returns a raw JSON object containing service name and an array of events
withFilter(filter)uses a filter function, which takes an event as an input and returns a boolean. An event will be “detected” if the filter function returns true for that event
useDefaultFilter()the default filter is to consider all events as detected. So by default, allEvents() and detectedEvents() will return the same result
Note that if you are inferencing a stream, multiple results will be returned. By default, calling allEvents() will only return the newly inferenced result. It’s possible to keep track of previous events of the stream. To do so, call the withMaxEventsHistorySize method on the StreamBuilder object. Its default value is 0, and increasing it will allow it to “remember” previous events.


An event contains the following data: (case can change depending on the language)

taga string describing the event.
startTimethe time at which the events start (Origin of time is the beginning of the audio)
endTimethe time at which the events end (Origin of time is the end of the audio)
probabilityit is between 0 and 1. The higher the probability is, the higher the likelihood that the event happened.

For more details about this data, please contact our team at