Reference - gRPC
To analyze your audio, 4 different programming languages are supported :
- Python: https://github.com/cochlearai/sense-python
- Node.js: https://github.com/cochlearai/sense-nodejs
- Android: https://github.com/cochlearai/sense-java
- Dart: https://github.com/cochlearai/sense-dart
The file represents an object that can inference audio from an audio file. An audio file is any source of audio data whose duration is known at runtime. The server will wait for the whole file to be received before it begins inferencing. All inferenced data will be received in one payload. A file can be, for instance, an mp3 file stored locally, a WAV file accessible from an URL, and etc… Currently, WAV, FLAC, mp3, Ogg, and mp4 files are supported. If you want to use another file encoding format, let us know at firstname.lastname@example.org so that we can prioritize it in our internal roadmap. A stream object is created using a FileBuilder object, following the builder pattern.
The stream represents an object that can inference audio from an audio stream. An audio stream is any source of data whose duration is not known at the beginning of the inferencing. Because the duration is not known, the server will inference audio in real-time. Our [client] minimum audio duration is 1 second. Therefore, 1 second of audio will be required before the first result is given. After that, one result will be given every 0.5 seconds of audio. The stream can be stopped at any moment of inferencing. A stream can be, for instance, the audio data coming from a microphone, audio data coming from a web radio, and etc. For now, the only format supported for streaming is a raw data stream (PCM stream). Raw data must be a mono channel audio stream, andits format (int32 / float32 / int64 / float64) must be given to describe the raw audio data. For best performance, we recommend using a sampling rate of 22050Hz and data represented as float32. Any other sampling rates/data types are fine and will also work out of the box. Multiple results will be returned by inferencing a stream. For each result, a callback will be called. If you want to use another file encoding format, let us know at email@example.com so that we can prioritize it in our internal roadmap. A stream object is created using a StreamBuilder object, following the builder pattern.
The result is a class that is returned by both file and stream class. Multiple results will be returned by a stream by calling a callback function. For a file, only one result will be returned. A Result object has the following methods (case can change depending on the programming language)
- service(): returns the service name : “Human Interaction”, “Emergency”, “Human Status” for instance
- allEvents(): returns all events
- detectedEvents(): returns all events that match the “filter” function defined below
- detectedEventTiming(): groups events that match the “filter function” and shows segments of time of when they were detected
- detectedTags(): returns only the “tag” of the event that matches the “filter” function
- toJSON(): returns a raw JSON object containing service name and an array of events
- withFilter(filter): uses a filter function, which takes an event as an input and returns a boolean. An event will be “detected” if the filter function returns true for that event
- useDefaultFilter() : the default filter is to consider all events as detected. So by default, allEvents() and detectedEvents() will return the same result
Note that if you are inferencing a stream, multiple results will be returned. By default, calling allEvents() will only return the newly inferenced result. It’s possible to keep track of previous events of the stream. To do so, call the withMaxEventsHistorySize method on the StreamBuilder object. Its default value is 0, and increasing it will allow it to “remember” previous events.
An event contains the following data: (case can change depending on the language)
- tag: a string describing the event. A list of tags is available [here]
- startTime: the time at which the events start (Origin of time is the beginning of the audio)
- endTime: the time at which the events end (Origin of time is the end of the audio)
- probability: is between 0 and 1. The higher the probability is, the higher the likelihood that the event happened. For more details about this data, please contact our team at firstname.lastname@example.org