Libraries

To analyze your audio, 4 different programming languages are supported :

You can find all our clients and more detailed documentation on github :

All clients are built in the same way and have 4 important classes :

File

File represents an object that can inference audio coming from an audio file.

An audio file is any source of audio data which duration is known at runtime. The server will wait for the whole file to be received before to start inferencing. All inferenced data will be received in one payload.

A file can be, for instance, an mp3 file stored locally, a WAV file accessible from an URL, etc…

So far WAV, FLAC, mp3, ogg, mp4 are supported.

If you want to use another file encoding format, let us know at support@cochl.ai so that we can prioritize it in our internal roadmap.

A stream object is created using a FileBuilder object, following the builder pattern.

Stream

Stream represents an object that can inference audio coming from an audio stream.

An audio stream is any source of data whose duration is not known at the beginning of the inferencing. Because the duration is not known, the server will inference audio as it comes. Our client minimum audio duration is 1 second. Therefore, 1 second of audio will be required before the first result to be given. After that, one result will be given every 0.5 seconds of audio.

Stream can be stopped at any moment of inferencing.

A stream can be, for instance, the audio data coming from a microphone, audio data coming from a web radio, etc…

For now, the only format that is supported for streaming is a raw data stream (PCM stream). Raw data is sent has to be a mono channel audio stream. Its format (int32 / float32 / int64 / float64) has to be given to describe the raw audio data.

For best performance, we recommend using a sampling rate of 22050Hz and data represented as float32.

Any other sampling rates/data types are fine and will also work out of the box.

Multiple results will be returned by inferencing a stream. For each result, a callback will be called

If you want to use another file encoding format, let us know at support@cochl.ai so that we can prioritize it in our internal roadmap.

A stream object is created using a StreamBuilder object, following the builder pattern.

Result

Result is an class that is returned by both file and stream class.

Multiple results will be returned by a stream by calling a callback function. For a file, only one result will be returned.

A Result object has the following methods (case can change depending on the programming language)

  • service(): returns the service name : “human-interaction”, “emergency”, “human-status” for instance
  • allEvents(): returns all events
  • detectedEvents(): return all events that match the “filter” function defined below
  • detectedEventTiming(): group events that match the “filter function” and shows segments of time of when they were detected
  • detectedTags(): return only the “tag” of the event that matches the “filter” function
  • toJSON(): return a raw JSON object containing service name and an array of events
  • withFilter(filter): use a filter function : that function takes an event as input and returns a boolean. An event will be “detected” if the filter function returns true for that event
  • useDefaultFilter() : the default filter is to consider all events as detected. So by default, allEvents() and detectedEvents() will return the same result

Note that if you are inferencing a stream, multiple results will be returned. By default, calling allEvents() will only return the newly inferenced result. It’s possible to keep track of previous events of the stream. To do so, call the withMaxEventsHistorySize method on the StreamBuilder object. Its default value is 0, and increasing it will allow to “remember” previous events.

Event

An event contains the following data : (case can change depending on the language)

  • tag: a string describing the event. List of tags are available here
  • startTime: the time where the events start (Origin of time is the begining of the audio)
  • endTime: the time where the events end (Origin of time is the end of the audio)
  • probability: is between 0 and 1. The higher the probability is, the higher the event likely happened. If you want to understand more precisely about this data, please contact our team at support@cochlear.ai.