Getting Started

1. Prepare Python Environment


Cochl.Sense Cloud API can be easily integrated into any Python application using the Cochl library. The library supports Python versions 3.8 or higher. Please make sure you’re using a compatible version of Python. First, create a Python virtual environment.

python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install --upgrade cochl

git clone https://github.com/cochlearai/cochl-sense-py.git cd cochl-sense-py/samples

python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install --upgrade cochl

git clone https://github.com/cochlearai/cochl-sense-py.git cd cochl-sense-py/samples

python -m venv venv
.\venv\Scripts\activate
pip install --upgrade pip
pip install --upgrade cochl

git clone https://github.com/cochlearai/cochl-sense-py.git cd cochl-sense-py/samples

2. File Sample


  • File samples can also be found here.
  • Supported file formats for the Cochl.Sense Cloud API: MP3, WAV, and OGG.

If a file is not in a supported format, it must be manually converted. More details can be found here.

This simple setup is enough to upload your file. Please input your retrieved API project key into “YOUR_API_PROJECT_KEY”.

import cochl.sense as sense

client = sense.FileClient("YOUR_API_PROJECT_KEY")

results = client.predict("your_file.wav")
print(results.to_dict())  # get results as a dict

# {
#     'session_id': 'df1637ab-5478-455c-bff8-c7b90ff215c2',
#     'window_results': [
#         {
#             'start_time': 0.0,
#             'end_time': 1.0,
#             'sound_tags': [
#                 {'name': 'Gunshot', 'probability': 0.578891396522522},
#                 {'name': 'Gunshot_single', 'probability': 0.578891396522522},
#             ],
#         },
#         {
#             'start_time': 0.5,
#             'end_time': 1.5,
#             'sound_tags': [
#                 {'name': 'Others', 'probability': 0.0}
#             ],
#         },
#         {
#             'start_time': 1.0,
#             'end_time': 2.0,
#             'sound_tags': [
#                 {'name': 'Others', 'probability': 0.0}
#             ],
#         },
#     ]
# }

You can adjust the custom settings (window hop, sensitivity control, etc.) as shown below.

import cochl.sense as sense

api_config = sense.APIConfig(
    window_hop=sense.WindowHop.HOP_1s,
    sensitivity=sense.SensitivityConfig(
        default=sense.SensitivityScale.LOW,
        by_tags={
            "Baby_cry": sense.SensitivityScale.VERY_LOW,
            "Gunshot":  sense.SensitivityScale.HIGH,
        },
    ),
)

client = sense.FileClient(
    "YOUR_API_PROJECT_KEY",
    api_config=api_config,
)

results = client.predict("your_file.wav")
print(results.to_dict())  # get results as a dict

The file prediction results can be displayed in a summarized format.

# print(results.to_dict())  # get results as a dict

print(results.to_summarized_result(
    interval_margin=2,
    by_tags={"Baby_cry": 5, "Gunshot": 3}
))  # get results in a simplified format

# At 0.0-1.0s, [Baby_cry] was detected

For more details about custom settings mentioned above, please refer to the Advanced configurations section.

3. Stream Sample


Any raw PCM audio stream data can be processed as shown below. Please input your retrieved API project key into “YOUR_API_PROJECT_KEY”. The code below provides an overview of how the Cochl.Sense Cloud API can be used with streaming data.

import cochl.sense as sense

# when audio is sampled in 22,050Hz and each sample is in f32le
SENSE_DATA_TYPE = sense.AudioDataType.F32
SENSE_ENDIAN = sense.AudioEndian.LITTLE
SAMPLE_RATE = 22050

audio_type = sense.StreamAudioType(
    data_type=SENSE_DATA_TYPE,
    endian=SENSE_ENDIAN,
    sample_rate=SAMPLE_RATE,
)
client = sense.StreamClient(
    "YOUR_API_PROJECT_KEY",
    audio_type=audio_type,
)

# put `bytes` type data into StreamBuffer
# and it returns predictable audio window when pop()
buffer = client.get_buffer()
your_audio_stream_data = ...  # `bytes` type data
buffer.put(your_audio_stream_data)
if buffer.is_ready():
    audio_window = buffer.pop()
    result = client.predict(audio_window)
    print(result)


# {'start_time': 0.0, 'end_time': 1.0, 'sound_tags': []}
# {'start_time': 0.5, 'end_time': 1.5, 'sound_tags': []}
# {'start_time': 1.0, 'end_time': 2.0, 'sound_tags': [{'name': 'Speech', 'probability': 0.18024994432926178}, {'name': 'Male_speech', 'probability': 0.18024994432926178}]}
# {'start_time': 1.5, 'end_time': 2.5, 'sound_tags': []}
# {'start_time': 2.0, 'end_time': 3.0, 'sound_tags': [{'name': 'Clap', 'probability': 0.8431069254875183}]}
# {'start_time': 2.5, 'end_time': 3.5, 'sound_tags': [{'name': 'Clap', 'probability': 0.6679767370223999}]}
# ...

The ‘stream_sample.py’ file shows a more detailed example using PyAudio. This sample requires PyAudio to be installed.

brew install portaudio
pip install pyaudio
sudo apt install portaudio19-dev
pip install pyaudio
pip install pyaudio

NOTE: The result of stream feature does not support summarized format because it outputs its result in real-time.

4. Check Usage


You can review your usage on the Cochl.Sense Dashboard.

hope_size

5. Additional Notes


(1) Convert to Supported File Formats (WAV, MP3, OGG)

Pydub is an easy way to convert audio files into supported formats (WAV, MP3, and OGG). First, install Pydub by following the instructions in this link. Then, write a Python script to convert your file into a supported format, as shown below.

from pydub import AudioSegment

mp4_version = AudioSegment.from_file("sample.mp4", "mp4")
mp4_version.export("sample.mp3", format="mp3")

For more details of Pydub, please refer to this link.

(2) Result Summary

You can summarize the file prediction results by aggregating consecutive windows, which return the time and length of each detected tag. The interval margin is a parameter that treats the unrecognized windows ones, and it affects all sound tags. If you want to specify a different interval margin for specific sound tags, you can use the ‘by_tags’ option.

print(results.to_summarized_result(
    interval_margin=2,
    by_tags={"Baby_cry": 5, "Gunshot": 3}
))

# At 0.0-1.0s, [Baby_cry] was detected