Speaker Recognition

Speaker recognition is the process of identifying a speaker based on voice characteristics. By creating voice tags, Cochl.Sense accurately detects and distinguishes speakers, making it easy to identify voices during transcription. Additionally, Cochl.Sense converts speech to text with speaker identification and timestamps, making it especially useful for meetings, interviews, and more. Cochl.Sense allows users to export transcriptions with detailed insights in a single JSON file.

Step 1: Create a new project


Click the New project button. Enter your project name and select Cloud API as the project type. Click Create Project to add your project.

NOTE: The speaker recognition feature is only available for Cloud API projects.

Create project

Step 2: Create a voice tag


After creating your project, click on it to view the details. Navigate to the Speaker Recognition tab and you will see two options: Record and Upload.

  • Record: Create a new voice tag by recording audio.
  • Upload: Uploading existing audio files to register your voice tag.

Speaker recognition

Click the record button and you will see this pop up. Enter the name for your voice tag and click the red record button to begin recording. Follow the prompts to read several sentences aloud. Once finished, click the Create tag button to save your voice tag.

Create project

Click the Upload button and enter the name for your voice tag. Drag and drop or browse for 5 to 20 audio files. Once the files are uploaded, click the Create tag button to complete the process.

Create voice tag - uploading

Step 3: Check the results


Go to the Upload tab. Click the Cochl.Sense API + Conversation Transcription button. Drag and drop or browse to upload your audio file.

Select conversation transcription

Cochl.Sense will perform speaker recognition and provide transcription with speaker IDs, timestamps and more details.

If you want to export a JSON file, click the View JSON button in the top-right corner.

You can download the result in three different formats: combined version, Cochl.Sense API result only, and transcription only.

JSON output