Diarization #130

ggerganov · 2022-11-08T18:31:02Z

Some unsuccessful experiments with audio embedding clustering

Tried to apply C-means fuzzy clustering on:

embeddings after the initial convolution in the encoder
self KV embeddings from each encoder layer
KQV embeddings from each encoder layer
embeddings from the last encoder layer
cross KV embeddings of each decoder layer

Instead of clustering the full embedding dimensions, first reduce dimensionality using SVD:

decompose the embeddings E = USV
compute singular vectors U' = US
project E on U' and take the top few coordinates

diarization : some unsuccessful experiments with audio embd clustering

c2f5be7

ggerganov force-pushed the diarization branch from 643bb17 to c2f5be7 Compare February 18, 2023 10:17

ggerganov added 3 commits February 18, 2023 18:36

diarization : more unsuccessful clustering experiments

d5d7769

diarization : try to cluster embedings from last encoder layer

d11f359

diarization : try conv and self-attention embeddings

ec44ad0

ggerganov force-pushed the diarization branch from 657741b to ec44ad0 Compare February 19, 2023 11:00

akashmjn mentioned this pull request Jun 30, 2023

whisper : support speaker segmentation (local diarization) of mono audio via tinydiarize #1058

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diarization #130

Diarization #130

ggerganov commented Nov 8, 2022 •

edited

Loading

Uh oh!

Uh oh!

Diarization #130

Are you sure you want to change the base?

Diarization #130

Conversation

ggerganov commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Nov 8, 2022 •

edited

Loading