+91 11 47074263
Sifs India
Speaker Identification | Challenges and SolutionsMarch 07, 2021 - BY SIFS India

Speaker Identification | Challenges and Solutions

The significant problem/challenges in the field of orator recognition arise from differences/deviations in the distinct categories of phonics and auditory features.

The most important issue in this field(speaker recognition) that comes into the spotlight in most of the analysis/research is the issue of channel mismatch.

It is not bound to dissimilarity in recording devices, the network quantity, and characteristics, noise quality, health, and stress associated conditions.

Another issue in the field of speaker recognition is to determine sufficient coverage for the distinct kinds of utterances in the training time to get improved performances for the situations when the distinct phonation kinds are articulated.

To approach various orators with probably coinciding voice/speech is one of the most important problems with this field. 

Different Styles of Phonation

The different types of phonation are:

Unvoiced Phonation

There are two types of Unvoiced phonation, Zero Phonation that coincides with the nil intensity/power and Respiration Phonation occurs when an unsettled wind stream passes through a calm vocal fold. The oscillation of vocal folds at a fixed interval produces certain reverberation in the uppermost compartment of the choral track which is called Normal Voiced Sound.

Voiced Phonation

There are 2 types of Voiced Phonation, Laryngealization which is produced by stabilizing the rear part of the choral folds by the arytenoid cartilages allowing the former portion of the choral folds to pulsate and Falsetto is produced by unnaturally constricting the basic structure of choral folds to attain a fake high pitch. 

Whispered Phonation

Whispered phonation is produced when the orator acts like producing a intonate articulation with the exclusion that the choral folds are more calm producing more rough wind stream as a comparison to speech resonance but the choral folds are not that much relaxed to produce an unvoiced utterance.

In general, the existing whispered speech data is very rare but in a few languages, like in Amerindian languages and a few old languages, inaudible articulation exists which brings independent effect from their oral copy.

The oral aspects of the orator are influenced by the incomplete relaxation of the choral folds. It is not easy to analyze the speaker/orator without the sufficient whispering data.

Speaker Identification Challenges

Speech in The Stress-Related Conditions: The stress-related conditions significantly impact phonation. The utterance/articulation of the orator goes through many changes under the influence of stress.  Stress will impact the importance of specific frequency bands, making it MFCC 

Extracting The Useful Source From Multiple Sources of Speech: The multiple sources of speech challenge has been addressed when sources are semi-stationary. To extract the useful source from multiple sources of audio and to decrease the intrusion from multiple other sources in the procedure is the major objective of the examination. To confine the speakers is the most significant step in this method. 

Few researchers adopt an HMM-based method for the segregation of the audile transfer function which results in the easy segregation of the sources from the multiple data by using a single mic. 

Channel Mismatch: The issue of channel mismatch is a significant issue in speaker identification. The initial access to the analysis of this challenge is condensed on the normalization of the characteristics. Many researchers have used normalization methods which present a good coverage. 

Some researchers have also tried to find different types of low-tech parody attempts in which one is by using the long-field mic which is used for recording the victim’s phonation and playing it back and the other type is the connection of the portions of small recordings useful for the text-dependent speaker recognition system. 

Intentional Cheating: Disguise is also a challenge in the field of speaker identification recognition but it doesn’t have many effects on speaker identification systems. 

Quality of the Speech Data: The use of good quality microphones resulted in the improved results of voice samples. 

Length of the Phonation Sample: It is hypothesized that the sample with a long length affects the speaker identification efficiency.

Need help?

Contact by WhatsApp

Hello SIFS Forensic Lab