Attention!

Few pictures that are supposed to be here were deleted accidentally from the photo archive. As a result you will see blank spaces with caption. This is not a loading error of your browser.

Monday, September 25, 2017

New answer on Quora: What is the need for speech signal processing?

Signal is information and speech is very much so.
A sentence spoken (speech) tells you much more than what you can infer from just reading the same (text).
Attributes of speech are:
  1. Core information: What the speaker intends to convey (message).
  2. Gender: Female voice has higher frequencies compared to male’s.
  3. Age: Voice deepens and crackles as age progresses.
  4. Timbre: Everyone has a unique voice. It is almost as unique as his/her fingerprint.
  5. Emotion: Angst, laughter, weeping, crying, etc.
  6. intensity: Whisper, talk, shout, scream.
The above mentioned can now be extracted and exploited by a computing device using signal processing methods. Few interesting applications are:
  • Speech recognition (speech to text): To identify what the speaker has said by essentially converting it into text for further processing/storage. Examples are Siri and Google Assistant.
  • Speaker recognition (voice biometric): To establish identity of the speaker and maybe use it to unlock phone or start a car. This is different from recognition and can be combined with it to unlock by saying a ‘pass phrase’.
  • speech coding: To effectively store as well as transmit speech in digital form over a channel (internet calls, mobile network, telephone cables, satellite link) using least bandwidth and in error free way.
  • speech synthesis (Text to speech): To artificially produce speech using systems which mimic the entire mechanism from human vocal cord vibration, air flow out of trachea to filtering effects caused by oral and nasal cavities. Examples are Google assistant and Microsoft Sam.
  • voice analysis: To medically diagnose the human vocal system from voice samples of patient.
  • speech enhancement: To improve quality of speech affected by noise in applications like teleconference, VoIP, mobile call, hearing aids.
  • voice morphing: To impersonate another individual’s voice using words spoken by you. Voice mimicry is one form done by humans. Now we are training computers to do the same. We may one day reach the perfection of regenerating Micheal Jackson’s voice and songs while lyrics are written and sung by someone else. Example from fiction is the voice ‘sticker’ used by Tom Cruise in Mission Impossible movie series.
Signal processing of speech has come a long way from the invention of telephone to voice calls on WhatsApp (which are surprisingly clearer than calls over mobile network).

No comments:

Post a Comment