Detecting Words and Phrases in Encrypted VoIP Calls
Abstract: Although Voice over IP (VoIP) is rapidly being adopted, its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that it is possible to identify the phrases spoken within encrypted VoIP calls when the audio is encoded using variable bit rate codecs. To do so, we train a hidden Markov model using only knowledge of the phonetic pronunciations of words, such as those provided by a dictionary, and search packet sequences for instances of specified phrases. Our approach does not require examples of the speaker’s voice, or even example recordings of the words that make up the target phrase. We evaluate our techniques on a standard speech recognition corpus containing over 2,000 phonetically rich phrases spoken by 630 distinct speakers from across the continental United States. Our results indicate that we can identify phrases within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.