Project Motivations
The goals of this project are to research automatic syllabification techniques and identify one that can improve the performance of the current Applied Linguistics Speech Lab tools.
After breaking down speech into syllables, the pitch, rhythm, and stress of each individual syllable can be analyzed to detect features of the speaker.
Some features include: emotion, sarcasm, country of origin, whether the speaker was asking a question, issuing a command, or simply making a statement.
Software Artchitecture
- Sound files containing english speech are recorded
- The raw sound files are then process to attach start and end times to each individual phone
- After attaching timing data, the phones are structured into objects
- The structured phones are then ran through various systems to calculate the syllabification of the phone data
Challenges
- Very few public syllabification solutions exist
- There is very little foundation to build off of
- Iterative Process
- Each of the syllabification solutions required many iterations in order to achieve desired percentages
- Memory Management
- The software had to be able to handle large amounts of data at once, special care to stay within reasonable memory limits was needed
Based on 1680 utterances
Genetic Algorithm
- Iterative technique that takes cues from evolution in biology
Genes
- Genes are candidate syllabification solutions
- Genes start and end with vowels, have a variable number of consonants and one syllable split in-between
Evolution
- The genes that have a more accurate syllabification result will be kept and ones that are inaccurate will be culled
Based on 1680 utterances
Hidden Markov Model
- Relies on statistical calculations to make predictions based on a series of sonority values
Markov Chain
- Works with a chain of states, where the current state is statistically reliant on the previous one
Syllabification with States
- With an input list of sonority values, the HMM outputs states which represent the beginning of each syllable
Based on 1680 utterances
K-Means Clustering
- Allows for organizing utterances into a specified number of groups
Data Grouping
- Performs grouping based on similar features ex: average, mean, median sonority values
Syllabification with Grouping
- Each group of utterances receives an individual Hidden Markov Model, allowing for more precise syllabification results
Testing
- Integration Testing
- Focused on utility functions that all algorithms use
- Evaluated code coverage of the systems
- Helps eliminate code that is not being used
- Basic Unit Testing
- Check robustness of systems by ensuring inputs are in the expected format and stay within boundaries
Result Highlights
Genetic Algorithm |
72.60% |
Hidden Markov Model |
77.64% |
K-Means Clustering |
78.10% |
Room for Improvement
Although these are the percentages that were achieved for the three solutions, due to the constant reworking that each requires, higher percentages may exist.
- The genetic algorithm theoretically gets more accurate over time, however it is based on random chance
- K-Means and HMM both work on a set of numerical parameters, a more optimal driving set of parameters may exist
Sponsors
Dr. David Johnson |
Dr. Okim Kang |
PhD, Computer Science |
PhD, Linguistics |
Mentor
Dr. John Georgas
Associate Professor with the Electrical Engineering and Computer Science Department
The Programming Team
From left to right:
Drew McDaniel |
Genetic Algorithm |
Salvatore Bottiglier |
Genetic Algorithm |
Michael Albanese |
Hidden Markov Model and K-Means Clustering |
Adam Thomas |
Website, Documents, and Testing |
Trent Cooper |
Documents and Testing |