In this thesis, a recently developed non-parametric Bayesian method for factor analysis is modified and applied to the problem of word discovery from continuous speech. Previously, Non-negative Matrix Factorization (NMF) has been
applied for the same purposes and this method is used here as a reference. Common to both methods is that they decompose a large feature matrix into a weighted combination of smaller sparse components. The new method, based on Beta Process priors, has the advantage compared to NMF of being able to infer the the size of the basis, and thereby also the number of recurring patterns, or word candidates, found in the data. Results obtained with this new method, called Beta Process Factor Analysis, are compared with NMF on the TIDigits database, showing that the new method is capable of not only finding the correct words, but also the correct number of words. It is further
demonstrated that the method can infer the approximate number of words for different vocabulary sizes by testing on randomly generated sequences of words.