ICANN 2011: cognitive processes

Sunday, June 19, 2011

Joshua Tenenbaum: How to grow a mind: Statistics, structure and abstraction

Joshua Tenenbaum is the Associate Professor of Computational Cognitive Science at MIT. On Friday, the last day of ICANN 2011, he gave an inspiring plenary presentation about reverse-engineering learning and cognitive development.

He stated that the most perplexing quality of the brain from machine learning perspective is its ability to grasp abstract concepts and infer causal relations with such sparse data, i.e "how does the mind get so much from so little?". He gave an entertaining example of this by showing a grid full of pictures of computer generated unidentifiable objects and naming three of them as "tufas". He then pointed at other objects in the grid and asked the audience whether it was or wasn't a tufa. There was a strong consensus and the answers were quite confidently "yes" or "no".

To explore how this kind of inference could be possible, Tenenbaum focused on what he called abstract knowledge. His talk was then divided into three parts, answering three different questions about abstract knowledge.

How does abstract knowledge guide learning and inference from sparse data?

According to Tenenbaum, the mind learns and reasons according to Bayesian principles. Simply put, there exists some sort of generative model of data and hypotheses and the probability of a certain hypothesis given data is given by the Bayes' rule. The abstract background knowledge affects the model through the available hypotheses and in the prior probabilities given to these hypotheses. The likelihood gives the probability of the data given a hypothesis.

What forms does abstract knowledge take?

It doesn't seem feasible to assume that every logically possible hypothesis is somehow presented along with its prior and likelihood. The hypotheses need to be presented in a more structured way. As Tenenbaum puts it: "some more sophisticated forms of knowledge representation must underlie the probabilistic generative models needed for Bayesian cognition".

Causes and effects can be modeled in a general way with directed graphs. As an example, in a symptom-disease model we would have symptoms and diseases as nodes and edges running from the diseases to the symptoms. The role of background knowledge here would be to know that there are two kinds of nodes and that the edges always run from diseases to symptoms, in effect limiting amount of hypotheses to be considered.

On the other hand it seems that tree structured representations would be most effective for learning words and concepts from examples.

How is abstract knowledge acquired?

So it seems that abstract background knowledge is required to make learning possible. But how then is this background knowledge learned? How does one know when to use a tree structured presentation and when is some other form more suitable?

Tenenbaum presented the answer in hierarchical Bayesian models or HBMs. They enable hypotheses spaces of hypothesis spaces and priors on priors. More specifically, Tenenbaum proceeded to show how HBMs can be used to infer the form (e.g. tree, ring, chain) and the structure simultaneously. An impressive example was sorting synthesized faces varying in race and masculinity into a correct matrix structure, where race varied along the other axis and masculinity along the other.

Conclusion

Clearly one of the goals of the talk was to establish that abstract background knowledge is essential in human learning. Its role is to constrain the logically valid hypotheses to make learning possible. Human learning was then formulated as Bayesian inference over richly structured hierarchical generative models.

Friday, June 17, 2011

Ramya Rasipuram and Mathew Magimai Doss: Improving Articulatory Feature and Phoneme Recognition using Multitask Learning

Articulatory features define properties of speech production, i.e. they describe the basic sounds we make. Phonemes on the other hand are the smallest units of sound used to form meaningful speech. In Finnish basically all the phonemes correspond to a letter, whereas in English they do not. However, phonemes are used to model pronounciation, and they are therefore cross- and multilingual.

The authors did experiments on their model using the TIMIT corpus, containing speech from American English speakers of different sexes and dialects. The corpus also contains the correct phonemes used in the speech. Following methods for phoneme recognition were applied:

Independent MLP (multilayer perceptron)
Multitask MLP
Phoneme MLP

Independent MLP is a standard method, whereas (2) and (3) are novel methods presented in their paper. In each method, articulatory features were learned from the audio, and an MLP network was trained to predict the phonemes. In independent MLP the classifiers are independent. However, since the features actually are interrelated, multitask learning was considered to be needed. The prediction accuracies (speech to phoneme) for independent, multitask and phenome MLP were 67.4%, 68.9% and 70.2%, respectively.

Additionally, a hierarchical version was presented for each method. They performed better than the original ones, maintaining the order of performance.

Rasipuram presented their work to be continued with:

Automatic speech recognition studies
Different importance weights for features
Adding gender and rate of speech as features

The talk gained some critique, as one researcher in the audience stated that performance better than this had been achieved already years ago. This wasn't really addressed by the author.