ICANN 2011: icann

Showing posts with label icann. Show all posts

Wednesday, June 22, 2011

Valero et al, Complex-Valued Independent Component Analysis of Natural Images

Natural image statistics is a useful tool for understanding the functions of the visual part of the brain. The research on Natural image statistics mainly focused on the “linear” model which tries to extract the independent sources from original natural images which correspond to Gabor-like receptive fields in the primary visual cortex.

Nowadays, researchers stepped into description of the statistics of the signals after simple “linear” model stage. It is pointed out that a learned squared outputs of the simple cells may lead to complex responses. For instances, while FFT is applied on natural images, there are phases term and magnitude term, which can be generally described by complex numbers. Similarly, in natural images, some image might contain phase information of some source images, and magnitude images from other source images. Furthermore, generally speaking, the information contains in the phase terms are of great importance. Therefore, naturally, Complex Independent Component Analysis comes into play.

Traditional complex ICA, according to the authors in the paper, assume a uniform distribution over the complex plain, which is the phase term, in natural image. This is not always the cases. In the experiment made in the paper, using the traditional complex ICA, the learned features in complex plain differ a lot from actual features, which is shown in the following figure. The blue curve shows the phases of complex ICA sources, and the black curve shows the learned phases of complex ICA sources.

http://cl.ly/190Q0n2O2l312V171p0X

Therefore, in this paper, the author proposed an extension to the traditional complex ICA that also models the phase information in natural images. This is done by assuming a von Mises distribution for the phase information of the output signal, instead of the uniform distribution that the standard cICA assumes. The extension allows for a better fit to the signal, as the phase distributions are often far from uniform. This assumed distribution is capable of capturing two peaks in the phase information. After learning, the learned feature fits better than the feature learned by traditional complex independent component analysis. The result is shown as follows,

http://cl.ly/2s420N1C2W281B3W1V15

In this paper, the author assumed a modified distribution over the phase information of the natural images, and thus had a better fit to the phase information. Therefore, this type of new distribution is good in terms of accuracy.

Geoffrey Hinton: Learning structural descriptions of objects using equivariant capsules

The second plenary talk in ICANN 2011 was given by Prof. Geoffrey Hinton from University of Toronto. The topic was "Learning structural descriptions of objects using equivariant capsules". The accompanied paper in the proceeding is under the name: “Transforming Auto-encoders”. In this talk, he discussed the limitation of the convolutional neural network, and proposed a new way of learning invariant features under a new neural network framework.

The human brain does not need to go through a step of rotation to recognize an object. This is proven by a test where the task is to recognize objects positioned in arbitrary angles versus the task of imaginatively rotating the same object. However, in several recently popular computer vision algorithms, this rule is violated.

In most popular computer vision research, people use explicitly designed operators to extract the invariant features from images. These operators, according to Prof. Hinton, turn out to be misleading and not efficient. For instance, using convolutional neural network, one will try to learn the invariant features in different part of the images, and discard the spatial relationship between them. This will not work in a higher level features where we need to do, for instance, face identity analysis, which requires extremely strong spatial relationship between mouth and eyes.

Prof. Hinton arguess that the convolutional network way of representing the invariant features, where only some scalar output is used to represent the presence of the feature, is not capable of representing highly complex invariant feature sets. Subsampling methods have been proposed to make convolutional neural networks invariant for small changes in the viewing angle of the object. Prof. Hinton argues that it is not correct as the ultimate goal of learning feature should not be viewpoint invariant. Instead, the goal should be Equivariant features where changes in viewpoints lead to corresponding changes in neural networks. Equivariant feature means that the building block of the object features should be rotated correspondingly while the objects are rotated.

Therefore, he developed a new way of learning feature extractors which learn equivariant features through computation on local space called "capsules", and output informative results. These local features are accumulated hierarchically towards a more abstract representation. The network is then trained with images of the same objects when they are slightly shifted and rotated. In this way, each learned capsule is a "generative model". The difference between convolutional neural network and the "capsule method" is that the capsule method considers the spatial relationship of image features carrying spatial position along with the feature presence probability distribution.

This new way of representing the transformation of images has opened a new possibility for training invariant features and Prof. Hinton argues that this approach behaves closer to the way human brain functions and will be more promising one comparing to traditional computer vision methods.

For detailed explanation and demonstration, please see the full paper included in the proceeding of ICANN 2011.

Sunday, June 19, 2011

Joshua Tenenbaum: How to grow a mind: Statistics, structure and abstraction

Joshua Tenenbaum is the Associate Professor of Computational Cognitive Science at MIT. On Friday, the last day of ICANN 2011, he gave an inspiring plenary presentation about reverse-engineering learning and cognitive development.

He stated that the most perplexing quality of the brain from machine learning perspective is its ability to grasp abstract concepts and infer causal relations with such sparse data, i.e "how does the mind get so much from so little?". He gave an entertaining example of this by showing a grid full of pictures of computer generated unidentifiable objects and naming three of them as "tufas". He then pointed at other objects in the grid and asked the audience whether it was or wasn't a tufa. There was a strong consensus and the answers were quite confidently "yes" or "no".

To explore how this kind of inference could be possible, Tenenbaum focused on what he called abstract knowledge. His talk was then divided into three parts, answering three different questions about abstract knowledge.

How does abstract knowledge guide learning and inference from sparse data?

According to Tenenbaum, the mind learns and reasons according to Bayesian principles. Simply put, there exists some sort of generative model of data and hypotheses and the probability of a certain hypothesis given data is given by the Bayes' rule. The abstract background knowledge affects the model through the available hypotheses and in the prior probabilities given to these hypotheses. The likelihood gives the probability of the data given a hypothesis.

What forms does abstract knowledge take?

It doesn't seem feasible to assume that every logically possible hypothesis is somehow presented along with its prior and likelihood. The hypotheses need to be presented in a more structured way. As Tenenbaum puts it: "some more sophisticated forms of knowledge representation must underlie the probabilistic generative models needed for Bayesian cognition".

Causes and effects can be modeled in a general way with directed graphs. As an example, in a symptom-disease model we would have symptoms and diseases as nodes and edges running from the diseases to the symptoms. The role of background knowledge here would be to know that there are two kinds of nodes and that the edges always run from diseases to symptoms, in effect limiting amount of hypotheses to be considered.

On the other hand it seems that tree structured representations would be most effective for learning words and concepts from examples.

How is abstract knowledge acquired?

So it seems that abstract background knowledge is required to make learning possible. But how then is this background knowledge learned? How does one know when to use a tree structured presentation and when is some other form more suitable?

Tenenbaum presented the answer in hierarchical Bayesian models or HBMs. They enable hypotheses spaces of hypothesis spaces and priors on priors. More specifically, Tenenbaum proceeded to show how HBMs can be used to infer the form (e.g. tree, ring, chain) and the structure simultaneously. An impressive example was sorting synthesized faces varying in race and masculinity into a correct matrix structure, where race varied along the other axis and masculinity along the other.

Conclusion

Clearly one of the goals of the talk was to establish that abstract background knowledge is essential in human learning. Its role is to constrain the logically valid hypotheses to make learning possible. Human learning was then formulated as Bayesian inference over richly structured hierarchical generative models.