Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts

Wednesday, June 22, 2011

Geoffrey Hinton: Learning structural descriptions of objects using equivariant capsules

The second plenary talk in ICANN 2011 was given by Prof. Geoffrey Hinton from University of Toronto. The topic was "Learning structural descriptions of objects using equivariant capsules". The accompanied paper in the proceeding is under the name: “Transforming Auto-encoders”. In this talk, he discussed the limitation of the convolutional neural network, and proposed a new way of learning invariant features under a new neural network framework.



The human brain does not need to go through a step of rotation to recognize an object. This is proven by a test where the task is to recognize objects positioned in arbitrary angles versus the task of imaginatively rotating the same object. However, in several recently popular computer vision algorithms, this rule is violated.

In most popular computer vision research, people use explicitly designed operators to extract the invariant features from images. These operators, according to Prof. Hinton, turn out to be misleading and not efficient. For instance, using convolutional neural network, one will try to learn the invariant features in different part of the images, and discard the spatial relationship between them. This will not work in a higher level features where we need to do, for instance, face identity analysis, which requires extremely strong spatial relationship between mouth and eyes.



Prof. Hinton arguess that the convolutional network way of representing the invariant features, where only some scalar output is used to represent the presence of the feature, is not capable of representing highly complex invariant feature sets. Subsampling methods have been proposed to make convolutional neural networks invariant for small changes in the viewing angle of the object. Prof. Hinton argues that it is not correct as the ultimate goal of learning feature should not be viewpoint invariant. Instead, the goal should be Equivariant features where changes in viewpoints lead to corresponding changes in neural networks. Equivariant feature means that the building block of the object features should be rotated correspondingly while the objects are rotated.

Therefore, he developed a new way of learning feature extractors which learn equivariant features through computation on local space called "capsules", and output informative results. These local features are accumulated hierarchically towards a more abstract representation. The network is then trained with images of the same objects when they are slightly shifted and rotated. In this way, each learned capsule is a "generative model". The difference between convolutional neural network and the "capsule method" is that the capsule method considers the spatial relationship of image features carrying spatial position along with the feature presence probability distribution.

This new way of representing the transformation of images has opened a new possibility for training invariant features and Prof. Hinton argues that this approach behaves closer to the way human brain functions and will be more promising one comparing to traditional computer vision methods.

For detailed explanation and demonstration, please see the full paper included in the proceeding of ICANN 2011.

Wednesday, June 15, 2011

Riitta Hari: Towards Two-Person Neuroscience

Prof. Riitta Hari kicked off ICANN'11 with her invited talk "Towards Two-Person Neuroscience". So far the research on human brain has mostly focused on the study of a single brain. Humans, however, are social creatures, whose thoughts and actions are reflected by the other members in the community. In virtually any human culture, isolation is used as a punishment, not only for children but also for adults.


We all know that the interaction with other people affects our mood and thoughts very strongly. While an individual is interacting with another person, the brains of the two persons become coupled as one's brain analyzes the behavior of the other and vice versa. This is why the neuroscience community is now looking towards a pair instead of an individual as a proper unit of analysis.



There have already been studies on humans under controlled interaction, such as a movie or a computer game. While watching a movie, brains of individual viewers have been shown to be activated in a very synchronous fashion. Game against a human opponent activates the brain differently from a game against computer, which is also reflected in the reported feelings of the players.

Mirroring is a phenomenon which has been possible to study with existing technology. We feel pain when we are shown a picture of a suffering person. Already Ludwig Wittgenstein noted that "The human body is the best picture of the human soul". How individual's feelings tune into other person's feelings, is a more complicated question. It is a combination of the following factors:

  • similar senses, motor systems and the brain that the individuals have

  • the experience that they collect throughout their lives, and

  • the beliefs they test by acting in the community.


Machine learning steps in for the analysis of the high-dimensional data produced by the functional measurement technologies. Dimensionality reduction methods such as independent component analysis (ICA) extract noise-free components that can potentially be biologically interpreted.

So far in most of the studies of human interaction, only the activity of one brain has been measured regardless of the presence of the other interacting person. Soon, however, accurate measurements of several subjects at a time will be possible, and that will most likely push for a leap in the development of computational data fusion techniques. Then, we will not only have a link between a stimulus and a brain image but between a stimulus and images of several subjects' brains.

When the focus of brain research moves towards the analysis of two or more interacting subjects, efficient multi-view methods will be needed. Thus, multi-view learning is currently a hot area of machine learning research.

Prof. Hari's message to the ICANN audience was that the analysis remains the bottleneck in brain research. As methodological researchers, we should next consider the opportunities opened by the new experiment settings and measurement technologies, and see how to learn more from the data.