Wednesday, June 22, 2011

Groot et al.: Learning from Multiple Annotators with Gaussian Processes

Among the ICANN'11 posters, Perry Groot et al. proposed an approach for learning a consensus from multiple unreliable annotators. In their paper “Learning from Multiple Annotators with Gaussian Processes” they present a way of using noisy target value information from several sources to regress observed data onto these target values.

The problem can be considered as a generalization of the multiple annotators' classification problem into the case of continuous target variable values. An example of the multiple annotators' problem is an Internet service called Amazon's Mechanical Turk. It is a service for conducting simple tasks on users who receive a small pay for the trouble. The accuracy of users, however, varies on how much effort they put into the task and on how experienced they are. The idea of the proposed model is that this variation between users (i.e. annotators) can be learned from the data in order to improve performance in the overall task of fusing the results from multiple users.

The authors solve the multiple annotators' problem with a slightly modified Gaussian process (GP) model. Gaussian process is a widely-recognized nonparametric Bayesian model for the covariance of a collection of random variables. The standard GP model learns a latent variable representing the variance of observed target variables. The trick in this paper is to make this latent variable source-specific. Knowing the source (user/annotator) of each annotation, the model can learn a source-specific latent variable representing the uncertainty in the annotations from each source.

In the study, the authors compare the proposed multi-annotator model to the standard GP model learned with pooled data, individual annotator's data, and weighted individual annotator's data. The proposed model outperformed the comparison methods in the UCI 'housing' data set, for which the authors manually created an annotation. In principle, the GP framework also applies to a classification problem, where the target variables are binary, but exact solution for that remains intractable.

The multiple annotators' problem undoubtedly remains a central question in machine learning. Services such as Amazon's Mechanical Turk have shown that there is and will be need for computational methods for the problem of fusing data from several uncertain sources. This work is a nice simple tweak to a widely-recognized probabilistic model. Regardless of the straightforward nature of the solution, it provides promising results for the problem. Gaussian processes have been a field of intensive research during the recent ten years. Also in ICANN'11 there was at least one GP poster in addition to this one.

No comments:

Post a Comment