To estimate the relevance of an image, a probabilistic generative model was utilized. The relevance of an image is thought to be reflected by a latent variable, whose value is estimated with linear regression from the observed content feature vector (coded by hand for the test set) with a gaussian error term. Then the the number of clicks is predicted from a multinomial distribution weights derived from the latent variables. The mouse movement features are predicted from a gaussian distribution with mean given by a certain linear mapping from the latent variables.
In the experiment, the subjects were asked to view a 17 minute videoclip and later recall certain events from it with the help of some of the frames shown on a timeline. The authors compared three different user interfaces: the proactive one described above, one that also shows all the images at once and zooms with a fisheye effect on hover and an ordinary scrollable one with constant image size. The results showed that the mean average precision (the better the precision the less the effort) of the proactive interface was better than the fisheye interface in all of the six tasks considered and lost to the traditional interface only in one task. This one loss seemed to be caused by the fact that the traditional scrolling interface initially showed the first image and the relevant images in one of the tasks happened to be in the very beginning. The hypothesis seemed to be correct, but the absolute measured precision still was not very high.
What I thought would be interesting to see in this application was using eye movement detection. The author confirmed that this was indeed the future plan. It would have been enlightening to have a demo version of the application available at the poster session but I guess that would have violated some (un)written rules of poster sessions.