Latest

SeminarPsychology

Error Consistency between Humans and Machines as a function of presentation duration

Thomas Klein
Eberhard Karls Universität Tübingen
Jul 1, 2024

Within the last decade, Deep Artificial Neural Networks (DNNs) have emerged as powerful computer vision systems that match or exceed human performance on many benchmark tasks such as image classification. But whether current DNNs are suitable computational models of the human visual system remains an open question: While DNNs have proven to be capable of predicting neural activations in primate visual cortex, psychophysical experiments have shown behavioral differences between DNNs and human subjects, as quantified by error consistency. Error consistency is typically measured by briefly presenting natural or corrupted images to human subjects and asking them to perform an n-way classification task under time pressure. But for how long should stimuli ideally be presented to guarantee a fair comparison with DNNs? Here we investigate the influence of presentation time on error consistency, to test the hypothesis that higher-level processing drives behavioral differences. We systematically vary presentation times of backward-masked stimuli from 8.3ms to 266ms and measure human performance and reaction times on natural, lowpass-filtered and noisy images. Our experiment constitutes a fine-grained analysis of human image classification under both image corruptions and time pressure, showing that even drastically time-constrained humans who are exposed to the stimuli for only two frames, i.e. 16.6ms, can still solve our 8-way classification task with success rates way above chance. We also find that human-to-human error consistency is already stable at 16.6ms.

SeminarPsychology

Commonly used face cognition tests yield low reliability and inconsistent performance: Implications for test design, analysis, and interpretation of individual differences data

Anna Bobak & Alex Jones
University of Stirling & Swansea University
Jan 20, 2022

Unfamiliar face processing (face cognition) ability varies considerably in the general population. However, the means of its assessment are not standardised, and selected laboratory tests vary between studies. It is also unclear whether 1) the most commonly employed tests are reliable, 2) participants show a degree of consistency in their performance, 3) and the face cognition tests broadly measure one underlying ability, akin to general intelligence. In this study, we asked participants to perform eight tests frequently employed in the individual differences literature. We examined the reliability of these tests, relationships between them, consistency in participants’ performance, and used data driven approaches to determine factors underpinning performance. Overall, our findings suggest that the reliability of these tests is poor to moderate, the correlations between them are weak, the consistency in participant performance across tasks is low and that performance can be broadly split into two factors: telling faces together, and telling faces apart. We recommend that future studies adjust analyses to account for stimuli (face images) and participants as random factors, routinely assess reliability, and that newly developed tests of face cognition are examined in the context of convergent validity with other commonly used measures of face cognition ability.

SeminarPsychology

Consistency of Face Identity Processing: Basic & Translational Research

Jeffrey Nador
University of Fribourg
Nov 18, 2021

Previous work looking at individual differences in face identity processing (FIP) has found that most commonly used lab-based performance assessments are unfortunately not sufficiently sensitive on their own for measuring performance in both the upper and lower tails of the general population simultaneously. So more recently, researchers have begun incorporating multiple testing procedures into their assessments. Still, though, the growing consensus seems to be that at the individual level, there is quite a bit of variability between test scores. The overall consequence of this is that extreme scores will still occur simply by chance in large enough samples. To mitigate this issue, our recent work has developed measures of intra-individual FIP consistency to refine selection of those with superior abilities (i.e. from the upper tail). For starters, we assessed consistency of face matching and recognition in neurotypical controls, and compared them to a sample of SRs. In terms of face matching, we demonstrated psychophysically that SRs show significantly greater consistency than controls in exploiting spatial frequency information than controls. Meanwhile, we showed that SRs’ recognition of faces is highly related to memorability for identities, yet effectively unrelated among controls. So overall, at the high end of the FIP spectrum, consistency can be a useful tool for revealing both qualitative and quantitative individual differences. Finally, in conjunction with collaborators from the Rheinland-Pfalz Police, we developed a pair of bespoke work samples to get bias-free measures of intraindividual consistency in current law enforcement personnel. Officers with higher composite scores on a set of 3 challenging FIP tests tended to show higher consistency, and vice versa. Overall, this suggests that not only is consistency a reasonably good marker of superior FIP abilities, but could present important practical benefits for personnel selection in many other domains of expertise.

SeminarPsychology

Accuracy versus consistency: Investigating face and voice matching abilities

Robin Kramer
University of Lincoln
Mar 18, 2021

Deciding whether two different face photographs or voice samples are from the same person represent fundamental challenges within applied settings. To date, most research has focussed on average performance in these tests, failing to consider individual differences and within-person consistency in responses. In the current studies, participants completed the same face or voice matching test on two separate occasions, allowing comparison of overall accuracy across the two timepoints as well as consistency in trial-level responses. In both experiments, participants were highly consistent in their performances. In addition, we demonstrated a large association between consistency and accuracy, with the most accurate participants also tending to be the most consistent. This is an important result for applied settings in which organisational groups of super-matchers are deployed in real-world contexts. Being able to reliably identify these high performers based upon only a single test informs regarding recruitment for law enforcement agencies worldwide.

consistency coverage

4 items

Seminar4
Domain spotlight

Explore how consistency research is advancing inside Psychology.

Visit domain