CAN I TRUST THIS PERSON? EVALUATIONS OF TRUSTWORTHINESS FROM FACES AND RELEVANT INDIVIDUAL VARIABLES

Provisionally accepted The final version of the article will be published here soon pending final quality checks Notify me

1 William James Center for Research, University of Aveiro, Portugal

William James Center for Research, University of Aveiro, Portugal 2 Social Services of University of Aveiro, Portugal

Facial appearance plays a decisive role during impression formation (e.g., Hassin & Trope, 2000). Drawing trait inferences from faces occurs extremely fast (i.e., within 100 milliseconds; e.g., Willis & Todorov, 2006) and appears early in childhood. This seems to be particularly true for trustworthiness judgments which develop earlier than other judgments and with high levels of consistency with judgments made by adults (Cogsdill et al., 2014). High agreement is commonly found in facial trustworthiness ratings (Todorov, 2008). Because this dimension is highly correlated with other judgments, Todorov (2008) proposed that facial trustworthiness “may reflect the general evaluation of the face” (p. 210). Furthermore, trustworthiness judgments tend to guide peoples’ decisions, even when other more diagnostic cues are available, suggesting that these judgments might be somewhat imprecise (e.g., Jaeger et al., 2020; however, see Chang et al., 2010). To underscore the relevance of these judgments per se, as well as their involvement in more ecologically relevant tasks, we briefly mention studies based on targeted judgments of facial trustworthiness and studies relying on actual behaviors.

In the criminal context, for example, Flowe (2012) reported that faces perceived as untrustworthy were rated as appearing more criminal, as compared to faces considered trustworthy. In a related work, Porter et al. (2010) revealed that, when dealing with weak or ambiguous evidence, untrustworthy-looking defendants were more likely to receive a guilty verdict. Additionally, the jurors’ confidence levels in such decisions were higher for individuals rated as less trustworthy. Accordingly, untrustworthy-looking defendants received harsher criminal sentences (Wilson & Rule, 2015), and experienced less leniency (Jaeger et al., 2020).

Financial behavior is also impacted by trustworthiness inferences. Individuals invest more money in trust games when their opponents look trustworthy (vs. untrustworthy) (Chang et al., 2010; Kroneisen et al., 2021) and cooperate less with untrustworthy-looking opponents (Kroneisen et al., 2021). Duarte and collaborators (2012) also showed that more trustworthy-looking individuals had higher loan approval rates and better credit scores.

Social interactions are also informed by trustworthiness perceptions. For example, individuals with trustworthy faces are more likely to be accepted in groups than those with untrustworthy faces (Tracy et al., 2020). In some cultures, inferences related to trustworthiness are also valued in future leaders and predictive of electoral success (Rule et al., 2010).

Several individual characteristics seem to influence one’s behaviors and evaluations of trustworthiness. For example, an individual’s trust attitudes relate with his/her evaluations of others’ trustworthiness, as well as with his/her trust behaviors toward them. Whitener et al. (1998) reported that individuals with a high propensity to trust in others expect more trustworthy (or cooperative) behaviors from them in real interactions (e.g., Gill et al., 2005). Given that participants’ trust attitudes have been measured mostly through self-report questions (e.g., Ben-Ner & Halldorsson, 2010; Glaeser et al., 2000), it would be interesting to explore if higher trust attitudes relate with higher facial evaluations of trustworthiness.

Even though the evidence is scarce, some studies suggest that the sex of the rater may play a role in trustworthy judgments. For instance, some studies have reported that women rate trustworthy-looking faces as significantly more trustworthy than men; furthermore, women assigned higher trustworthiness to female than to male faces (Mattarozzi et al., 2015). Still, behavioral studies have returned mixed evidence. Whereas some found no sex differences regarding trust in others (see Croson & Buchan, 1999; Dreber & Johannesson, 2008), others have reported that females are more reluctant to trust others (e.g., Ben-Ner & Halldorsson, 2010; Glaeser et al., 2000). Still, in both types of studies, females tended to be evaluated as more trustworthy than males (e.g., Croson & Buchan, 1999; Mattarozzi et al., 2015). Therefore, one would expect female (vs. male) faces to be perceived as more trustworthy by both sex raters, but predictions regarding the influence of the rater’s sex on the evaluations are not clear-cut; the data reported here contributes to the ongoing debate.

Trust behaviors towards others seem to increase with age (e.g., Alesina & La Ferrara, 2002). Older adults tend to trust more frequently in people with a reputation of being untrustworthy than young adults (e.g., Bailey et al., 2015; although see Sutter & Kocher, 2007), which makes them more vulnerable to subsequent exploitation and fraud (e.g., Castle et al., 2012). Accordingly, studies based on facial trustworthiness evaluations report that even though there is a high agreement on the ratings provided by older and young adults, the former tend to perceive faces as more trustworthy (Cassidy et al., 2019; Castle et al., 2012). These studies used a relatively small number of faces and collected a relatively small number of ratings per age group. We provide ratings for a much larger set of stimuli and from a larger and more heterogeneous sample of participants; our data should contribute to the investigation of this topic.

Several studies have also reported a positive relation between trusting behaviors and level of education: the higher the level, the higher the trust in others (Holmberg & Rothstein, 2017). Relatedly, Knack and Zak (2003) noted that a society that raises the level of education concomitantly promotes trust in others. Whether the same relation would be obtained in facial trustworthiness evaluations is an open question and our data should speak to that.

Regarding the influence of marital status on the perception of facial trustworthiness, data are scarce. At the behavior level, Alesina and La Ferrara (2002) reported that although marital status per se had no effect on trust, when the divorce or separation processes were somehow traumatic, trust in others declined substantively. Glaeser et al. (2000), in contrast, reported that married individuals (vs. other types of relationship status) were more trusting. We present individual information on marital status in order to broaden this still limited literature.

As briefly reviewed, facial trustworthiness impacts various domains and much remains to be known about the role of several individual variables. Here, we provide facial trustworthiness evaluations from the Portuguese population for a large set of faces (N = 231) gathered from different international databases. Data were collected online and in the laboratory, thus contributing to the ongoing discussion regarding the reliability of online collected data (e.g., Walter et al., 2019). Finally, we present data on individual variables that might influence the perception of facial trustworthiness and/or trust behaviors: sex, age, years of education, marital status, self- and others-perceived trustworthiness, and the participants’ trust attitudes. To our best knowledge, only one study conducted with the Portuguese population reports trustworthiness evaluations for faces; yet, it was limited to a specific database (not included in our work) and it did not cover the variables explored here (cf. Ramos et al., 2016).

A questionnaire was completed online by 822 participants. The following elimination criteria were implemented: non-Portuguese participants (as we sought to collect normative data for the Portuguese population; n = 68); randomization errors in the questionnaire (n = 16); and, underaged participants (n = 1). The final sample consisted of 737 participants, aged between 18 and 78 years (M = 35.0, SD = 13.0). The questionnaire was not fully completed by 314 other participants.

The same online questionnaire was completed in a laboratory setting by 119 valid participants with a mean age of 21 years (SD = 4.1; range: 18-53). Eleven other participants were excluded for being non-Portuguese (n = 8) or due to questionnaire randomization errors (n = 3).

Table 1 reports a complete characterization of the two samples regarding sex, age group, marital status, and years of education.

We used 231 frontal-view, colored young adult facial photographs (126 males and 105 females), displaying a neutral facial expression and direct eye gaze. Two of the authors selected existing face databases that complied with the following: (a) they contained faces similar to those of the Portuguese population; (b) photographs were taken under controlled and similar conditions (e.g., illumination setting and uniform background); (c) individuals used a standard t-shirt and removed jewelry, glasses, and makeup; and, (d) faces were of young adults. Faces from five databases were selected based on these criteria: (1) Karolinska Directed Emotional Faces (KDEF; Goeleven et al., 2008); (2) Warsaw Set of Emotional Facial Expression Pictures (WSEFEP; Olszanowski et al., 2014); (3) Radboud Facial Database (RaFD; Langner et al., 2010); (4) FACES Database (Ebner et al., 2010); and, (5) Amsterdam Dynamic Facial Expression Set (ADFES; van der Schalk et al., 2011). Proper permission was secured to use these databases and, when necessary, to edit the pictures. Edition of the pictures was as described in Pandeirada et al. (2020) and aimed to obtain a more homogeneous set of stimuli1. The final distribution of stimuli by database and the specific faces used in the study are presented in the “Read me first” worksheets of the files available on OSF (https://osf.io/ckmd4/?view_only=83c5fe29850847868cad3a09de8ace2d). Each participant evaluated only 50 faces that were pseudo-randomly selected from the total set of stimuli with the following constraints: (1) the same number of female (n = 25) and male faces (n = 25) was presented in each questionnaire, and; (2) the number of faces selected from a given database was proportionally similar for all databases. We decided to ask each participant to rate only a subset of the total set of faces to shorten the task and thus increase the likelihood of task completion.

A questionnaire was administered using the software LimeSurvey between January and June of 2014. The questionnaire, as well as the data, were housed in a local server at the University of Aveiro. For the online data collection, a brief description of the questionnaire and its electronic link was disseminated via electronic mail. As we were aiming to collect a large number of ratings per face, the link was sent to several organizations of various types across the country (e.g., universities, professional schools, industrial organizations, technological organizations) with a request to share the link with their collaborators. For the laboratory data collection, participants were recruited from three Portuguese public Universities. All participants were required to be at least 18 years old and of Portuguese nationality; no other exclusion criteria were presented. Data collection occurred between June of 2014 and May of 2015.

The questionnaire started with a brief description of the study and an informed consent request. If no consent was given, participants were thanked and the program ended; otherwise, the program moved on to collect information on sex, age, nationality, marital status, and years of education.

The trustworthiness-rating instructions then informed participants they would be shown faces sequentially and should: “observe each face and indicate how trustworthy each face is to you, that is, to what extent you could trust this person”. The 7-point rating scale was also described, indicating that 1 corresponded to “not trustworthy at all”, and 7 to “very trustworthy”. The task was self-paced but participants were instructed to respond quickly and to rely on their “gut feeling”. They were also told that their evaluations represented their personal view and that there were no correct or incorrect responses. Faces were then displayed, one at a time, at the center of a white background screen with the response scale bellow; this consisted of a series of radio buttons along with the labels for the values 1 and 7. Responses were mandatory and given through the selection of their chosen number. Each picture was preceded by a 1000-ms fixation cross and followed by a 500-ms blank screen. A different presentation order was used for each participant.

After rating all the faces, participants were asked to rate their own trustworthiness (i.e., how trustworthy they considered themselves to be) and how trustworthy they thought other people would rate them. The rating procedure described above was followed here except that a shadow face represented the participant’s face. Then, to measure trust attitudes as in previous studies (e.g., Ben-Ner & Halldorsson, 2010), participants answered the following questions: (1) “Generally speaking, would you say that most people can be trusted, or that you cannot be too careful when dealing with people?”; (2) “Which of the following statements best reflects your view? I will not trust a person until there is clear evidence that he or she can be trusted, or I will trust a person until I have clear evidence that he or she can’t be trusted“; (3) “On a scale from 1 to 6 where 1 is “Relatively cautious” and 6 is “Relatively trusting”, how would you describe your interactions with other people?”. The questionnaire ended here for male participants. Additional information, of no relevance for the current study2, was collected from female participants. A final appreciation message was presented at the end. All procedures complied with the applicable aspects of the 1964 Helsinki declaration and its later amendments.

Due to the pseudo-random selection of the 50 faces presented in each questionnaire, a variable number of participants contributed to the rating of each face. On average, each face was rated by 159.52 participants from the online sample (SD = 20.55; range 117-215), and by 25.76 participants from the laboratory sample (SD = 5.35; range 13-38).

The data, collected both online and in the laboratory, are made available through OSF (https://osf.io/ckmd4/?view_only=83c5fe29850847868cad3a09de8ace2d). The responses given by each participant and the data organized by item are in files “Trustworthiness_Subject Data.xlsx” and “Trustworthiness_Item Data.xlsx”, respectively. In the latter, we present the number of ratings that contributed to the mean rating (and corresponding SD) of each face. Item-based data are also presented separately for the online and laboratory samples, and according to different variable conditions as described below. We do not present data when the mean number of observations for the face was below 5 as these would be rather unrepresentative; these exceptions are noted in the corresponding spreadsheets. The item data file includes the following tabs:

1) Read me first: Describes what can be found in each of the remaining tabs. The number of faces from each database that was presented in each questionnaire is shown at the bottom of this tab;

2) Overall Data: Presents the overall trustworthiness ratings (means and SD) for each face;

3) Sex: Presents information for each face according to the sex of the participant;

4) Age Group: Data are provided separately for three age groups according to the age of the rater (as in McLellan & McKelvie, 1993): young-adult raters, middle-aged raters and old raters;

5) Marital Status: Ratings are provided according to the participants’ relationship situation; data were broken down depending on whether the participant was single, married, or divorced;

6) Years of Education: Ratings are presented broken down by the number of years of education participants have (≤ 12 or > 12);

7) Self and Others Evaluation: The participants’ responses on self and other evaluations were recoded to create three groups: low (ratings of 1–2), average (ratings of 3–5), and high trustworthiness (ratings of 6–7).

To give the reader a better sense of the obtained data and their potential use, we report in Table 2 the mean number of ratings and mean trustworthiness values per face and broken down by different variables.

Trustworthiness judgments are one of the cornerstones of trait inferences and play an important adaptive function (Todorov et al., 2008). It is one of the most important aspects people consider when evaluating faces, although some variation seems to exist across different countries (Jones et al., 2021). We report trustworthiness evaluations for a large set of face images. We also provide information on several variables of potential interest to trustworthiness inferences hoping to inspire readers to explore the potential of this dataset, and thus help complement past research and shed light into future research avenues.

Regarding data collection, we sought to contribute to the debate concerning the reliability of data collected online and in the laboratory. To preview, a brief inspection of our data revealed an excellent agreement between these two datasets from young adults [Intra Class Correlation (ICC) = .92], similarly to what was reported in other studies (e.g., Maeder et al., 2018). We should note that we only used young-adult faces and our sample procedures did not guarantee Portuguese representative samples; these are limitations of our study. For the online data collection, the questionnaire was disseminated through organizations from across the country, whereas the laboratory data were gathered from three restricted Universities. Still, the data from these two samples are in high agreement and we provide a quite large number of data points per face from the online sample.

Some literature reports agreement on trustworthiness evaluations across observers and cultures (e.g., Sutherland et al., 2018; Sutherland, Rhodes, et al., 2020). An exploratory analysis revealed that our ratings agree strongly with those obtained in the Netherlands by Jaeger (2020) for the RaFD faces (ICC =.95 and .92, with our young-adults online and laboratory data, respectively). Good agreement was also achieved between our young-adult laboratory ratings and those collected in Spain by Aguado et al. (2011) for the KDEF faces (ICC = .78), in spite of the fact that faces were presented in grayscale in that study. Considering these exploratory results, we are fairly confident that our data could reliably be used by researchers from other countries. Nonetheless, more comparisons could be made with other studies that provide data for some of the stimuli here used, while considering methodological differences that likely affect the results. For example, trustworthiness ratings of the KDEF faces were provided by Gutiérrez-García et al. (2019), Sutton et al. (2019), and Sutherland et al. (2017); however, they all used faces displaying different emotions and, in the latter, also in different viewpoints.

Inferences from faces rarely occur for a single dimension and tend to influence our behavior in a concerted way purportedly to activate the most adaptive responses (Todorov et al., 2008). In a previous study, we have made available ratings of attractiveness for 96.5% of the faces included in the present report. These combined sets of data will allow researchers to select stimuli and/or analyze the data while considering these two important dimensions. In all, the current data add to the build-up of an integrated dataset that should be of great use for researchers from various areas.