Which personality traits are real? Stress-testing the lexical hypothesis

Jun 21, 2023

This post is also available on LessWrong. Thank you to Justis Mills for proofreading and giving feedback.

Most scientific personality models are, directly or indirectly1, based on the lexical hypothesis, which roughly speaking states that there is a correspondence between important personality traits and abstract behavior-descriptive adjectives. For example, the Big Five was created by having people rate themselves using words like “outgoing”, “hard-working” and “kind”, and finding patterns in these. It is neat that one can create models in this way, but the large amount of abstraction involved by using abstract adjectives raises huge questions about how “real” the personality traits are.

I have created a new personality test, currently named Targeted Personality Test. I have multiple goals with this test, but one of them is to investigate which personality traits are “real”2 without relying on the lexical hypothesis. I do this mainly by assessing lots of specific narrow behaviors, rather than abstract vague adjectives.3

By the end of this blog post, I hope to have introduced some concepts that makes my approach make sense, and thereby enable you to understand this diagram I made summarizing my results:

The semi-formal understanding of what is going on in this chart is very long, so before we proceed, let me give a brief, vague indication of what you will be informed about:

Trait impact: a measure of how strongly the personality trait influences the various behaviors and thoughts that we would expect it to.
Factor model loss: a measure of how much the personality trait conflates different unrelated things together.
Correlation with lexical notion: a measure of how well-labelled the personality trait is. (You can mostly ignore this variable as all of the personality traits performed reasonably well on it.)

Easy-Goingness: An example

A conventional personality test such as the SPI-81-27&5 might measure your personality traits such as Easy-Goingness by asking you how well a few abstract statements describe you, e.g.:

I like to take it easy
I like a leisurely lifestyle
I have a slow pace to my life

It seems plausible that someone who agrees that such statements describe them would be Easy-Going in some sense, and indeed I bet this sort of measure can pass all sorts of criteria used by psychologists to evaluate the quality of the test.4

So if it is probably valid by the standard criteria, what could go wrong that the standard criteria don’t test for?

Well, let’s imagine the sort of person who is Easy-Going. They probably tend to relax in their free time, e.g. watching TV, and they probably don’t get worked up about controversial stuff, and they probably don’t go above and beyond at work. Basically, a relaxed person who doesn’t get too stressed or excited about things.

When we call the person above “Easy-Going”, is this just a convenient label we use for someone who happens to have a constellation of traits like the above? Or are we saying that there is some underlying factor, like a motive to take it easy, which causes them to have these sorts of characteristics? Or maybe there are some underlying factors, but they are heterogeneous and “Easy-Goingness” lumps them together? These are the sorts of questions I tried to investigate.

My first step was to come up with a more concrete characterization of Easy-Goingness than abstract statements like “I like to take it easy” or “I have a slow pace to my life”. I did this by giving the SPI-81-27&5 test to a bunch of people, and then asking the people who score high and low in Easy-Goingness to describe an example of how they could be said to be Easy-Going. To give you a taste for the answers, one person who scored high in Easy-Goingness said:

When I finish work for the day I often go straight home and jump into my pyjamas. I like to relax and watch some tv and films to unwind after a long day - usually with a glass of wine. Certain days when I come home my partner would like to travel for a couple hours to go dog walking and enjoying time outside. No matter what kind of day I have at work I am always keen to do anything my partner/family/friends would like to do as is in my nature.

Meanwhile, a person who scored low in Easy-Goingness said:

A simple example is that when I arrive at work, my boss often asks at once if I want a coffee, as he often wants one at the beginning of the day. I prefer to do some work before having a coffee, as to me it signifies a moment of relaxation and to the puritan work ethic part of me, it doesn't make sense to have a break until I have "earned" it.

Based on 10 descriptions like this, I constructed some statements that would be reflective of Easy-Going people (+), or non-Easy-Going people (-):

(+) In the evening I tend to relax and watch some videos/TV
(+) I don’t feel the need to arrange any elaborate events to go to in my free time
(+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
(+) I think you’ve got to have low expectations of others, as otherwise they will let you down
(-) I get angry about politics
(-) I have a stressful job
(-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
(-) I spent a lot of effort on parenting

I included these statements in the Targeted Personality Test, as well as similar statements I designed for the 26 other personality traits from the test that I based this study on, and some additional statements that were useful for research purposes.

These statements are quite unlike the typical statements used in personality tests, because they are intentionally aiming to be much more narrow and concrete. This probably makes them less “efficient” in the sense that respondents will have to answer a lot more questions before we can get a detailed view of what they are like.

However, by being so concrete and narrow, it also allows us to more strongly test how real the traits are, e.g. whether it is just a coincidence that they sometimes co-occur to lead to “Easy-Going” people, and whether the statements conflate multiple unrelated traits together.

Trait Impact as a measure of realness

I have multiple measures of realness, but I think the most important measure of whether a personality trait is real is whether the associated behaviors do in fact correlate with each other, rather than them just sometimes coincidentally occurring together.

The simplest way to visualize whether this applies is with a correlation matrix, which is a diagram that shows how strongly a set of variables correlate with each other:

Here, each row and column represents a personality statement that I asked people to rate themselves on, and each cell represents the Pearson Correlation between the row variable and the column variable. While the exact correlations varied, overall the different concrete behaviors associated with Easy-Goingness had a correlation of about 0.06 with each other. If you are not familiar with Pearson Correlations, then here is a visualization of how weak 0.06 is:

*Hypothetical simulated distribution for r=0.06*

This suggests to me that Easy-Goingness is not very “real”. While it might make sense to describe a person as doing something Easy-Going, for instance when they are watching TV, it is kind of arbitrary to talk about people as being more or less Easy-Going, because it depends a lot on context/what you mean.

If we take the square root of the extent to which two behaviors associated with the trait correlate with each other, we get the extent to which the behaviors correlate with the overall level of the trait. Doing this for Easy-Goingness, we get an effect of around 0.25. This is a somewhat stronger connection, but still quite weak:

*Hypothetical simulated distribution for r=0.25*

The fact that this is weak means that even the most Easy-Going people cannot necessarily be expected to be particularly Easy-Going in all contexts. It is much more subtle than that.

The “Trait impact” axis in my original diagram in the start of the post shows this correlation for all of the different traits.

I picked Easy-Goingness as an example because it had the lowest “Trait impact”. It may also be informative to look at an example with a high “Trait impact”. The highest “Trait impact” was Art Appreciation, but it feels too narrow, so I am going to skip over it5 and consider Conservatism as an example of a trait with a high “Trait impact”.

The correlation matrix for Conservatism looks like the following:

As you can see and probably expected, there are strong correlations between different Conservative/Progressive responses. Visualized as a scatterplot, it might look like this:

*Hypothetical simulated distribution for r=0.56*

Of course this is still far from deterministic, but now it looks like we’ve got something fairly strong. Ideology seems more “real” than Easy-Goingness, in the sense measured by “Trait impact”.

Factor model loss as a measure of conflation

One of my other measures of personality trait realness was “Factor model loss”. What does that mean? Let’s take one of the traits that scored the worst in “Factor model loss”: Creativity.

If we look carefully, we can see that there are two distinct groups of items:

Creative problem-solving: finding root causes for problems at work, copying old methods instead of coming up with new ones at work, being good at ideas during brainstorming
Artistic creativity: creating decorations, quizzes/games/adventures/trips, being imaginative

Two of the items, involving creating visualizations and coming up with fictional stories, correlated with Creative problem-solving and Artistic creativity. Meanwhile math vs humanities didn’t really correlate with either.

Thus, it seems that the term “Creativity” is problematic as a personality trait, because it conflates Creative problem-solving with Artistic creativity, treating them as being the same thing when really they are basically unrelated.

To quantify the extent of the problem, I approximated what the correlation matrix would have to look like if there was no absolutely no conflation problem and there was only a single trait of Creativity which covered both Creative problem-solving and Artistic creativity. I got this result:

“Factor model loss” refers to the size of the difference between these two correlation matrices: the observed correlations versus the hypothetical correlations if there only was a single trait.

Correlation with lexical notion: naming things

The final notion of “realness” in my diagram was “Correlation with lexical notion”. What does that mean? Well, remember how I keep separating things into “concrete” and “abstract” descriptors?

I think of the “abstract” descriptors as being a measure of the informal common-sense version of the trait. You are probably easy-going if you think you are easy-going, conservative if you think you are conservative, and creative if you think you are creative. It may be a matter of definition to strictly know whether you fit, but it certainly seems like a good starting point.

But the fact that we can measure the common-sense notion of the trait separately from the behaviors associated with the trait raises the question: Do these measure the same thing? For instance, maybe there was a flaw in the way we collected examples of behaviors, so that they don’t correspond to what the trait is actually like.

I quantified this with “Correlation with lexical notion”. It is based on6 the correlations between the abstract and the concrete questions.

However, it turns out that there is not much more to say about this, because all of the traits did great with respect to this; the “Correlation with lexical notion” was consistently close to 1, showing that the concrete and the abstract descriptors were getting at the same thing. (And when I inspected the ones who did the worst, it often seemed to be because of a technical form of noise that I am not going to get into.)

Summary

I have three different measures of the realness of a personality trait:

Trait impact: how strongly the personality trait influences the various behaviors and thoughts that we would expect it to.
Factor model loss: how much the personality trait conflates different unrelated things together.
Correlation with lexical notion: how well-labelled the personality trait is.

Since the correlation with the lexical notion is consistently high, it appears that the personality traits have been assigned reasonably descriptive labels; however, some of the labels conflate multiple personality traits, such as:

Creativity (appears to conflate Creative problem-solving and Artistic creativity)
Charisma (appears to conflate Interpersonal Sensitivity and Social Ease)
Emotional Stability (appears to conflate Problem-Handling Confidence and (opposite of) Catastrophizing)
Conformity (appears to conflate Government Conformity, Aesthetic Conformity, and Religiosity)
Authoritarianism (appears to conflate Political Authoritarianism and Law Adherence)
Attention-Seeking (appears to conflate Benign Narcissism and (opposite of) Shyness)

To see more about what is getting conflated, skip to the appendix, where I show the correlation matrices for each of the traits.

But most importantly, a lot of traits are not that impactful. Examples of non-impactful traits include Easy-Goingness, Conformity, Irritability, Perfectionism, Sensation-Seeking, Trust, Compassion and Impulsivity. While people to some extent exhibited general, context-independent differences from each other in these traits, the differences were small relative to the context-dependent differences. So rather than seeing people as e.g. trusting or non-trusting in general, it may be much more productive to ask who they trust and who they don’t trust.

One thing I should warn about is, I think the trait impact for Orderliness could be overestimated, because I think in practice participants interpreted half of the questions as being about how tidy they kept their own home, which might be much narrower than general Orderliness.

Bonus: Going beyond the lexical hypothesis

Many of the 27 traits in the original test turned out to be problematic for my purposes.7 You might think this shows my test to be irreparably flawed, but actually I had sort of hoped this would happen.

It is possible to use a statistical technique called factor analysis to identify patterns of correlations in empirical data. This is the technique that was used to create the original SPI test that I based my test on, and it is also the technique that has been used for many other psychometric tests.

Using factor analysis, I reshuffled the items from my test into 7 alternate factors, and made a version of the test that is less than a third of the length of the original one. In the future, I will likely write an in-depth description of how factor analysis works and which factors I have found.

This is almost certainly not the final form of the test. I have many plans for additional investigations I can perform as I get more data.

The Big Five personality factors were originally derived by asking people to rate themselves on a large number of personality adjectives, and using statistics to find the biggest clusters of related descriptors. Other tests have been developed through other methods, many of which don’t primarily focus on abstract adjectives, though for reasons I won’t get into right now, I think they have a lot of dependence on the lexical hypothesis.

Of course, this is a subtle, complex question which depends on what exactly one means by “real”. I define the notion of “realness” I focus on later in the post, but other notions may be relevant for other purposes.

Because it is inherently difficult to measure behavior, I had to still rely on self-report surveys.

These are internal reliability, i.e. the sort of person who says “I like a leisurely lifestyle” is also more likely to say “I have a slow pace to my life”; test-retest reliability, i.e. the sort of person who says “I like a leisurely lifestyle” today will also tend to do so tomorrow, in a month, in a year, or in a decade; inter-rater validity, i.e. if a person says “I like a leisurely lifestyle” then their friends and family will also tend to say “They like a leisurely lifestyle”; criterion validity, i.e. the sort of person who says “I like a leisurely lifestyle” scores higher on some objective criterion of leisurely lifestyle such as amount of vacation days; and maybe also heritability, i.e. if one twin in a pair says “I like a leisurely lifestyle” then the other twin likely also says so too.

The narrower of a trait you are considering, the stronger the associated correlations would be. To see this, consider the absurd example where you are only considering a specific behavior, say watching TV. Any trait has a correlation of 1 with itself, so watching TV would have a Trait Impact of 1. It is only by abstracting over multiple different behaviors that Trait Impact can be nontrivial.

Since the different questions don’t correlate perfectly internally, e.g. “I enjoy cooking food for other people” and “I like to dance with people at parties” only correlate at 0.24, we can’t exactly expect abstract “sociability” to correlate perfectly with either. So I adjust for the reduction in correlation that would be expected from imperfect internal correlations.

Not necessarily for all purposes. Just because a trait is weak by my measures does not mean it cannot be relevant by other measures. Talk with personality researchers and read their papers if you want to find out what criteria they care about.

Emil O. W. Kirkegaard

Did you consider asking people a lot of frequency type questions, a kind of semi-objective measure of personality? E.g. how many times the last 2 weeks did you: cry, drink alcohol with drinks, play a board game with drinks, play an electronic game on computer/console/phone, browser Twitter, eat candy, watch TV in bed/sofa.

Expand full comment

1 reply by tailcalled

Apple Pie

Sep 8, 2023

Thank you very much for posting this where we could read it! It's great to see this kind of thing on substack, and seeing this post convinced me to put up some of my own research, which I'd previously decided against - you can see it up here.

My personal take on your own work here (which you probably won't agree with) is that you've done a good job showing that the usual lexical pathway psychology has been using is actually on the correct track. When you say things like "Creativity (appears to conflate Creative problem-solving and Artistic creativity)," this is a feature, not a bug, of dimensional models. No one says that things like A) creative problem solving, and B) artistic creativity are literally the same; there's a large body of research on the difference between scientific and artistic success out there.* Yet there's obviously going to be some similarity between these two things that allows them to be positioned nearby in a space of personality traits.

More useful to you would probably be this bit of advice: The impact of personality on behavior is strongest when options are open.

Twelve different people wandering in the desert for three days are all going to be fighting over a bottle of water they come upon lying in the crevace between a rock. "If you were thirsty in the desert, would you want water?" is more a question designed to flush Lizardmen than an attempt to measure personality.

In other words, when you design concrete questions, a good strategy would be to focus on choices over duties. Having a stressful job, spending a lot of effort on parenting, and many other concrete details often depend on other situations or people around you much less than the effects of innate personality. I know that just over the past six months, my answers to those two questions would have changed dramatically.

* For example, I liked: Kaufman, S. B., Quilty, L. C., Grazioplene, R. G., Hirsh, J. B., Gray, J. R., Peterson, J. B., & DeYoung, C. G. (2016). Openness to experience and intellect differentially predict creative achievement in the arts and sciences. Journal of personality, 84(2), 248-258.

11 more comments...

tailcalled