PDA

View Full Version : The Precision of Personality Tests


polenka
09-25-2007, 11:02 PM
I do have one problem with this test spitting out numbers at the end to give an indication of preference strength. It only looks at what number of questions you answered related to each preference. It doesn't really tell you how strong that preference is for each question. So it's good for indicating that you may exhibit a lot of those traits, but it still doesn't necessarily mean you exhibit that preference strongly.

So, someone who gets an 80 T score, for example, could still be much stronger or weaker in that preference than someone else with the same score. Not that it isn't good for giving someone an idea. It can just be misleading when people use them as absolutes for their preference strengths.


The assumption behind the test is that if a trait is dominant, it will express itself in a variety of areas and a variety of situations. Thus, each individual question is not a separate indicator of a separate trait, but certain questions will load onto one underlying trait, and in the creation of the measure, an alpha, or internal consistency level was probably computed at some point and would have had to be acceptable in order for the test to become a well-accepted measure--the internal consistency level is basically suggestive of how well the indicators on the scale go with all the other indicators on the same scale. Any indicators not loading onto the underlying trait acceptably (and thus reducing the internal consistency of the scale) would have been removed.

Not to say, of course, that these scales are perfect....and given that they are forced-choice questions with only two possible choices, one is restricted in the range of responses, and in a strict technical sense, such dichotimous indicators should not be added or subtracted for the very reasons you cite. But statistically if you have enough questions with an acceptable internal consistency, the scale will behave as though it's measured continuously (and in this case, there are enough)--since statisticians haven't found a better method of measurement yet, this is the best we get.




Edit: split from What's your Temperament Numbers? Thread. -Jezebel (To view links or images in this forum your post count must be 2 or greater. You currently have 0 posts.)title changed from "accuracy" to "precision"

Jezebel
09-26-2007, 02:22 AM
The assumption behind the test is that if a trait is dominant, it will express itself in a variety of areas and a variety of situations. Thus, each individual question is not a separate indicator of a separate trait, but certain questions will load onto one underlying trait, and in the creation of the measure, an alpha, or internal consistency level was probably computed at some point and would have had to be acceptable in order for the test to become a well-accepted measure--the internal consistency level is basically suggestive of how well the indicators on the scale go with all the other indicators on the same scale. Any indicators not loading onto the underlying trait acceptably (and thus reducing the internal consistency of the scale) would have been removed.

Not to say, of course, that these scales are perfect....and given that they are forced-choice questions with only two possible choices, one is restricted in the range of responses, and in a strict technical sense, such dichotimous indicators should not be added or subtracted for the very reasons you cite. But statistically if you have enough questions with an acceptable internal consistency, the scale will behave as though it's measured continuously (and in this case, there are enough)--since statisticians haven't found a better method of measurement yet, this is the best we get.

I understand this, and agree that it's good enough to get a general idea of preference strengths. My problem isn't that at all, it's just when I see people using this as a precise instrument to measure and compare their preference strengths to others (and I have seen people do this). I could go either way on several of the questions because I exhibit both sides of the traits almost equally as far as I can tell. I don't agree that there are enough questions for each preference to compensate for this. Every question is worth around 11-12ish points. Two traits someone feels borderline about could affect their score by around 24 points. Three could affect it by around 36. And so on. Yet, they could still get the exact same number as the person who is extreme in all of those traits while having a much lower strength of preference.

Like I said, this is fine for a general idea, just not precision.

statisticians haven't found a better method of measurement yet, this is the best we get.

I disagree. This is just a free online test, I hardly think it's going to be the best of what mbti professionals can offer. There are even other free online tests that use weighted questions, which I think are a bit more accurate if they're going to try to tell me the strength of my preferences at the end.

polenka
09-26-2007, 03:59 AM
The assumption behind the test is that if a trait is dominant, it will express itself in a variety of areas and a variety of situations. Thus, each individual question is not a separate indicator of a separate trait, but certain questions will load onto one underlying trait, and in the creation of the measure, an alpha, or internal consistency level was probably computed at some point and would have had to be acceptable in order for the test to become a well-accepted measure--the internal consistency level is basically suggestive of how well the indicators on the scale go with all the other indicators on the same scale. Any indicators not loading onto the underlying trait acceptably (and thus reducing the internal consistency of the scale) would have been removed.

Not to say, of course, that these scales are perfect....and given that they are forced-choice questions with only two possible choices, one is restricted in the range of responses, and in a strict technical sense, such dichotimous indicators should not be added or subtracted for the very reasons you cite. But statistically if you have enough questions with an acceptable internal consistency, the scale will behave as though it's measured continuously (and in this case, there are enough)--since statisticians haven't found a better method of measurement yet, this is the best we get.

I understand this, and agree that it's good enough to get a general idea of preference strengths. My problem isn't that at all, it's just when I see people using this as a precise instrument to measure and compare their preference strengths to others (and I have seen people do this). I could go either way on several of the questions because I exhibit both sides of the traits almost equally as far as I can tell. I don't agree that there are enough questions for each preference to compensate for this. Every question is worth around 11-12ish points. Two traits someone feels borderline about could affect their score by around 24 points. Three could affect it by around 36. And so on. Yet, they could still get the exact same number as the person who is extreme in all of those traits while having a much lower strength of preference.

Like I said, this is fine for a general idea, just not precision.

statisticians haven't found a better method of measurement yet, this is the best we get.

I disagree. This is just a free online test, I hardly think it's going to be the best of what mbti professionals can offer. There are even other free online tests that use weighted questions, which I think are a bit more accurate if they're going to try to tell me the strength of my preferences at the end.


Absolutely, there is residual error....which means if one lies on the edge, as you argue, he or she could score either way, perhaps even due to chance...but there are enough indicators for the measure to be treated as a continuous (which doesn't necessarily mean exact) measure, thus enabling the "adding" of indicators.

This may be a free online test, but it is pretty representative of other personality tests, its been around for quite a while, and its been widely validated. The only reason a measure is or is not free depends on the author of the measure and whether they want to get any money out of it--thus, taking the time and effort to get it published (I don't even know if publishment of measures was common back in the 40's and 50's). The Myers-Briggs is not published, which means anyone can use its items--if you pay for a Myers-Briggs (which you can) you're not paying for the measure itself, but for the therapist/psychologists' interpretation of the measure--which, ironically enough, will heavily depend on interpreting the more precise numbers that are so problematic. As a psychologist I should get hammered for this, but really all you'd be paying for is the degree of the person who is interpreting your results.

I mean, we could go into the full blown gold standard, the Minnesota Multiphasic Personality Inventory, which uses a true/false scale, and will run you approximately 550 items and 1-2 hours. But it also includes primarily clinical scales, so unless you're interested in depression, anxiety, hypochondriasis, etc, it's not going to be that much more informative. Or one could go to one of the Big Five measures such as the Neuroticism-Extraversion-Openness Personality Iventory, which you might like better because its measured on a five point likert scale ranging from strongly agree to strongly disagree, thus allowing more variation in responses. That will run you 240 items and about 40 minutes, and is a published measure, so it would cost you.

Statistically, anyways, you run into the same problems--technically the difference between strongly agree and agree on one indicator is not the same as the difference between strongly agree and agree on another indicator, but we still add them up to create a scale. Even inclusive of weights, the weights could differ based on the person--one indicator might be more telling for some than for others. Statistically we don't have anything better, though arguably we have better options in terms of measurement instruments.

These free versions of the Myers-Briggs are widely used in psychological literature (in part because research gets really costly, really fast when using a published measure).

That being said, I agree that it is not a uber-precise measure--at the very least, one could expect deviations of +/- 10 around some unknown "true" score. But if I get 1 on one scale, and 100 on another, or even 50 in one scale and 100 on another, I can be pretty certain that the scale I get 100 on is more dominant than the scale I get 1/50 on.

Take it 50 times at varying points of the day, in varying moods, on various days over a period of a year--these scores should be roughly normally distributed--calculate the mean, and you'd probably get a good estimation of your "true"--and I use that word loosely--score.

Jezebel
09-26-2007, 04:33 AM
You're reading a bit too much into what I'm saying and nitpicking. And yet, after reading through all that, I don't even see you really disagreeing with me. I never said it wasn't good for giving a good general indication, just that it isn't quite precise.

Psyborg
10-06-2007, 04:30 AM
To be scientific, it may be accurate but not necessarily precise. Precision is the ability to come to a grouping of conclusions that appears to answer a question in an adequate and apparently well-defined way but not necessarily accurate. An example of this that is given in high school textbooks is the dart board where a person may be able to hit the board in a tight cluster but it's not near the bullseye.

Accuracy, however, is where the conclusions appears to not only answer the question but does so in a believable way if not in a tight way. Again, the dart board would have all of the darts similarly close to the bullseye but not necessarily in a tight grouping.

So in this sense you are very much correct. As a career councilor told me, this tells you within a good degree of accuracy what your "personality profile" is, but there is plenty of room for error. What if you were only scoring a 55 on one of the letters? Or two? What does this say about your personality?

On the other hand, it's safe to say that it's accurate enough to give you a rough idea of how you'd interact with another random person in any given situation, society as a whole, and therefore what careers might be applicable to you, etc. It isn't precise enough to give you a list of only 5 careers or to give you the exact match for a perfect date or spouse.

There is definitely a huge gap between accuracy, as the subject line states, and precision, which you seem to imply.

Jezebel
10-06-2007, 04:36 AM
There is definitely a huge gap between accuracy, as the subject line states, and precision, which you seem to imply.
The above posts actually came before the subject title (this was split from another thread), but can I see what you mean. Taken into account and changed.