TQ Nordic offers advanced psychometric solutions tailored to meet specific organizational needs. Their assessments focus on providing a personalized, digital, and efficient candidate journey, ensuring fairness, accuracy, and relevance to the workplace. TQ Nordic leverages the latest advancements in behavioral science and technology to deliver assessments that are not only robust but also future-focused, helping organizations make informed talent decisions.
Normative: Respondents rate each statement independently, allowing comparisons with a benchmark group. Susceptible to response biases.
Ipsative: Respondents compare or rank different statements, reducing potential for faking but limiting the ability to compare individuals.
Quasi-Ipsative (Nipsative): Combines elements of both normative and ipsative measures, allowing for individual and group comparisons with reduced biases.
High reliability and validity are essential for psychometric assessments. While high reliability does not guarantee high validity, high validity presupposes high reliability. By ensuring both, we can provide meaningful and accurate measurements that help organizations make informed decisions about talent selection and development.
Reliability refers to the consistency and dependability of a measurement instrument. In the context of psychometric assessments, it ensures that the test produces stable and consistent results over repeated applications. To assess reliability, we use several methods, including internal consistency reliability, test-retest reliability, and inter-rater reliability.
Internal consistency reliability gauges the precision of the scales within a psychometric test. It is commonly measured using Cronbach’s alpha coefficient, which operates on a scale from 0 to 1. The typical industry acceptable level for this coefficient is 0.7.
Our questionnaire aims for reliability scores ranging from 0.7 to 0.85 per trait, with an average of 0.8.
For quasi-ipsative measures of personality, like the TQ Personality Questionnaire, internal consistency can be meaningfully calculated because these measures do not suffer from algebraic dependence among the scales. To complement internal consistency, we also consider the standard error of measurement (SEM), which helps calculate confidence intervals around a candidate’s test score. SEM is inversely related to test reliability; thus, a higher reliability indicates a lower SEM, which is desirable.
Validity, on the other hand, refers to the degree to which a test measures what it claims to measure. It ensures that the inferences made from the test scores are appropriate and meaningful in the given context.
Test-retest reliability involves administering the same test to the same group at two different points in time to check for consistency. Inter-rater reliability assesses the degree of agreement among different raters. High reliability in these areas indicates that the test provides stable and consistent results, regardless of when it is administered or who scores it.
Let’s take a closer look at various psychometric concepts that are good to know as we proceed in this course.
Psychometrics comes from the Latin word “psychometria”, which consists of “psyche” (soul) and “metric” (measurement). However, psychometrics is not about measuring the soul or personality, but is a field of study within statistics that develops methods and analyses for measuring and evaluating tests and assessments. By using mathematical and statistical techniques, psychometrics develops methods to determine, for example, mental abilities, attitudes, and various types of performance.
A norm group is a group that represents the population in terms of relevant variables such as age, gender, education level, and sector of employment. It is worth noting that there are fewer cultural differences in professional life than in private life, which is something to keep in mind.
TQ Nordic’s tests use norm groups to report results. It is a global norm group that represents an average of the world’s working population. If tests are conducted in a specific language, that language is weighted more heavily to account for cultural differences. For example, tests conducted in English may have adjustments that reflect cultural nuances specific to the English-speaking job market.
Reliability and validity are two very important concepts in psychometrics.
Validity means that I measure what is relevant in the context, that I measure the right things at the right time, while reliability means that I measure in a reliable way, that the results are stable and the same if the survey is conducted again.
Let’s take a closer look at this concept and why it’s important.
Validity is about ensuring that the methods and tools we use to measure something actually measure what we intend to measure. This means that the results from a measurement or test are accurate and useful for the specific purpose.
For example:
– If we want to measure job performance, our methods should focus on work-related tasks and behaviors, not activities in private life.
Validity thus means that our measurements are reliable and relevant for the context we want to examine so that we can trust the results.
Reliability means that our measurements are reliable and consistent. If we conduct a measurement or test several times, the results should be stable over time. This is often measured by something called test-retest, where the same test is done twice with a few weeks in between and the results are compared.
For example:
If a student scores 85 points on a math test today and then scores 85 points again when they take the same test a week later, the test has high reliability.
But if the same student scores 85 points the first time and 70 points the second time, then the test has low reliability because the results vary too much. It may mean that the test is greatly affected by, for example, mood or sleep.
Reliability and validity are both important to ensure that measurements and tests provide useful and accurate results. They are related but not the same thing.
Here are some simple rules that describe their relationship:
Low reliability always means low validity:
If our measurements are not consistent and stable (low reliability), we cannot trust that they are measuring what we want to measure (low validity).
For example, if a math test gives a student different results each time they take it, we cannot say that the test reliably measures the student’s math skills.
High reliability does not guarantee high validity:
Even if the measurements are consistent and stable (high reliability), it does not mean that they necessarily measure the right thing (validity).
For example, if we measure shoe size very accurately and get the same result every time, the measurement is reliable. But shoe size says nothing about a person’s math skills, so it has low validity in that context.
High validity requires high reliability:
For our measurements to be valid and measure what we want (high validity), they must first be consistent and stable (high reliability).
For example, if we want to measure math skills with a test, the test must give the same result each time (high reliability) for us to say that it really measures math skills (high validity).
To summarize this chapter:
For a measurement to be useful, it must be both reliable (consistent and stable) and valid (measure the right thing). High reliability is a prerequisite for high validity, but it is not sufficient by itself. Reliability and validity can be measured on a scale between 0-1.
When it comes to reliability for a cognitive test, it should be in the range of 0.7 to 0.9 per personality trait. TQ Nordic’s 18 personality traits have good reliability over 0.7. The average for all 18 personality traits is 0.8.
When it comes to validity, it should be in the range of 0.2 to 0.4. TQ’s assessments have good validity over 0.35.
Why does TQ not have a reliability of 1.0?
Reliability is influenced by factors including the environment at the time of testing. Was the environment optimal? Did you take the test undisturbed without distractions? Did you read all the instructions carefully? Or did you think you would take it as it comes and learn along the way? And of course, the daily form, which can also affect reliability, for example, whether you were well-rested.
Why is validity lower than reliability? It is harder to measure well-being, intelligence, knowledge, perceptions, or experiences than, for example, body weight or height. These concepts are often subjective and can vary greatly among individuals, whereas a weight or height is what it is. To conduct good validity studies, one must relate what is measured to something else. The results we obtain must be consistent with results from other studies or simultaneous measurements with another method.
We must move beyond these concepts to why we use psychometric assessments. What we can say is that consensus around concepts requires fairly clear job profiles. We help leaders prioritize and focus on the right things. This is our goal here, which makes this a reliable tool for predicting performance and suitability for a role. Evaluate candidates uniformly, so that it is a fair and objective method to measure potential and the likelihood of good performance.
Compare against a standard group, not against other CVs. Personality questionnaires can help answer questions like: How is this person usually perceived? How does the person behave at work? How is the person best motivated? And with whom can this person work well? It also helps the individual find the right role where they are likely to succeed and thrive, and how they normally handle situations when they are at their best.