• The Influence of the Invisible

  • Flying bombers in World War II was one of the most dangerous jobs in history. If you were part of a Bomber Command crew, you had only a 55% chance of survival, and by the middle of the war, officials were getting desperate to decrease bomber crew casualties. In 1943, the Statistical Research Group (SRG) was challenged to solve a central problem that had been plaguing US bombing campaigns throughout the war: Bombers have a higher likelihood of returning safely if they are heavily armored, but the armor weighs down the plane, and the more armor you have, the more cumbersome they are to fly. The optimum balance requires knowing which parts of the plane are most vulnerable to attack and strategically reinforcing those areas. Research conducted by the Center for Naval Analysis analyzed data on planes that returned from their missions, detailing the number and placement of bullet holes along the aircraft. After compiling their data, they recommended the areas where the armor should be added, largely focusing on those areas that sustained the most damage based on the patterns of the bullet holes. Now that they knew where the armor should go, they had to determine how much armor should be placed in these vulnerable areas, and they submitted their findings to the SRG, specifically to a man named Abraham Wald, for further analysis.

  • But the report Wald sent back wasn’t what they expected. Wald revisited their original assessment, and, instead of calculating how much more armor was needed, he drastically revised their original recommendations about where the armor should be placed. The Navy analysis looked at returning planes and deduced where armor was needed by measuring the common patterns of the holes in the aircraft. But what they didn’t account for in their data set were the bombers that never came back. The bombers that returned were the ones that survived. This led Wald to a seemingly obvious, but startling conclusion:

  • The vulnerable areas of the aircraft are not where the bullet holes were, but where the bullet holes weren’t.

  • In fact, the bullet hole patterns on the aircraft that returned actually marked out the places where the armor was sufficient to protect them, precisely the area more armor wasn’t required. The bombers that didn’t return, Wald reasoned, would have taken damage to the unaffected areas on the bombers that returned. Therefore, the most vulnerable areas of the aircraft that needed reinforcement were precisely the areas where no bullet holes were found. By accounting for what was not visible in the data set, these two analyses used the same data but reached completely different conclusions.

  • The insight of Abraham Wald has been made famous not only because it helped solve a critical problem for the US Air Force and likely saved countless lives, but because it has become a powerful example of a basic cognitive bias that we all share: The Survivorship Bias. The Survivorship Bias is a logical error in which we draw conclusions from data based on what is visible and available, which has survived a selection process of some kind, rather than data that is invisible, because it has been, for various reasons, left out of the analysis. We seek conclusions by examining what we can see, and often vastly underestimate the influence of factors that lie in the shadows. By neglecting to account for the influence of the invisible, we can end up making drastic errors in judgment connecting the wrong dots.

  • Shadow Cognition and Human Performance Assessment

  • The way we bias data collection and assessment is especially relevant for designers of immersive learning experiences. We are charged with creating decision environments that drive optimal learning. To ensure we are getting the best outcomes, we are always looking for innovative ways to measure performance outcomes. But when we look to traditional assessment methods, we find that the data and measurements typically used are dangerously susceptible to the Survivorship Bias. Because human psychology and decision making is so complex, the mechanisms are difficult to understand, and the science behind it is still very much in its infancy, too often we assess human behavior and decision making based on what is most visible and most easily measurable (e.g. speed, accuracy, error rate, time, proximity), rather than assessing the less visible cognitive dimensions that might more meaningfully drive performance outcomes (e.g. sensemaking, situation awareness, self-regulation, adaptation).

  • The genesis for the concept of shadow cognition comes from Gary Klein’s book “Streetlights and Shadows”. In the book, Klein demonstrates the many ways in which we underestimate the powerful influence of complexity, ambiguity and our own ignorance on our decision making, and how this error leads to costly assumptions about how to build systems, how to train our workforces, and how to measure performance. Klein argues that correcting for these erroneous assumptions lies in a better understanding of the “shadowy dimensions” of human decision making and the role they play in shaping our approaches. For our purposes here, shadow cognition provides a framework we can use to better understand the gaps in our current assessment paradigms and how we can begin to address these gaps. In our view, the existing gap in human performance casts three primary shadows: 1) shadow environments, 2) shadow cognition, 3) shadow data.

  • Shadow Environments: Accounting for Ambiguous Decision Contexts

  • We make decision errors because we believe we live in a well-ordered world, one that has rational rules and optimal courses of action. But often we find ourselves in a world full of ambiguities and complexities beyond our understanding. The primary source of our decision errors resides in overestimating the degree to which we can derive optimum courses of action, assess risk and achieve pre-determined goals. Our errors reside between our belief in a world of clarity and the actual world, full of ambiguity. Complex domains force us to confront this gap. How we approach a situation, the skills that are needed, the decision-making requirements are all contingent upon the complexity and unpredictability of the environment we’re operating in. For example, procedures work well in well-ordered domains. But in complex domains, context needs to be taken more seriously and procedures won’t always suffice. In fact, they can be detrimental. Here, we need judgment to follow procedures where they apply and go beyond them when necessary. The expert not only needs to know how to operate when procedures aren’t available, but more importantly needs to know how to determine whether and when they need to go beyond procedures in the first place. And yet evaluators even for high uncertainty task environments like the military, often overlook the nuanced distinctions between a well-ordered, a complicated and a chaotic task environment and the differential training needs required.

  • Shadow Cognition: Tacit Knowledge and the Problem of Expertise

  • Another error we make in both our daily lives and in our assessment models is to overestimate the degree to which we are influenced by nonconscious processes. We tend to measure explicit knowledge that we are aware of, like declarative knowledge, memorized facts, elaboration of routines and adherence to procedure. When confronted by complex environments, experience and expertise is essential and the knowledge utilized by experts to make effective decisions is largely tacit knowledge. The knowledge that resides in this tacit layer, used by experts in complex environments, requires sensemaking, pattern matching, judging typicality, and mental models. However, this presents evaluators an ongoing challenge, because the central feature of expertise is that it is not available to conscious processing, and therefore is extremely difficult to measure and evaluate. While these skills are difficult to assess, they are not out of reach. Methods such as Cognitive Task Analysis provide useful knowledge elicitation tools, and the design of immersive learning experiences can elicit behavior that may not be known to the operator but can be observed given the right set of assessment tools.

  • Shadow Data: Illuminating the Invisible

  • The final shadow relates back to the problem Abraham Wald helped illuminate; both in our daily lives and in our assessment models, we tend to discount the importance of the data that is not available in our data sets as irrelevant. Without knowing what is not being measured, without digging deeper into what can be measured, evaluators can have all the data they need and reach exactly the wrong conclusion. As in the bomber example, more data doesn’t necessarily mean meaningful data. Assume that Wald had taken the recommendations from the Navy at face value and asked for more data. He would have gotten more returned bombers with more bullet holes and simply reinforced the original conclusion. More striking, getting more data in this case would actually be worse, because it would have made the original case for the Navy seem more robust. The ability to recontextualize and reframe the data, accounting for what is not easily measurable, is of critical importance to enhanced assessments.

  • The Future of Human Performance Assessment

  • As the design of immersive learning experiences becomes more complex, so will the data collection and assessment needs. Creating an assessment model capable of capturing the full spectrum of human performance data in complex domains is key to scalability, resilience, and sustainability of training outputs. Leveraging these assessment models and making them central to our synthetic and live training environments could help enhance the capability of evaluators to assess cognitive and decision making in complex domains for individual and collective training.

  • By having more granular assessments of the full spectrum of decision strategies, we can better target specific developmental needs. By capturing all stages of the learner’s developmental trajectory, we can anticipate what the needs will be at the next stage in the training process. By incorporating a learning model that motivates adapting reasoning, we can prepare learners for a wider range of situational demands.

  • While traditional training and assessment methods continue to be valuable for assessing performance on routine tasks, procedures and protocol, they are not well suited for a deep understanding of performance during high uncertainty events, where cognitive skills associated with sensemaking and decision making are critical. The future demands on human performance assessment will require a greater understanding of how operators make decisions in high uncertainty contexts, using new training tools and paradigms to ensure proficiency and operational readiness.