DECISION REFERRAL
Decision Referral: Combining strengths of radiologists and AI
Decision referral is at the core of our AI concept. Here we explain what it is, how it works, and the impact it can have on radiologists' metrics when reporting with Vara.

Safety net: For cases where the AI is very confident that the images are suspicious, it offers a safety net: Should the radiologist classify one of those cases as negative, the safety net triggers and points the radiologist to a specific region in the image that is suspicious to the AI. The radiologist can then reconsider the decision, potentially catching a cancer that would have otherwise been missed.
The safety net localizes where in the image a suspicious lesion is detected:

Unclassified cases: Importantly, the AI does not make a statement for all cases. There are cases that are neither classified as normal (the least suspicious cases), nor is the safety net activated (the most suspicious cases). For those cases, the AI is not confident enough and the decision expertise should come from the radiologist.
An intrinsic property of decision referral is its configurability: One can configure the AI so that the lowest 50% of cases are labelled normal, or one can configure it to label the lowest 70% as normal. Similarily, the safety net can be activated for the 1% of most suspicious cases, or alternatively for the 2% of most suspicious cases.
The question is: Which combination of normal triaging and safety net configuration has the most positive impact on the radiologist's metrics?
Let's look at both AI systems in isolation. Normal triaging can only reduce the sensitivity↓ of the radiologist: By labelling a case “normal” that is actually a cancer, the sensitivity of the radiologist decreases. On the other hand, normal triaging can label a case as “normal” that a radiologist otherwise would send to recall unnecessarily. This way, the specificity↑ can go up.
Now, for the safety net, the behaviour is exactly the opposite. Each case that the safety net proposes can only decrease the specificity↓, because it might activate for a case that isn’t actually cancerous. On the other hand, each additional cancer that the safety net catches that a radiologist would have missed otherwise will increase the sensitivity↑.
We will now analyze the following: Is there a configuration for the safety net (specificity↓ and sensitivity↑) and normal triaging (sensitivity↓ and specificity↑) that cancels out the negative effects and results in a net-positive impact on the radiologist's metrics?
Evaluating decision referral
To evaluate this question, we use a large retrospective dataset of the German screening program. For each case, we know the assessment of the two initial radiologists, the recall decision, as well as the final biopsy results (malignant or benign) if a biopsy was done. All positive cases are biopsy-proven and all negative cases have a two year negative followup to make sure they are really negative and not missed cancers.
This rich metadata allows us to evaluate the performance of decision referral retrospectively. We are testing the following scenario: How would the radiologist's metrics change, if they:
Followed the AI’s recommendation for each case where the AI makes a statement, (i.e. a normal assessment for any case that normal triaging labels as "normal" and "suspicious" for any case where the safety net triggers.)
Did their standard assessment for all cases where the AI was not confident enough to make statement.
The table below illustrates on six example cases how these assumptions play out depending on what Vara's AI said and what the radiologist said:

For each case, Vara's AI either classifies the case as “normal”, “safety net” or leaves the case unclassified. If it's one of the first two, we assume the radiologist followed the AIs recommendation and check how that would change their metrics.
Sources
[1] https://fachservice.mammo-programm.de/download/evaluationsberichte/Jahresbericht-Evaluation_2018.pdf
[2] Internal data from 10 German screening units.