It is embarrassing that healthcare does not assess its quality with the same rigour of other industries.
The previous post (Guns, Van Halen, and Brown M&Ms - Real-Time Quality Checks) discussed the use of fictional data in standard workflows as a means to continuously check the quality of different systems in areas as diverse as airport security, agriculture, and aviation.
Summary
This post will look at two types of quality: Upper Limit - measured under observation, and Lower Limit - measured under real-life conditions. I will show how most medical quality assessments only get at the upper quality limit, and we never learn about the day-to-day quality of healthcare. Fictitious data can be used to understand healthcare’s true variation in quality. In the long run, this will help us create better systems that train better clinicians in real time.
Listen to the post on YouTube, or on your podcast app under: Gregory Schmidt
Two types of quality
I think there are two major types of quality that can be measured: upper limit and lower limit.
1. Upper Limit Quality (Simulated Quality)
Simulated quality assesses the upper limit of the quality of a system. It assesses how a system perfroms under optimal conditions. In this case, the system knows it is under observation. It knows it should perform as optimally as possible. It is ‘under a test’.
If a system fails an upper limit quality assessment, there are major problems. But in my opinion, it tells you nothing about the system’s actual day-to-day quality. AKA, the thing you care about.
2. Lower Limit Quality (Real-Life Quality)
2a. A system’s Day-to-Day Quality is assessed by watching it perform under typical operating conditions. It is also critical that the system does not know it is being observed. The system should be operating at its ‘average’ variation.
The examples in the last post discussing the insertion on the computer screen of fictional threat items in to x-ray scanner luggage helps assess the typical alertness of operators. This is the real-life quality I’m particularly interested in. Not how well that operator did on a test 10 years ago. How we’ll do they do on average during the week? in the morning? while tired? with a hangover?
2b. Under Stress Quality is another way to measure real-life quality. In this case the system is observed when it is know to be stretched to the limit. When a systems is at or over capacity it may become particularly dangerous. Such an assessment may make the recommendation to increase particular resources when under stress. Or when at a certain stress level, the system should prohibit certain functions from being permitted, because it may not be safe to perform them.
Types of quality assessments in medicine
Unfortunately from what I can tell, medicine is only assesses its upper limit (simulated) quality.
Physicians have to write an exam. All this tells you is how a physician performs based on textbook knowledge in a simulated environment. It really says, in my opinion, nothing of the variation in day-to-day quality they provide.
Hospitals (and medical schools) undergo accreditation. Again, this is entirely fake conditions. Institutions spend large sums of money and years preparing for these infrequent assessments. This does not tell you the day-to-day quality of the hospital.
These types of upper limit quality assessments are fine to help identify those individual and institutions that even when studied under allegedly ‘optimal conditions’ sill produce ‘red flags’.
However, let us be clear here: we are setting in these cases a ‘low bar’ (just pass), and studying the system while under ‘optimal conditions’ (aware of its evaluation).
Would you trust the quality of a production line that was only certified under fake conditions once? Or would you trust a production system that had its quality inspected only once every few years? That is crazy.
Factories go to great lengths and great sums of money to create quality assessment systems that operate as much in real-time as possible, and send these results into their production system to fine tune their output. A specific tolerance for variability is permitted. They are aware what it is, and when they deviate from it.
In healthcare, I find we neither know what the tolerance for variability is, nor when we deviate from it.
Real-life evaluation in medicine?
Eg. Pathology: Pathology laboratories must conform to national certification bodies. Lab are routinely sent special ‘audit samples’ based on the capabilities the lab is accredited to perform. The sample goes through the labs’ processes, and then a final diagnosis is submitted by the pathologist. Multiple labs across the country receive the same ‘audit sample’. Their results are pooled together, and the ‘correct’ answer is determined based on what most people reported.
This is good. But, this is really only an Upper Level Quality (Simulated Quality) assessment. Clearly the lab will handle the special ‘audit specimen’ with more care. The sample will receive more time on its analysis, and a more thoughtful diagnosis’. This tells you very little about how the lab handles specimens day-to-day when they are busy. All this process can catch is when a lab process is grossly incompetent.
eg. Blood Labs: from what I can tell blood labs actually have rather well enforced quality controls. Whenever I speak with lab directors they ‘get it’. They know the language of quality, variation, process, and systems. And - critically - they genuinely care about it.
There are multiple lab certification bodies, with standards that must be met. They often run quality control checks against samples several times a day. All of this is highly standardized. It is obviously not perfect. Clearly some labs seem to be better than others in terms of quality - particularly in specialized tests. But overall as a sector they are lightyears ahead of the rest of medicine.
Ways to determine the Real-life (Lower-Limit) Quality of medicine
I was intrigued to find out how fictitious data is used in other industries to assess their real time quality. I think there are ways we can incorporate this in medicine.
Mystery customer
Secrete shoppers, mystery dinners, secrete hotel guests are all common within the hospitality and service industry. They permit the assessment of these businesses under real-time operating conditions. Often this is for third party assessments and ratings of these business.
One could think of many ways to have ‘mystery patients’. Such patients need not be actors, they could be real people, with real conditions. But they would gather much of the same type of undercover real time quality of service data. This is far better than asking a doctor a multiple choice question on a test.
The area of eConsults or ‘electronic consultation’ is increasing in popularity. A ‘synopsis’ of a patient’s file is sent to a consultant for a second opinion. These files could be assessed to gauge the quality of responses given. And critically, the person receiving this eConsultation would not know they were being evaluated.
Data flag
It is hard to insert fictitious data directly into clinical care. But there is an alternative way to accomplish the same goal. A list of common lab-work and imaging tests that are often misinterpreted by the receiving physician could be generated. When a lab or radiology software generates a report on a real patient that matches this ‘watch list’ that patient’s report could be ‘flagged for follow up’.
The report could then be sent to the receiving physician as it ordinarily does. But in the future the system would follow up on what happened. . Did the patient get the timely referal they required based on those results? Did they get the repeat imaging that was suggested? Did the results get filed without anything being done on them? How long did it take for the physician to act on these results?
Critical Case Sequence
In radiology a file of scans that have resulted in malpractice cases can be created. These scans can enter into the circulation of radiologist’s work-lists without them realizing it. If they too fail to identify the critical finding, it would alert them.
(The only trouble with this example, is that often critical radiology scans require a direct phone call to the ordering physician. It might become obvious if the MD and phone number who submits these ficticious scans into the system is the same. There are ways around this.)
Black box recording
My understanding is that colonoscopies have accepted standards for how long they should take. If you go faster, the thought is you may miss something. From what I’ve seen physicians start and stop their own clock to document how long it took.
Why are all colonoscopies not video recorded, by default? This would be great to build machine learning models off of, and provide objective evidence of the quality of the colonoscopy. One could even send the tape to someone else for a second opinion.
Quality assessments in Medicine won’t work
I can already hear all the critisims. The quality of care in healthcare is good. There is nothing to be worried about. The system is stretched it its limit, it doesn’t have time to deal with additional workload.
This post doesn’t have time to provide a rebuttal. But the take away is that although these people and institutions have passed simulated upper limit quality assessments, the real day-to-day quality remains yet unpublished and a dark secrete.
Note: this article did not get into performance metrics, such as ‘number of patients post operative with complications’ or ‘number of patients who return to hospital after discharge with heart failure’. Performance metrics are really hard to compare between patient and physician populations, and are slightly different then this article’s objective of real-life real-time assessments.
The Future of education
There is tremendous opportunity to use the electronic health record as a real-time education tools for new and old doctors alike. Systems could be designed to ask the clinician ‘what would you do?’. Or the systems could ‘observe’ what the clinician does, and provide alternative suggestions and advice.
This could help train doctors and keep their skills fresh. This could be particularly helpful in low resource environments.
Doctor, what do you think of this EKG? Did you not see the T wave inversion? Essentially, the EHR becomes a personal lifelong tutor. Always improving its skill, always improving the clinician’s skill.