My good friend ChatGPT and I joined 575 other participants at the virtual FHIR Connectathon a few weeks ago. ChatGPT behaved itself (most of the time …).
This FHIR Connectathon was a perfect setting for experimenting with the value of generative AI in healthcare. I previously explored the opportunities for generative AI and reported on some of my early experimentation in my blog last September:
A FHIR Connectathon like this is a mixture of the testing of implementations of well documented FHIR Implementation Guides using realistic non-PHI patient data, education about FHIR for newcomers to the FHIR standards community, and discussions among experts about how to leverage the FHIR standard for new applications to improve patient outcomes. ChatGPT (and I) felt like kids in a candy shop!
I have been involved in the FHIR standards community since 2014. My GPT-4 version of ChatGPT was trained on all of the FHIR documentation available on the Internet up until last year. A good example of how ChatGPT can collaborate with software developers and healthcare providers was my experimentation with the FHIR Physical Activity Implementation Guide during one of the tracks of the Connectathon last week.
Lloyd McKenzie, technical lead for the HL7 FHIR Physical Activity project, walked us through details of the implementation guide. He next demonstrated reference implementations of iPhone apps for patients and providers that create the FHIR resources necessary to capture the creation of a care plan with physical activity goals by a physician, the monitoring of the patient’s progress towards those goals, and the reports to the physician of the progress.
I then connected a software prototype application I implemented using the Superblocks low code app builder to the remote HAPI FHIR server where the demo FHIR physical activity data was stored. The application executed FHIR API calls to read the Patient, CarePlan and Observation resources from the demo database and then executed OpenAI API calls to GPT-4 with a prompt to summarize the FHIR data and compare the results to the American Heart Association (AHA) recommendations for physical activity. GPT-4 knew about the AHA recommendations from its previous training and did a decent job of analyzing the results. Except that GPT-4 did not notice that there was an error in the demo reporting data that stated that the patient “increased their activity to 8 days per week”. I explained to GPT-4 in a prompt that it needs to watch for this type of error in the data. It apologized for the error and modified its comparison of results.
Lloyd and I also experimented with a simple Retrieval Augmented Generation (RAG) application to test the capability of ChatGPT to answer questions about the Physical Activity Implementation Guide. We passed it the web page below from a draft of the implementation guide in a prompt. This page described the different types of physical activity measures and included tables that included LOINC codes for each of these measures.
ChatGPT responded accurately when asked for the LOINC code for most of the measures but could not find a LOINC code for Activity Duration. At that point, I explained to ChatGPT how to find the LOINC code for Activity Duration in the second column of the first row of the table in the Activity-Based Measures section. ChatGPT apologized for the oversight and its response accuracy improved after that.
I will be including transcripts from these conversations with ChatGPT in upcoming blogs along with the results of other experiments with GPT-4 and other large language models in healthcare applications.
Let me know if you have ideas for how we can test and prove the value of generative AI in future FHIR Connectathons. We are fortunate in the healthcare industry to have the FHIR standard for data exchange that is understood by software developers and our AI co-pilots such as ChatGPT. That allows us both to focus together on solving real world problems!