ChatGPT in Healthcare - What Could Go Wrong?
How ChatGPT Health initiatives earlier this year jeopardize medical records and position OpenAI to sell your personal data while increasing medical misdiagnosis at scale.
What is it?
Early this year OpenAI announced ChatGPT Health, a special gateway to be created for what they say is a secure gateway for your health and wellness questions, tailored to you if you upload your medical records and medical apps.
Does the technology work for health advice?
Evidence suggests it does not, despite over 40M people using the generic chat product for the use case. Outside of potentially biased research that OpenAI has sponsored showing it minimizes diagnostic errors, the medical community at large promptly ran studies suggesting that LLMs have neglible difference in diagnosis (1 month after the announcement), then a few weeks later released results from a study suggesting ChatGPT specifically misdiagnosed clinician-designed criteria.
Is this data really secure?
Going back to their ask for customer privacy, OpenAI says any medical records you upload are safe, however they have had insurance deals in the works to share data in undisclosed capacities and recently released information on a partnership for insurance quotes. One week after the insurance quote partnership announcement, they raised $122 billion at an $852 valuation.
How could have this effected the most recent valuation?
In addition to questionable use of medical records as stated above, any additional valuations might use automation of doctors in the medical community that is calculated at a per-hour value to be higher than any average, in the same way valuations are used for developers and other industries through presumed economic displacement that is likely not an accurate representation.
Well does OpenAI say the product is unique for health anyways?
OpenAI publically states the system is designed from and trained by Healthbench, a synthetically generated dataset containing evaluation diagnostics sourced from a diverse set of doctors, but ultimately limited to the training data by definition of being synthetic data. The authors in the paper acknowledge this dataset does not contain the full breadth of diagnostic norms and lose depth in the multi-turn conversations that doctors need to diagnose medical symptoms. Furthermore, an analysis of the dataset shows falsehoods in OpenAI’s claim that it even represents multi-turn conversations (over 50% of are single-turn, e.g. one-shot diagnoses). Why would OpenAI be misleading and release a paper that is not peer reviewed as solid research to a medical community that might not have the expertise to look this deep?
OpenAI typically deflects criticism by deferring to the next version of the dataset, which will be improved in future versions due to improved context windows and scaling laws (presumptions which have been proved false, for both the former and the later, data that has been available for awhile now).
In summary, OpenAI claims HealthBench is beneficial, while independent medical research suggests otherwise. Who do you believe? And, who is at fault when a diagnosis is wrong?
Which industry will be next focus?
There will be more..likely where higher salaries are involved; stay tuned for more in this space..!



