The Future of Biomedical Informatics

Biomedical Informatics Q&A with Dr. Sean Mooney


 

Q&A

About Dr. Sean Mooney 

In this edition of Healthcare's Data Innovations, our subscribers get to meet Dr. Sean Mooney, an accomplished figure in biomedical informatics and medical education. Serving as Professor and Chief Research Information Officer (CRIO) at UW Medicine, he pioneers collaborative electronic systems for research and develops next-gen informatics tools. With a background spanning prestigious institutions like UC San Francisco and Stanford, his expertise lies in unraveling genetic disease origins and cancer causes. Dr. Mooney's influential work, supported by NIH and the National Library of Medicine, shapes the future of healthcare data science. In this Q&A, he shares insights on data challenges, bias mitigation, and the transformative role of data-driven medicine.

 

Top Takeaways:

The Future Healthcare is Driven by Data Science:

  • Data science efforts are moving from basic predictions to more complete and integrated forecasts for individual patients, leading to more accurate outcomes.
  • Using patient records to understand broader population health trends could revolutionize how we approach healthcare.

There are Solutions to Global Healthcare Data Challenges:

  • Healthcare data has issues like inconsistency and missing information, but advanced data science tools are helping create learning health systems that improve care over time.
  • Data science is enabling more efficient research and operational improvements while safeguarding patient privacy.

Bias and Fairness in Data Science:

  • Bias exists in healthcare data even before using machine learning, but data-driven approaches actually help uncover and address these inequalities.
  • It's important to actively explore how machine learning might worsen biases and inequities, while collecting data that reveals social and environmental factors affecting health.

Q&A

Jake: Dr. Mooney, in the brief time that we have come to know each other this year, and in conjunction with some additional research, I’d categorize your focus as rethinking what the next generation of informatics and genetics will look like. It’s also interesting that you’ve been so passionate about computing tools and infrastructure in research. Is that a fair assessment?

Dr. Mooney: Yes it is! I have kind of a funny path that I’ve taken to CRIO here at University of Washington School of Medicine. I'm actually trained as a computational chemist and have spent my life kind of trying to understand the biochemistry and the molecular biology of human genetic diseases. Really with the goal of being able to use biology to interpret the human genome from a clinical perspective.

At some point in my career, maybe 15 years ago now, I got really interested in understanding humans better, and how patients are all different from each other and of course, how they're also the same. As an example, when we do genetic studies, often times we would see some sort of inclusion criteria. Let's say we were doing a study on Parkinson's disease and we'd have inclusion criteria that would define whether the patient had Parkinson's or not for the purposes of the study. So we would get a database stood up and create files of genetic elements that came from those patients, with the expectation that it's a yes/no if the patients have Parkinson’s or not.

But if you ever meet a group of Parkinson's patients, they're all different. They're all really, really different. These patients they live in different environments, have different exposures, and nearly everything about them is different. And I just felt like at that point in my career the real opportunity for me was to try to understand those differences better! As we start to understand the biology of human disease better we can also understand how that biology gives rise to those individuals who are also different even though they have the same condition.

I know this is long, but looking back 15 or 20 years ago when I was a professor running a lab, I also had an interest in what we call open data science or open science. That is making data available to use for discovery; to try to understand the complexities of biology or other insights from data. By making data open and available to data scientists, frankly around the world, I felt and still feel like we can advance science more efficiently and likely more impact-fully. So, it was a long road but those two kind of pathways led me to want to get closer to healthcare research. That’s what led to me to look for positions where I could understand humans better and understand clinical data better to more accurately describe how people are similar and different from a clinical perspective.

Jake: A fact that won’t be lost on our readers, is that there are a lot of challenges to utilizing clinical data.

Dr. Mooney: There certainly are a lot of challenges with using clinical data. Historically clinical data has not been harmonized from site to site, so it's not integratabtle. Clinical data is very noisy with a lot of missing data, and sometimes what is there is very low quality. Trying to interpret a human and how different we are, how our diseases or conditions express themselves is complex enough. But it comes from data, unfortunately we are missing large tracks of that data. I tend to remind people that clinical data is collected not for the sole purpose of necessarily describing a patient. Instead, its collected for the purposes of managing healthcare. Which can include making sure the patient is going to the right department, making sure the patients have the right scheduling, ensuring their visits are all correct, and that they're communicated with correctly, let alone the financial aspects of healthcare that are all managed from within our EMRs.

Those challenges also leave out that there is also managing the use of the data in a secure way. HIPAA, patient privacy, and other system policies that could be unique to a healthcare system present challenges for the use of health data for research and discovery. So, yea, you could say there are challenges!

I really wanted to help implement and oversee the use of healthcare data as well and happened to find my way to University of Washington as the Chief Research Information Officer. Our goal is enable data to be used, but also enable transparency for patients and the health system that manages that data while also protecting privacy.

Jake: In these face of these challenges, I’m always so impressed by the work of learning health system and academic medical center research leaders and their teams. Our mutual friend, Dr. Philip Payne at Washington University School of Medicine in St. Louis is one we have been lucky to support. You have an advanced data science team – can you talk about that path and progress?

Dr. Mooney: Of course, I want to enable the process of learning healthcare. From a research perspective we want to be able to take health data and develop methods that leverage that data like machine learning algorithms, new statistical algorithms, things like that and of course the computer infrastructure that can run that level of analysis.

It’s fundamental for me that we run those models in a way that’s regulatory appropriate but also in a way that is repeatable and scalable operationally. Our methods actually become more effective the more we use them and operationalize them. Which leaves us wanting to collect more data, build training data, validation data processes and create this continuous cycle of learning that’s highly operational. I’m lucky that this CRIO role helps me oversee all of those components.

Jake: The infrastructure does vary from system to system – how does your role help ensure the verified learning that comes from that continuous cycle is implemented in care?

Dr. Mooney: Well, once we establish infrastructure that can support continuous learning, we build up our collection of data, deploy our machine learning algorithms, then we implement those tools back into research data and effort for further verification and eventually the right insights do make their way back into, in our case, EPIC. Which allows us to then collect new data as a part of those studies and keep the cycle alive. Really, that’s what my role does.

Something I'm really excited about in this vein is that we are building an institute called the Institute for Medical Data Science between the College of Engineering, the School of Public Health and the School of Medicine where we're really trying to enable all the bright minds that are at this university, maybe even those that are outside of UW, to be able to get together and really leverage the healthcare platform to understand how data-driven methods are used. That type of mission and these types of effort are why I'm here in this role.

Jake: Not surprisingly, items you mentioned are very popular topics in healthcare right now – precision medicine, AI, predictive health. There are system challenges past just interoperability to really obtaining these goals, right? There are serious infrastructure challenges – our team has been building these solutions for some time now, including Databasin (databasin.co) for healthcare research. Given the intersection of systemic challenges but technical enablement, where do you see data science driving healthcare in the next few 3 years?

Dr. Mooney: There’s a good question! I think we're seeing a transition right now, Let's take machine learning because I believe machine learning is really important for the future of medicine, and It's something I'm really interested in.

I know a lot of work, historically, was based in tools that we use inclinical data that were statistical or even rule based, very simple. But I'm very interested in machine learning algorithms that take a large amount of data and read that data in a unsupervised or supervised way, generalize that data, and then can make predictions based on new data points that we've never seen before that are within the scope of what our acceptable data methods have been trained to do.

I don’t want to lose sight of the methods that we've built in the past are really reductionist in terms of how they're built. That is, someone wants to predict whether an inpatient has sepsis or not, is at risk for deteriorating, or other bad sepsis based outcome. And we know that there are signals that they may be getting sepsis that providers may not always see. So we build a method that can predict sepsis in a way that would be able to allow for interventions that would prevent deterioration or worse outcomes happening in the future. Another example might be a method that predicts whether a patient is going miss a clinic appointment, right. If the patient is likely to miss a clinical appointment based on their data, we can intervene and make sure they get there, because missed appointments cost healthcare systems money. So we could build methods to predict that.

Those are both very reductionist because you think about a patient's exposure and it’s thought about in a world that’s in a vacuum and that doesn’t exist in healthcare. There's all sorts of things that are going on that could potentially be predicted. But we're going to build a method in this case to predict sepsis, or are they going to have a heart attack? Are they going to be transferred to acute care? Transferred outside of the ICU? Are they going be discharged? Are they going to come back again?

When we predict things, we're largely reductionist and that we're kind of predicting things in a vacuum by themselves. And we know that in the machine learning world that those individual predictions are less accurate if we don't know the complete scope of that patient's healthcare universe.

I think we're going to be more integrated in the future in terms of how we build predictions and forecasts for an individual patient. The next generation of data science in health is going to be more comprehensive, it's going to be more holistic, it's going to be broader!  Data science is going to move away from being individualized with individualized methods, to being more integrated and broad…and it's going to predict everything about that patient.

And I think that by doing that, we will be more accurate.  Many times our inaccuracy in machine learning is caused by other confounding things that are similar but not exactly what you're trying to predict.

I'm going to go a little bit further in the future. Right now we have a lot of different high level ways of collecting data. We have sent the US census, we have data that's collected for the purposes of collecting the health of the population across various government, public, and private agencies. That collection is generally done at the population level. We can look at items like the burden of patients that have heart attacks or other conditions in a healthcare region and of course much more.

I believe, and I'm stretching a bit here, but I believe in the future that we'll be able to infer the health of the population by looking at individual patient records and understanding all of the patients in a region and what they're being treated for.

Jake: Do you think that contributes to a level of sustainability? One of the challenges now is how quickly care models fall apart even deployed at different hospitals within the same hospital system. Do you feel like we are going to get around that curve?

Dr. Mooney: Super important problem, and a super relevant challenge. You're totally right, it’s really hard. We know that it's hard because the data's different, maybe it’s coded different, maybe not harmonized well, even semantically similar items may be quite different in terms of the how they look in a database between two clinical sites, even within the same health system. Further, you have to update those methods we talked about earlier, you have to run that cycle of learning in a very clean and well defined manner. To be a learning health system, you have to be running that cycle, how often do you run that cycle of learning? How much investments can you make to get data harmonized well at all levels?

I think some of these problems are going to solve themselves over time, not without a lot of work for many people, but they're going to solve themselves over time based on drivers that may not be machine learning based. I have to believe that data iss going to become more harmonized as we move into the future, right? We’re coming along now though, we're starting to use APIs. We have HL7, FHIR, OMOP for example. Healthcare systems will keep adapting to more and more standardizations and technologies for the exchange of data.

But our analysis is not saying we’ll ever set and forget, we’re still going to train them, retrain them, watch them over time, and ultimately learn from them.

Jake: My family and I are on both sides of the need for growth in healthcare. We’re obviously in the consulting and product space now with Databasin, but our oldest also has a rare chromosome disorder, and we’ve been on a long healthcare journey. As a parent of a child with a rare disease, I’m always thoughtful about the AI we support not extending biases that already exist in healthcare. What are your thoughts on that?

Dr. Mooney: So this is a really hard question to answer. The first thing that I'll say is there's biases that have existed way before machine learning did. Biases exist within the health system, but proper methodologies, including machine learning, and being data driven actually help to expose them. The first piece to look at when evaluating inequity in healthcare is your data, you’ll see that bias is there, that should happen way before machine learning algorithms tell you there is bias. I think some of inequity is addressable, some of inequity is very addressable, and unfortunately some of inequity is very hard to address – but in all cases it’s right there, visible in your data right now.  We need to look harder into our data and really, truly understand that.

This is the second side of inequity, that data can ameliorate on some level inequity because it is the record of inequity that exists today. And that record is in our data and every group that's studying inequity in a healthcare organization, the first thing they should say is let's look at our data and try to understand it, and hire a data scientist to help us understand it.

None of this is to say that inequity cannot be exacerbated with machine learning and data because machine learning is a great generalizer. And we're seeing this right now from generative artificial intelligence being applied. We’re seeing some cases where I think personally, I'm not going to give examples, but I think personally that if some of these models that data science efforts create were applied in practice they would cause disparities or cause inequities to be exacerbated.

I believe that for a lot of reasons. Machine learning's a great generalizer and the majority sets of data points are going to kind of dominate in the end. We need to invest in studying how machine learning could continue inequity and we need to be very explicit to make sure that we collect data as part of healthcare that may not be used for financial reasons or directly in the patient's clinical care. Instead, the purpose would be to give us insights into that inequity. Give light to the social determinants of health, environmental determinants of health, their socioeconomic status, and population health features.

We can certainly get a better handle on when and how inequity occurs, particularly data-driven inequity and inequity that's going to be caused by a machine learning algorithm or the products of a machine learning algorithm.

Jake: I know we are hitting the end of our time, thank you so much Dr. Mooney for your time today!

Similar posts

Subscribe to our Healthcare's Data Innovation Blog

Be the first to know about the latest trends and developments in healthcare data management and analysis.

Sign Up