Could AI help select influenza strains for seasonal vaccines?
Every year, scientists grapple with predictions of which strains of influenza will most likely make the rounds in the upcoming fall and winter, informing the year’s seasonal flu vaccines. But these vaccines are not always effective, and the virus can change between strain prediction, vaccine development, and the onset of flu season.
Could artificial intelligence (AI) aid in phylogenetic identification of potential flu candidates for seasonal vaccines? We spoke with Caroline Colijn, senior author of a recent research paper that aimed to answer this question (1), to find out.
What are the limitations of traditional methods for selecting strains for the seasonal flu vaccines?
Seasonal influenza is constantly evolving and it’s always going to be challenging to predict what is most likely going to circulate in the upcoming season. Traditional methods benefit from data on the immunity generated by circulating strains as well as global surveillance and immunological expertise, but the core challenge remains. And that’s why we wanted to explore whether large-scale phylogenetics can shed light on which strains are likely to grow and circulate. Phylogenetic trees encode information about the spread of viruses because they track patterns of ancestry in the population. Spreading among humans is how these viruses have descendants, so we suspected there would be signatures of which viruses are spreading more within the shapes and structures of large-scale phylogenetic trees.
Why did you turn to AI?
AI is an increasingly important tool to learn from data and inform decision-making processes. AI can find patterns in big datasets, such as large phylogenies, that may be too large to analyze with traditional approaches; in turn, AI models can become more accurate as more data are provided to them.
Our approach reads large-scale phylogenetic trees to identify signatures that a strain is likely to grow into the next season. These signatures are part of the shapes and branching patterns in the phylogenetic tree, and include features such as the tree imbalance, branching rate, and number of “cherry” configurations, among many others. We trained AI models to detect the signatures that a small tree is likely to grow into in future and applied that knowledge to the question of vaccine strain selection.
How could your model feed into the current strain selection process?
Influenza vaccine strain selection is a complex process that involves global collaboration, surveillance, and laboratory analysis. As part of the Global Influenza Surveillance and Response System, centers around the world continuously collect influenza virus samples from patients. They monitor the types of circulating viruses and identify genetic changes in the virus, particularly in the hemagglutinin (HA) protein for H3N2. Laboratories analyze the samples, for example, with genetic sequencing to characterize how influenza is evolving and with antigenic characterization to see how the viruses react to antibodies. The goal is to identify new viral strains that are different antigenically from the strains included in the current vaccine. These are strains to which the human population is less likely to have immunity against, compared with strains that are similar to what is currently circulating and as such, they are natural vaccine candidates.
We hope that our approach will be an additional source of information that could be used in the collaborative vaccine selection process because it can provide a complementary tool to explore what might circulate in the coming season.
Did you encounter any challenges during the research?
The data used in this study are publicly available sequences collected from an online database. There can be issues with genome sequencing data, where the number of sequences collected from one particular area or time can be overrepresented in a dataset compared with the true number of infections, which can skew the results. To account for this, we used downsampling strategies to only include a percentage of the full dataset proportional to the population and re-ran our models to validate the results.
Where will this research take you next?
One of the challenging aspects of AI methods is that they can be hard to interpret; for example, why do we find the signatures that we do? Could we understand this better with a simpler model or a tool that includes a more mechanistic representation of why some strains are more likely to circulate? Additionally, each influenza season brings new data and we can refine our methods further. Incorporating antigenic characterizations of the strains would also be a natural direction of research.