A collaborative study between researchers from the Yong Loo Lin School of Medicine, National University of Singapore (NUS Medicine), and the Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Germany, investigated how advanced AI tools, like Large Language Models (LLMs), can make it easier to evaluate interventions for ageing and provide personalised recommendations. The findings were published in the leading review journal Ageing Research Reviews.
Research into ageing is producing an overwhelming amount of data, making it difficult to determine which interventions — such as new medicines, dietary changes, or exercise routines — are safe and effective. This study investigated how AI can analyse data more efficiently and accurately, by proposing a comprehensive set of standards for AI systems to ensure they deliver accurate, reliable, and understandable evaluations through their ability to analyse complex biological data.
The researchers identified eight critical requirements for effective AI-based evaluations:
- Correctness of the evaluation results. Data quality will be assessed for accuracy.
- Usefulness and comprehensiveness.
- Interpretability and explainability of the evaluation results. Clarity and conciseness of the results and the given explanations.
- Specific consideration of causal mechanisms affected by the intervention.
- Consideration of data in a holistic context:
- Efficacy and toxicity, and evidence for the existence of a large therapeutic window;
- Analyses in an “interdisciplinary” setting.
- Enabling reproducibility, standardisation, and harmonisation of the analyses (and of the reporting).
- Specific emphasis on diverse longitudinal large-scale data.
- Specific emphasis on results that relate to known mechanisms of ageing.
Telling LLMs about these requirements as part of the prompting improved the quality of the recommendations they produced.
Professor Brian Kennedy from the Department of Biochemistry & Physiology, and Healthy Longevity Translational Research Programme at NUS Medicine, who co-led the study, said, “We tested AI methods using real-world examples such as medicines and dietary supplements. We found that by following specific guidelines, AI can provide more accurate and detailed insights. For instance, when analysing rapamycin, a drug often studied for its potential to promote healthy ageing, the AI not only evaluated its efficacy but also provided context-specific explanations and caveats, such as possible side effects.”
“The study’s findings could have far-reaching effects,” added Professor Georg Fuellen, Director, Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, who co-led the study, “For healthcare, telling the AI about the critical requirements of a good response can enable it to find more effective treatments and make them safer to use. Generally, AI tools could design better clinical trials and help tailor health recommendations to each person. This research is a major step toward using AI to improve health outcomes for everyone, especially as they age.”
Moving forward, the team is now focusing on a large-scale study of how to best prompt AI models for longevity-related intervention advice, to evaluate their accuracy and reliability for a wide array of carefully designed benchmarks, that is, curated, high-quality data. The validation of such AI systems is specifically important because the longevity interventions may then be implemented by a large number of healthy people. Prospective studies will need to demonstrate that AI-based evaluations can accurately predict successful outcomes in human trials, paving the way for safer and more effective health interventions.
The team hopes to use their findings to make health and longevity interventions more precise and accessible, and ultimately improve the quality and duration of life. Collaboration between researchers, clinicians, and policymakers will be essential to establish robust regulatory frameworks, ensuring the safe and effective use of AI-driven evaluations.