Your location:Main page - Academic progress

Academic progress

Prof. Bin Wang's team propose a viewpoint on the methods, challenges, and prospects of applying small-sample learning to next-generation human health risk assessment.

Traditional epidemiological research in the study of environmental exposure and population health faces numerous challenges, such as difficulties in handling high-dimensional data and the conflict between large sample requirements and research costs. These limitations are particularly evident when addressing complex exposure environments. This study proposes small-sample learning as an innovative approach that can significantly enhance assessment efficiency and result interpretability by integrating exposome data, artificial intelligence techniques, and systems biology information. The article provides an in-depth discussion on the application of large-scale exposome databases, the necessity of multi-omics data integration, and the potential and practice of deep learning technologies in optimizing small-sample learning. In the future, AI and big data-based risk assessment methods are expected to become key solutions to the limitations of traditional epidemiological approaches. This research was a collaborative effort led by Professor Bin Wang's team at Peking University, in partnership with Professor Song He (Academy of Military Medical Sciences, China), Professor Le Zhang (University of Electronic Science and Technology of China), and Professor Mingliang Fang (Fudan University).

9b7ccdbf5c4643498555bb056a7b5c4f.png

Figure 1. Schematic diagram of small-sample learning for human health risk assessment (HRA) of environmental exposure

TitleSmall-Sample Learning for Next-Generation Human Health Risk Assessment: Harnessing AI, Exposome Data and Systems Biology

Linkhttps://doi.org/10.1021/acs.est.4c11832

First Affiliation: Peking University

Journal: Environmental Science & Technology (CAS Tier 1)

bf37c8f8c0424f07909bb2a0553985d0.png

Human health risk assessment (HRA) has traditionally relied on epidemiological methods that require large sample sizes, extensive follow-ups, and significant resources. However, these approaches face limitations when applied to exposome big data, which includes diverse environmental exposures and internal biological responses. The study highlights the increasing necessity to develop small-sample learning methods that integrate artificial intelligence (AI), exposome data, and systems biology to address these challenges. Traditional HRA methods struggle with the scale and complexity of exposome big data, which contains over 119 million exposures and 17,000 disease subtypes in databases like ExposomeX. These data are difficult to process due to high dimensionality and variability. Moreover, the resources needed for large-scale studies limit their applicability in assessing rare contaminants or individual susceptibility variations. The study proposes that innovative small-sample learning techniques can overcome these challenges, enabling meaningful insights from limited data.

The Role of Small-Sample Learning

Small-sample learning offers a transformative approach to HRA by leveraging data augmentation, transfer learning, and multimodal integration. These techniques enable the derivation of robust insights even in data-scarce scenarios. For example, transfer learning allows models trained on large datasets to be fine-tuned for smaller, domain-specific tasks, while multimodal learning integrates diverse data types such as omics, imaging, and textual records. These methods enhance model efficiency and accuracy in identifying risk factors and disease burdens in small populations.

Big Data and Exposome Integration

The integration of large-scale exposome databases with advanced AI tools is central to small-sample learning. ExposomeX and TOXRIC are highlighted as key resources for HRA, offering extensive data on environmental exposures and their toxicological effects. These databases provide prior knowledge on associations between exposures and health outcomes, which can be used to refine models for smaller cohorts. Additionally, multiomics data, such as transcriptomics and proteomics, offer comprehensive insights into the biological impacts of exposures.

Advanced AI and Machine Learning Methods

Deep learning methods are critical for addressing the challenges of small-sample HRA. Techniques like generative adversarial networks (GANs), diffusion models, and variational autoencoders (VAEs) are used for data augmentation and feature extraction. These models can generate synthetic data to enhance sample diversity and improve model performance. Transfer learning further refines these approaches, enabling models to generalize insights from one domain to another, such as applying findings from common pollutants to emerging contaminants. The integration of multimodal data is pivotal for comprehensive HRA. By combining numerical, graphical, and textual data, multimodal learning offers a holistic perspective on how environmental exposures affect health. Systems biology frameworks, such as GLUE, facilitate the integration of multiomics data, enabling the identification of regulatory interactions across biological levels. These approaches improve the accuracy and interpretability of models in small-sample scenarios.

Opportunities and Challenges

The study identifies several opportunities for advancing small-sample HRA, including the integration of wearable devices, continuous environmental monitoring, and active learning strategies. A significant challenge in AI-driven HRA is balancing model accuracy with interpretability. This study emphasizes the importance of translating AI-derived insights into meaningful biological explanations for healthcare providers and policymakers. Techniques like Shapley Additive Explanations (SHAP) and perturbation-based analyses help clarify the significance of features within models. The study also highlights the need for robust, generalizable models that can address biases and variability in small-sample datasets. However, challenges remain in standardizing data formats, addressing missing data, and optimizing multimodal integration. The development of high-quality data-sharing platforms is critical for fostering collaboration and ensuring the reliability of HRA models.

Conclusion

This study advocates for a paradigm shift in health risk assessment by leveraging small-sample learning, AI, and systems biology. It emphasizes the potential of these methodologies to overcome traditional limitations, providing more precise and personalized insights into environmental health risks. The proposed approaches align with the goals of next-generation HRA, offering innovative solutions for addressing the complexities of the exposome.


Authors:

Tianxiang Wu (First Author)

b76dd6903e1c477cb1399637468a3ef8.jpg

Doctoral candidate at the School of Public Health, Peking University. His primary research focuses on artificial intelligence and computational toxicology. He has published two first-author papers in the internationally renowned environmental journal Environmental Science & Technology and one paper in the Chinese core journal Science Bulletin.

Bin Wang (Corresponding Author)

04811e4f385e47379f26081a302f3d56.png

Researcher/Tenured Associate Professor at the Institute of Reproductive and Child Health and jointly appointed at the College of Urban and Environmental Sciences, Peking University. His research focuses on environmental health, exposomics, and artificial intelligence. He has led four projects funded by the National Natural Science Foundation of China (General and Young Scientists’ grants) and served as a key member in three national key R&D projects. As a first or corresponding author, he has published 60 articles, including original research and reviews, in prestigious international journals such as Science, Environmental Health Perspectives, Environmental Science & Technology, and The Innovation. His H-index is 45, with over 6,000 citations. He serves as an Associate Editor for Environmental Science & Technology, a leading journal in the field of environmental health. His accolades include the Second Prize of the Beijing Preventive Medicine Association Scientific and Technological Award, the Third Prize of the Huaxia Medical Science and Technology Award, and the recognition as an "Outstanding Individual in the Fight Against COVID-19 in the National Science and Technology System."