AI Foundation Models Are Reshaping Environmental Health Risk Assessment Strategies-Featured research-PKU Institute of Reproductive & Child Health

Chinese

Your location：Main page - Featured research

Featured research

AI Foundation Models Are Reshaping Environmental Health Risk Assessment Strategies

日期：2026-01-31 | 访问量：

AI Foundation Models Are Reshaping Environmental Health Risk Assessment Strategies

Date: 2026-2-1

In traditional environmental health risk assessment, analytical frameworks have largely relied on single pollutants, isolated exposure scenarios, or specific outcome-oriented models, making them ill-suited to address real-world problems characterized by the highly coupled, cross-scale, and nonlinear interactions among climate change, complex pollution mixtures, socioeconomic disparities, and population health. Although vast amounts of environmental monitoring, epidemiological, and socioeconomic data have been accumulated globally, existing assessment systems still depend predominantly on highly customized, task-driven model structures. As a result, they struggle to achieve unified analysis and generalizable prediction across regions, scenarios, and health outcomes.

In recent years, the emergence of artificial intelligence (AI) foundation models has introduced a fundamentally new paradigm for environmental health risk assessment. Unlike task-specific machine learning (ML) models, AI foundation models are pre-trained on large-scale, multimodal datasets to learn generalizable representations of the “environment-exposure-health” nexus, enabling efficient transfer and fine-tuning across diverse regions, scenarios, and research objectives. This framework aims to overcome key bottlenecks in current environmental health risk assessment, including fragmented models, inconsistent spatial and temporal scales, and limited reproducibility.

A collaborative team led by Prof. Bin Wang of the Institute of Reproductive and Child Health, Peking University, together with Dr. Kai Zhang (postdoctoral researcher), Prof. Tiantian Li from the National Institute of Environmental Health, Chinese Center for Disease Control and Prevention, and Associate Researcher Dr. Xiangyu Min from the Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, systematically articulated the theoretical foundations and practical pathways of this paradigm shift. Focusing on atmospheric and soil environments, domains closely linked to human survival and supported by relatively rich spatiotemporal monitoring data. the team proposed two classes of foundation models that are urgently needed: (1) an integrated foundation model targeting the “climate change–air pollution–socioeconomic factors–population health” system, which emphasizes Transformer-based and graph neural network architectures to bridge environmental exposure prediction, disease burden assessment, and policy scenario simulation, thereby shifting risk assessment from retrospective attribution to forward-looking early warning and strategy optimization; and (2) an environmental sustainability foundation model for agricultural systems, which integrates remote sensing, climate, soil, pollution emission, and health-related data to enable unified prediction and decision support from production systems to ecological and health risks, providing system-level tools for source-oriented exposure control and health-driven management.

Overall, AI foundation models hold the promise of reshaping environmental health risk assessment strategies, transforming them from fragmented, static, and experience-driven approaches into a multiscale, dynamic, and health-outcome-oriented systemic framework, and offering interpretable, transferable, and sustainable scientific support for precision prevention and policy making.

(1) Developing an Integrated Foundation Model for “Climate Change–Air Pollution–Socioeconomics–Population Health”

At present, approximately 99% of the global population is exposed to ambient environments exceeding the World Health Organization (WHO) air quality guidelines, and air pollution is responsible for an estimated 7 million premature deaths each year. Meanwhile, climate warming and the increasing frequency and intensity of extreme heat events are projected to impose health burdens on a substantial proportion of the population that may surpass those attributable to conventional air pollution within this century. In addition, socioeconomic factors, such as income, education, and fertility, profoundly shape the spatial patterns of health risk inequality by influencing exposure profiles, disease susceptibility, and the distribution of healthcare resources. Although large-scale assessment frameworks such as the Global Burden of Disease (GBD) provide valuable insights at the macro level, their multi-stage, assumption-intensive modeling systems still face notable limitations in transparency, reproducibility, and rapid scenario exploration.

Traditional machine learning models, including random forests, regression-based approaches, or standalone deep learning networks, are typically designed for single tasks. As such, they struggle to simultaneously integrate heterogeneous data sources encompassing air pollution, meteorological variability, population structure, and health outcomes, and they are difficult to extend naturally to downstream health risk assessment or policy simulation. In contrast, AI foundation models, through pre-training on large-scale, multimodal environmental datasets, can learn generalizable representations of Earth systems and health risks. These representations can then be rapidly adapted to diverse application scenarios through fine-tuning, substantially enhancing predictive performance across scales and tasks.

Building on this concept, the authors propose the CASH-FM (Climate-Air Pollution-Socioeconomics-Health Foundation Model) as a unifying framework (Figure 1). CASH-FM adopts two types of model design. On one hand, Transformer and U-Net architectures are used to process high-resolution gridded data, enabling fine-scale prediction of air pollution, climate variables, and their temporal dynamics. On the other hand, multi-scale graph neural networks (GNNs) are introduced to encode national or regional nodes, capturing spatial dependencies, socioeconomic linkages, and the propagation and heterogeneity of health burdens across regions. These two architectures complement each other, allowing the model to deliver both high-resolution spatiotemporal predictions and robust regional-scale decision support.

The paper further defines three core functional objectives of CASH-FM. First, multimodal data integration: by combining remote sensing, ground-based monitoring, emission inventories, meteorological data, and health statistics, the model can systematically identify pollution hotspots and quantify associated health burdens. Second, multiscale predictive capability: from neighborhood-level exposure assessment to national-scale policy scenario analysis, CASH-FM supports health risk prediction under alternative emission reduction or climate adaptation strategies. Third, public health decision support: by quantifying health benefits and economic costs on the basis of model predictions, CASH-FM provides a scientific foundation for air quality early warning systems, resource allocation, and clean energy transition policies.

Figure 1. Schematic illustration of the dual-architecture CASH-FM foundation model.

(Full text available at: https://pubs.acs.org/10.1021/acs.est.5c16891)

(2) Developing a Sustainable Risk Assessment Foundation Model for Agriculture-Environment-Health

The sustainable development of global agricultural systems has long been a central concern across agricultural science, environmental science, and public policy. Agriculture sits at the intersection of multiple, intertwined challenges, food security, environmental degradation, and climate change. On the one hand, it underpins food supply for nearly 8 billion people and forms the foundation of societal stability; on the other, it consumes approximately 70% of global freshwater and land resources and represents a major source of soil nutrient depletion, greenhouse gas emissions, and chemical pollution. A growing body of evidence suggests that, under certain conditions, crop yield, resource-use efficiency, and environmental sustainability can be improved synergistically through optimized cropping systems, fertilizer and pesticide management, and land-use practices. However, such evidence is largely derived from local surveys, field experiments, or single-process models with limited spatial scale and fixed scenario assumptions. These constraints hinder system-level integration and long-term prediction at regional or global scales, thereby limiting their relevance for macro-level decision-making and policy formulation.

As agricultural intensification continues, fertilizer and pesticide inputs, together with soil contamination, have become increasingly prominent concerns. Excessive fertilization not only drives nitrogen and phosphorus losses and aquatic eutrophication but also exacerbates agricultural greenhouse gas emissions. Pesticides and industrial pollutants may further interact synergistically or additively with heavy metals and emerging contaminants, substantially amplifying ecological and population health risks. In China, for example, studies indicate that soil arsenic contamination has continued to increase since 2000 and is projected to intensify further over the coming decades, posing long-term threats to rice production safety and public health. These complex challenges expose fundamental limitations in existing agricultural intervention and assessment frameworks, particularly in their capacity for system integration, cross-scale prediction, and scenario analysis, underscoring the urgent need for new technological pathways.

Although directly incorporating large-scale, multimodal data into biophysical simulations of agriculture, environment, and health remains challenging, the emergence of foundation models offers new opportunities to build systematic and scalable analytical frameworks. Through self-supervised pre-training on large, multimodal datasets, foundation models can learn generalizable system representations and demonstrate superior generalization and transfer efficiency in downstream tasks. Breakthroughs achieved by models such as GraphCast and Aurora in climate and environmental prediction suggest that similar approaches could unify climate change data, remote sensing observations, soil properties, crop management practices, pollution emissions, and socioeconomic information within a single, efficient, and extensible modeling framework, thereby enabling integrated “agriculture–environment–health” assessments.

Building on this vision, the research team proposes the development of an Agricultural Sustainability Foundation Model (ASFM), designed to support prediction, scenario simulation, and decision-making from field-scale to national and global scales through multi-source data integration and domain knowledge embedding. As illustrated in Figure 2, ASFM follows a two-stage development strategy. Stage I focuses on the integration and pre-training of global spatiotemporal datasets, leveraging architectures such as Transformers, graph neural networks (GNNs), or U-shaped convolutional neural networks (UNet) to learn general representations of agricultural systems. Stage II incorporates local observations or scenario-specific data and employs efficient fine-tuning techniques—such as low-rank adaptation (LoRA), quantized low-rank adaptation (QLoRA), and adaptive low-rank adaptation (AdaLoRA)—to rapidly adapt the model to specific regions or application contexts. By combining multiple tuning strategies, ASFM is designed to maintain strong generalization while accommodating diverse application needs. This enables cross-scale prediction, scenario-based simulation, and multi-system interaction modeling, allowing the model to function as a decision-support system embedded with sustainability objectives. Ultimately, ASFM can generate optimized strategies and provide localized recommendations, facilitating the coordinated advancement of food security, ecological protection, and public health goals.

Figure 2. Two-stage conceptual framework of the Agricultural Sustainability Foundation Model (ASFM).

(Full text available at: https://doi.org/10.1021/acs.estlett.5c01105)

Challenges and Outlook

The development of AI foundation models for integrated environment–health applications faces a set of shared and structural challenges. First, global environmental and health data remain highly fragmented and heterogeneous. Data from different sources vary widely in spatial resolution, temporal coverage, quality control, and accessibility, and existing data ecosystems are still largely dominated by high-income countries. This imbalance may amplify regional inequalities and introduce systematic biases, thereby constraining model generalizability in low-income and data-scarce regions. Second, both CASH-FM for environmental health and ASFM for agricultural systems confront fundamental challenges in cross-scale modeling and generalization. Local or field-scale processes do not naturally align with regional or global climate and socioeconomic dynamics, and purely data-driven models are prone to failure when extrapolated beyond observed scenarios. Addressing this issue requires multiscale model architectures and hybrid strategies that integrate mechanistic understanding with AI to enhance physical consistency and interpretability. Third, model governance, transparency, and the explicit integration of sustainability objectives remain insufficient. Most current models prioritize predictive accuracy, with limited capacity to balance multiple objectives such as food security, environmental protection, economic costs, and social equity.

Looking ahead, lowering data barriers through federated learning and privacy-preserving training, strengthening interdisciplinary collaboration, and promoting open and co-developed infrastructures will be essential. AI foundation models should gradually be built as a form of public infrastructure serving global public health and agricultural sustainability. Such efforts can drive a paradigm shift in risk assessment—from fragmented, retrospective, and static analyses toward unified, forward-looking, and scenario-driven decision frameworks, providing robust, long-term scientific support for policy making and practical interventions. More broadly, AI foundation models are rapidly becoming a foundational layer across disciplines, including text, vision, biomedicine, and knowledge networks, substantially improving efficiency and productivity. In environmental science, however, their application is still at an early stage. The research team believes that over the next 5–10 years, AI foundation models possess genuine “game-changing” potential and will inevitably reshape the paradigms of environmental health research.

Reference:

1) Tham, et al. Building the world’s first truly global medical foundation model. Nat. Med. 2025, 31 (11), 3580−3585.

2) Xie, et al. Crop switching can enhance environmental sustainability and farmer incomes in China. Nature 2023, 616 (7956), 300−305.

3) Lam, et al. Learning skillful medium-range global weather forecasting. Science 2023, 382 (6677), 1416−1421.

4) Bodnar, et al. A foundation model for the Earth system. Nature 2025, 641 (8065):1180-1187.

5) Bommasani, et al. On the opportunities and risks of foundation models. arXiv 2021, DOI: 10.48550/arxiv.2108.07258.

6) Brown, et al. Language models are few-shot learners. arXiv 2020, DOI: 10.48550/arXiv.2005.14165.

7) Wang, et al. ExposomeX: Development of an Integrative Exposomic Platform to Expedite Discovery of the “Exposure−Biology−Disease” Nexus. Environ. Sci. Technol. 2025, 59 (26), 13251−13263.

Acknowledgements

We acknowledge the China Cohort Consortium (https://chinacohort.bjmu.edu.cn/) and the Key Laboratory of Epidemiology of Major Diseases, Ministry of Education (Peking University) (https://klemd.pku.edu.cn/index.htm) for their collaborative support. We also acknowledge financial support from the National Natural Science Foundation of China and key research and development programs of the Ministry of Science and Technology of China.

Author Information

First Authors:

Dr. Kai Zhang received his PhD from Shanghai Jiao Tong University and is currently a postdoctoral researcher at the Institute of Reproductive and Child Health, School of Public Health, Peking University. His research focuses on climate change and its health effects, integrating traditional epidemiological approaches with artificial intelligence to investigate the impacts and underlying mechanisms of environmental pollutants and climate change on cardiopulmonary diseases. To date, he has published more than 10 papers as first author in journals including Environmental Science & Technology, Environmental Pollution, and the Journal of Infection. He is a recipient of the National Postdoctoral Program (Category C) and has participated in three projects funded by the National Natural Science Foundation of China, including Youth and General Programs.

Prof. Xiangyu Min is an Associate Researcher at the Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences. His research focuses on green agricultural development and its health impacts. He specializes in integrating multimodal agricultural-environmental data with machine learning approaches to investigate agricultural system sustainability, the restoration of soil ecological functions, and the mechanisms underlying “soil-health” linkages. To date, he has published 22 peer-reviewed articles as first or corresponding author in journals including Soil & Tillage Research, Habitat International, and Environmental Science & Technology Letters. He has led five research projects, including a General Program and a Young Scientists Program funded by the National Natural Science Foundation of China, as well as a subproject of the National Key Research and Development Program.

Corresponding Authors:

Prof. Tiantian Li, Ph.D., Researcher, Doctoral Supervisor, Recipient of the National Science Fund for Distinguished Young Scholars, and Deputy Director of the National Institute of Environmental Health, Chinese Center for Disease Control and Prevention. Her research focuses on climate change, air pollution and health, health big data, and risk prediction. Dr. Li has led 11 national-level research projects, including a National Outstanding Youth Fund project, a National Natural Science Foundation of China key project, an Integrated Project topic, and National Key R&D Program projects. Over the past five years, Dr. Li has published 76 original research and opinion papers as the first or corresponding author in top-tier journals, including Lancet, Nature, and their families of journals. Of these, 45 were published in top-tier journals (Category 1) of the Chinese Academy of Sciences, with an average Impact Factor of 21. Her honors include the inaugural Cell Press China Female Scientist Award, the Huaxia Medical Science and Technology Young Investigator Award, and the First Prize of the 2024 China Meteorological Service Association Science and Technology Award (ranked first of 15 recipients). Dr. Li is a member of the World Heart Federation Committee on Climate Change, Air Pollution and Health and a member of the National Environmental and Health Expert Advisory Committee.

Prof. Bin Wang is a tenured Associate Professor and Vice Dean at the Institute of Reproductive and Child Health, Peking University, and also serves as an adjunct professor at the College of Urban and Environmental Sciences, Peking University. His research focuses on exposomics and AI-driven environmental health risk assessment. He is a co-developer of the integrative exposomics platform ExposomeX (www.exposomex.cn), which accelerates research on the “Exposure-Biology-Disease” nexus. Prof. Wang has made significant contributions by developing statistical models to predict internal human exposure levels of common environmental pollutants across different regions. He has further quantitatively evaluated associations between maternal exposure to high pollution levels and adverse reproductive health outcomes, providing critical evidence for the impacts of environmental pollutants on reproductive health. Prof. Wang has published over 140 SCI-indexed papers and has an H-index of 54. He is also a leader in education and developed the course “Exposomics” for undergraduate and graduate students, which was recognized as an Excellent Teaching Demonstration Course at Peking University.