Amid the rapid advancement of AI-driven materials design, extracting key features from complex material structures in a scientific, interpretable, and efficient manner has emerged as a core challenge for achieving intelligent materials discovery. Topological structural chemistry, a methodology that maps microscopic material structures into mathematical topological models, has demonstrated powerful capabilities in structural representation and property prediction in recent years across fields such as materials genome engineering, catalytic activity exploration, and energy materials design. Professor Pan Feng’s team at the School of Advanced Materials, Peking University Shenzhen Graduate School, has been dedicated to expanding graph theory and topological data analysis (TDA) methods and their applications in structural feature extraction, achieving a series of innovative results. These include topological representations of material structures (J. Phys. Chem. Lett., 2023, 14, 954), inverse materials design (npj Comput. Mater., 2025, 11, 147), novel solid-state electrolyte design (J. Am. Chem. Soc., 2024, 146, 18535; J. Am. Chem. Soc., 2025, 147, 24), chemical reaction path search (CCS Chemistry, 2024, 7, 1), and catalytic active phase search (Nat. Commun., 2025, 16, 2542). These studies have systematically established a full-chain research framework from topological structural representation to property prediction and functional materials design, laying a solid theoretical and methodological foundation for the deep integration of structural chemistry and artificial intelligence.
Recently, Professor Feng Pan’s team proposed a materials structural feature extraction framework based on Topological Data Analysis (TDA), offering a new approach to structural representation and property prediction that combines mathematical rigor with high interpretability. The findings were published in the internationally renowned journal The Journal of Physical Chemistry Letters under the title “Structural Feature Extraction via Topological Data Analysis” (J. Phys. Chem. Lett., 2025, 16, 8056–8067, DOI: 10.1021/acs.jpclett.5c01831).

Figure 1. Principle of Topological Data Analysis for Material Structural Feature Extraction
The study is grounded in algebraic topology theory, abstracting atomic structures of materials into mathematical complexes (simplicial or path complexes) and quantifying multi-scale structural morphology, connectivity, and void features by computing topological invariants (such as Betti numbers and cycle density). Compared to traditional empirical structural descriptors (e.g., coordination number, local environment parameters) or “black-box” deep learning features, this method exhibits stronger structural sensitivity and physical interpretability, effectively capturing high-dimensional geometric and connectivity information relevant to material performance.

Figure 2. Structure–Property Relationships Revealed by Topological Features
The team systematically demonstrated the application of methods such as Persistent Homology, GLMY Homology, and Higher-Order Hypergraph Homology in crystals, MOFs, and multicomponent molecular systems, highlighting the advantages of topological features in describing bonding networks, pore distribution, orientation relationships, and defect sensitivity. By integrating these features with Graph Neural Network (GNN) models, the team reduced model error by up to 55% in defect-sensitive property prediction and improved the R² from 0.74 to 0.85 in MOF gas adsorption performance prediction, demonstrating excellent generalization and explanatory power. This study is the first to systematically summarize the application mechanisms of topological data analysis across multiple structural systems (crystals, molecules, and biomacromolecules) and proposes a selection guide tailored to different structural features. For instance: Persistent Homology is suitable for periodic or pore-dominated systems; GLMY Homology is ideal for systems with directionality or non-equilibrium behavior (e.g., chemical reaction networks, charged molecular graphs); Hypergraph Homology is well-suited for systems dominated by many-body interactions (e.g., protein–ligand recognition, complex adsorption processes). The study highlights the unique advantages of topological features in capturing structure–property relationships, providing a unified mathematical representation framework for energy materials, catalysts, functional oxides, and other systems. It lays a theoretical foundation for integrated “structure–performance–generation” materials design.
The research demonstrates the potential of deeply integrating topological feature extraction and analysis with artificial intelligence algorithms. By embedding topological features into Graph Neural Networks and Transformer models, the team not only significantly improved prediction accuracy but also enhanced the interpretability of model outputs, providing quantitative insights into structure–property relationships.
This work was co-supervised by Professor Feng Pan and Dr. Shunning Li. PhD student Bingxu Wang and master’s student Bin Feng are co-first authors. The research was supported by the National Natural Science Foundation of China and the Guangdong Provincial Key Laboratory, among other programs.
Link to the paper: https://pubs.acs.org/doi/10.1021/acs.jpclett.5c01831