AI for predicting the physico chemical properties of materials

Eva Gerardin, Diane Gauthier and Members of JEFAI.UC

This article was co-written with JEFAI.UC, a Portuguese Junior-Enterprise from the University of Coimbra specializing in artificial intelligence. It is the result of several months of work and discussions between Chimie Lille Etudes and JEFAI.UC.

Introduction :

Predicting the physicochemical properties of materials is a crucial challenge in materials science for advancements in energy, healthcare, and electronics. Traditional methods, relying on experiments and simulations, are often slow and costly because of resources. AI and machine learning offer a faster, more efficient alternative by analyzing large datasets, identifying patterns, and enabling accurate predictions, accelerating materials discovery and innovation. The emergence of Artificial Intelligence (AI) has revolutionized the old process by enabling faster and more accurate predictions. By leveraging machine learning and data-driven models, researchers can now explore a vast range of materials and optimize their properties with unprecedented efficiency.

This article examines how AI is transforming the prediction of physicochemical properties, highlighting key methodologies, current challenges, and future perspectives in this rapidly evolving field.

I. Machine Learning and AI: Transforming Materials Science

Machine Learning (ML) and Artificial Intelligence (AI) are revolutionizing materials science by enabling data-driven approaches to predict and design materials with desired properties. These technologies are transforming traditional trial-and-error methods into efficient, high-throughput processes, significantly reducing the time and cost associated with materials discovery [1][2]. This chapter explores the role of ML and AI in materials science, focusing on supervised, unsupervised, and deep learning techniques, as well as the importance of databases in training models.

Supervised Learning

Supervised learning is widely used for regression and classification tasks in materials science. For example, it has been applied to predict material properties such as bandgaps, mechanical strength, and thermal conductivity. A notable application is the prediction of bandgaps in perovskites using ML models, which achieved high accuracy compared to traditional methods [1][2]. Supervised learning relies on labeled datasets, where the model learns to map input features (e.g., atomic structure, composition) to target properties.

Unsupervised Learning

Unsupervised learning is used for clustering and dimensionality reduction, enabling the discovery of patterns and relationships in unlabeled data. For instance, clustering algorithms have been employed to categorize materials into families based on their structural or chemical properties [1][2]. This approach is particularly useful for exploratory analysis and identifying novel materials with similar characteristics.

Deep Learning

Deep learning (DL), a subset of ML, has gained significant traction in materials science due to its ability to analyze unstructured data and automatically extract features. DL models, such as convolutional neural networks (CNNs) and graph neural networks (GNNs), have been used for atomistic simulations, materials imaging, and spectral analysis [1][2]. For example, GNNs have been applied to predict adsorption energies and catalytic properties of materials, demonstrating their potential for complex property prediction [1][2].

Databases and Their Role in Training Models

The availability of large, high-quality datasets is critical for training accurate ML models. Databases such as the Materials Project, AFLOW, and NOMAD provide structured data on material properties, enabling researchers to train models for property prediction and materials discovery [1]. These databases are often curated using computational methods like density functional theory (DFT) and experimental data, ensuring their reliability.

Despite the growing availability of materials databases, challenges remain in data accessibility and quality. Many datasets are incomplete or lack standardization, making it difficult to train robust models. Additionally, extracting data from unstructured sources like scientific literature requires advanced natural language processing (NLP) techniques, as demonstrated by tools like ChemDataExtractor and IBM DeepSearch [2].

AI-powered tools are increasingly being used to extract and structure data from scientific literature. For example, the Eunomia AI agent autonomously creates structured datasets from natural language text, simplifying the compilation of machine learning-ready datasets [3]. Such tools are essential for overcoming the limitations of manual data extraction and ensuring the scalability of ML applications in materials science.

Types of Models Used in Materials Science

Regression Models: such as linear regression, support vector regression (SVR), and Gaussian process regression (GPR), are commonly used for predicting continuous material properties. For instance, GPR has been applied to predict thermal conductivity and mechanical properties with high accuracy [1][2].

Classification Models: including decision trees, random forests, and neural networks, are used for categorical predictions, such as identifying material phases or classifying materials into functional categories. These models are particularly useful for high-throughput screening of materials [1][2].

Deep learning models, such as CNNs and GNNs, are increasingly being used for complex tasks like atomistic simulations and property prediction. For example, GNNs have been employed to predict electronic properties and catalytic activity, leveraging their ability to model complex relationships in material structures [1][2].

II. AI applications for predicting physicochemical properties

AI has demonstrated remarkable success in predicting a wide range of material properties, including mechanical, thermal, electronic, optical, chemical, and catalytic properties. This section explores these applications in detail, supported by recent advancements and case studies.

Mechanical Properties

AI models have been widely used to predict mechanical properties such as resistance, elasticity, and hardness. These models leverage compositional and structural descriptors to establish relationships between material inputs and mechanical outputs. For example, ML models trained on Density Functional Theory (DFT) datasets have accurately predicted the elastic moduli of crystalline materials.

The MLMD platform is a notable example of how AI can address data scarcity in this domain. It integrates surrogate optimization and active learning to predict mechanical properties of materials like high-entropy alloys and steel. By using a Bayesian toolkit, the platform enables accurate predictions even with limited datasets, making it a valuable tool for materials discovery [4].

Additionally, databases such as the Materials Project and JARVIS-DFT provide extensive datasets for training ML models to predict mechanical properties, such as elastic constants and hardness. These resources have been instrumental in advancing the field, enabling researchers to design materials with tailored mechanical properties for specific applications [5][6].

Thermal Properties

Thermal properties, including conductivity and heat capacity, are critical for designing materials for energy applications. AI models, particularly deep learning architectures, have been employed to predict these properties with high accuracy.

A notable example is the Transformer model using the Pointwise Distance Distribution (PDD) representation, which has been developed to predict thermal properties of crystals. This model outperforms traditional methods in both accuracy and computational efficiency, making it a promising tool for materials design [5].

Furthermore, the Garden-AI infrastructure hosts ML models for predicting thermodynamic properties, providing uncertainty quantification and domain guidance to ensure robust predictions. These advancements highlight the potential of AI to accelerate the discovery of materials with optimized thermal properties, which are essential for applications in energy storage, thermal management, and beyond [6].

Electronic and Optical Properties

AI has been particularly effective in predicting bandgap, electrical conductivity, and optical properties, which are crucial for designing semiconductors, solar cells, and optoelectronic devices.

For instance, researchers at the Indian Institute of Science (IISc) developed a transfer learning-based model using Graph Neural Networks (GNNs) to predict electronic properties like bandgaps and dielectric constants. Their model demonstrated superior performance compared to models trained from scratch, showcasing the power of transfer learning in materials science [7].

Additionally, the Materials Project database has been instrumental in training ML models to predict electronic properties, leveraging DFT-computed data for high-accuracy predictions. These advancements enable researchers to design materials with tailored electronic and optical properties, paving the way for innovations in electronics, photonics, and renewable energy [5].

Chemical and Catalytic Properties

AI models are increasingly used to predict reactivity, stability, and adsorption properties, which are essential for designing catalysts and chemical processes.

The MLMD platform is a prime example of how AI can be applied in this domain. It incorporates active learning and surrogate optimization to predict catalytic properties of

materials like perovskites, enabling the discovery of novel catalysts with targeted properties through iterative design processes [4].

Another notable development is the ElaTBot LLM, which has been designed to predict elastic constants and design materials with specific chemical properties. By integrating with general LLMs like GPT-4o, the ElaTBot enhances prediction accuracy and inverse design capabilities, making it a powerful tool for materials discovery. These advancements demonstrate the potential of AI to revolutionize catalysis and chemical engineering, enabling the design of more efficient and sustainable chemical processes [8].

III. Challenges and limitations of AI in predicting material properties

Despite its transformative potential, AI faces several challenges when applied to the prediction of physicochemical properties. These limitations must be addressed to ensure reliable and practical applications in materials science.

One major issue is data quality and availability, as many material datasets are incomplete, inconsistent, or biased, making it difficult for models to generalize accurately, especially for rare or novel materials. Another challenge is interpretability and reliability, the predictions are sometimes difficult to understand and to trust. Enhancing explainability and integrating physics-based constraints can improve confidence in AI-driven results. Additionally, computational complexity remains a hurdle, as training advanced AI models requires significant resources, limiting accessibility for smaller research groups. Addressing these challenges is crucial to making AI predictions more accurate, interpretable, and practical for materials science.

Moreover, AI models trained on laboratory data may struggle to predict material behavior in large-scale manufacturing, where factors like impurities and processing variations come into play. Finally, many industries still rely on conventional methods and lack the infrastructure to fully integrate AI into their workflows. Overcoming these challenges will be key to bridging the gap between AI predictions and real-world applications [9][10].

IV. Prospects and future developments

The integration of Artificial Intelligence (AI) and Machine Learning (ML) into materials science holds immense promise, though the field is still evolving. Future advancements will focus on enhanced data generation, advanced algorithms, and seamless integration with experimental workflows. High-throughput experimentation and autonomous labs are expected to generate vast datasets, enabling robust AI model training. However, challenges such as data quality, standardization, and intellectual property concerns must be addressed. Collaborative platforms like the Materials Project and initiatives like FAIR (Findable, Accessible, Interoperable, and Reusable) data principles are promoting better data sharing and management, though barriers remain.

Advanced algorithms, including explainable AI (XAI) and hybrid models, will drive future progress. XAI techniques like SHAP and LIME are being adapted to improve the interpretability of AI predictions, while hybrid models combining ML with physics-based simulations, such as density functional theory, enhance accuracy and reliability. Transfer learning will enable AI applications to materials with limited data, though ensuring model generalizability across diverse systems remains a challenge. Integration with experimental workflows, such as autonomous discovery systems like A-Lab and real-time data analysis, will accelerate materials discovery by providing immediate feedback and optimization.

AI’s expanding applications include multiscale modeling, sustainable materials, and personalized materials. Multiscale models will bridge atomic and macroscopic behaviors, while AI-driven discovery of sustainable materials, such as energy-efficient catalysts, will address global challenges. Personalized materials, tailored for specific applications like biomedical implants, will benefit from AI optimization. Ethical considerations, such as mitigating bias in AI models and workforce transformation, must also be addressed to ensure responsible development. By fostering interdisciplinary collaboration and investing in advanced algorithms and infrastructure, the materials science community can fully harness AI’s potential to accelerate discovery and innovation.

Conclusion

Artificial Intelligence (AI) and Machine Learning (ML) have emerged as transformative tools in materials science, enabling the rapid prediction and discovery of materials with tailored physicochemical properties. By leveraging supervised, unsupervised, and deep learning techniques, researchers can analyze vast datasets, identify patterns, and make accurate predictions that were previously unattainable through traditional methods. The integration of AI with high-throughput experimentation, advanced databases, and physics-based models has significantly accelerated the materials discovery process, reducing time and costs while expanding the range of explorable material systems.

Despite these advancements, challenges remain, including data quality and availability, model interpretability, and the need for seamless integration with experimental workflows. Addressing these issues will require continued innovation in algorithms, data curation, and interdisciplinary collaboration. The future of AI in materials science is promising, with prospects such as autonomous discovery systems, sustainable material design, and personalized materials poised to revolutionize the field. By fostering ethical practices and investing in workforce development, the scientific community can ensure that AI-driven advancements are both impactful and equitable. Ultimately, AI is not just a tool but a paradigm shift, empowering researchers to unlock new frontiers in materials science and address some of the most pressing challenges in energy, healthcare, and sustainability.

Sources :

[1]- Choudhary, Kamal, et al. "Recent advances and applications of deep learning methods in materials science." npj Computational Materials 8.1 (2022): 59.

[2] - Chong, Sue Sin, et al. "Advances of machine learning in materials science: Ideas and techniques." Frontiers of Physics 19.1 (2024): 13501.

[3] - Ansari, Mehrad, and Seyed Mohamad Moosavi. "Agent-based learning of materials datasets from the scientific literature." Digital Discovery 3.12 (2024): 2607-2617.

[4] - Ma, Jiaxuan, et al. "MLMD: a programming-free AI platform to predict and design materials." npj Computational Materials 10.1 (2024): 59.

[5] - Balasingham, Jonathan, Viktor Zamaraev, and Vitaliy Kurlin. "Accelerating material property prediction using generically complete isometry invariants." Scientific Reports 14.1 (2024): 10132. [6] - Jacobs, Ryan, et al. "Machine learning materials properties with accurate predictions, uncertainty estimates, domain guidance, and persistent online accessibility." Machine Learning: Science and Technology 5.4 (2024): 045051.

[7] - “Researchers Develop AI Model to Predict Material Properties” [Online]. Available: https://www.sciencenewstoday.org/researchers-develop-ai-model-to-predict-material-properties [8] - Liu, Siyu, et al. "Large Language Models for Material Property Predictions: elastic constant tensor prediction and materials design." arXiv preprint arXiv:2411.12280 (2024).

[9] P. Turkowski, “How Is AI Used in Chemistry? Conversational Assistant Case Study.” [Online]. Available: https://www.netguru.com/blog/ai-assistant-chemistry

[10] “L’apprentissage automatique et l’IA aident à prédire les résultats des réactions chimiques.” [Online]. Available:

https://www.chemeurope.com/fr/news/1183452/l-apprentissage-automatique-et-l-ia-aident-a-predire-le s-resultats-des-reactions-chimiques.html