Multi-Scale Residue Interaction Learning for Protein pKa Prediction via Physicochemical Graph Representation Networks
Keywords:
protein pKa prediction, multi-scale graph neural networks, physicochemical representation, molecular property prediction, systems infrastructure, fairness in bioinformaticsAbstract
Accurate prediction of the acid dissociation constants of ionizable residues is a foundational challenge in structural biology, impacting protein engineering, enzyme design, and the understanding of pH-dependent conformational dynamics. Traditional empirical methods, while computationally efficient, often fail to capture the delicate interplay between local chemical microenvironments and long-range electrostatic effects. This paper introduces a systems-oriented architecture for protein pKa prediction that leverages multi-scale residue interaction learning within a physicochemical graph representation framework. The model constructs a hierarchical graph in which atomic, residue, and protein-level features are embedded through physically motivated node attributes, including partial charges, solvent accessibility, and hydrogen-bonding capacities. Message passing is orchestrated across spatial scales, enabling the network to internalize both short-range inductive effects and global dielectric responses. From a large-scale systems perspective, we analyze the trade-offs between graph granularity, computational throughput, and predictive fidelity, highlighting how modular design choices enable deployment on heterogeneous computing clusters. Robustness is examined through perturbations of structural inputs and cross-family generalization, revealing that physically regularized multi-scale aggregation confers resilience against conformational noise. Fairness considerations are addressed by auditing prediction discrepancies across different amino acid types and buried versus surface-exposed residues, leading to architectural adjustments that mitigate systematic biases. The paper further discusses infrastructure and sustainability, including containerized microservice deployment, energy-efficient inference, and policy implications for AI-driven molecular property servers. By reconciling biophysical rigor with scalable system engineering, the proposed framework illustrates how graph-based multi-scale learning can serve as a responsible, interpretable, and deployable component of the computational structural biology ecosystem.
References
1. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M., & Jensen, J. H. (2011). PROPKA3: Consistent treatment of internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation, 7(2), 525–537.
2. Goh, G. B., Knight, J. L., & Brooks, C. L. (2012). Constant pH molecular dynamics simulations of proteins. Journal of Chemical Theory and Computation, 8(1), 36–46.
3. Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R., & Tkatchenko, A. (2017). Quantum-chemical insights from deep tensor neural networks. Nature Communications, 8, 13890.
4. Fout, A., Byrd, J., Shariat, B., & Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. Advances in Neural Information Processing Systems, 30.
5. Song, Z., Wang, R., Jiao, X., & Huang, Z. (2026). Graph-Based Deep Learning Models for Predicting p K a Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering. Journal of Chemical Information and Modeling.
6. Wigh, D. S., Goodman, J. M., & Lapkin, A. A. (2022). A review of molecular representation in the age of machine learning. Journal of Cheminformatics, 14, 14.
7. Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a benchmark for molecular machine learning. Chemical Science, 9(2), 513–530.
8. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35.
9. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.
10. Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., ... & Velankar, S. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1), D439–D444.
11. Baker, N. A., Sept, D., Joseph, S., Holst, M. J., & McCammon, J. A. (2001). Electrostatics of nanosystems: application to microtubules and the ribosome. Proceedings of the National Academy of Sciences, 98(18), 10037–10041.
12. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. International Conference on Learning Representations.
13. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning, 1263–1272.
14. Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., ... & Baker, D. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49–56.
15. Walters, W. P., & Barzilay, R. (2020). Applications of deep learning in molecule generation and molecular property prediction. Accounts of Chemical Research, 54(2), 263–270.
16. Nielsen, J. E., & McCammon, J. A. (2003). Calculating pKa values in enzyme active sites. Protein Science, 12(9), 1894–1901.
17. Gao, Y., Zhu, J., Zheng, L., & Zhang, J. Z. H. (2020). DeepKa: deep learning based prediction of protein pKa shifts. Journal of Chemical Information and Modeling, 60(12), 6146–6156.
18. Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.
19. Shaw, D. E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R. O., Eastwood, M. P., ... & Wriggers, W. (2010). Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002), 341–346.
20. Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4, 123–144.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Bioinformatics Insights and Analytics

This work is licensed under a Creative Commons Attribution 4.0 International License.



