On Pre-trained Language Models for AntibodyAntibodies are vital proteins offering robust protection for the human body
from pathogens. The development of general protein and antibody-specific
pre-trained language models both facilitate antibody prediction tasks. However,
few studies comprehensively explore the representation capability of distinct
pre-trained language models on different antibody problems. Here, to
investigate the problem, we aim to answer the following key questions: (1) How
do pre-trained language models perform in antibody tasks with different
specificity? (2) How many benefits will the model gain if we introduce the
specific biological mechanism to the pre-training process? (3) Do the learned
antibody pre-trained representations make sense in real-world antibody
problems, like drug discovery and immune process understanding? Previously, no
benchmark available largely hindered the study to answer these questions. To
facilitate the investigation, we provide an AnTibody Understanding Evaluation
(ATUE) benchmark. We comprehensively evaluate the performance of protein
pre-trained language models by empirical study along with conclusions and new
insights. Our ATUE and code are released at https://github.com/dqwang122/EATLM.
arxiv.org