Photo by Unsplash
AVIDa-SARS-CoV-2
AVIDa-SARS-CoV-2 is a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery.
Columns
A description of columns in the dataset CSV file.
AVIDa-SARS-CoV-2.csv
Column | Description |
VHH_sequence | Amino acid sequence of VHH |
Ag_label | Antigen Type |
label | Binary label represented by 1 for the binding pair and 0 for the non-binding pair |
subject_species | Species of the subject from which VHH was collected |
subject_name | Name of the subject from which VHH was collected |
subject_sex | Sex of the subject from which VHH was collected |
antigen_sequences.csv
Column | Description |
Ag_label | Antigen Type |
Ag_sequence | Amino acid sequence of antigen |
Pipeline
AVIDa-SARS-CoV-2 was generated through the following workflow. The scripts highlighted in blue are available on GitHub.
Statistics
AVIDa-SARS-CoV-2 contains 77,003 data samples, comprising 22,002 binding pairs and 55,001 non-binding pairs. The following figure shows the number of data samples for each antigen type.
Subjects
Two alpacas were used for dataset generation.
Name | Species | Sex |
Christy | Alpaca | Female |
Puta | Alpaca | Male |
Antigen Types
We used 13 types of antigens as targets listed in the table below.
Antigen Type | Panning | Description |
WT | cell | Wild-type (WT) SARS-CoV-2 identified in Wuhan |
D614G | cell | Mutant with D614G mutation |
Alpha | cell, bead | Mutant with representative mutations of Alpha variant with a C9 tag at the C-terminus. |
Alpha+K417N | cell | Mutant of antigen type “Alpha” with K417N mutation |
Alpha+K484K | cell | Mutant of antigen type “Alpha” with E484K mutation |
Beta | cell, bead | Mutant with representative mutations of Beta variant |
Delta | cell, bead | Mutant with representative mutations of Delta variant |
Kappa | bead | Mutant with representative mutations of Kappa variant |
Lambda | bead | Mutant with representative mutations of Lambda variant |
Omicron | cell, bead | Mutant with representative mutations of Omicron (BA.1) variant |
PMS | bead | Polymutant spike (PMS) protein |
S2-domain | bead | S2-domain of the WT |
OC43 | bead | Human coronavirus OC43 (HCoV-OC43) |