A joint research paper on artificial intelligence (AI)-driven drug discovery by “SAKURA internet Research Center,” an in-house research institute of SAKURA internet Inc., and COGNANO, Inc. (hereinafter referred to as “COGNANO”), has been accepted for the Datasets and Benchmarks Track at the Neural Information Processing Systems (NeurIPS) 2024, one of the most prestigious international conferences in the field of AI and machine learning.
The paper will be presented from Wednesday, December 11 to Friday, December 13, 2024 (local time) in Vancouver, British Columbia, Canada.
SAKURA internet Research Center and COGNANO publicly released a large-scale labeled interaction dataset between diverse antibodies and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by using the immune system of live alpacas. This research paper was accepted in recognition of the novelty and usefulness of our released dataset.
The emergence of ChatGPT has catalyzed the research and development of large language models (LLMs), leading to rapid advancements in natural language processing technology. These technological advancements are not limited to the natural languages we use in our daily lives but are also extending into the field of drug discovery, particularly antibody discovery. This is because an antibody sequence can be represented as a string of letters representing a type of amino acid. Leveraging this characteristic, there is a growing interest in constructing language models trained on vast antibody sequence data (hereafter referred to as “antibody language models”), which are expected to significantly enhance the efficiency of antibody discovery. However, in contrast to natural languages, where vast amounts of publicly available data can be found on the Internet, publicly accessible antibody sequence data are limited. Specifically, the scarcity of labeled datasets indicating the antibody sequences that interact with particular antigens, such as viruses or bacteria, poses a significant challenge for the future development of antibody language models.
To address this challenge, SAKURA internet Research Center and COGNANO have established a method for generating large-scale antigen-antibody interaction datasets using live alpacas and have published the generated dataset. Camelids, such as alpacas and llamas, possess exceptionally simple antibody structures compared to other animals, offering the advantage of more efficient sequence data generation. SAKURA internet Research Center and COGNANO selected SARS-CoV-2, which triggered a global pandemic in early 2020, as the target antigen and publicly released a large-scale labeled interaction dataset between diverse antibodies and SARS-CoV-2 by using the immune system of live alpacas. The availability of this dataset will enable researchers worldwide to develop and evaluate high-performance practical antibody language models. This research paper was accepted in recognition of the novelty and usefulness of our released dataset. SAKURA internet Research Center and COGNANO envision that these research findings will open new possibilities for AI-driven drug discovery, contributing to advancements in medical science and expansion of AI applications.
Roles of each organization in joint research
COGNANO:
COGNANO was responsible for constructing datasets essential for training the AI models through biological experiments. COGNANO has established a novel method for constructing labeled datasets of antigen-antibody interactions and publicly released generated datasets targeting SARS-CoV-2.
SAKURA internet Research Center:
SAKURA internet Research Center was responsible for constructing and evaluating AI models that predict antigen-antibody interactions using datasets generated by COGNANO. SAKURA internet Research Center verified the usefulness of the released datasets through benchmark experiments using the original antibody language model and various existing general protein and antibody-specific language models.
Accepted paper
Title:
A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models
Authors:
Hirofumi Tsuruta (SAKURA internet Inc., COGNANO, Inc.), Hiroyuki Yamazaki (COGNANO, Inc., Biorhodes, Inc.), Ryota Maeda (COGNANO, Inc., Biorhodes, Inc.), Ryotaro Tamura (SAKURA internet Inc., COGNANO, Inc.), Akihiro Imura (COGNANO, Inc., Biorhodes, Inc.)
Paper: https://arxiv.org/abs/2405.18749
Released datasets: https://datasets.cognanous.com
Presentation at NeurIPS 2024
About NeurIPS:
NeurIPS, established in 1987, is one of the world's most prestigious international conferences in the fields of AI and machine learning because of its large number of submissions and low acceptance rate, driven by a rigorous peer-review process. NeurIPS 2024, the Thirty-Eighth Annual Conference on Neural Information Processing Systems, will be held in Canada in December 2024.
Date and Location:
Date: Wednesday, December 11 – Friday, December 13, 2024
Location: Vancouver Convention Center, Vancouver, British Columbia, Canada
Presenter:
Hirofumi Tsuruta (SAKURA internet Inc., COGNANO, Inc.)
Details:
Please refer to the following website: https://neurips.cc/Conferences/2024
- The content contained in this press release are as of the date of announcement. This information is subject to change without notice.
- The company names and service names described in this press release are registered trademarks or trademarks of the respective companies.
Representative:Kunihiro Tanaka, Founder & CEO, President
Headquarters:GRAND GREEN OSAKA North, JAM BASE, 3F 6-38 Ofukacho, Kita-ku, Osaka-shi, Osaka
Date of foundation: December 23, 1996
Date of establishment: August 17, 1999
URL:https://www.sakura.ad.jp/corporate/
Representative:Akihiro Imura, CEO & Co-founder
Headquarters:#101, 64 Higashiyama, Kamitakano, Sakyo-ku, Kyoto-shi, Kyoto
Date of foundation:October 17, 2014
Date of establishment:October 17, 2014
URL:https://www.cognano.co.jp/
SAKURA internet Inc.
https://sakura.f-form.com/sakurapr
COGNANO, Inc.
https://cognanous.com/contact