Skip to main content
arXiv is now an independent nonprofit! Learn more
archive
Search Submit Donate Log in
Press Enter to search · Advanced search

Computer Science > Machine Learning

arXiv:2411.10548 (cs)
[Submitted on 15 Nov 2024 (v1), last revised 8 Sep 2025 (this version, v5)]

Title:BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Authors:Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef, Keisuke Kamata, George Armstrong, Kevin Boyd, Zhonglin Cao, Han-Yi Chou, Simon Chu, Allan dos Santos Costa, Sajad Darabi, Eric Dawson, Kieran Didi, Cong Fu, Mario Geiger, Michelle Gill, Darren J Hsu, Gagan Kaushik, Maria Korshunova, Steven Kothen-Hill, Youhan Lee, Meng Liu, Micha Livne, Zachary McClure, Jonathan Mitchell, Alireza Moradzadeh, Ohad Mosafi, Youssef Nashed, Saee Paliwal, Yuxing Peng, Sara Rabhi, Farhad Ramezanghorbani, Danny Reidenbach, Camir Ricketts, Brian C Roland, Kushal Shah, Tyler Shimko, Hassan Sirelkhatim, Savitha Srinivasan, Abraham C Stern, Dorota Toczydlowska, Srimukh Prasad Veccham, Niccolò Alberto Elia Venanzi, Anton Vorontsov, Jared Wilber, Isabel Wilkinson, Wei Jing Wong, Eva Xue, Cory Ye, Xin Yu, Yang Zhang, Guoqing Zhou, Becca Zandstein, Alejandro Chacon, Prashant Sohani, Maximilian Stadler, Christian Hundt, Feiwen Zhu, Christian Dallago, Bruno Trentini, Emine Kucukbenli, Saee Paliwal, Timur Rvachov, Eddie Calleja, Johnny Israeli, Harry Clifford, Risto Haukioja, Nicholas Haemel, Kyle Tretina, Neha Tadimeti, Anthony B Costa
View a PDF of the paper titled BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, by Peter St. John and 92 other authors
View PDF HTML (experimental)
Abstract:Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Cite as: arXiv:2411.10548 [cs.LG]
  (or arXiv:2411.10548v5 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2411.10548
arXiv-issued DOI via DataCite

Submission history

From: Micha Livne [view email]
[v1] Fri, 15 Nov 2024 19:46:16 UTC (1,210 KB)
[v2] Mon, 9 Jun 2025 22:23:50 UTC (1,230 KB)
[v3] Thu, 12 Jun 2025 14:06:28 UTC (1,230 KB)
[v4] Sat, 30 Aug 2025 04:41:58 UTC (367 KB)
[v5] Mon, 8 Sep 2025 19:12:19 UTC (1,230 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery, by Peter St. John and 92 other authors
  • View PDF
  • HTML (experimental)
  • TeX Source
view license

Current browse context:

cs.LG
< prev   |   next >
new | recent | 2024-11
Change to browse by:
cs
q-bio
q-bio.BM

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
Loading...

BibTeX formatted citation

Data provided by:

Bookmark

BibSonomy Reddit

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender (What is IArxiv?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
We gratefully acknowledge support from our major funders, member institutions, , and all contributors.
About · Help · Contact · Subscribe · Copyright · Privacy · Accessibility · Operational Status (opens in new tab)
Major funding support from
Simons Foundation Schmidt Sciences