A collaboration between Parse Biosciences’ GigaLab and Vevo Therapeutics has generated what they say is the largest single-cell dataset to date. The Tahoe-100M dataset comprises 100 million cells and covers 60,000 conditions, 1,200 drug treatments, and 50 tumor models. Vevo plans to use the dataset to advance its artificial intelligence-based drug discovery efforts.
The partnership leveraged single-cell RNA sequencing capabilities from Parse as well as high-throughput sequencing functionality from Ultima Genomics. The project was completed in about a month.
According to Johnny Yu, PhD, Vevo’s CSO and co-founder, “The dataset is an important step forward for the Vevo team and the Mosaic platform.” Mosaic is designed to generate high-resolution in vivo data at scale. The company claims that its platform can measure how drugs impact cells from hundreds of patients generating millions of data points on changes in gene expression.
The Tahoe-100 dataset contains entirely of data from perturbing diseased cells and “is 50x larger than all the public drug-perturbed single-cell data,” according to Vevo CEO and co-founder Nima Alidoust. Like many companies, the company is betting that AI-based tools can ingest this information and make interesting connections between drugs and disease pathways that open up new therapeutic opportunities. A key challenge for drug developers is getting enough data to train models that are up to the task.
“The Tahoe-100M atlas entirely changes the game, allowing us to train much larger AI models that can better learn the language of the cell,” noted Hani Goodarzi, PhD, Vevo co-founder, associate professor at the University of California, San Francisco, and a core investigator at Arc Institute.
Vevo plans to combine data from its single-cell atlas with AI models to search for novel targets and pathways for major cancer subtypes, as well as drug compounds that target these pathways. “Over the past two years, we’ve refined our platform and with access to the Parse GigaLab, we can now generate the data needed to power AI-based drug discovery at massive speed and scale,” Yu said.
Vevo plans to announce additional collaborations around the dataset in the first quarter of next year.
The partnership also helps demonstrate the benefits of Parse’s Evercode technology which powers GigaLab, specifically its ability to “deliver speed, quality, and immense scalability,” said Alex Rosenberg, Parse’s CEO and co-founder. GigaLab, an initiative that Parse launched earlier this year, targets researchers working on projects of 10 million single cells and larger for a range of applications. At the time of the launch, the company claimed a capacity of profiling 2.5 billion cells per year with plans to grow over time.
Besides the Vevo partnership, Parse is exploring other large-scale projects that could leverage GigaLab’s single-cell sequencing capabilities including partnerships with biopharma companies as well as with large consortia.