MDI Biological Laboratory

07/13/2024 | News release | Distributed by Public on 07/13/2024 08:40

MDI Bio Lab Data Team Taps the Cloud

Bioinformatics

MDI Bio Lab Data Team Taps the Cloud

  • July 13, 2024
Big data analysis is steadily moving away from local servers and into "the cloud". And the team in MDI Bio Lab's Computational Genomics and Data Science Core is playing a role in the biomedical world's transition.

The National Institutes of Health is helping researchers affordably access cloud services and environments while solving biological problems. As part of an NIH initiative, MDI Bio Lab collaborated with two commercial cloud service providers, Google and Deloitte, to develop a training module for cloud-based analysis of specific types of molecular activity.

Led by MDI Bio Lab Comparative Genomics and Data Science Core Director Joel Graber, Ph.D., and primarily implemented by analyst Ryan Seaman, the module addresses the use of software, data, and cloud services to make a deep dive on a cell sample's entire set of RNA - the messenger molecules that carry DNA instructions to tell a cell what proteins to make.

Taken together, a cell's RNA set is called its transcriptome, and its study called transcriptomics - the creation of a detailed snapshot of gene expression at a given moment in a cell's life. The training module focuses on using raw data to refine our knowledge of the possible transcripts that can be generated by each gene, information that is often incomplete for less well-studied organisms, such as the African turquoise killifish or axolotl (Mexican salamander).

It's one element in the growing field called "single-cell omics" which relies on crunching extraordinary amounts of data on RNA, DNA, proteins and other molecules in a sample to characterize an individual cell's functions.

Thanks to cloud services, individual researchers can remotely access the massive computer power needed to process these complex analyses. And they can rent that computing time by the hour - much less expensively than maintaining huge servers on premises.

As a bioinformatician, Seaman works as a human bridge connecting scientist and sample to data and remote server.

"Just this morning, I set up a computer for one of our summer students that's probably about 10 times more powerful than a desktop computer," Seaman says. "And it's like $1.50 an hour. So we can just use it for a couple hours. And then when we're done, just turn it off. Ten bucks, and you get basically a mega-computer."

A 2023 graduate of Colby College, Seaman was first exposed to genomics analysis during a short course at MDI Bio Lab, sponsored by the Maine INBRE (an NIH-funded network of 17 higher education and research institutions led by MDI Bio Lab).

Administered by the NIH's NationaI Institute of General Medical Sciences, the national INBRE program is supporting the creation of cloud-based bioinformatics modules at INBRE networks around the country. The overall NIH program is called the Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.

"It was just supposed to be a module to teach people how to do bioinformatics in the cloud," Seaman says. "I had no idea it was going to turn out to be a published paper."

Seaman came to the Lab as an entry-level analyst, and quickly bloomed into a critical part of the Computational Genomics and Data Science Core team, according to Graber, who is a co-author on the recent publication.

"He came in and helped drive the project to its conclusion," Graber says.

Research reported in this article was supported by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant numbers P20GM103423.