In 2001, the world celebrated the first sequenced human genome, a monumental feat of science and technology that promised new insights into human biology and disease. It was also the beginning of the transformation of biology into a “big data” science, in which computation would become a crucial tool in understanding how genetic information is translated into physical traits.
Recognizing this shift in paradigm, Franco Preparata, now professor emeritus of computer science, guided Brown in launching one of the first undergraduate computational biology concentrations in the country. That program would eventually evolve into the Center for Computational Molecular Biology (CCMB), where researchers today use state-of-the-art data techniques such as machine learning and artificial intelligence to understand the genetic underpinnings of human life.
Sohini Ramachandran, a professor of ecology and organismal biology and director of CCMB as well as the Brown Data Science Initiative, uses advanced statistical methods, machine learning and other techniques to understand human genetic history and the genetic foundation for complex physical traits. Her work in deciphering the human genome earned her a Presidential Early Career Award for Scientists and Engineers, the highest U.S. government honor for early career researchers.
One example of Ramachandran’s work is a method that uses machine learning to search through human genomic datasets to find rare beneficial mutations. Most genetic mutations are neutral — neither helping nor hurting an individual’s chance of surviving and reproducing. But the ability to spot beneficial mutations that have spread through natural selection helps to reveal the evolutionary history of people around the world and to shed light on the evolutionary roots of medical conditions.
Genetics, Evolution and Human Health
More recently, Ramachandran has been investigating how the genetic architecture of complex traits and diseases may diverge in people of different ancestries. The vast majority of studies aimed at linking genes to medical conditions have used data from people of European ancestry, and it has been assumed that those findings apply to everyone regardless of ancestral background. But that is often not the case, according to research by Ramachandran and Lorin Crawford, an assistant professor of biostatistics and a CCMB faculty member.
Most diseases are not caused by a single genetic variant, but rather a suite of different mutations and genes that interact in complex ways. That means that disease-associated variants in one population aren’t always the same variants associated with disease in another population. Ramachandran and Crawford are developing methods to look at the genome at various scales, from single genes to gene networks, in order to account for variation across populations. Such work could help in establishing genetic risk factors and developing treatments for a wide variety of complex medical conditions.
Crawford, whose research earned awards from the Sloan Foundation and the David and Lucille Packard Foundation, among others, is also developing ways of using deep neural networks, a form of artificial intelligence, to discover associations between complex traits and genetic pathways. Deep neural networks are known to be good at detecting subtle patterns in large datasets. However, such systems are something of a “black box”— the way in which they arrive at their answers is somewhat mysterious, even to the people who design the systems. In genomics, that makes their output difficult to fully interpret and validate. Crawford is working on ways to open the black box. He has developed a method that supplies neural networks with biologically relevant annotations, which help to make their output more interpretable. The work is a step toward bringing the power of artificial intelligence to bear in understanding complex genetic traits.
Another CCMB researcher, Associate Professor Emilia Huerta-Sanchez, uses advanced data analysis and statistical modeling to examine how events in the distant past have helped to shape human genetic variation today. One area of focus is how interbreeding between archaic humans like Neanderthals and Denisovans has influenced the modern human genome.
In a landmark 2014 study published in the journal Nature, Huerta-Sanchez and colleagues found that a genetic variant that helps Tibetans live at extremely high altitudes came from ancient interbreeding with archaic Denisovans. More recently, she showed that this gene variant was likely introduced into the lineage around 50,000 years ago, but it remained selectively neutral — meaning not favored by natural selection — until around 9,000 years ago, when permanent inhabitation of the Tibetan highlands began. The findings are a striking example of how variation from ancient interbreeding can be helpful to a population centuries later.