3 Questions: Using computation to study the world’s best single-celled chemists
Assistant Professor Yunha Hwang utilizes microbial genomes to examine the language of biology. Her appointment reflects MIT’s commitment to exploring the intersection of genetics research and AI.
Today, out of an estimated 1 trillion species on Earth, 99.999 percent are considered microbial — bacteria, archaea, viruses, and single-celled eukaryotes. For much of our planet’s history, microbes ruled the Earth, able to live and thrive in the most extreme of environments. Researchers have only just begun in the last few decades to contend with the diversity of microbes — it’s estimated that less than 1 percent of known genes have laboratory-validated functions. Computational approaches offer researchers the opportunity to strategically parse this truly astounding amount of information.
An environmental microbiologist and computer scientist by training, new MIT faculty member Yunha Hwang is interested in the novel biology revealed by the most diverse and prolific life form on Earth. In a shared faculty position as the Samuel A. Goldblith Career Development Professor in the Department of Biology, as well as an assistant professor at the Department of Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, Hwang is exploring the intersection of computation and biology.
Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?
A: Extreme environments are great places to look for interesting biology. I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth. And the only thing that lives in those extreme environments are microbes. During a sampling expedition that I took part in off the coast of Mexico, we discovered a colorful microbial mat about 2 kilometers underwater that flourished because the bacteria breathed sulfur instead of oxygen — but none of the microbes I was hoping to study would grow in the lab.
The biggest challenge in studying microbes is that a majority of them cannot be cultivated, which means that the only way to study their biology is through a method called metagenomics. My latest work is genomic language modeling. We’re hoping to develop a computational system so we can probe the organism as much as possible “in silico,” just using sequence data. A genomic language model is technically a large language model, except the language is DNA as opposed to human language. It’s trained in a similar way, just in biological language as opposed to English or French. If our objective is to learn the language of biology, we should leverage the diversity of microbial genomes. Even though we have a lot of data, and even as more samples become available, we’ve just scratched the surface of microbial diversity.
Q: Given how diverse microbes are and how little we understand about them, how can studying microbes in silico, using genomic language modeling, advance our understanding of the microbial genome?
A: A genome is many millions of letters. A human cannot possibly look at that and make sense of it. We can program a machine, though, to segment data into pieces that are useful. That’s sort of how bioinformatics works with a single genome. But if you’re looking at a gram of soil, which can contain thousands of unique genomes, that’s just too much data to work with — a human and a computer together are necessary in order to grapple with that data.
During my PhD and master’s degree, we were only just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things that we just called “microbial dark matter.” When there are a lot of uncharacterized things, that’s where machine learning can be really useful, because we’re just looking for patterns — but that’s not the end goal. What we hope to do is to map these patterns to evolutionary relationships between each genome, each microbe, and each instance of life.
Previously, we’ve been thinking about proteins as a standalone entity — that gets us to a decent degree of information because proteins are related by homology, and therefore things that are evolutionarily related might have a similar function.
What is known about microbiology is that proteins are encoded into genomes, and the context in which that protein is bounded — what regions come before and after — is evolutionarily conserved, especially if there is a functional coupling. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, then you might want them located right next to each other.
What I want to do is incorporate more of that genomic context in the way that we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity to add contextual information to how we understand proteins and hypothesize about their functions.
Q: How can your research be applied to harnessing the functional potential of microbes?
A: Microbes are possibly the world’s best chemists. Leveraging microbial metabolism and biochemistry will lead to more sustainable and more efficient methods for producing new materials, new therapeutics, and new types of polymers.
But it’s not just about efficiency — microbes are doing chemistry we don’t even know how to think about. Understanding how microbes work, and being able to understand their genomic makeup and their functional capacity, will also be really important as we think about how our world and climate are changing. A majority of carbon sequestration and nutrient cycling is undertaken by microbes; if we don’t understand how a given microbe is able to fix nitrogen or carbon, then we will face difficulties in modeling the nutrient fluxes of the Earth.
On the more therapeutic side, infectious diseases are a real and growing threat. Understanding how microbes behave in diverse environments relative to the rest of our microbiome is really important as we think about the future and combating microbial pathogens.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
assistantstudyresearch
AI is Driving Cognitive Surrender Whilst Influencing Confidence Levels
AI has rapidly transformed how people access information and make decisions. Tools like ChatGPT offer speed, convenience and support for everyday tasks, however growing evidence suggested overreliance on AI may influence how we think, reason and evaluate information. The research from the University of Pennsylvania’s Wharton School of Business has reviewed 1,300 subjects use of [ ] The post AI is Driving Cognitive Surrender Whilst Influencing Confidence Levels appeared first on DIGIT .

98% of Firms Struggling to Manage Wireless as AI Explodes
Wi-Fi has evolved into a strategic growth engine delivering exponential value for enterprises, according to new research from Cisco, to the extent that a single network investment drives returns across employee productivity, customer engagement, and revenue. Polling more than 6,000 global wireless professionals, Cisco’s latest State of Wireless report found that 80% of large businesses [ ] The post 98% of Firms Struggling to Manage Wireless as AI Explodes appeared first on DIGIT .

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models
As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and gender, socioeconomic status bias remains significantly underexplored despite its widespread implications in the real world. We introduce SocioEval, a template-based framework for systematically evaluating socioeconomic bias in foundation models through decision-making tasks. Our hierarchical framework encompasses 8 theme — Divyanshu Kumar, Ishita Gupta, Nitin Aravind Birur
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

98% of Firms Struggling to Manage Wireless as AI Explodes
Wi-Fi has evolved into a strategic growth engine delivering exponential value for enterprises, according to new research from Cisco, to the extent that a single network investment drives returns across employee productivity, customer engagement, and revenue. Polling more than 6,000 global wireless professionals, Cisco’s latest State of Wireless report found that 80% of large businesses [ ] The post 98% of Firms Struggling to Manage Wireless as AI Explodes appeared first on DIGIT .

ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts
arXiv:2604.02771v1 Announce Type: new Abstract: Smart contracts are increasingly targeted by adversaries employing obfuscation techniques such as bogus code injection and control flow manipulation to evade vulnerability detection. Existing multimodal methods often process semantic, temporal, and structural features in isolation and fuse them using simple strategies such as concatenation, which neglects cross-modal interactions and weakens robustness, as obfuscation of a single modality can sharply degrade detection accuracy. To address these challenges, we propose ContractShield, a robust multimodal framework with a novel fusion mechanism that effectively correlates multiple complementary features through a three-level fusion. Self-attention first identifies patterns that indicate vulnerab




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!