TaxaAdapter: Vision Taxonomy Models are Key to Fine-grained Image Generation over the Tree of Life
arXiv:2603.26128v1 Announce Type: new Abstract: Accurately generating images across the Tree of Life is difficult: there are over 10M distinct species on Earth, many of which differ only by subtle visual traits. Despite the remarkable progress in text-to-image synthesis, existing models often fail to capture the fine-grained visual cues that define species identity, even when their outputs appear photo-realistic. To this end, we propose TaxaAdapter, a simple and lightweight approach that incorporates Vision Taxonomy Models (VTMs) such as BioCLIP to guide fine-grained species generation. Our me — Mridul Khurana, Amin Karimi Monsefi, Justin Lee, Medha Sawhney, David Carlyn, Julia Chae, Jianyang Gu, Rajiv Ramnath, Sara Beery, Wei-Lun Chao, Anuj Karpatne, Cheng Zhang
Authors:Mridul Khurana, Amin Karimi Monsefi, Justin Lee, Medha Sawhney, David Carlyn, Julia Chae, Jianyang Gu, Rajiv Ramnath, Sara Beery, Wei-Lun Chao, Anuj Karpatne, Cheng Zhang
View PDF HTML (experimental)
Abstract:Accurately generating images across the Tree of Life is difficult: there are over 10M distinct species on Earth, many of which differ only by subtle visual traits. Despite the remarkable progress in text-to-image synthesis, existing models often fail to capture the fine-grained visual cues that define species identity, even when their outputs appear photo-realistic. To this end, we propose TaxaAdapter, a simple and lightweight approach that incorporates Vision Taxonomy Models (VTMs) such as BioCLIP to guide fine-grained species generation. Our method injects VTM embeddings into a frozen text-to-image diffusion model, improving species-level fidelity while preserving flexible text control over attributes such as pose, style, and background. Extensive experiments demonstrate that TaxaAdapter consistently improves morphology fidelity and species-identity accuracy over strong baselines, with a cleaner architecture and training recipe. To better evaluate these improvements, we also introduce a multimodal Large Language Model-based metric that summarizes trait-level descriptions from generated and real images, providing a more interpretable measure of morphological consistency. Beyond this, we observe that TaxaAdapter exhibits strong generalization capabilities, enabling species synthesis in challenging regimes such as few-shot species with only a handful of training images and even species unseen during training. Overall, our results highlight that VTMs are a key ingredient for scalable, fine-grained species generation.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.26128 [cs.CV]
(or arXiv:2603.26128v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.26128
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Mridul Khurana [view email] [v1] Fri, 27 Mar 2026 07:22:43 UTC (38,303 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivPredicting new research directions in materials science using large language models and concept graphs
Nature Machine Intelligence, Published online: 01 April 2026; doi:10.1038/s42256-026-01206-y Marwitz et al. demonstrate the use of large language models to build semantic concept graphs from materials science abstracts and train a machine learning model to predict emerging topic combinations from historical data. They show that the model enables experts to find suggestions that can inspire new research.
Show HN: Semantic atlas of 188 constitutions in 3D (30k articles, embeddings)
I built this after noticing that existing tools for comparing constitutional law either have steep learning curves or only support keyword search. By combining Gemini embeddings with UMAP projection, you can navigate 30,828 constitutional articles from 188 countries in 3D and find conceptually related provisions even when the wording differs. Feedback welcome, especially from legal researchers or comparative law folks. Source and pipeline: github.com/joaoli13/constitutional-map-ai Comments URL: https://news.ycombinator.com/item?id=47609372 Points: 4 # Comments: 0
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPWEh6U2I5SmhLcnhXMzZCRExEaC1RRV81ZVFMcWVpeUJ5eXpqYjlkbkZWSWhtSDZ6SmxJcnI1Ni03eDdrdUIwaVZwZjc1NTFLUmxIdTRXcXJwcDNPTzVJUDZhYVJoU3pkTzhPczZYUW9kVXIyU1N1M2NVb1Qyd0gwUmNiRU1xR3dSTVFMdExzalhwTDVmZ1dIUkZ0TG9LQjg5S3JGTEFNdXhzX05HYl95VHh5MGFRbEk2NkdhbzIwVTgtV3pEeWY2cXEtbmEyX0lPTDdkRkhKSWZDcnRSdzhkM29GUEpXWVF2bUhJbXgyWjNWUUtpQlMtZWdVT3Z0cTB2SmpfaUJlMEJVX2s1OHhSVnFHSS1MSnU0S2F1akhWdFJjX1pqTy1nYmdndUhpc2oxNTBDVldNWEI5dEl3dHQ4eW1fS1hkTXNzdGNfX0lCZldRZ3pvbzBGaEE1T0dMYjY3VTNZZUpEQVhMTGpJOHNFWmZoRmtuRWdTbmxQUnBLTXI3ZXlBS2hJOTdRcktTb0l5WE9QaDBWdjFmdGREM1NfRVJSVno3ZG1yYkpVNFFNdHR0NG11Sjg2Qw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보
<a href="https://news.google.com/rss/articles/CBMiiAFBVV95cUxQX01lN01zSTlDZFVxclRsbFg3Z2ZMck5NRGNubER1YS1CYkE1d2I4eVRCMDlBRVI0RjNxSl9MNTBEdkNVTFpwOXVMWHhKVXdVS1NFWVlaUy05OERFbVo4SjB0cFZucG5QaWppclEwa1NOakYwY2NsLXZiRU9oMlVOX2dQWDEyVjBt?oc=5" target="_blank">Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference</a> <font color="#6f6f6f">조선일보</font>
Riyadh conference to discuss role of AI in media industry - Arab News PK
<a href="https://news.google.com/rss/articles/CBMiTEFVX3lxTE1oNXFyTlkxMjJORkNoaXQ1UWg5RklsTldyNE9EX0hhNUxVTFNZMDcxclZySHczNnFERWtGdno1UW1JaFg0aFJseHhXNTY?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> <font color="#6f6f6f">Arab News PK</font>
GENPACK: KPI-Guided Multi-Criteria Genetic Algorithm for Industrial 3D Bin Packing
arXiv:2601.11325v3 Announce Type: replace Abstract: The three-dimensional bin packing problem (3D-BPP) is a longstanding challenge in operations research and logistics. While classical heuristics and constructive methods can generate packings efficiently, they often fail to satisfy industrial requirements such as stability, balance, and handling feasibility. Metaheuristics such as genetic algorithms (GAs) offer greater flexibility, but pure GA approaches frequently struggle with efficiency, parameter sensitivity, and scalability to industrial order sizes. These limitations are particularly evident at real-world pallet dimensions, where even state-of-the-art methods often fail to produce robust, deployable solutions. We propose a KPI-guided GA-based pipeline for industrial 3D-BPP that integ
PRISM: Differentiable Analysis-by-Synthesis for Fixel Recovery in Diffusion MRI
arXiv:2604.00250v1 Announce Type: new Abstract: Diffusion MRI microstructure fitting is nonconvex and often performed voxelwise, which limits fiber peak recovery in narrow crossings. This work introduces PRISM, a differentiable analysis-by-synthesis framework that fits an explicit multi-compartment forward model end-to-end over spatial patches. The model combines cerebrospinal fluid (CSF), gray matter, up to K white-matter fiber compartments (stick-and-zeppelin), and a restricted compartment, with explicit fiber directions and soft model selection via repulsion and sparsity priors. PRISM supports a fast MSE objective and a Rician negative log-likelihood (NLL) that jointly learns sigma without oracle information. A lightweight nuisance calibration module (smooth bias field and per-measureme
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!