Research Papers research paper arxiv computer-vision image-recognition

TaxaAdapter: Vision Taxonomy Models are Key to Fine-grained Image Generation over the Tree of Life

arXivMarch 30, 202610 min read0 views

arXiv:2603.26128v1 Announce Type: new Abstract: Accurately generating images across the Tree of Life is difficult: there are over 10M distinct species on Earth, many of which differ only by subtle visual traits. Despite the remarkable progress in text-to-image synthesis, existing models often fail to capture the fine-grained visual cues that define species identity, even when their outputs appear photo-realistic. To this end, we propose TaxaAdapter, a simple and lightweight approach that incorporates Vision Taxonomy Models (VTMs) such as BioCLIP to guide fine-grained species generation. Our me — Mridul Khurana, Amin Karimi Monsefi, Justin Lee, Medha Sawhney, David Carlyn, Julia Chae, Jianyang Gu, Rajiv Ramnath, Sara Beery, Wei-Lun Chao, Anuj Karpatne, Cheng Zhang

Authors:Mridul Khurana, Amin Karimi Monsefi, Justin Lee, Medha Sawhney, David Carlyn, Julia Chae, Jianyang Gu, Rajiv Ramnath, Sara Beery, Wei-Lun Chao, Anuj Karpatne, Cheng Zhang

View PDF HTML (experimental)

Abstract:Accurately generating images across the Tree of Life is difficult: there are over 10M distinct species on Earth, many of which differ only by subtle visual traits. Despite the remarkable progress in text-to-image synthesis, existing models often fail to capture the fine-grained visual cues that define species identity, even when their outputs appear photo-realistic. To this end, we propose TaxaAdapter, a simple and lightweight approach that incorporates Vision Taxonomy Models (VTMs) such as BioCLIP to guide fine-grained species generation. Our method injects VTM embeddings into a frozen text-to-image diffusion model, improving species-level fidelity while preserving flexible text control over attributes such as pose, style, and background. Extensive experiments demonstrate that TaxaAdapter consistently improves morphology fidelity and species-identity accuracy over strong baselines, with a cleaner architecture and training recipe. To better evaluate these improvements, we also introduce a multimodal Large Language Model-based metric that summarizes trait-level descriptions from generated and real images, providing a more interpretable measure of morphological consistency. Beyond this, we observe that TaxaAdapter exhibits strong generalization capabilities, enabling species synthesis in challenging regimes such as few-shot species with only a handful of training images and even species unseen during training. Overall, our results highlight that VTMs are a key ingredient for scalable, fine-grained species generation.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26128 [cs.CV]

(or arXiv:2603.26128v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26128

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mridul Khurana [view email] [v1] Fri, 27 Mar 2026 07:22:43 UTC (38,303 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26128

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Predicting new research directions in materials science using large language models and concept graphs

Nature Machine Intelligence, Published online: 01 April 2026; doi:10.1038/s42256-026-01206-y Marwitz et al. demonstrate the use of large language models to build semantic concept graphs from materials science abstracts and train a machine learning model to predict emerging topic combinations from historical data. They show that the model enables experts to find suggestions that can inspire new research.

Nature Machine Intelligence

1m1 day ago

Laws & RegulationFresh

Show HN: Semantic atlas of 188 constitutions in 3D (30k articles, embeddings)

I built this after noticing that existing tools for comparing constitutional law either have steep learning curves or only support keyword search. By combining Gemini embeddings with UMAP projection, you can navigate 30,828 constitutional articles from 188 countries in 3D and find conceptually related provisions even when the wording differs. Feedback welcome, especially from legal researchers or comparative law folks. Source and pipeline: github.com/joaoli13/constitutional-map-ai Comments URL: https://news.ycombinator.com/item?id=47609372 Points: 4 # Comments: 0

Hacker News Top

1mabout 3 hours ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPWEh6U2I5SmhLcnhXMzZCRExEaC1RRV81ZVFMcWVpeUJ5eXpqYjlkbkZWSWhtSDZ6SmxJcnI1Ni03eDdrdUIwaVZwZjc1NTFLUmxIdTRXcXJwcDNPTzVJUDZhYVJoU3pkTzhPczZYUW9kVXIyU1N1M2NVb1Qyd0gwUmNiRU1xR3dSTVFMdExzalhwTDVmZ1dIUkZ0TG9LQjg5S3JGTEFNdXhzX05HYl95VHh5MGFRbEk2NkdhbzIwVTgtV3pEeWY2cXEtbmEyX0lPTDdkRkhKSWZDcnRSdzhkM29GUEpXWVF2bUhJbXgyWjNWUUtpQlMtZWdVT3Z0cTB2SmpfaUJlMEJVX2s1OHhSVnFHSS1MSnU0S2F1akhWdFJjX1pqTy1nYmdndUhpc2oxNTBDVldNWEI5dEl3dHQ4eW1fS1hkTXNzdGNfX0lCZldRZ3pvbzBGaEE1T0dMYjY3VTNZZUpEQVhMTGpJOHNFWmZoRmtuRWdTbmxQUnBLTXI3ZXlBS2hJOTdRcktTb0l5WE9QaDBWdjFmdGREM1NfRVJSVno3ZG1yYkpVNFFNdHR0NG11Sjg2Qw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1m1 day ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 119 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보

<a href="https://news.google.com/rss/articles/CBMiiAFBVV95cUxQX01lN01zSTlDZFVxclRsbFg3Z2ZMck5NRGNubER1YS1CYkE1d2I4eVRCMDlBRVI0RjNxSl9MNTBEdkNVTFpwOXVMWHhKVXdVS1NFWVlaUy05OERFbVo4SjB0cFZucG5QaWppclEwa1NOakYwY2NsLXZiRU9oMlVOX2dQWDEyVjBt?oc=5" target="_blank">Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference</a> 조선일보

GNews AI Samsung

1m40 minutes ago

Research PapersRecent

Riyadh conference to discuss role of AI in media industry - Arab News PK

<a href="https://news.google.com/rss/articles/CBMiTEFVX3lxTE1oNXFyTlkxMjJORkNoaXQ1UWg5RklsTldyNE9EX0hhNUxVTFNZMDcxclZySHczNnFERWtGdno1UW1JaFg0aFJseHhXNTY?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> Arab News PK

GNews AI Saudi Arabia

1mabout 16 hours ago

Research PapersLive

GENPACK: KPI-Guided Multi-Criteria Genetic Algorithm for Industrial 3D Bin Packing

arXiv:2601.11325v3 Announce Type: replace Abstract: The three-dimensional bin packing problem (3D-BPP) is a longstanding challenge in operations research and logistics. While classical heuristics and constructive methods can generate packings efficiently, they often fail to satisfy industrial requirements such as stability, balance, and handling feasibility. Metaheuristics such as genetic algorithms (GAs) offer greater flexibility, but pure GA approaches frequently struggle with efficiency, parameter sensitivity, and scalability to industrial order sizes. These limitations are particularly evident at real-world pallet dimensions, where even state-of-the-art methods often fail to produce robust, deployable solutions. We propose a KPI-guided GA-based pipeline for industrial 3D-BPP that integ

arXiv cs.NE

1mabout 1 hour ago

Research PapersLive

PRISM: Differentiable Analysis-by-Synthesis for Fixel Recovery in Diffusion MRI

arXiv:2604.00250v1 Announce Type: new Abstract: Diffusion MRI microstructure fitting is nonconvex and often performed voxelwise, which limits fiber peak recovery in narrow crossings. This work introduces PRISM, a differentiable analysis-by-synthesis framework that fits an explicit multi-compartment forward model end-to-end over spatial patches. The model combines cerebrospinal fluid (CSF), gray matter, up to K white-matter fiber compartments (stick-and-zeppelin), and a restricted compartment, with explicit fiber directions and soft model selection via repulsion and sparsity priors. PRISM supports a fast MSE objective and a Rician negative log-likelihood (NLL) that jointly learns sigma without oracle information. A lightweight nuisance calibration module (smooth bias field and per-measureme

arXiv cs.CV

2mabout 1 hour ago