Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessTwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityRescuing 216 Pages from the GeoCities Era: How I Built an HTML-to-Blogger ToolDEV CommunityQ&A: AWS on new AI agents, quantum computing in healthcare - MobiHealthNewsGNews AI quantumGemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models - eWeekGNews AI GemmaBlack Hat USADark ReadingBlack Hat AsiaAI BusinessTwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityRescuing 216 Pages from the GeoCities Era: How I Built an HTML-to-Blogger ToolDEV CommunityQ&A: AWS on new AI agents, quantum computing in healthcare - MobiHealthNewsGNews AI quantumGemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models - eWeekGNews AI Gemma
AI NEWS HUBbyEIGENVECTOREigenvector

ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning

arXivby [Submitted on 30 Mar 2026]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.28575v1 Announce Type: cross Abstract: The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified — Mohamad Koohi-Moghadam, Hongzhe Sun, Hongyan Li, Kyongtae Tyler Bae

View PDF

Abstract:The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified representations based on shared anticancer activities rather than structural similarity. We compiled complementary datasets comprising 44,854 unique organic compounds and 5,164 unique metal complexes, standardized across 60 cancer cell lines. By training parallel encoders with activity-aware hard negative mining, we mapped structurally distinct compounds into a shared 256-dimensional embedding space where biologically similar compounds cluster together regardless of chemical class. We systematically evaluated four molecular encoding strategies: Morgan fingerprints, ChemBERTa, MolFormer, and Chemprop, through quantitative alignment metrics, embedding visualizations, and downstream classification tasks. Morgan fingerprints achieved superior performance with an average alignment ratio of 0.899 and downstream classification AUCs of 0.859 (inorganic) and 0.817 (organic). This work establishes contrastive learning as an effective strategy for unifying disparate chemical domains and provides empirical guidance for encoder selection in multi-modal chemistry applications, with implications extending beyond anticancer drug discovery to any scenario requiring cross-domain chemical knowledge transfer.

Comments: 15 pages

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.28575 [cs.LG]

(or arXiv:2603.28575v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28575

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mohamad Koohi-Moghadam [view email] [v1] Mon, 30 Mar 2026 15:28:36 UTC (873 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ChemCLIP: B…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!