Research Papers research paper arxiv ai artificial-intelligence

ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning

arXivby [Submitted on 30 Mar 2026]March 31, 20262 min read1 views

arXiv:2603.28575v1 Announce Type: cross Abstract: The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified — Mohamad Koohi-Moghadam, Hongzhe Sun, Hongyan Li, Kyongtae Tyler Bae

View PDF

Abstract:The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified representations based on shared anticancer activities rather than structural similarity. We compiled complementary datasets comprising 44,854 unique organic compounds and 5,164 unique metal complexes, standardized across 60 cancer cell lines. By training parallel encoders with activity-aware hard negative mining, we mapped structurally distinct compounds into a shared 256-dimensional embedding space where biologically similar compounds cluster together regardless of chemical class. We systematically evaluated four molecular encoding strategies: Morgan fingerprints, ChemBERTa, MolFormer, and Chemprop, through quantitative alignment metrics, embedding visualizations, and downstream classification tasks. Morgan fingerprints achieved superior performance with an average alignment ratio of 0.899 and downstream classification AUCs of 0.859 (inorganic) and 0.817 (organic). This work establishes contrastive learning as an effective strategy for unifying disparate chemical domains and provides empirical guidance for encoder selection in multi-modal chemistry applications, with implications extending beyond anticancer drug discovery to any scenario requiring cross-domain chemical knowledge transfer.

Comments: 15 pages

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.28575 [cs.LG]

(or arXiv:2603.28575v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28575

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mohamad Koohi-Moghadam [view email] [v1] Mon, 30 Mar 2026 15:28:36 UTC (873 KB)

Original source

arXiv

https://arxiv.org/abs/2603.28575

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market NewsLive

Anthropic just paid $400 million for a startup with fewer than 10 people

Anthropic has acquired Coefficient Bio, a stealth biotech AI startup founded barely eight months ago, in an all-stock deal worth just over $400 million. The acquisition brings a team of fewer than 10 people, nearly all former Genentech computational biology researchers, into Anthropic’s healthcare and life sciences division, and it signals something larger than a [ ] This story continues at The Next Web

The Next Web Neural

5mabout 1 hour ago

Research PapersFresh

Picking Up 'Skull Vibrations'? Could Be XR Headset Authentication

"Skull vibration harmonics generated by vital signs" can be used to sign in to VR, AR, and MR headsets, according to emerging research.

Dark Reading

1mabout 3 hours ago

ProductsLive

[R] Differentiable Clustering & Search !

Hey guys, I occasionally write articles on my blog, and I am happy to share the new one with you : https://bornlex.github.io/posts/differentiable-clustering/ . It came from something I was working for at work, and we ended up implementing something else because of the constraints that we have. The method mixes different loss terms to achieve a differentiable clustering method that takes into account mutual info, semantic proximity and even constraints such as the developer enforcing two tags (could be documents) to be part of the same cluster. Then it is possible to search the catalog using the clusters. All of it comes from my mind, I used an AI to double check the sentences, spelling, so it might have rewritten a few sentences, but most of it is human made. I've added the research flair

Reddit r/MachineLearning

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning

Submission history

Daily AI Digest

More about

Anthropic just paid $400 million for a startup with fewer than 10 people

Picking Up 'Skull Vibrations'? Could Be XR Headset Authentication

[R] Differentiable Clustering & Search !

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

Picking Up 'Skull Vibrations'? Could Be XR Headset Authentication

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Experts to address AI at BC3 cybersecurity conference - Butler Eagle

TROY student Eli Hankinson showcases research on AI and interactive learning at regional conference - Troy University