Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeVariance, which develops AI agents for compliance and fraud investigations, raised a $21.5M Series A led by Ten Eleven Ventures and joined by YC and others (Ryan Lawler/Axios)TechmemeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityBuilding Human Resilience for the Age of AI - Elon UniversityGoogle News: AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude can now use your computer to finish tasks for you in AI agent push - MSNGoogle News: ClaudeVariance, which develops AI agents for compliance and fraud investigations, raised a $21.5M Series A led by Ten Eleven Ventures and joined by YC and others (Ryan Lawler/Axios)TechmemeHow to Test Discord Webhooks with HookCapDEV CommunitySaaS Pricing Models Decoded: What Per-Seat, Usage-Based, and Flat-Rate Really Cost YouDEV CommunityClaude Code hooks: intercept every tool call before it runsDEV CommunityHow to Test Twilio Webhooks with HookCapDEV CommunityI'm an AI Agent That Built Its Own Training Data PipelineDEV CommunityMy React Portfolio SEO Checklist: From 0 to Rich Results in 48 HoursDEV CommunityWhy AI Agents Need a Trust Layer (And How We Built One)DEV CommunityBuilding a scoring engine with pure TypeScript functions (no ML, no backend)DEV Community🚀 I Vibecoded an AI Interview Simulator in 1 Hour using Gemini + GroqDEV CommunityBuilding Human Resilience for the Age of AI - Elon UniversityGoogle News: AI

LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2407.15264v2 Announce Type: replace-cross Abstract: Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, thi — Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu

View PDF HTML (experimental)

Abstract:Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use graph partitioning and sharding techniques to scale up across multiple GPUs within a node and/or scale out across multiple nodes. However, this approach suffers from the high computational costs of graph partitioning algorithms and inefficient communication across GPUs. To address these overheads, we propose Large-scale Storage-based Multi-GPU GNN framework (LSM-GNN), a storage-based approach to train GNN models that utilizes a novel communication layer enabling GPU software caches to function as a system-wide shared cache with low overheads. LSM-GNN incorporates a hybrid eviction policy that intelligently manages cache space by using both static and dynamic node information to significantly enhance cache performance. Furthermore, we introduce the Preemptive Victim-buffer Prefetcher (PVP), a mechanism for prefetching node feature data from a Victim Buffer located in CPU pinned-memory to further reduce the pressure on the storage devices. Experimental results show that despite the lower compute capabilities and memory capacities, LSM-GNN in a single node with two GPUs offers superior performance over two-node-four-GPU Dist-DGL baseline and provides up to 3.75x speed up on end-to-end epoch time while running large-scale GNN training

Subjects:

Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Cite as: arXiv:2407.15264 [cs.DC]

(or arXiv:2407.15264v2 [cs.DC] for this version)

https://doi.org/10.48550/arXiv.2407.15264

arXiv-issued DOI via DataCite

Submission history

From: Jeongmin Park [view email] [v1] Sun, 21 Jul 2024 20:41:39 UTC (3,844 KB) [v2] Sat, 28 Mar 2026 00:47:07 UTC (1,794 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
LSM-GNN: La…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 189 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers