Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHigh-Precision OCR for Medical Device Labeling with RF-DETR and Gemini 2.5 FlashRoboflow BlogNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAI Asked ChatGPT To Explain Ethereum to Me Like I’m 12 - Yahoo Finance UKGoogle News: ChatGPTOpenAI Called The One Person AI Startup And Three Founders Proved It - ForbesGoogle News: OpenAItrunk/3dcc1a51f1fb1700a975d91d24f44be49f60e45dPyTorch ReleasesAnthropic Just Leaked Its Own AI Secrets. Here’s What It Means for You.Towards AITutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking ModelReddit r/LocalLLaMAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIDeep Machine Learning - Artificial Neural Network - - TradingViewGoogle News: Machine LearningChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch ReleasesBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHigh-Precision OCR for Medical Device Labeling with RF-DETR and Gemini 2.5 FlashRoboflow BlogNvidia’s AI Powerhouse Rally Ignites Fresh Wall Street Hype - TipRanksGNews AI NVIDIAI Asked ChatGPT To Explain Ethereum to Me Like I’m 12 - Yahoo Finance UKGoogle News: ChatGPTOpenAI Called The One Person AI Startup And Three Founders Proved It - ForbesGoogle News: OpenAItrunk/3dcc1a51f1fb1700a975d91d24f44be49f60e45dPyTorch ReleasesAnthropic Just Leaked Its Own AI Secrets. Here’s What It Means for You.Towards AITutorial - How to Toggle On/OFf the Thinking Mode Directly in LM Studio for Any Thinking ModelReddit r/LocalLLaMAThe Real Reason OpenAI Shut Sora Down Is a Warning to Every AI Startup - FuturismGoogle News: OpenAIDeep Machine Learning - Artificial Neural Network - - TradingViewGoogle News: Machine LearningChinese firms market Iran war intelligence ‘exposing’ U.S. forces - The Washington PostGNews AI military[P] Implemented ACT-R cognitive decay and hyperdimensional computing for AI agent memory (open source)Reddit r/MachineLearningtrunk/8c8414e5c03f21b5405acc2fd9115f4448dcd08a: revert https://github.com/pytorch/pytorch/pull/172340 (#179151)PyTorch Releases
AI NEWS HUBbyEIGENVECTOREigenvector

NVIDIA L40S GPUs are here

Replicate BlogNovember 15, 20241 min read0 views
Source Quiz

NVIDIA L40S GPUs are here, with better performance and lower cost.

Posted November 15, 2024 by

  • zeke

Today we added NVIDIA L40S GPUs to our supported hardware types. These new GPUs are around 40% faster than A40 GPUs.

We’re also going to be removing support for A40 GPUs. We will begin migrating all existing models and deployments from A40 GPUs to L40S GPUs over the coming weeks. You’ll continue to pay the same price for your private models and deployments, but you might pay more if you’re using public models or training models on A40 GPUs.

You can now run L40S GPUs for any new models, existing models, or deployments. To learn how to change the hardware type for your models and deployments, check out the docs.

Starting today, you have the option to switch any of your existing models and deployments to L40S GPUs, but you are not required to do so. If you choose not to switch, your models and deployments will continue to run on A40 GPUs for another few weeks until we migrate them to L40S GPUs.

Migration timeline

To give you better performance per dollar, we will begin migrating all existing models and deployments from A40 GPUs to L40S GPUs over the coming weeks:

  • 2024-12-02: Public model migration. We will begin migrating all public models running on A40 GPUs, and will finish migrating all models by 2024-12-04.

  • 2024-12-09: Private model and deployment migration. We will begin migrating all private models and deployments running on A40 GPUs, and will finish migrating all models by 2024-12-11. A40 GPUs will no longer be available after this date.

Speed and cost

The L40S GPUs are more expensive per second than the A40 GPUs, but they are also faster.

Our benchmarks across the highest-usage A40 models today show a ~40% median speed improvement when migrating from A40 GPUs to L40S GPUs. For more detail, see our Observable notebook which covers our benchmarking methodology, collection methods, and results.

How much you’ll pay if your models are migrated for L40S GPUs depends on what you’re using them for:

  • Private models and deployments: If you’re currently using A40 GPUs for private models or deployments, this is all good news: you’ll continue to pay the same price you pay today after we’ve migrated them to L40S GPUs. Because these new GPUs are faster, your bill will probably go down, and at worst stay the same.

  • Public models and training: If you’re using public models or training models on A40 GPUs, you’ll pay the new price of $3.51 per hour for L40S GPUs.

If you’re using standard A40s, the price per hour of L40S GPUs is 70% higher per hour, but they’re about 40% faster, so you can expect your bill to stay roughly the same. (1.7 × 0.6 ≈ 1) If you’re using large A40s, the price per hour of L40S GPUs is 34% higher per hour, but they’re about 40% faster, so you can expect your bill to go down by about 20%. (1.34 × 0.6 ≈ 0.8)

To compare the performance and pricing of all our available hardware types, visit replicate.com/pricing.

Updating your deployments

If you have deployments that are using A40 GPUs, you will need to decrease their minimum instances to avoid being charged more.

For example, if your deployment is running 10 minimum instances on A40 GPUs, you should change your deployment configuration to 6 minimum instances when you switch to L40S GPUs, as they are approximately 40% faster.

You can edit your deployment configuration on the web or use the HTTP API.

If you’re not sure how to best configure your deployments, email us at [email protected].

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
NVIDIA L40S…Replicate B…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 178 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!