How to integrate VS Code with Ollama for local AI assistance

The New Stackby Jack WallenApril 1, 20265 min read2 views

If you’re starting your journey as a programmer and want to jump-start that process, you might be interested in taking The post How to integrate VS Code with Ollama for local AI assistance appeared first on The New Stack .

If you’re starting your journey as a programmer and want to jump-start that process, you might be interested in taking advantage of AI to make the process of getting up to speed a bit simpler. After all, coding can be a tough business to break into, and every advantage you can give yourself should be considered.

Before I continue, I will say this: use AI to help you learn the language that you’re interested in and not as a substitute for actually learning the language. Consider this an assistant, not a replacement for skill.

When I need to turn to AI, I always go for locally-installed options for a couple of reasons. First, using locally installed AI doesn’t put a strain on the electrical grid. Second, I don’t have to worry that a third party is going to get a glimpse of my queries, so privacy is actually possible.

To that end, I depend on Ollama as my chosen locally-installed AI tool. Ollama is easy to use, flexible, and reliable.

If your IDE of choice is Visual Studio Code, you’re in luck, as you can integrate it with a locally installed instance of Ollama.

I’m going to show you how this is done.

What you’ll need

To make this work, you’ll need a desktop OS running Linux, macOS, or Windows. I’ll demonstrate the process on a Ubuntu-based Linux distribution (Pop!OS). If you’re using either macOS or Windows, the only things that you’ll need to change are the installation of Ollama and VS Code. Fortunately, in both instances, it’s just a matter of downloading the binary installer of each tool, double-clicking the downloaded files, and walking through the setup process.

On Linux, it’s a bit different.

Let me show you.

Installing Ollama

The first thing we’ll do is install Ollama. If you’re using macOS or Windows, download the .dmg for Mac or the .exe for Windows, double-click the file, and you’re off.

On Linux, open a terminal window and issue the command:

curl -fsSL https://ollama.com/install.sh | sh

You’ll be prompted for your sudo password before the installation begins.

After the installation is complete, you’ll then need to pull a specific LLM for Ollama. On macOS and Windows, open the Ollama GUI, go to the query field, click the downward-pointing arrow, type codellama, and click the entry to install the model.

On Linux, open a terminal app and pull the necessary LLM with:

ollama pull codellama

Install VS Code

Next, you’ll need to install VS Code.

The same thing holds true: with macOS or Windows, download the VS Code executable binary for your OS of choice, double-click the downloaded file, and walk through the installation wizard.

On Linux, you’ll also need to download the installer for your distribution of choice (.deb for Debian-based distributions, .rpm for Fedora-based distributions, or the Snap package).

To install VS Code on Linux, change into the directory housing the installer file you downloaded. Install the app with one of the following commands:

For Ubuntu-based distributions: sudo dpkg -i code*.deb
For Fedora-based distributions: sudo rpm -i code*.rpm
For Snap packages: sudo snap install code –classic

You now have the two primary pieces to get you started.

Setting up VS Code

The next step is to set up VS Code to work with Ollama. To that, you’ll need to install an extension called Continue.

For that, hit Ctrl+P (on macOS, that’s Cmd+P).

In the resulting field, type:

ext install continue.continue

In the resulting page (Figure 1), click Install.

Figure 1: Installing the necessary extension on VS Code is simple.

Once the extension is installed, click on the Continue icon in the left sidebar. In the resulting window, click the Select Model drop-down and click Add Chat model (Figure 2).

Figure 2: You have to add a model before you can continue.

In the resulting window, select Ollama from the provider drop-down (Figure 3).

Figure 3: You can select from any one of the available models, but we’re going with Ollama.

Next, make sure to select Local from the tabs and then click the terminal icon to the right of each command. This will open the built-in terminal, where you’ll then need to hit Enter on your keyboard to execute the command (Figure 4).

Figure 4: This is where the meat of the configuration takes place.

When the first command (the Chat model command) completes, do the same for the second command (the Autocomplete model) and the third (the Embeddings model). This will take some time, so be patient. When each step is complete, you’ll see a green check by it.

After that’s completed, click Connect.

If you click the Continue extension, you should now see a new chat window that is connected to your locally installed instance of Ollama (Figure 5).

You are all set up and ready to rock.

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamaollama

ProductsLive

How I Built a Desktop AI App with Tauri v2 + React 19 in 2026

I wanted to build one app that does AI chat, image generation, and video generation — all running locally, no cloud, no Docker, no terminal. Just a .exe you download and run. The result is Locally Uncensored — a React 19 + TypeScript frontend with a Tauri v2 Rust backend that connects to Ollama for chat and ComfyUI for image/video generation. It ships as a standalone desktop app on Windows (.exe/.msi), Linux (.AppImage/.deb), and macOS (.dmg). This post covers the real technical challenges I hit and how I solved them. If you're building a Tauri app that talks to local services, manages large file downloads, or needs to auto-discover software on the user's machine, this is for you. The Stack Frontend : React 19, TypeScript, Tailwind CSS 4, Framer Motion, Zustand Desktop Shell : Tauri v2 (Ru

Dev.to AI

11mabout 1 hour ago

ReleasesLive

The All-in-One Local AI App: Chat + Images + Video Without the Cloud

There's a point in every local AI enthusiast's journey where you realize you're juggling too many tools. Ollama for chat. ComfyUI for images (if you can get it working). Some other tool for video. A separate app for voice transcription. And none of them talk to each other. You end up with five terminal windows, three browser tabs, and a growing suspicion that this shouldn't be this hard. That's why I built Locally Uncensored — a single desktop app that does AI chat, image generation, and video creation. Everything runs on your machine. Nothing touches the cloud. No Docker required. You download a .exe, double-click it, and you're done. The Problem: Death by a Thousand Tabs If you're running local AI today, your workflow probably looks something like this: Open a terminal, run ollama serve

Dev.to AI

9mabout 1 hour ago

ModelsLive

A Practical Guide to llama-nemotron-embed-1b-v2

Explore NVIDIA’s llama-nemotron-embed-1b-v2, a compact multilingual embedding model built for efficient retrieval across 26 languages. Read All

Hackernoon AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 185 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Anthropic’s Claude AI Can Now End Abusive Conversations For ‘Model Welfare’ - Forbes

Anthropic’s Claude AI Can Now End Abusive Conversations For ‘Model Welfare’ Forbes

GNews AI welfare

1m7 months ago

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

Models

How Databricks’ FlashOptim cuts LLM training memory by 50 percent

Training large language models usually requires a cluster of GPUs. FlashOptim changes the math, enabling full-parameter training on fewer accelerators. The post How Databricks’ FlashOptim cuts LLM training memory by 50 percent first appeared on TechTalks .

TechTalks

1mabout 1 month ago

Models

How Sakana AI’s new technique solves the problems of long-context LLM tasks

RePo, Sakana AI’s new technique, solves the "needle in a haystack" problem by allowing LLMs to organize their own memory. The post How Sakana AI’s new technique solves the problems of long-context LLM tasks first appeared on TechTalks .

TechTalks

1mabout 2 months ago