Using Local AI models with .NET Aspire

December 10, 2024

Using local AI models can be a great way to experiment on your own machine without needing to deploy resources to the cloud. In this post, we’ll look at how use .NET Aspire with Ollama to run AI models locally, while using the Microsoft.Extensions.AI abstractions to make it transition to cloud-hosted models on deployment.

[HEADING=1]Setting up Ollama in .NET Aspire[/HEADING]

We’re going to need a way to use Ollama from our .NET Aspire application, and the easiest way to do that is using the Ollama hosting integration from the .NET Aspire Community Toolkit. You can install the Ollama hosting integration from NuGet via the Visual Studio tooling, VS Code tooling, or the .NET CLI. Let’s take a look at how to install the Ollama hosting integration via the command line into our app host project:

[iCODE]dotnet add package CommunityToolkit.Aspire.Hosting.Ollama[/iCODE]

Once you’ve installed the Ollama hosting integration, you can configure it in your [iCODE]Program.cs[/iCODE] file. Here’s an example of how you might configure the Ollama hosting integration:

var ollama =
       builder.AddOllama("ollama")
              .WithDataVolume()
              .WithOpenWebUI();

Here, we’ve used the [iCODE]AddOllama[/iCODE] extension method to add the container to the app host. Since we’re going to download some models, we’re going to want to persist that data volume across container restarts (it means we don’t have to pull several gigabytes of data every time we start the container!). Also, so we’ve got a playground, we’ll add the [iCODE]OpenWebUI[/iCODE] container, which will give us a web interface to interact with the model outside of our app.

[HEADING=1]Running a local AI model[/HEADING]

The [iCODE]ollama[/iCODE] resource that we created in the previous step is only going to be running the Ollama server, we still need to add some models to it, and we can do that with the [iCODE]AddModel[/iCODE] method. Let’s use the Llama 3.2 model:

[iCODE]var chat = ollama.AddModel("chat", "llama3.2");[/iCODE]

If we wanted to use a variation of the model, or a specific tag, we could specify that in the [iCODE]AddModel[/iCODE] method, such as [iCODE]ollama.AddModel("chat", "llama3.2:1b")[/iCODE] for the 1b tag of the Llama 3.2 model. Alternatively, if the model you’re after isn’t in the Ollama library, you can use the [iCODE]AddHuggingFaceModel[/iCODE] method to add a model from the Hugging Face model hub.

Now that we have our model, we can add it as a resource to any of the other services in the app host:

builder.AddProject<Projects.MyApi>("api")
      .WithReference(chat);

When we run the app host project, the Ollama server will start up and download the model we specified (make sure you don’t stop the app host before the download completes), and then we can use the model in our application. If you want the resources that depend on the model to wait until the model is downloaded, you can use the [iCODE]WaitFor[/iCODE] method with the model reference:

builder.AddProject<Projects.MyApi>("api")
      .WithReference(chat)
      .WaitFor(chat);

[ATTACH type=full" alt=".NET Aspire dashboard showing health checks and model download status]6063[/ATTACH]

In the above screenshot of the dashboard, we’ll see that the model is being downloaded. The Ollama server is running but unhealthy because the model hasn’t been downloaded yet, and the [iCODE]api[/iCODE] resource hasn’t started as it’s waiting for the model to download and become healthy.

[HEADING=1]Using the model in your application[/HEADING]

With our API project set up to use the [iCODE]chat[/iCODE] model, we can now use the [iCODE]OllamaSharp[/iCODE] library to connect to the Ollama server and interact with the model, and to do this, we’ll use the [iCODE]OllamaSharp[/iCODE] integration from the .NET Aspire Community Toolkit:

[iCODE]dotnet add package CommunityToolkit.Aspire.OllamaSharp[/iCODE]

This integration allows us to register the OllamaSharp client as the [iCODE]IChatClient[/iCODE] or [iCODE]IEmbeddingsGenerator[/iCODE] service from the Microsoft.Extensions.AI package, which is an abstraction that means we could switch out the local Ollama server for a cloud-hosted option such as Azure OpenAI Service without changing the code using the client:

[iCODE]builder.AddOllamaSharpChatClient("chat");[/iCODE]

Note: If you are using an embedding model and want to register the [iCODE]IEmbeddingsGenerator[/iCODE] service, you can use the [iCODE]AddOllamaSharpEmbeddingsGenerator[/iCODE] method instead.

To make full use of the Microsoft.Extensions.AI pipeline, we can provide that service to the [iCODE]ChatClientBuilder[/iCODE]:

builder.AddKeyedOllamaSharpChatClient("chat");
builder.Services.AddChatClient(b => b
   .UseFunctionInvocation()
   .UseOpenTelemetry(configure: t => t.EnableSensitiveData = true)
   .UseLogging()
   // Use the OllamaSharp client
   .Use(b.Services.GetRequiredKeyedService<IChatClient>("chat")));

Lastly, we can inject the [iCODE]IChatClient[/iCODE] into our route handler:

app.MapPost("/chat", async (IChatClient chatClient, string question) =>
{
   var response = await chatClient.CompleteAsync(question);
   return response.Message;
});

[HEADING=1]Supporting cloud-hosted models[/HEADING]

While Ollama is great as a local development tool, when it comes to deploying your application, you’ll likely want to use a cloud-based AI service like Azure OpenAI Service. To handle this, we’ll need to update the API project to register a different implementation of the [iCODE]IChatClient[/iCODE] service when running in the cloud:

if (builder.Environment.IsDevelopment())
{
   builder.AddKeyedOllamaSharpChatClient("chat");
}
else
{
   builder.AddKeyedAzureOpenAIClient("chat");
}

builder.Services.AddChatClient(b => b
   .UseFunctionInvocation()
   .UseOpenTelemetry(configure: t => t.EnableSensitiveData = true)
   .UseLogging()
   // Use the previously registered IChatClient, which is either Ollama or Azure OpenAI
   .Use(b.Services.GetRequiredKeyedService<IChatClient>("chat")));

[HEADING=1]Conclusion[/HEADING]

In this post, we’ve seen how, with only a few lines of code, we can set up an Ollama server with .NET Aspire, specify a model that we want to use, have it downloaded for us, and then integrated into a client application. We’ve also seen how we can use the Microsoft.Extensions.AI abstractions to make it easy to switch between local and cloud-hosted models. This is a powerful way to experiment with AI models on your local machine before deploying them to the cloud.

Check out the eShop sample application for a full example of how to use Ollama with .NET Aspire.

The post Using Local AI models with .NET Aspire appeared first on .NET Blog.

Sign In

Using Local AI models with .NET Aspire

Recommended Posts

Guest Aaron Powell

Join the conversation

Browse

Activity