SwiftUI Guide

Build an AI Chat App in SwiftUI: Streaming From Scratch vs the Pre-Wired Version

A working AI chat screen is deceptively hard: the bubble UI takes an afternoon, but token streaming, cancellation, and secure key handling are where weeks disappear. This guide walks the from-scratch path step by step, then shows what's already wired in The Swift Kit.

Last updated: 2026-06-09 8 min read By Ahmed Gagan, iOS Engineer
Quick Answer

To build an AI chat app in SwiftUI you stream responses from an LLM (OpenAI, Anthropic Claude, or Apple Foundation Models) into a scrolling list of message bubbles, appending tokens to the last message as they arrive over a URLSession bytes stream. The hard parts are not the UI — they are server-side key handling, per-token state updates without stutter, request cancellation, and rate limiting. You can write this from scratch in roughly a week, or start from The Swift Kit, where the streaming chat, proxied keys, and rate limiting are already wired.

From-scratch effort
~1 week for a robust streaming chat (UI, proxy, cancellation, limits)
Recommended transport
URLSession.bytes(for:) + SSE parsing for true token streaming
Pre-wired in The Swift Kit
OpenAI + Claude + Apple Foundation Models, keys proxied, per-user rate limiting

Why the bubbles are the easy 10%

Most 'build a SwiftUI ChatGPT clone' tutorials stop at a pretty bubble list and a single non-streaming request. That demo looks finished but ships almost none of the real work. The hard 90% is everything around the UI: streaming tokens so the response feels alive instead of arriving as a frozen block, keeping the key off the device so it can't be scraped from your binary, cancelling an in-flight generation cleanly, retrying after a dropped connection without losing the partial answer, and stopping a single user from burning your entire OpenAI budget in an afternoon. None of that shows up in a screenshot, which is exactly why it gets underestimated. Streaming in particular changes the architecture: your view model must mutate the last message in place on the main actor, your network layer reads bytes incrementally rather than awaiting a full body, and your provider abstraction has to normalize different SSE formats between OpenAI and Claude.

  • Streaming changes your whole network + state layer, not just the view
  • Server-side key proxying is non-negotiable for a shipped app
  • Cancellation, retry, and rate limiting are the parts demos skip

From scratch vs the pre-wired version

Writing it yourself is the right call when AI chat IS the product and you want full control over the provider abstraction, prompt pipeline, and streaming internals — you will learn a lot and own every line. The trade-off is honest: expect roughly a week to reach something robust, plus ongoing maintenance as provider APIs shift. The pre-wired path is The Swift Kit, where streaming chat against OpenAI, Anthropic Claude, and free on-device Apple Foundation Models is already built, with keys proxied through Supabase Edge Functions and per-user rate limiting in place. You start at the feature work — your prompts, your UX — instead of the plumbing. The Swift Kit is $99 one-time for unlimited commercial projects with lifetime updates and a 14-day refund, so the math is simply whether a week-plus of plumbing is worth more than $99 to you. If chat is one feature in a larger app rather than the whole app, the pre-wired version almost always wins.

  • From scratch: full control, ~1 week + maintenance, best when chat is the core
  • Pre-wired: ship today, $99 one-time, best when chat is one feature among many
  • Both end at the same UI — the difference is the week of plumbing underneath

Build a streaming AI chat screen in SwiftUI

Do it yourself with the steps below, or skip the plumbing — The Swift Kit ships this exact stack (streaming chat, server-proxied keys, per-user rate limiting) pre-wired so you start at step 7.

  1. 1

    Model your message + conversation state

    Define an Identifiable Message with role and a mutable content string, and hold them in an @Observable view model. The mutable content is what lets you append streamed tokens to the last bubble in place.

    struct Message: Identifiable {
      let id = UUID()
      let role: Role   // .user / .assistant
      var text: String // var: appended during streaming
    }
    
    @Observable final class ChatVM {
      var messages: [Message] = []
      var isStreaming = false
    }
  2. 2

    Build the bubble list UI

    Render messages in a ScrollViewReader + LazyVStack so you can auto-scroll to the newest token. Keep user and assistant bubbles visually distinct and pin a composer at the bottom.

    ScrollViewReader { proxy in
      ScrollView {
        LazyVStack(spacing: 8) {
          ForEach(vm.messages) { msg in
            BubbleView(message: msg).id(msg.id)
          }
        }
      }
      .onChange(of: vm.messages.last?.text) { _, _ in
        withAnimation { proxy.scrollTo(vm.messages.last?.id, anchor: .bottom) }
      }
    }
  3. 3

    Proxy your API key server-side

    Never ship an OpenAI or Anthropic key in the app binary — it can be extracted in minutes. Put the key in a serverless function (Supabase Edge Function) that calls the LLM and streams the response back. The Swift Kit does exactly this so the key never touches the client.

    # Edge Function holds the secret; the app only knows this URL
    supabase functions deploy chat
    supabase secrets set OPENAI_API_KEY=sk-...
  4. 4

    Open a streaming request with URLSession bytes

    Use URLSession.bytes(for:) to read the response line by line as it arrives, instead of awaiting the full body. Each Server-Sent-Events line carries a token delta you parse and append.

    let (bytes, _) = try await URLSession.shared.bytes(for: request)
    for try await line in bytes.lines {
      guard line.hasPrefix("data: "), line != "data: [DONE]" else { continue }
      let token = try parseDelta(line)
      await MainActor.run { vm.messages[vm.messages.count - 1].text += token }
    }
  5. 5

    Append tokens on the main actor without stutter

    Mutate the last message's text on the MainActor so SwiftUI redraws smoothly. Batch very fast token streams if you see frame drops, and flip isStreaming so the composer shows a stop button.

  6. 6

    Handle cancellation and errors

    Store the streaming Task so a Stop button can call task.cancel(). Wrap the loop to catch network drops mid-stream and surface a retry, leaving the partial message intact rather than discarding it.

    streamTask = Task {
      do { try await stream() }
      catch is CancellationError { /* keep partial text */ }
      catch { vm.error = error.localizedDescription }
      vm.isStreaming = false
    }
  7. 7

    Add rate limiting and a paywall

    Free AI usage is a cost sink — gate requests with per-user limits in your Edge Function and put heavier usage behind a paywall. The Swift Kit wires Supabase Edge Function rate limiting and RevenueCat entitlements so this is a config change, not a build.

From scratch vs The Swift Kit

The Swift Kit vs Build from scratch comparison
FeatureThe Swift KitBuild from scratch
Token streaming (SSE)Pre-wiredBuild yourself
Server-side key proxySupabase Edge FunctionsDIY backend
Per-user rate limitingIncludedBuild yourself
ProvidersOpenAI + Claude + Apple FMWire each yourself
Time to first chatSame day~1 week
Cost$99 one-timeYour time

Frequently Asked Questions

How do I stream AI responses token-by-token in SwiftUI?
Use URLSession.bytes(for:) and iterate the response with for try await line in bytes.lines. Each Server-Sent-Events line carries a token delta; parse it and append to the last message's text on the MainActor so SwiftUI redraws the bubble as text grows. The Swift Kit ships this streaming loop already built for both OpenAI and Claude.
Is it safe to put my OpenAI or Claude API key in the app?
No. Keys embedded in an iOS binary can be extracted with standard tools in minutes, exposing you to unlimited billing. Proxy the key through a serverless function instead — The Swift Kit uses Supabase Edge Functions so the key lives server-side and the app only ever talks to your endpoint.
OpenAI, Claude, or Apple Foundation Models — which should I use?
Apple Foundation Models run free and on-device, ideal for privacy and cost but limited in capability. OpenAI and Anthropic Claude are more powerful but billed per token and require a server proxy. The Swift Kit wires all three behind one interface so you can switch or mix without rewriting your chat layer.
How long does it take to build a streaming AI chat app from scratch?
The bubble UI is an afternoon, but a robust version — streaming, server-proxied keys, cancellation, retry on dropped connections, and per-user rate limiting — is realistically about a week, plus ongoing upkeep as provider APIs change. Starting from The Swift Kit collapses that to configuration.
How do I stop one user from running up my AI bill?
Enforce limits server-side, not in the app, since client checks are trivially bypassed. The Swift Kit implements per-user rate limiting inside Supabase Edge Functions and pairs it with RevenueCat entitlements so you can gate heavier usage behind a paywall.

Keep exploring

Skip the week of plumbing

The Swift Kit ships streaming AI chat across OpenAI, Claude, and Apple Foundation Models — keys proxied, rate limiting wired, paywall ready. $99 one-time, unlimited commercial projects, lifetime updates, 14-day refund. Start at the feature work, not the plumbing.

Get The Swift Kit — $99

One-time purchase · Lifetime updates · 14-day refund