The Swift Kit logoThe Swift Kit
Tutorial

How to Build an AI-Powered iOS App with SwiftUI and ChatGPT in 2026

A hands-on, step-by-step guide to integrating OpenAI's ChatGPT, DALL-E image generation, and GPT-4o vision into a SwiftUI app. Includes real code for the proxy backend, the chat interface, streaming responses, and production hardening.

Ahmed GaganAhmed Gagan
14 min read

Why AI Features Are Now Table Stakes for Mobile Apps

Let me be blunt: if you shipped a productivity or content app in 2025 without some form of AI integration, you probably noticed the download numbers stalling. According to Sensor Tower's 2025 year-end report, apps with AI-powered features saw 37% higher Day-30 retention compared to their non-AI peers. App Annie's Q4 2025 data showed that "AI" appeared in the subtitle or keyword set of 4 out of 10 new App Store submissions.

Users now expect smart suggestions, natural language input, and generative features. The good news? OpenAI has made this remarkably accessible. You do not need a machine learning background. You need an API key, a lightweight proxy server, and solid SwiftUI fundamentals. This tutorial walks you through all three.

Architecture Overview: Why You Need a Proxy Server

Before we write a single line of Swift, let us talk architecture. I have seen too many tutorials that call the OpenAI API directly from the iOS app. Do not do this. Here is why:

  • Your OpenAI API key is embedded in the app binary. Anyone with a jailbroken device (or even just strings on the IPA) can extract it.
  • You have zero control over usage. A single malicious user can rack up thousands of dollars in API calls before you notice.
  • You cannot implement rate limiting, content moderation, or usage analytics on the client side.
  • Apple has rejected apps that make direct calls to AI APIs without a proxy because they cannot moderate content effectively.

The correct architecture looks like this:

iOS App (SwiftUI)Your Proxy Server (Flask / FastAPI / Node)OpenAI API

Your proxy server holds the API key, enforces per-user rate limits, logs usage, and optionally caches responses. The iOS app only ever talks to your server. In my experience, Flask is the fastest way to get a proxy running if you are a mobile developer who does not live in backend code daily. It is about 60 lines of Python for a fully functional proxy.

Step 1: Setting Up the Flask Proxy Backend

We will build a minimal Flask server with three endpoints: /chat for text completions,/image/generate for DALL-E image generation, and /image/analyze for GPT-4o vision. Here is the complete server code:

# server.py
from flask import Flask, request, jsonify, Response
import openai
import os
import time
from functools import wraps

app = Flask(__name__)
openai.api_key = os.environ["OPENAI_API_KEY"]

# Simple in-memory rate limiter (use Redis in production)
user_requests: dict[str, list[float]] = {}
RATE_LIMIT = 20  # requests per minute

def rate_limit(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        user_id = request.headers.get("X-User-ID", "anonymous")
        now = time.time()
        timestamps = user_requests.get(user_id, [])
        timestamps = [t for t in timestamps if now - t < 60]
        if len(timestamps) >= RATE_LIMIT:
            return jsonify({"error": "Rate limit exceeded"}), 429
        timestamps.append(now)
        user_requests[user_id] = timestamps
        return f(*args, **kwargs)
    return decorated

@app.route("/chat", methods=["POST"])
@rate_limit
def chat():
    data = request.json
    messages = data.get("messages", [])
    model = data.get("model", "gpt-4o")

    response = openai.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=data.get("max_tokens", 1024),
        temperature=data.get("temperature", 0.7),
    )
    return jsonify({
        "content": response.choices[0].message.content,
        "usage": {
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
        }
    })

@app.route("/chat/stream", methods=["POST"])
@rate_limit
def chat_stream():
    data = request.json
    messages = data.get("messages", [])

    def generate():
        stream = openai.chat.completions.create(
            model=data.get("model", "gpt-4o"),
            messages=messages,
            max_tokens=data.get("max_tokens", 1024),
            stream=True,
        )
        for chunk in stream:
            delta = chunk.choices[0].delta.content
            if delta:
                yield f"data: {delta}\n\n"
        yield "data: [DONE]\n\n"

    return Response(generate(), mimetype="text/event-stream")

@app.route("/image/generate", methods=["POST"])
@rate_limit
def generate_image():
    data = request.json
    response = openai.images.generate(
        model="gpt-image-1",
        prompt=data["prompt"],
        n=1,
        size=data.get("size", "1024x1024"),
        quality=data.get("quality", "auto"),
    )
    return jsonify({"url": response.data[0].url})

@app.route("/image/analyze", methods=["POST"])
@rate_limit
def analyze_image():
    data = request.json
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": data.get("prompt", "Describe this image.")},
                {"type": "image_url", "image_url": {"url": data["image_url"]}},
            ],
        }],
        max_tokens=500,
    )
    return jsonify({"content": response.choices[0].message.content})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5001)

A few things to note. The rate limiter here is in-memory and resets when the server restarts. In production, swap it for a Redis-backed solution. The X-User-ID header should be a verified user token from your auth system — if you are using Supabase, check out our Supabase SwiftUI tutorial for how to pass JWT tokens to your backend.

To run the server locally:

pip install flask openai
export OPENAI_API_KEY="sk-..."
python server.py

For deployment, I recommend Railway or Fly.io. Both can have your Flask app live in under 5 minutes with a free tier that handles thousands of requests per day. Render is another solid option if you prefer a more traditional PaaS.

Step 2: Building the SwiftUI Chat Interface

Now let us build the iOS side. We need three things: a message model, anAPI service, and a chat view. Here is the message model:

// Models/ChatMessage.swift
import Foundation

struct ChatMessage: Identifiable, Codable, Sendable {
    let id: UUID
    let role: Role
    let content: String
    let timestamp: Date

    enum Role: String, Codable, Sendable {
        case user
        case assistant
        case system
    }

    init(role: Role, content: String) {
        self.id = UUID()
        self.role = role
        self.content = content
        self.timestamp = Date()
    }
}

Next, the API service. This class talks to your Flask proxy, not to OpenAI directly:

// Services/AIService.swift
import Foundation

actor AIService {
    private let baseURL: URL
    private let session: URLSession

    init(baseURL: URL) {
        self.baseURL = baseURL
        let config = URLSessionConfiguration.default
        config.timeoutIntervalForRequest = 30
        self.session = URLSession(configuration: config)
    }

    func sendMessage(
        messages: [ChatMessage],
        model: String = "gpt-4o"
    ) async throws -> String {
        let url = baseURL.appendingPathComponent("chat")
        var request = URLRequest(url: url)
        request.httpMethod = "POST"
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")

        let payload: [String: Any] = [
            "messages": messages.map { ["role": $0.role.rawValue, "content": $0.content] },
            "model": model,
        ]
        request.httpBody = try JSONSerialization.data(withJSONObject: payload)

        let (data, response) = try await session.data(for: request)

        guard let httpResponse = response as? HTTPURLResponse else {
            throw AIError.invalidResponse
        }
        guard httpResponse.statusCode == 200 else {
            if httpResponse.statusCode == 429 {
                throw AIError.rateLimited
            }
            throw AIError.serverError(httpResponse.statusCode)
        }

        let result = try JSONDecoder().decode(ChatResponse.self, from: data)
        return result.content
    }
}

struct ChatResponse: Codable {
    let content: String
    let usage: TokenUsage?
}

struct TokenUsage: Codable {
    let prompt_tokens: Int
    let completion_tokens: Int
}

enum AIError: LocalizedError {
    case invalidResponse
    case rateLimited
    case serverError(Int)

    var errorDescription: String? {
        switch self {
        case .invalidResponse: return "Invalid response from server"
        case .rateLimited: return "Too many requests. Please wait a moment."
        case .serverError(let code): return "Server error (\(code))"
        }
    }
}

Notice we are using actor for the AI service. This gives us thread safety for free — no manual locking, no data races. This is important because multiple views might trigger API calls simultaneously.

Now, the chat view. This is where SwiftUI really shines:

// Views/ChatView.swift
import SwiftUI

struct ChatView: View {
    @State private var messages: [ChatMessage] = []
    @State private var inputText = ""
    @State private var isLoading = false
    @FocusState private var isInputFocused: Bool

    private let aiService: AIService

    init(aiService: AIService) {
        self.aiService = aiService
    }

    var body: some View {
        VStack(spacing: 0) {
            // Messages list
            ScrollViewReader { proxy in
                ScrollView {
                    LazyVStack(spacing: 12) {
                        ForEach(messages) { message in
                            MessageBubble(message: message)
                                .id(message.id)
                        }

                        if isLoading {
                            TypingIndicator()
                                .id("typing")
                        }
                    }
                    .padding(.horizontal, 16)
                    .padding(.vertical, 12)
                }
                .onChange(of: messages.count) {
                    withAnimation(.easeOut(duration: 0.3)) {
                        proxy.scrollTo(messages.last?.id ?? "typing", anchor: .bottom)
                    }
                }
            }

            Divider()

            // Input bar
            HStack(spacing: 12) {
                TextField("Ask anything...", text: $inputText, axis: .vertical)
                    .lineLimit(1...5)
                    .textFieldStyle(.plain)
                    .padding(.horizontal, 16)
                    .padding(.vertical, 10)
                    .background(.ultraThinMaterial, in: RoundedRectangle(cornerRadius: 20))
                    .focused($isInputFocused)

                Button {
                    sendMessage()
                } label: {
                    Image(systemName: "arrow.up.circle.fill")
                        .font(.system(size: 32))
                        .foregroundStyle(inputText.isEmpty ? .gray : .blue)
                }
                .disabled(inputText.isEmpty || isLoading)
            }
            .padding(.horizontal, 16)
            .padding(.vertical, 8)
        }
        .navigationTitle("AI Chat")
    }

    private func sendMessage() {
        let text = inputText.trimmingCharacters(in: .whitespacesAndNewlines)
        guard !text.isEmpty else { return }

        let userMessage = ChatMessage(role: .user, content: text)
        messages.append(userMessage)
        inputText = ""
        isLoading = true

        Task {
            do {
                let response = try await aiService.sendMessage(messages: messages)
                let assistantMessage = ChatMessage(role: .assistant, content: response)
                messages.append(assistantMessage)
            } catch {
                let errorMessage = ChatMessage(
                    role: .assistant,
                    content: "Sorry, something went wrong: \(error.localizedDescription)"
                )
                messages.append(errorMessage)
            }
            isLoading = false
        }
    }
}

struct MessageBubble: View {
    let message: ChatMessage

    var body: some View {
        HStack {
            if message.role == .user { Spacer(minLength: 60) }

            Text(message.content)
                .padding(.horizontal, 16)
                .padding(.vertical, 10)
                .background(
                    message.role == .user
                        ? Color.blue
                        : Color(.systemGray5)
                )
                .foregroundStyle(message.role == .user ? .white : .primary)
                .clipShape(RoundedRectangle(cornerRadius: 18))

            if message.role == .assistant { Spacer(minLength: 60) }
        }
    }
}

struct TypingIndicator: View {
    @State private var dotCount = 0

    var body: some View {
        HStack {
            Text(String(repeating: ".", count: dotCount + 1))
                .padding(.horizontal, 16)
                .padding(.vertical, 10)
                .background(Color(.systemGray5))
                .clipShape(RoundedRectangle(cornerRadius: 18))
                .onAppear {
                    Timer.scheduledTimer(withTimeInterval: 0.4, repeats: true) { _ in
                        dotCount = (dotCount + 1) % 3
                    }
                }
            Spacer()
        }
    }
}

That is a fully functional chat UI. The ScrollViewReader auto-scrolls to the latest message. The LazyVStack keeps memory usage low even with thousands of messages. The input field expands vertically as the user types multi-line prompts.

Step 3: Adding Image Generation (DALL-E / gpt-image-1)

Image generation is one of the most impressive features you can add to an app. Users love it because the results are tangible and shareable. Here is how to extend the AI service to support image generation:

// Add to AIService.swift
func generateImage(prompt: String, size: String = "1024x1024") async throws -> URL {
    let url = baseURL.appendingPathComponent("image/generate")
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")

    let payload = ["prompt": prompt, "size": size]
    request.httpBody = try JSONEncoder().encode(payload)

    let (data, response) = try await session.data(for: request)
    guard let httpResponse = response as? HTTPURLResponse,
          httpResponse.statusCode == 200 else {
        throw AIError.invalidResponse
    }

    let result = try JSONDecoder().decode(ImageResponse.self, from: data)
    guard let imageURL = URL(string: result.url) else {
        throw AIError.invalidResponse
    }
    return imageURL
}

struct ImageResponse: Codable {
    let url: String
}

And a simple SwiftUI view to display generated images:

struct ImageGeneratorView: View {
    @State private var prompt = ""
    @State private var imageURL: URL?
    @State private var isGenerating = false

    let aiService: AIService

    var body: some View {
        VStack(spacing: 20) {
            if let imageURL {
                AsyncImage(url: imageURL) { image in
                    image.resizable().scaledToFit()
                } placeholder: {
                    ProgressView()
                }
                .clipShape(RoundedRectangle(cornerRadius: 16))
            }

            HStack {
                TextField("Describe an image...", text: $prompt)
                    .textFieldStyle(.roundedBorder)

                Button("Generate") {
                    generate()
                }
                .disabled(prompt.isEmpty || isGenerating)
            }
        }
        .padding()
    }

    private func generate() {
        isGenerating = true
        Task {
            do {
                imageURL = try await aiService.generateImage(prompt: prompt)
            } catch {
                print("Image generation failed: \(error)")
            }
            isGenerating = false
        }
    }
}

Step 4: Adding Vision / Image Analysis

GPT-4o can analyze images, which opens up incredible use cases: receipt scanning, food logging, homework help, plant identification, accessibility descriptions. Here is how to send a camera image to your proxy for analysis:

// Add to AIService.swift
func analyzeImage(imageData: Data, prompt: String = "Describe this image.") async throws -> String {
    let base64 = imageData.base64EncodedString()
    let dataURI = "data:image/jpeg;base64,\(base64)"

    let url = baseURL.appendingPathComponent("image/analyze")
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")

    let payload: [String: String] = [
        "image_url": dataURI,
        "prompt": prompt,
    ]
    request.httpBody = try JSONEncoder().encode(payload)

    let (data, response) = try await session.data(for: request)
    guard let httpResponse = response as? HTTPURLResponse,
          httpResponse.statusCode == 200 else {
        throw AIError.invalidResponse
    }

    let result = try JSONDecoder().decode(ChatResponse.self, from: data)
    return result.content
}

One gotcha: base64-encoded images can be very large. A typical 12MP iPhone photo is 3-5 MB as JPEG, which becomes about 4-7 MB as base64. You should resize images before sending them. I generally scale down to 1024px on the longest side, which keeps the base64 payload under 500 KB while preserving enough detail for GPT-4o to work with.

Step 5: Handling Streaming Responses

Streaming responses dramatically improve perceived performance. Instead of waiting 3-5 seconds for a complete response, the user sees text appear word-by-word in real time — just like ChatGPT's own interface. Here is how to implement streaming with URLSession and the AsyncBytes API:

// Add to AIService.swift
func streamMessage(
    messages: [ChatMessage],
    onToken: @Sendable @escaping (String) -> Void
) async throws {
    let url = baseURL.appendingPathComponent("chat/stream")
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")

    let payload: [String: Any] = [
        "messages": messages.map { ["role": $0.role.rawValue, "content": $0.content] },
        "model": "gpt-4o",
    ]
    request.httpBody = try JSONSerialization.data(withJSONObject: payload)

    let (bytes, response) = try await session.bytes(for: request)
    guard let httpResponse = response as? HTTPURLResponse,
          httpResponse.statusCode == 200 else {
        throw AIError.invalidResponse
    }

    for try await line in bytes.lines {
        guard line.hasPrefix("data: ") else { continue }
        let token = String(line.dropFirst(6))
        if token == "[DONE]" { break }
        onToken(token)
    }
}

Then update your sendMessage function in the view to use streaming:

private func sendMessageStreaming() {
    let text = inputText.trimmingCharacters(in: .whitespacesAndNewlines)
    guard !text.isEmpty else { return }

    messages.append(ChatMessage(role: .user, content: text))
    inputText = ""
    isLoading = true

    // Add a placeholder message for the assistant
    let placeholderID = UUID()
    var streamedContent = ""
    messages.append(ChatMessage(role: .assistant, content: ""))

    Task {
        do {
            try await aiService.streamMessage(messages: messages) { token in
                Task { @MainActor in
                    streamedContent += token
                    // Update the last message in place
                    if let lastIndex = messages.indices.last {
                        messages[lastIndex] = ChatMessage(
                            role: .assistant,
                            content: streamedContent
                        )
                    }
                }
            }
        } catch {
            messages.append(ChatMessage(role: .assistant, content: "Error: \(error.localizedDescription)"))
        }
        isLoading = false
    }
}

The key insight here is that we append an empty assistant message first, then update its content as tokens arrive. This gives the user a smooth, real-time typing effect. In my testing, streaming reduces the perceived response time from 3-5 seconds to under 200 milliseconds for the first visible token.

Comparing OpenAI Models for iOS Apps

Choosing the right model for each feature is critical for balancing cost, speed, and quality. Here is a breakdown based on my production experience:

ModelBest ForInput Cost (per 1M tokens)Output Cost (per 1M tokens)Avg Latency
GPT-4oComplex chat, vision analysis, nuanced writing$2.50$10.002-4s
GPT-4o-miniQuick suggestions, autocomplete, classification$0.15$0.600.5-1.5s
o3-miniReasoning, math, code generation$1.10$4.403-8s
gpt-image-1Image generation (replaces DALL-E 3)$0.04 - $0.19 per image8-15s
GPT-4o (vision)Image analysis, OCR, visual Q&A$2.50$10.003-6s

My recommendation: use GPT-4o-mini as your default for most interactions. It is 16x cheaper than GPT-4o and fast enough that streaming is almost unnecessary. Reserve GPT-4o for features where quality visibly matters — long-form writing, complex analysis, or vision tasks.

Error Handling and Rate Limiting Strategies

Your app will encounter errors. The OpenAI API returns 429 when you hit rate limits, 500 for server issues, and occasionally times out during peak hours. Here is a robust error handling pattern I use in every production AI app:

// Services/RetryableAIService.swift
actor RetryableAIService {
    private let inner: AIService
    private let maxRetries = 3

    init(baseURL: URL) {
        self.inner = AIService(baseURL: baseURL)
    }

    func sendWithRetry(messages: [ChatMessage]) async throws -> String {
        var lastError: Error?

        for attempt in 0..<maxRetries {
            do {
                return try await inner.sendMessage(messages: messages)
            } catch let error as AIError where error == .rateLimited {
                // Exponential backoff: 1s, 2s, 4s
                let delay = UInt64(pow(2.0, Double(attempt))) * 1_000_000_000
                try await Task.sleep(nanoseconds: delay)
                lastError = error
            } catch {
                throw error  // Non-retryable errors fail immediately
            }
        }

        throw lastError ?? AIError.invalidResponse
    }
}

On the UI side, always show a meaningful error state. Do not just swallow errors silently. A simple "Something went wrong. Tap to retry." is infinitely better than a loading spinner that never stops.

Security: API Keys Must NEVER Go in the iOS App

I want to hammer this point home because I see it violated constantly. Never, ever, under any circumstances put your OpenAI API key in your iOS app. Not in a .plist, not in an environment variable, not in a keychain entry that you set at build time, not obfuscated in a string. If it ships in the binary, it can be extracted.

Here is what to do instead:

  1. Store the API key as an environment variable on your proxy server.
  2. Authenticate iOS users against your proxy using their Supabase (or any auth provider) JWT token.
  3. Your proxy validates the JWT, checks the user's usage quota, and then makes the OpenAI call.
  4. Optionally, tie API usage to the user's subscription tier (free = 10 messages/day, pro = unlimited).

This approach also lets you switch AI providers without updating the app. If you want to experiment with Anthropic's Claude API or Google's Gemini alongside OpenAI, you change the proxy code — the iOS app never knows the difference.

Production Tips: Caching, History, and Token Management

Cache Common Responses

If your app has predictable queries (like "Summarize today's news" or "Generate a workout plan"), cache responses on the proxy with a 15-30 minute TTL. This cuts costs dramatically. I have seen caching reduce API spend by 40% in apps with repetitive query patterns.

Manage Conversation History Wisely

GPT-4o has a 128K token context window, but that does not mean you should send the entire conversation history every time. Each token costs money, and latency increases with context length. Here is my approach:

  • Keep the last 10 messages in the active context.
  • Summarize older messages into a system prompt: "Previous conversation summary: The user asked about X and you suggested Y."
  • Store full history locally on the device for the user to scroll through, but only send the trimmed context to the API.

Track Token Usage Per User

Log every API call with the user ID, model used, prompt tokens, and completion tokens. This data is invaluable for:

  • Setting usage-based pricing tiers
  • Identifying abusive users before they rack up costs
  • Optimizing your prompts (shorter prompts = lower cost)
  • Deciding which model to use for which feature

Pre-warm Connections

The first request to OpenAI's API always takes longer due to TCP/TLS handshake. Send a lightweight "warmup" request when your app launches (or when the user navigates to the AI feature) to eliminate this cold start latency.

Putting It All Together

Here is a quick recap of what we built:

  1. A Flask proxy server with rate limiting and three endpoints (chat, image generation, image analysis).
  2. A SwiftUI chat interface with message bubbles, auto-scroll, and a send button.
  3. Image generation using gpt-image-1 with a simple prompt-to-image flow.
  4. Image analysis using GPT-4o vision for camera input.
  5. Streaming responses for real-time token delivery.
  6. Error handling with exponential backoff retries.

If this feels like a lot of code to write and maintain, that is because it is. Between the proxy server, the Swift networking layer, the UI components, error handling, token management, and the dozen edge cases I did not even cover (network drops mid-stream, backgrounding during a request, conversation persistence) — you are looking at 2-3 weeks of solid work to get this production-ready.

That is exactly why we built all of this into The Swift Kit. The AI module comes pre-wired with a configurable proxy backend, streaming chat UI, image generation, vision analysis, conversation history management, and token usage tracking. You bring your OpenAI API key, paste it into the proxy config, and it works. Check out the features page to see the full AI integration, or head to pricing to grab the kit and start building today.

Share this article

Ready to ship your iOS app faster?

The Swift Kit gives you a production-ready SwiftUI codebase with onboarding, paywalls, auth, AI integrations, and more. Stop building boilerplate. Start building your product.

Get The Swift Kit