Semi-random thoughts and tales of tinkering
Both
This is the section where things get real. We are going to build a working app — a VU meter that listens to the microphone and displays the volume level in real time. It will not do spectral analysis yet (that comes in Sections 5 and 6), but it establishes the entire audio pipeline: permissions, capture, processing, and display. Every piece we build here carries forward into the final spectrum analyzer.
Apple provides several frameworks for working with audio. The one we care about is
AVFoundation, specifically the AVAudioEngine class. Here is the
mental model:
┌─────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Microphone │─────▶│ AVAudioEngine │─────▶│ Output │
│ (hardware) │ │ (audio graph) │ │ (speaker) │
└─────────────┘ └──────────────────┘ └──────────────┘
│
│ installTap()
▼
┌──────────────────┐
│ Your callback │
│ (process audio) │
└──────────────────┘
AVAudioEngine is a real-time audio graph — a pipeline of nodes that process audio.
The input node represents the microphone. The output node represents
the speaker. You can insert processing nodes between them, but for our purposes we just need to
observe what the microphone is picking up.
The mechanism for observing is called a tap. You install a tap on a node, and the engine calls your closure every time a buffer of audio data is ready. That buffer is a chunk of floating-point samples — the raw PCM data we discussed in Section 2.
If you have used NAudio or WASAPI on Windows, AVAudioEngine
is the equivalent. The difference is that Apple wraps everything in a higher-level graph API.
There is no need to manually configure sample rates or buffer formats — the engine negotiates
that with the hardware.
iOS will not let your app touch the microphone without explicit user consent. This is a two-step process:
Step 1: Declare intent in Info.plist. Open your project's Info.plist
(or the "Info" tab in your target settings) and add this key:
// Info.plist key (shown as raw key + value)
Key: NSMicrophoneUsageDescription
Value: "VUMeter needs the microphone to measure audio levels"
This string is displayed to the user in the system permission dialog. Make it clear and honest. Apple reviews these strings and will reject your app if the description is vague or misleading.
Step 2: The runtime dialog. The first time your app calls
engine.start() on a node connected to the microphone, iOS presents a system dialog:
"VUMeter Would Like to Access the Microphone." The user taps Allow or Don't Allow. If they deny
it, the audio engine still starts but delivers silence.
During development, if you accidentally deny the permission and want to reset it, go to Settings → Privacy & Security → Microphone on the device (or Simulator), find your app, and toggle it back on. You can also reset all permissions in the Simulator via Device → Erase All Content and Settings.
Create a new Swift file in your project called AudioEngine.swift. This is the first
version — it only computes volume level (RMS). We will extend it with FFT and spectrum data in
later sections.
Here is the complete file:
import AVFoundation
import Observation
@Observable
class AudioEngine {
var level: Float = 0.0 // 0.0 (silence) to 1.0 (loud)
var isRunning = false
private let engine = AVAudioEngine()
func start() {
let input = engine.inputNode
let format = input.outputFormat(forBus: 0)
input.installTap(onBus: 0, bufferSize: 1024, format: format) { [weak self] buffer, _ in
let rms = Self.computeRMS(buffer: buffer)
let normalized = Self.normalize(rms)
DispatchQueue.main.async {
self?.level = normalized
}
}
do {
try engine.start()
isRunning = true
} catch {
print("Audio engine failed to start: \(error)")
}
}
func stop() {
engine.inputNode.removeTap(onBus: 0)
engine.stop()
isRunning = false
level = 0.0
}
// Root Mean Square — the standard way to measure perceived loudness
private static func computeRMS(buffer: AVAudioPCMBuffer) -> Float {
guard let channelData = buffer.floatChannelData?[0] else { return 0 }
let frameCount = Int(buffer.frameLength)
guard frameCount > 0 else { return 0 }
var sum: Float = 0
for i in 0..<frameCount {
let sample = channelData[i]
sum += sample * sample
}
return sqrt(sum / Float(frameCount))
}
// Map raw RMS to a 0–1 display range
private static func normalize(_ rms: Float) -> Float {
let db = 20 * log10(max(rms, 1e-6))
let minDb: Float = -60
let maxDb: Float = 0
let clamped = max(minDb, min(maxDb, db))
return (clamped - minDb) / (maxDb - minDb)
}
}
That is about 50 lines of code that captures live audio from the microphone, computes a volume level, and exposes it as a reactive property. Let's walk through every important piece.
@Observable
class AudioEngine {
var level: Float = 0.0
var isRunning = false
// ...
}
@Observable is Swift's equivalent of C#'s INotifyPropertyChanged. When you
mark a class with @Observable, the Swift compiler automatically synthesizes change-tracking
for every stored property. Any SwiftUI view that reads audio.level will automatically
re-render when that value changes.
In WPF/MAUI, you would write a property with a backing field, call
OnPropertyChanged() in the setter, and bind to it in XAML. In Swift,
@Observable eliminates all of that boilerplate. You just write a normal property
and it works. The framework tracks which views read which properties at runtime, not through
string-based bindings.
let input = engine.inputNode
let format = input.outputFormat(forBus: 0)
input.installTap(onBus: 0, bufferSize: 1024, format: format) { [weak self] buffer, _ in
let rms = Self.computeRMS(buffer: buffer)
let normalized = Self.normalize(rms)
DispatchQueue.main.async {
self?.level = normalized
}
}
installTap(onBus:bufferSize:format:) registers a callback on the audio node. Here is
what each parameter means:
onBus: 0 — Audio nodes can have multiple buses (channels of
data). Bus 0 is the default, and for the input node, it is the microphone.bufferSize: 1024 — A hint to the engine about how many
samples per callback. At 44,100 Hz, 1024 samples is about 23 ms of audio. The engine may deliver
a different size — this is a request, not a guarantee. Smaller buffers mean lower latency but
more CPU overhead; larger buffers are the opposite.format — The audio format (sample rate, channels, bit depth).
We read it from the node rather than hardcoding, because it depends on the hardware.The trailing closure is your callback. It receives an AVAudioPCMBuffer (the raw
samples) and an AVAudioTime (the timestamp, which we ignore with _).
This callback fires on a real-time audio thread — not the main thread, not a
background queue, but a high-priority thread managed by the audio system.
{ [weak self] buffer, _ in
// ...
self?.level = normalized
}
The [weak self] capture list is critical. Without it, here is what happens:
self (the AudioEngine instance).AVAudioEngine, which owns the tap, which owns the
closure.[weak self] makes the closure's reference to self a weak
reference — it does not prevent deallocation. If the AudioEngine is deallocated while the tap is
still installed, self becomes nil, and the self?.level
call safely does nothing.
This is the same concept as WeakReference<T> in C#. In C#, the garbage
collector handles most cycles for you, but in Swift (which uses reference counting, not GC),
you must break cycles manually. You will see [weak self] in nearly every closure
that captures self and might outlive it. It becomes second nature.
DispatchQueue.main.async {
self?.level = normalized
}
The audio tap callback runs on a real-time audio thread. SwiftUI views must be updated from the
main thread. DispatchQueue.main.async dispatches a block of work to the main thread's
run loop — it will execute on the next iteration.
This is exactly like Dispatcher.BeginInvoke() in WPF, or
MainThread.BeginInvokeOnMainThread() in MAUI. Same concept, slightly different API.
In Swift, Grand Central Dispatch (GCD) is the primary concurrency mechanism, and
DispatchQueue.main is the queue bound to the main/UI thread.
private static func computeRMS(buffer: AVAudioPCMBuffer) -> Float {
guard let channelData = buffer.floatChannelData?[0] else { return 0 }
let frameCount = Int(buffer.frameLength)
guard frameCount > 0 else { return 0 }
var sum: Float = 0
for i in 0..<frameCount {
let sample = channelData[i]
sum += sample * sample
}
return sqrt(sum / Float(frameCount))
}
This is the theory from Section 2 turned into code. Let's trace through it:
buffer.floatChannelData?[0] — The buffer stores audio as
32-bit floats, one array per channel. [0] grabs channel 0 (mono, or the left channel
of stereo). The ? is Swift's optional chaining — if the buffer somehow has no float
data, the entire expression is nil and the guard returns 0.sum / Float(frameCount) — The mean of the squared
values.sqrt(...) — The root of the mean square. This brings
the result back into the same scale as the original samples.The result is a single float — the RMS level — typically between 0.0 (silence) and maybe 0.3 for normal speech. Clapping close to the mic might push it to 0.7. It will rarely hit 1.0 unless the input is distorted.
You could just take the maximum absolute sample value (the "peak"). But peak is jumpy and does not correspond well to perceived loudness. A single spike sample could max out the peak while the audio sounds quiet. RMS averages over the entire buffer, giving a much more stable and perceptually meaningful reading. This is why professional VU meters use RMS.
private static func normalize(_ rms: Float) -> Float {
let db = 20 * log10(max(rms, 1e-6))
let minDb: Float = -60
let maxDb: Float = 0
let clamped = max(minDb, min(maxDb, db))
return (clamped - minDb) / (maxDb - minDb)
}
Raw RMS values are not great for display. Normal speech might be 0.01 to 0.05 — you would barely see the meter move. We need to convert to a logarithmic (decibel) scale, then map to 0–1.
20 * log10(rms) — Converts amplitude to decibels. An RMS of
1.0 is 0 dB (maximum). An RMS of 0.01 is -40 dB. An RMS of 0.001 is -60 dB. The
max(rms, 1e-6) guard prevents log10(0), which would be negative
infinity.max(minDb, min(maxDb, db))
clamps the value into that range.(clamped - minDb) / (maxDb - minDb) is
a standard linear interpolation. -60 dB maps to 0.0, 0 dB maps to 1.0, -30 dB maps to 0.5.
This is the value we hand to the UI.Human hearing is logarithmic — we perceive the difference between -60 dB and -40 dB as roughly the same "jump" as -40 dB to -20 dB, even though in linear terms the second jump is 100x larger. By converting to decibels before mapping to the display, our meter moves in a way that matches how we actually hear sound. This is fundamental to every audio meter, VU or otherwise.
Now let's build the view. Open ContentView.swift (the file Xcode created for you)
and replace its contents with this:
import SwiftUI
struct ContentView: View {
@State private var audio = AudioEngine()
var body: some View {
VStack(spacing: 40) {
Text("VU Meter")
.font(.largeTitle)
.bold()
GeometryReader { geo in
ZStack(alignment: .bottom) {
RoundedRectangle(cornerRadius: 12)
.fill(Color.gray.opacity(0.2))
RoundedRectangle(cornerRadius: 12)
.fill(meterColor(level: audio.level))
.frame(height: geo.size.height * CGFloat(audio.level))
.animation(.easeOut(duration: 0.05), value: audio.level)
}
}
.frame(width: 80, height: 300)
Text(dbLabel(level: audio.level))
.font(.title2.monospacedDigit())
.foregroundStyle(.secondary)
Button(audio.isRunning ? "Stop" : "Start") {
if audio.isRunning {
audio.stop()
} else {
audio.start()
}
}
.buttonStyle(.borderedProminent)
.tint(audio.isRunning ? .red : .green)
}
.padding(40)
}
func meterColor(level: Float) -> Color {
switch level {
case 0..<0.6: return .green
case 0.6..<0.85: return .yellow
default: return .red
}
}
func dbLabel(level: Float) -> String {
let db = (level * 60) - 60
if level < 0.001 { return "-∞ dB" }
return String(format: "%.1f dB", db)
}
}
This gives us a vertical bar that fills from bottom to top, changes color at threshold levels, displays the current dB reading, and has a start/stop button. Let's break down the key SwiftUI concepts.
@State private var audio = AudioEngine()
@State tells SwiftUI: "own this object's lifecycle." SwiftUI creates the
AudioEngine instance when the view first appears and keeps it alive across re-renders.
Combined with @Observable on the class, SwiftUI tracks exactly which properties
the view's body reads. When audio.level changes, only the parts of the
view that depend on level are recomputed — not the entire view tree.
Think of @State as the SwiftUI equivalent of a view model that is scoped to
a particular view. In WPF, you would set DataContext = new AudioEngine() and bind
properties in XAML. SwiftUI collapses binding, change notification, and lifecycle management
into two keywords: @State on the view side, @Observable on the
model side.
GeometryReader { geo in
ZStack(alignment: .bottom) {
RoundedRectangle(cornerRadius: 12)
.fill(Color.gray.opacity(0.2))
RoundedRectangle(cornerRadius: 12)
.fill(meterColor(level: audio.level))
.frame(height: geo.size.height * CGFloat(audio.level))
}
}
.frame(width: 80, height: 300)
GeometryReader is SwiftUI's way of giving you access to the parent container's
size. The closure receives a GeometryProxy (here called geo) with
properties like geo.size.width and geo.size.height.
We set the outer frame to 80x300 points. Inside, we stack two rounded rectangles with
ZStack (a z-axis stack — layers on top of each other). The gray one is the background
track. The colored one is the fill, whose height equals the container height times the audio level.
When audio.level is 0.5, the colored bar is 150 points tall. When it is 1.0, it fills
the whole container.
This is like measuring a WPF Grid or Canvas with
ActualWidth/ActualHeight and using those values to size a child
element. The difference is that in SwiftUI, GeometryReader is reactive — when
the container resizes (say, on device rotation), the inner content automatically recomputes.
.animation(.easeOut(duration: 0.05), value: audio.level)
This single modifier makes the meter bar animate smoothly. Whenever audio.level
changes, SwiftUI interpolates the frame height from the old value to the new value over 0.05
seconds with an ease-out curve. No animation state variables. No timers. No manual interpolation.
You declare "animate this property" and the framework handles it.
The duration of 0.05 seconds (50 ms) is a deliberate choice. Our audio tap fires roughly every 23 ms (1024 samples at 44.1 kHz), so 50 ms gives the animation just enough time to smooth out jitter without adding noticeable lag. Try changing it to 0.3 and you will see the meter feel sluggish. Remove it entirely and the meter will stutter.
func meterColor(level: Float) -> Color {
switch level {
case 0..<0.6: return .green
case 0.6..<0.85: return .yellow
default: return .red
}
}
Swift's switch can match against ranges, which is something you cannot do directly
in C#. The 0..<0.6 syntax creates a half-open range (includes 0, excludes 0.6).
The result: the meter is green up to 60% of maximum, yellow from 60% to 85%, and red above 85%.
This mimics the color scheme of classic analog VU meters.
Also notice that Swift's switch does not fall through by default (no
break needed), and the compiler checks exhaustiveness — the default
case is required because the compiler cannot prove our ranges cover all possible Float
values.
Button(audio.isRunning ? "Stop" : "Start") {
if audio.isRunning {
audio.stop()
} else {
audio.start()
}
}
.buttonStyle(.borderedProminent)
.tint(audio.isRunning ? .red : .green)
The button label and tint color both depend on audio.isRunning. When the engine
is running, you see a red "Stop" button. When it is stopped, you see a green "Start" button.
Because isRunning is a property on an @Observable object, the button
automatically re-renders when it changes. No manual state synchronization.
The .buttonStyle(.borderedProminent) modifier gives us a filled, rounded button
— the standard iOS "primary action" style. .tint sets the fill color.
You now have two files: AudioEngine.swift and ContentView.swift.
Build and run (⌘R). When the app launches, tap Start. iOS will present
the microphone permission dialog — tap Allow.
You should see the green bar responding to sound in real time. Try:
This is a working VU meter. It is a real audio application capturing live input, computing a meaningful metric, and displaying it with smooth animation. Everything from here builds on top of this foundation.
Before moving on, try these modifications to deepen your understanding:
bufferSize — Try 512 (lower latency, choppier) and
4096 (smoother, more lag). Notice how it affects the meter's responsiveness.minDb — Change -60 to -40. The meter becomes more
sensitive (quiet sounds register higher). Change it to -80 and the meter barely moves unless
you shout..animation(...) line.
The meter still works but looks jittery. The animation smooths out the discrete jumps between
buffer callbacks.The VU meter computes a single number per buffer. The spectrum analyzer will compute
hundreds of numbers — one for each frequency band. But first, in Section 4, we will optimize
our math using Apple's Accelerate framework. The hand-written for loop in
computeRMS works fine for 1024 samples, but the FFT will need to process thousands
of samples many times per second. We need faster tools.