Shared Audio Experiences (Headphones Recommended)

In today's world, the media interacted with is expanding its dimensions. This is a demo I made with a couple of my lab peers in school.

Using the Multipeer Connectivity framework, I was able to bring shared audio experiences to nearby iOS devices for our project. We wanted something to go with nature and thus decided to find a bunch of nature sounds and record the demo in the EcoCommons. My job was to extend this Apple demo (https://developer.apple.com/documentation/arkit/world_tracking/tracking_and_visualizing_planes), which uses an ARWorldTrackingConfiguration and an MCSession to send data back and forth between peers—such as items in the scene.

Tap to view Swift Code (View Controller)

import UIKit
import SceneKit
import ARKit
import MultipeerConnectivity

class ViewController: UIViewController, ARSCNViewDelegate, ARSessionDelegate {
	// MARK: - IBOutlets
    
    @IBOutlet weak var sessionInfoView: UIView!
    @IBOutlet weak var sessionInfoLabel: UILabel!
    @IBOutlet weak var sceneView: ARSCNView!
    @IBOutlet weak var sendMapButton: UIButton!
    @IBOutlet weak var mappingStatusLabel: UILabel!
    
    // MARK: - View Life Cycle
    
    var multipeerSession: MultipeerSession!
    var audioSpatializer = AudioSpatializer()
    let soundIdentities = [
    	"pine", 
    	"bird", 
    	"bird2", 
    	"frog", 
    	"monkey", 
    	"rain", 
    	"wind"
    ]
    var soundIdentity = "pine"

    // Usual setup code here (i.e. viewDidLoad)
	// MARK: - Multiuser shared session
	    
    /// - Tag: PlaceCharacter
    @IBAction func handleSceneTap(_ sender: UITapGestureRecognizer) {
        // Hit test to find a place for a virtual object
        guard let hitTestResult = sceneView
            .hitTest(
	            sender.location(in: sceneView),
	            types: [
	            		 .existingPlaneUsingGeometry,
	            		 .estimatedHorizontalPlane
	            ]
            )
            .first
            else { return }

        // Place an anchor for a virtual character.
        let anchor = ARAnchor(
        	name: soundIdentity, 
        	transform: hitTestResult.worldTransform
        )
        sceneView.session.add(anchor: anchor)

        // Send the anchor info to peers
        guard let data = try? NSKeyedArchiver.archivedData(
        withRootObject: anchor,
		requiringSecureCoding: true)
        else { fatalError("can't encode anchor") }
        self.multipeerSession.sendToAllPeers(data)
    }

    /// - Tag: ReceiveData
    var mapProvider: MCPeerID?
    func receivedData(_ data: Data, from peer: MCPeerID) {
        do {
            if let worldMap = try NSKeyedUnarchiver.unarchivedObject(
            	ofClass: ARWorldMap.self,
				from: data
			) {
                // Run the session with the received world map
                let configuration = ARWorldTrackingConfiguration()
                configuration.planeDetection = .horizontal
                configuration.initialWorldMap = worldMap
                sceneView.session.run(
                	configuration, 
                	options: [.resetTracking, .removeExistingAnchors]
                )
                // Remember who provided the map for showing UI feedback
                mapProvider = peer
            }
            else
            if let anchor = try NSKeyedUnarchiver.unarchivedObject(
            	ofClass: ARAnchor.self,
            	from: data
            ) {
                // Add anchor to the session
                sceneView.session.add(anchor: anchor)
            }
            else {
                print("unknown data recieved from \(peer)")
            }
        } catch {
            print("can't decode data recieved from \(peer)")
        }
    }
}
						

The missing piece of the puzzle was figuring out how to get the audio to correspond with the objects placed in the scene (each one emitting its own sound source). To do this, I created my own AudioSpatializer class which uses PHASE to place the sound sources corresponding to each sound object.

Tap to view Swift Code (Audio Spatializer)

import PHASE
import SwiftUI
import CoreMotion

class AudioSpatializer {
	// The audio engine for PHASE
    let phaseEngine = PHASEEngine(updateMode: .automatic)
    // Reference frame (this is what you would use for calibration, but here we simply use the value measured at the start of motion or its identity)
    private var referenceFrame = matrix_identity_float4x4
    // The listener in the scene
    var listener: PHASEListener!
    // The sound source in the scene
    var source: PHASESource!
    var sources: [PHASESource] = []
    init() {
        // Create a Listener.
        listener = PHASEListener(engine: phaseEngine)
        // Set the Listener's transform to the origin with no rotation.
        listener.transform = referenceFrame
        // Attach the Listener to the Engine's Scene Graph via its Root Object.
        // This actives the Listener within the simulation.
        try! phaseEngine.rootObject.addChild(listener)
        // Create an Icosahedron Mesh.
        let mesh = MDLMesh.newIcosahedron(
        	withRadius: 1.0, inwardNormals: false, allocator:nil
        )
        // Create a Shape from the Icosahedron Mesh.
        let shape = PHASEShape(engine: phaseEngine, mesh: mesh)
        // Create a Volumetric Source from the Shape.
        source = PHASESource(engine: phaseEngine, shapes: [shape])
        // Translate the source to the origin
        source.transform = referenceFrame;
        source.gain = 12.0
        // Attach the Source to the Engine's Scene Graph.
        // This actives the Listener within the simulation.
        try! phaseEngine.rootObject.addChild(source)
        let pipeline = PHASESpatialPipeline(
        	flags: [.directPathTransmission, .lateReverb]
        )!
        let mixer = PHASESpatialMixerDefinition(spatialPipeline: pipeline)
        pipeline.entries[PHASESpatialCategory.lateReverb]!.sendLevel = 0.1;
        phaseEngine.defaultReverbPreset = .mediumRoom
        // Create a node to play the sound
        try! phaseEngine.assetRegistry.registerSoundAsset(
               url: Bundle.main.url(
               		forResource: "Forest_Amb", 
               		withExtension: "wav"
               )!,	identifier: "forest",
               assetType: .resident, channelLayout: nil,
               normalizationMode: .dynamic
        )
        // Do this for all the other sounds/objects ...
       	// Then create the sampler and load it with the first sound
        let samplerNodeDefinition = PHASESamplerNodeDefinition(
            soundAssetIdentifier: "forest",
            mixerDefinition: mixer // Configured later
        )
        // Set the Push Node's Calibration Mode to Relative SPL and Level to 0 dB.
        samplerNodeDefinition.playbackMode = .looping
        samplerNodeDefinition.setCalibrationMode(
            calibrationMode: .relativeSpl, level: 0.0
        )

        try! phaseEngine.assetRegistry.registerSoundEventAsset(
            rootNode:samplerNodeDefinition,
            identifier: "nature_event"
        )

        // Associate the Source and Listener with the Spatial Mixer
        let mixerParameters = PHASEMixerParameters()
        mixerParameters.addSpatialMixerParameters(
        	identifier: mixer.identifier, source: source, listener: listener
        )

        // Associate the event with the mixer
        let soundEvent = try! PHASESoundEvent(
            engine: phaseEngine,
            assetIdentifier: "nature_event",
            mixerParameters: mixerParameters // As yet undefined
        )

        // Start the engine and sound playback
        try! phaseEngine.start()
        soundEvent.prepare()
        soundEvent.start(completion: nil)
    }

}
						

There are two more tasks with the audio, which are to update the sound relative to where the person is standing in the world—as well as adding any sources when world data is received from the peer. Another thing I noticed was despite configuring PHASE, the sound sources were not attenuating as the listener got further away from them. This was addressed by using the inverse square law in the updateListenerTransform(updatedTransform: simd_float4x4) method.

Tap to view Swift Code (Audio Spatializer Methods)

// Update the listener's position and orientation relative to each source
// and attenuate gain using inverse square law
func updateListenerTransform(updatedTransform: simd_float4x4) {
    listener.transform = updatedTransform
    for source in sources {
        let distance = sqrt(
        	pow(source.transform.columns.3.x - updatedTransform.columns.3.x, 2)
        	+ pow(source.transform.columns.3.y - updatedTransform.columns.3.y, 2)
        	+ pow(source.transform.columns.3.z - updatedTransform.columns.3.z, 2)
        )
        source.gain = Double(1 / pow(distance, 2))
    }
}
// Add a source to the audio scene with a specific transform
func addSource(withTransform: simd_float4x4, identifier: String) {
    let mesh = MDLMesh.newIcosahedron(
    	withRadius: 1.0, inwardNormals: false, allocator:nil
    )
    // Create a Shape from the Icosahedron Mesh.
    let shape = PHASEShape(engine: phaseEngine, mesh: mesh)
    // Create a Volumetric Source from the Shape.
    source = PHASESource(engine: phaseEngine, shapes: [shape])
    // Translate the source to the origin
    source.transform = withTransform;
    // Attach the Source to the Engine's Scene Graph.
    // This actives the Listener within the simulation.
    try! phaseEngine.rootObject.addChild(source)
    sources.append(source)
    let pipeline = PHASESpatialPipeline(
    	flags: [.directPathTransmission, .lateReverb]
    )!
    let mixer = PHASESpatialMixerDefinition(spatialPipeline: pipeline)
    pipeline.entries[PHASESpatialCategory.lateReverb]!.sendLevel = 0.1;
    phaseEngine.defaultReverbPreset = .mediumRoom
    // Associate the Source and Listener with the Spatial Mixer
    let mixerParameters = PHASEMixerParameters()
    mixerParameters.addSpatialMixerParameters(
    	identifier: mixer.identifier, source: source, listener: listener
    )
    let samplerNodeDefinition = PHASESamplerNodeDefinition(
        soundAssetIdentifier: identifier,
        mixerDefinition: mixer
    )

    // Set the Push Node's Calibration Mode to Relative SPL and Level to 0 dB.
    samplerNodeDefinition.playbackMode = .oneShot // Only play the sound once
    samplerNodeDefinition.setCalibrationMode(
        calibrationMode: .relativeSpl, level: 0.0
    )

    // Register the new sampler (sound object)
    try! phaseEngine.assetRegistry.registerSoundEventAsset(
        rootNode: samplerNodeDefinition,
        identifier: "nature_event\(sources.count)"
    )
    let soundEvent = try! PHASESoundEvent(
        engine: phaseEngine,
        assetIdentifier: "nature_event\(sources.count)",
        mixerParameters: mixerParameters // As yet undefined
    )
    soundEvent.prepare()
    soundEvent.start()
}
						

Once the custom audio class was setup, it was really simple to integrate into the existing view controller methods, where the peers send and receive data from the world. Inside handleSceneTap(_ sender: UITapGestureRecognizer> and receivedData(_ data: Data, from peer: MCPeerID), the audio source could be updated using the anchor transform by subsequently calling audioSpatializer.addSource(withTransform: anchor.transform, identifier: anchor.name!) after the anchor was added to the scene (sceneView.session.add(anchor: anchor)). The result ended up sounding something like this (headphones recommended for best experience):

Spatialized Sound Objects (Headphones Recommended)