Spatial Sound

Large Room-scale Virtual Reality in Audio Visual Arts using Sensors and Biofeedback

Pierre Jolivet, Paul Oomen, Gábor Pribék


November 2019

Pierre Jolivet (UCD SMARTlab)
Paul Oomen (Spatial Sound Institute, 4DSOUND) Gábor Pribék (MOME)

published by:
LINKs serie 3-4, Spatial Sound Institute

The paper outlines the possibilities of a new transdisciplinary praxis for sound art. It focuses on highlighting the key and current technologies that can be used as creative tools for the aforementioned praxis (in the absence of completeness). It also provides real-life case studies and outlines the areas where further research is possible or desirable. The arrival of inside-out tracking systems, tracking areas as large as the physical environments we are surrounded by, is now a reality. This can set standards and provide flexibility in large stage or gallery-based live music performances. VR combined with an omnidirectional sound environment and a new untethered freedom of movement, allows the replication of a physical surrounding, where sound and visuals can be composed by spatial parameters within volume variables for audience immersion or virtual concerts. An artist could compose using sensors, environmentally specific as well as biofeedback related - through EEG, EKG and even forthcoming haptic suit. In the context of sound spatialisation, all can be controlled with the assets relayed by OSC as 4DSOUND is having the protocol at its core.

Mixed Reality, Sound Art, Sensor, Biofeedback, Live Performance

1. Introduction

1.1 Multimodal interaction as a creative tool

The following paper came into thought after the realisation that sound art can be creatively expanded beyond hearing with all of the senses, specifically sight and touch (taste and smell is not part of this paper but could open up for further research possibilities). The artworks can use a wide range of automated or live performed input and output methods, especially with the usage of biofeedback and mixed-reality based spatial parameters extending or altering a physical environment. The advent of accessible technology interconnected with open source communication protocols such as Open Sound Control (OSC) creates the possibilities of having seamless multi-platform systems where sensors, sound and visuals can communicate with each other in an embedded environment. More robust low level integration of technologies, e.g. through shared memory and full-duplex data transmission, is offered by emergent innovative C++ platforms like NAP framework.

Experimental environment using Unity, Processing,
Spout and weather asset (rain)

1.2 Transdisciplinary Praxis

Computational developments make the technologies needed for the input and output methods (biofeedback sensors, head-mounted displays, speakers, CPUs and GPUs) accessible to a consumer level. The technology was around, albeit in primitive form, since the last century but was restricted to research labs. At present, it has become economically viable to create modular set-ups. At the current state of technology, the interaction with these systems is not always intuitive and natural. Ergonomic design solutions are needed to provide user-friendly experiences in which the artists or audience can confidently participate. The integration of these input and output methods into one modular system stimulates the sensing in a way that creates a new language between the artist and the audience by interconnecting the virtual and physical space. The concentric diagram below exemplifies the concept of the described transdisciplinary praxis.

2. Technology

2.1 Virtual Reality

Mixed-Reality is stimulating the senses (sight, hearing, touch, smell, taste) through an illusion, which makes the user believe that relative to physical presence one is situated in another simulated environment (Virtual Reality) or in a physical environment which is extended with simulated content (Augmented Reality) (1). Mixed-Reality is everything in-between, where physical and virtual worlds can be blended or switched any time.

With the arrival of Oculus Rift in May 2016, HTC Vive in June 2016 and HoloLens in May 2016 (and the list could go on) Virtual-, Augmented- and Mixed-Reality (VR, AR, MR)  created a stir in the audio-visual world bringing a step closer a more compact MR device. HTC Vive could even do room-scale tracking using two base stations (Lighthouse tracking system) covering a 5m x 5m area, and now going beyond 5m x 5m with the Lighthouse 2.0 tracking system. Oculus Rift S and the standalone headset Oculus Quest (no PC required) introduced the inside-out tracking for consumers in 2019, where external sensors are not needed anymore and the tracking is held by multiple cameras placed on the headset itself. This is done with the help of computer vision technology, going beyond the positional tracking of the headset with six degrees of freedom. Hand tracking is also a built-in addition in the Oculus Quest as it is coming in 2020. This will allow more natural interactions (no need for VR controllers) so the user can easily interact with virtual and physical objects simultaneously (previously achieved by a connected Leap Motion sensor).

These headsets brings closer supplement or exchange within a physical environment in which a live performance is held with spatial visuals, sounds and hand based gestures.

Oculus Quest hand-tracking concept (Excerpt from

2.2 Omnidirectional Sound Environment

The large room-scale application of VR, especially when considering participation of large audience that move freely within the performance environment, requires a system that can compute and convincingly project dimensional sound sources in space, without the sound imaging being limited to a sweetspot. In parallel to the visual possibilities of VR, this means not only surround or spherical projection of sound around the audience, but also to be able to place sounds physically anywhere inside an environment, among and in-between the audience, as well at any distance outside the physical boundaries of the performance space. This allows to interconnect the sounds with their visual representation in VR.

4DSOUND has pioneered the design of omnidirectional sound environments since 2012. An example of a venue equipped with such a system is the Spatial Sound Institute in Budapest, covering an active listening area of 300 m2.

4DSOUND system at the Spatial Sound Institute, Budapest.

High-end omnidirectional sound environments such as built by 4DSOUND with Omniwave speaker technology come with a considerable price tag, but more affordable alternatives have been employed in numerous projects, such as this recently developed 3d-printed omnidirectional speaker by LASG and Bloomline Acoustics that comes at sub-$100 a piece.

The 4DSOUND system is driven by a 4D Engine that processes discrete audio inputs matching meta-data of virtual sound source characteristics and behaviour in space, including position, dimension, rotation and perceptive characteristics. All parameters of the 4D Engine are accessible via OSC (Open Sound Control), therefore allowing core integration with other platforms such as VR devices, sensor aggregates and haptic systems.

With the release of 4DSOUND v2.0 built with the NAP Framework, new possibilities emerge for deeper systems integration. Inspired on game engine architecture, NAP includes powerful rendering capabilities and allows to connect different technologies and domains like sound, light, kinetics, micro processors, sensors, mobile etc. in one-and-the-same responsive, stable and durable core. NAP provides a thin layer that hosts a modular workflow of such complex system applications, and flexibility and deep integration of specific third party libraries like ImGui and Faust.

2.3 Physical-Virtual vs Virtual Audio Systems

The state-of-the-art in VR sound is binaural audio with personalised real-time rendering of the Head-Related Transfer Function (HRTF), delivering a virtual sound space primarily based on directional cues (2). An omnidirectional sound environment offers an interesting alternative with increasing advantage when catering large audience within one experience.

Compared to the built-in headphones in VR headsets, sound coming from a physical environment around the audience caters for spatial cues to the listeners that better correlate with the acoustics of the performance space and as a result, performing better depth-of-field of the projected sound images. An omnidirectional sound environment as such represents a virtual sound space projected within or blending with the physical space one is in. The embeddedness of both audience and soundwaves in one physical space inherently provides better to the acoustics of the listener’s bodies; that is, full body reception of sound waves that propagate in physical space, instead of directional cues that are exclusively presented to the ears and thus lack acoustic correlation with the rest of the body.

Besides the qualitative improvements of the experience, the omnidirectional sound environment bypasses other limitations of binaural audio applications such as the necessary matching of personalised HRFs with the shape of the individual listener’s auricle to optimise effectivity of the directional cues; and coherent dimensional projection of the sound sources being limited to sources very close to the ears, or sources that are dimensionally smaller or equal to the size of the sound transmitters, ie. the built-in headphones of the VR headsets.

2.4 Hybrid Audio Systems for Far-and Very-Near-Field Sound 

A better alternative for including personalised spatial audio comes with the introduction of acoustically transparent headphones that deliver sound through bone conduction. Recently AfterShokz has been introducing different models consumer-ready to the market that are compatible for wireless audio transmission, although questions remain about the level of fidelity such headphones provide.

Aftershokz wireless bone-conducting headphones

The ergonomics of carrying such devices is also to be considered, especially when wearing it combined with a head-mounted display. Bone conductive headphones are already being integrated in VR headsets. HoloLens or Oculus Quest are already approaching this technique with a hybrid solution - they put small speakers very near to our ears on the head strap itself which allows to simultaneously hear the real surroundings and the simulated environment as well, both spatially. Another valid alternative could be the recent Bose Soundwear speakers powered with Bose’s own waveguide technology, with claims made about higher fidelity sound and, as they are worn on the shoulders, they might increase the participant physical comfort.

Bose Soundwear with personalised projection of binaural audio through waveguide technology

The benefits of including such headphones could be to reproduce selected sound sources that are small and happen in the very near field <0.3 m to the ears of the listener. Such very-near-field vicinity is difficult to achieve within physically installed sound environments. This can work in fluent conjunction with an omnidirectional sound environment that takes over the reproduction once sound sources have a virtual distance of >0.3 m from the perceiver and/or when they exceed the range of the headphone’s sound transmitters in dimensionality. As such, the two reproduction methods driven by one 4DSOUND engine create a hybrid system that have the potential to produce a fluency across the very near-field and the far-field of the virtual sound space.

The complexity of broadcasting audio and data to multiple individual rendering units centrally from one sound engine system remains to be considered, as well as the logistics required to cater individual engines for each headphone system to render personalised audio in real-time. Nevertheless, advances are being made to include such micro-engines on single-board portable computers running on Raspberry Pi.

2.5 Environmental Sensors

New off-the-shelf single board computers are now available with the Raspberry Pi setting-up a standard as a quad-core, USB powered, GPIO (General Purpose Input Output) device. The GPIO allow the system to receive HATs (Hardware Attached on Top) such as Sense HAT including on a single board: gyroscope, accelerometer, magnetometer, temperature, barometric pressure and humidity or the much more compact Enviro pHAT with temperature / pressure sensor, light and RGB colour, accelerometer / magnetometer sensor and a 4-channel 3.3v, analogue to digital sensor (ADC).

Raspberry Pi equipped with an Enviro pHAT connected via Ethernet to a LAN (Local Area Network)

An electromagnetic field-translating device (Elektrosluch) has been utilized by one of the authors (Pierre Jolivet) to fully control an omnidirectional sound environment through the OSC protocol in the interactive sound installation Mémétique Élucubrations. The large-scale 4DSOUND environment at the Spatial Sound Institute was an integral part of the composition as the audience was required to navigate through a computer self-generated analog-to-digital soundscape to experience a complex auditory alteration of the sound.

2.6 Biofeedback

In parallel, non-intrusive EEG headband like Muse are small enough to be integrated with a VR headset (albeit with ergonomic difficulty) for brain-controlled events. An OSC host could even be set-up from a smartphone or a tablet for mobility. Neurable created a versatile system incorporating six dry sensors as a strap replacement but it is restricted to vertical application due to high equipment cost. Another interesting and complementary option is MySignals from Cooking Hacks, a development platform as a shield for Arduino with 18 possible sensors. Arduino could easily be integrated to a Raspberry Pi or used as a stand-alone IoT (Internet of Things) device.

Muse 2 headband (includes EKG)

Following the steps of the pioneering work of Alvin Lucier with Music for Solo Performer (1965), NUE from the Korean Artist Lisa Park developed a custom communication platform between MindWave and 4DSOUND through a custom-built smartphone app that translates the real-time brainwave data into OSC messages for spatial control of sound.

Lisa Park testing NeuroSky EEG Headband within a 4DSOUND-built environment

Custom smartphone OSC aggregate between MindWave and 4DSOUND

The work Body Echoes captures the inner movements in the body of yoga-master Amanda Morelli through a grid of custom-built Arduino controllers and contact microphones attached to Morelli’s body, The raw noisy sounds of Morelli’s blood flow and breath were then linked to the sensor-captured biodata, sent through OSC to 4DSOUND and translated to audible spatial movements correlating with the energy flows within Morelli’s body as she performs various yoga positions.

Amanda Morelli performing yoga postures in 4DSOUND wired to
Arduino sensors and microphones

Last but not least, haptic integration presents itself in the form of a suit conceptualised and manufactured by a company called Tesla, merging 80 electrostimulation channels (tactile feedback), 10 motion capture sensors (motion tracking) and 5 temperature channels. A more basic but affordable modular suit (sub-€500) has been developed by Bhapics - a healthy competition.

3. Spatial Performance

3.1 OSC in Large Room-scale Situation

OSC in large room-scale situation is an approach for a new VR compositional feature, incorporating procedural and self-generative art for sound and visuals. As previously stated, technology is now reaching maturity for a creative implementation within a performing space. The processes could go in multiple directions using OSC host <-> client as the protocol assigns, IPs, specific ports and paths for communication and can deal with integer, floating-point, string or OSC blob (binary packaging).

OSC Tag    Data Class
i    Integer
f    Floating-point
s    String
b    Blob

The software needs to support the OSC protocol internally or via a third-party plugin or libraries. For example, Ableton Live requires Max for Live (Max/MSP from Cycling ‘74), Processing, Python and Unity needs a third-party asset to be able to communicate through OSC. All-in-all, straightforward actions are accessible for anybody with beginner-level programming skills. It is quite easy to use OSC as a structural core for navigating data for a room-scale spatial experience inputting and outputting various data types.

With headsets such as Oculus Quest, it is now a reality to freely walk around, for instance, in the sound environment of the Spatial Sound Institute in Budapest which is 300 m2. A virtual representation of the environment is needed for safe navigation in the space and the users in it should be also represented via avatars with their exact locations. Another solution is the passthrough feature, which can directly input stereo video stream to the device, in real-time, so the users can be aware of their surroundings while walking around. A third solution is to use of augmented-reality headsets with semi-transparent displays such as HoloLens or MagicLeap. Within these devices the users can see the real environment extended with rendered imagery through the semi-transparent display. Numerous headsets are needed to include the audience in the experience generated by the audio-visual artists, which is now a possibility as the cost of the devices went down (Oculus Quest). A local high-bandwidth server solution is needed to handle the data-flow through OSC (or other alternate network communication protocols) to handle real-time communication between multiple devices. This technology needs further research and development for successful and safe implementation.

Screen from video, Dead and Buried Arena game at Oculus Connect 5 in 2018

3.2 Effective Scalability

Ableton Live is now representing an industry standard as not only a host for plug-ins but a platform for software development. Max for Live devices are programmable add-ons allowing for effects and instruments to be developed as well as a MIDI/OSC communication tool. The OSCular Collection originally from EraserMice bring support within Ableton to OSC as host and client. Any OSC feed could be captured, analysed and processed.

OSCular receiver capturing data from Muse and assigning to Reaktor (Curve to Jitter for adjustments)

This series of devices are perfect for sound art or even conventional music with OSC <-> MIDI conversion, regarding our topic is it paramount as control streams, especially for Muse, could be set to manipulate aspects of 4DSOUND. This could be done directly with Raspberry Pi based sensors using Python scripts with value tuning as the data output may vary.

Python coding including SenseHAT and the OSC library with 1000 divider on the barometric pressure integer output for data moderation with client

4DSOUND 2.0 uses a Python API for quick fine tuning and rapid prototyping. Without recompiling or restarting the 4D Engine, it is possible to edit modules or modify behaviours in real-time. The implementation of a scripting language opens the 4D Engine to creative coding that goes beyond the open back-end via OSC, but does not require the level of complexity of the C++ layer. However, even on C++ level it is possible to create modules within a straightforward architecture as it is designed for modular integration of effects and components, up to the design of new virtual entities with shared or unique properties within existing functionalities.

3.3 Geolocational Composition

All the aforementioned preparatory structural methodology provides a circumspect convergence toward a truly inclusive design in composition with a new medium such as VR, in combination with sound art and spatialisation. Not to mention, public inclusiveness intermingled with self-generative, data-driven production. Beside data inputs, for example Spout can create an intermediary channel for texture mapping bringing sound, environmental or biofeedback translation into VR. For notice, Spout or Syphon as it is called on the Mac platform is currently restricted to a bitmap output with all the scalability limitation.

Spout fed Unity scene with self-generated live texture mapping produced via a sensor-based Processing sketch

3.4 Tracer - Spatial sound composer

VR has been gaining popularity in the game and educational market and it is now time to tackle the standard for creative art practices with such medium. To do that, we have to consider two approaches: compositional and live performance.

Tracer already creates a framework for composition, using HTC Vive. It is an immersive interface for composers to develop their composition in a spatial sound environment (e.g.: 4DSOUND system) using VR. Tracer consists of two parts: an Ableton Live set for sound and a VR interface developed in Unity, which is translating it to a 3D space. Tracer therefore presents a somehow extended, immersive interface for composers planning their composition on a 4DSOUND system. It is a prototyping tool for the artist, that lets them design the spatial composition using freely drawn paths.

Tracer OSC communication flow

The performer is able to draw paths in the virtual 3D space, and to assign Tracers – sound emitting objects – to these hand-drawn paths. A Tracer then emits the assigned sound to a dynamic position in virtual space. The performer has control over several sonic and spatial parameters directly from within the VR. They are able to manipulate these in real time, making it possible to create live performances and experiences. The tool in its current version is meant to be used by performers and not by an audience, but because the environment is being mapped and extended with real-time sound visualizations and spatial interactions beyond the auditory experience, the VR based visual experience in Tracer could be opened to the audience as well. Such experience would need further design decisions by the involved artists.

Tracer can set a standard for creative integration of VR and sound environments, albeit in a conventional architecture where the creator takes centre stage, excluding the audience from actively taking part in the visual experience. With this paper, we intend to cover an additional sound art-oriented perspective, not excluding tools like Tracer but expanding toward a more experiential standpoint to be central in the creative process.

Tracer interface in-situ at the Spatial Sound Institute

4. Conclusion

It is now not so far-fetched to imagine an even higher integration with Internet / 5G connected IoTs for a better and more diverse data streams (gas sensors, per example, are also available). The key issue will be cost, as an artist or a community of artists need as much autarky as possible from mercantilism to achieve satiated potential. This new tactic-in-creation could see the emergence of a Symbiotic Art, creating frames of references with ecological aspects. One can envision compositional output as environmental statements in our current climatic precarity. An untouched complementary feature is haptics, due to the complexity (and cost) of a suit aggregating the necessary sensors. Tesla is currently the only company developing a full-body multifaceted system, but bHaptics propose now a basic low-cost modular torso-based product.

Full description of the Teslasuit (includes also motion capture, temperature control and biometrics)


1. Milgram, P., Takemura, H., Utsumi, A. and Kishino, F., 1995, December. Augmented reality: A class of displays on the reality-­virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282­293). International Society for Optics and Photonics.

2. Geronazzo, M., 2019. User Acoustics with Head-Related Transfer Functions.