Skip to main content
 首页 » 资源教程


2016年09月17日 16:43:121780

Spatial Audio


The Google Virtual Reality (VR) SDK features a best-in-class audio rendering engine that is highly optimized for mobile VR. The goal of the engine is to give listeners a truly realistic spatial audio experience by replicating how sound waves interact with the environment and the listener's head and ears.



How Spatial Audio works

Spatial Audio is a powerful tool that you can use to control user attention. You can present sounds from any direction to draw a listener's attention and give them cues on where to look next. But most importantly, Spatial Audio is essential for providing a believable VR experience. When VR users detect a mismatch between their senses, the illusion of being in another world breaks down.

The Google VR SDK simulates the main audio cues humans use to localize sounds:

Interaural time differences.

Interaural level differences.

Spectral filtering done by our outer ears.

谷歌VR SDK模拟主音频提示人类使用本地化的声音:

Interaural time differences: When a sound wave hits a person's head, it takes a different amount of time to reach the listener's left and right ears. This time difference varies depending on where the sound source is in relationship to the listener's head. The farther to the left or right side of the head the object is located, the larger this time difference is.


Interaural level differences: For higher frequencies, humans are unable to discern the time of arrival of sound waves. When a sound source lies to one side of the head, the ear on the opposite side lies within the head's acoustic shadow. Above about 1.5 kHz, we mainly use level (volume) differences between our ears to tell which direction sounds are coming from.

Spectral filtering: Sounds coming from different directions bounce off the inside of the outer ears in different ways. The outer ears modify the sound's frequencies in unique ways depending on the direction of the sound. These changes in frequency are what humans use to determine the elevation of a sound source.

耳间水平差异:频率越高,人类都无法辨别声波的到达时间。当声源位于头部的一侧,另一侧的耳朵在头部的声影。约1.5 kHz以上,我们主要使用水平(体积)之间的差异我们的耳朵告诉声音是来自哪个方向。


Spatial Audio in Google VR

To simulate sound waves coming from virtual objects, we use a technology known as ambisonics to envelop the listener's head in a sphere of sound. The Google VR audio system surrounds the listener with a high number of virtual loudspeakers to reproduce sound waves coming from any direction in the listener's environment. The denser the array of virtual loudspeakers, the higher the accuracy of the synthesized sound waves.



Virtual loudspeakers are made possible through the use of head-related transfer functions (HRTFs). The cues discussed in the previous section are captured within these HRTFs. When audio is played through HRTFs over headphones, the listener is fooled into thinking the sound is located at a particular point in 3D space.

In the real world, as sound waves travel through the air, they bounce off of every surface in our environment, resulting in a complex mix of reflections. The Google VR SDK breaks this complex set of sound waves down into three components:


Direct sound

Early reflection

Late reverb

虚拟扬声器成为可能通过使用head-related转移函数(头)。在前一节中讨论的线索在这些头捕获。打了头在耳机音频时,听众是傻到以为声音位于某一特定点在3 d空间。

在现实世界中,当声波穿过空气,每一个在我们的环境中表面被弹开,导致一个复杂的混合反射。谷歌VR SDK将这组复杂的声波分为三部分:


The first wave that hits our ears is the direct sound which has travelled directly from the source to the listener. The farther a sound source is from the listener, the less energy it has resulting in a lower volume than closer sounds.


The first few reflected waves that arrive at your ears are known as the early reflections. These give the listener an impression of the size and shape of the room they are in. The Google VR SDK spatializes the early reflections in real time and then creates new, artificial sources for each of them.

最初几个反射波到达你的耳朵被称为早期的倒影。这些给听者的印象房间的大小和形状。谷歌VR SDK spatializes早期实时反射,然后创建新的人工来源。

Over time, the density of reflections arriving at your ears builds more and more until the individual waves are indistinguishable. This is what we refer to as the late reverb. The Google VR SDK has a powerful built-in reverb engine that can be used to very closely match the sound of real rooms. If you change the size of the room or the surface materials of the walls around you, the reverb engine reacts in real time and adjusts the sound waves to match the new conditions.


The Google VR audio system can also simulate the ways in which sound waves traveling between the source and listener are blocked by objects in between. The Google VR audio system simulates these occlusion effects by treating high and low frequency components differently, with high-frequencies being blocked more than low frequencies. This mimics what is happening in the real-world.

随着时间的推移,反射的密度到达耳朵构建越来越多,直到个体是没有区别的。这就是我们称为混响。谷歌VR SDK具有强大的内置混响引擎,可用于紧密匹配的声音真正的房间。如果你改变房间的大小或你周围的墙壁的表面材料,混响引擎实时反应和调整声波以匹配新的情况。



Closely related to the effect of occlusion is a sound object’s directivity pattern. A directivity pattern is a shape or pattern that describes the way in which sound emanates from a source in different directions. For example, if you walk in a circle around someone playing a guitar, it sounds much louder from the front (where the strings and sound hole are) than from behind. When you are behind, the body of the guitar and the person holding it occlude the sound coming from the strings.

随着时间的推移,反射的密度到达耳朵构建越来越多,直到个体是没有区别的。这就是我们称为混响。谷歌VR SDK具有强大的内置混响引擎,可用于紧密匹配的声音真正的房间。如果你改变房间的大小或表面


With the GVR audio system, a user can change the shape of a directivity pattern for an object and mimic the non-uniform ways in which real-world objects emit sound. There are two available parameters:

随着时间的推移,反射的密度到达耳朵构建越来越多,直到个体是没有区别的。这就是我们称为混响。谷歌VR SDK具有强大的内置混响引擎,可用于紧密匹配的声音真正的房间。如果你改变房间的大小或表面

Alpha: Changes the shape of the sound emission pattern.

Sharpness: Controls how wide or narrow the emission pattern is.

Head movements and sound

By moving our heads, we can perceive the relative changes in all of the time level and frequency cues. This helps us to localize sounds more accurately.


When a user moves his head in VR, his head-mounted display tracks the movements. The Google VR SDK uses rotation information to rotate sounds around inside the virtual loudspeaker array in the opposite direction of head movement. In this way, virtual sounds stay locked in position.

当用户移动他的头在虚拟现实,他头戴显示设备跟踪运动。谷歌VR SDK使用旋转信息在虚拟扬声器阵列内部旋转声音头运动的方向相反。通过这种方式,虚拟声音保持锁定的位置。

Design Tips

GVR Audio Room

For each part of your GVR experience, you should first determine if you need a GVR Audio Room. Audio Rooms provide early reflections and reverb, which help make the sound more realistic when there are nearby walls or structures. They are—not surprisingly—most useful when your scene takes place in an actual room. For outdoor scenes, an Audio Room can feel less natural, because you may have only one reflective surface (the ground).

You have full control over the amount of reverb and the material of the surfaces, so take care to match the room sound to the environment.


Atmospheric sounds

For producing the general ambience of a scene, like the wind in the trees, ocean waves, and birds, there are two choices for sound playback.

GVR Audio Sources.

GVR Audio Sources are the most flexible and work best for objects that move dynamically or that users might interact with. For example, if you attach an Audio Source to a bird, the listener hears the sound of the bird change naturally as it flies near, and then farther away. You can sprinkle Audio Sources throughout the environment to create the general ambience.





GVR SoundField Sources.

GVR SoundField Sources play back ambisonic files that let you hear audio from every direction. This is similar to how skybox or 360 photos work. Since ambisonic files only respond to head rotation, they work best as sounds in the distance.

GVR SoundField来源。

GVR SoundField回放ambisonic来源文件,让你听到声音从各个方向。这类似于天空体或360张照片是如何工作的。因为ambisonic文件只响应头旋转,他们在远处工作最好的声音。

Animate the sound source

If you want to command the listener's attention, but a sound source is out of view, you can animate the position of the sound. This enables the listener to pinpoint sounds much more quickly.


Repeat the sound

To help the listener pinpoint a sound, play it more than once. This is why, for example, your phone's ringtone is not a single beep. If it was, you would have a hard time finding it, and you might not even be sure it was your phone. You can achieve the same effect by using sounds that comprise many distinct elements.



Use more complex sounds


You should avoid using overly quiet sounds, sounds lacking in high frequencies, or simple tones like a sine wave beep. Instead, craft sounds that have sufficient volume levels, are complex, and contain a full spectrum of frequencies.



Tips for crafting sounds

Audio Source sounds

For Audio Sources, make sure the sound files you use are monophonic and don't include reverb.

SoundField Source sounds (ambisonic sounds)

For SoundField Source sounds, we currently support first-order ambisonic files. These files are more complex than Audio Source files, and the tools and libraries that support them are still in the early stages.

With digital audio workstation (DAW) software and a plugin such as Ambix, you can create ambisonic files two ways:

Using monophonic files, place sounds on a virtual sphere around the user. You can move the sounds around and add effects to them.




Use an ambisonic microphone like the SoundField ST450, TetraMic, or Zoom H2n to capture the sound of an environment in 3D. You can load the captured sound into the Ambix plugin and run effects on it, rotate it if needed, and then mix.

Check your work

After you get your VR experiences up and running, make sure to check your work. You want to ensure that what you see matches what you hear. For example, if you can hear the sound of ocean waves crashing, but the ocean looks frozen, it takes away from the sense of realism.


Be sure to visit all the places your users will go in your VR experience to confirm that everything sounds natural. In experiences where users can roam freely, they love putting their ear right up to sound sources.

使用一个ambisonic麦克风像SoundField ST450,TetraMic或变焦H2n捕捉3 d环境的声音。你可以捕获的声音加载到Ambix插件和运行效果,如果需要旋转,然后混合。



Tom Merton/Getty Images

Take extra care to ensure that the sounds you're using are of high quality, are at a clear but comfortable volume, and adjust realistically with any movement. Because most of your users will be listening through headphones, make sure to test your sound on a variety of headphones, not laptops or desktop speakers.