Expectations of Perception

July 23rd, 2013

Recently I was working on a project in which a country road had a small drainage ditch to the side of it with flowing water in it. I looked at it once, and instantly thought, “I need to add a sound for that!”

Two weeks later, I was taking a hike through Cougar Mountain Regional Park (didn’t see any cougars– feline or otherwise), when I came across a very similar scenario in real life: a small stream of water flowing downhill. I stopped, looked, and listened, but to my surprise I heard no water trickling or babbling sounds emanating from this little stream.

If I went back and removed the sound from my project, someone could walk through the world, see that ditch, and wonder, “why the hell isn’t there a water flowing sound coming from that water?” The simple point here is that our perception of sound often differs from the reality of sound, and in games (or any form of media for that matter) we need to carefully weigh this when crafting an aural landscape. If a user is expecting a sound and it’s not there, it makes a negative impression. Not necessarily because the overall sound design is bad, but rather s/he notices a sound is missing. We have broken the wall of immersion. In the real world, slow moving water needs speed, but it also needs an obstruction in its path to cause enough movement to generate an audible sound. In the game world; however, it may just need to exist with the illusion of movement: perhaps it’s just an animated texture, or a shader trick. There doesn’t need to be a rock or an eddy causing a rapid, it’s just there, it’s expected, so let it have sound. Unless of course that goes against the aesthetic you’re trying to develop in the course of your project.

Sound design is all about managing perceptual expectation. We all know how weak gunfire sounds in real life compared to that which we create for games and film. So there is both the need to manage perception in the design of individual sounds as well as on the implementation side of sound design. But how do we choose what aspects in the world should and should not have sound and how those sounds behave? There are two things to consider here: technical and aesthetic.

On the technical side there are decisions to make based on what is available to you. What device(s) are you developing for? How much memory do you have available? Do you have DSP? Is there any sort of scripting or complex behavioral structure at your disposal? How many concurrent sounds can we play? What else may be concurrently going on in the world? Fortunately, as technology evolves, tools and technical specs are both improving so that even mobile games can use Wwise, FMOD, Unreal, etc. to provide the designer with more options, power, creativity, and flexibility to achieve their sonic goals for a project. Handhelds and mobile are losing their “stripped down,” “less powerful” monikers so that the only limitations we may have on our sound design are those we choose to put there. Of course, we’re not to the Mecca of no technical restrictions yet. Even on Playstation 4, I don’t have limitless memory and resources and that’s probably a good thing. Limitations often drive creativity and allow you to see things in a different light. We still need to fit our design into the technology we’re using, it’s just a matter of understanding the limitations of that tech and working through them.

The aesthetic side is more of a gray area. Technical specs are often set in stone, and while you may be able to negotiate for extra resources, you’re still working in an established ballfield. When determining what should have sound and how it should sound, that’s where the creative and artistic part of sound design really kicks in. This is where you get to decide (either by yourself is sometimes with the assistance of a game/creative director or other audio personnel) where you want the audio to take the user and how it should make them feel. There’s no real science in determining what is right or wrong, it’s usually a mix of gut feeling, experience, and inspiration from others that can drive you to the right place creatively.

I do not mean to suggest that technical and aesthetic design decisions are mutually exclusive. On the contrary, in a well designed audio plan, they are intimately entwined, each one informing the other. We generally want to create a believable soundscape within the context of the game world. What that means specifically is part of the beauty and mystery that is our craft. And the key to meaningful sound design is often understanding the differences in perception and reality and ensuring your audio vision for a project matches the sonic landscape you wish to create.

Wwise HDR best practices

May 19th, 2013

Audiokinetic has released Wwise 2013.1 with many new features, among them PS4 support, ITU BS 1770 compliant loudness meters and HDR audio. We worked with Audiokinetic to develop the HDR feature set over the past year and now that it’s out, I’d like to share some of my best practices that I’ve come up with (so far) in using it:

1). Keep it mellow: The first thing to be aware of is that the Wwise implementation of HDR audio is a relative volume scheme. We initially played with using SPL, similar to DICE’s Frostbite Engine, but abandoned that because a). we learned that even DICE didn’t use real-world SPL values, which sort of negates the whole reasoning behind using real-world values to set volume and b). because not everyone would use HDR and introducing a second volume slider (Volume and SPL) in Wwise just confused and overcomplicated things. So anything you want to be affected by the HDR effect (which may generally include all game sounds except UI, mission critical VO and the like) will live in its own bus with a special HDR effect on it. But this bus should be kept at a reasonable level. Generally around -12 to -18 dB. This will give you headroom in the final mix, and give your loudest sounds the ability to play without clipping. Furthermore when you have lots of very loud sounds plays, a more conservative bus level will allow things to sound cleaner. For individual sound structures, you can start with 0dB as your baseline, bring down sounds that should be quieter in the mix and bump up the louder ones above 0dB so they’ll push the HDR window up when they play.

2). The voice monitor is your best friend – The new voice monitor (shortcut: Ctrl + Shift + H) is a fantastic asset for tuning individual sound levels within the HDR space. Being able to visualize the input and output of all sounds, as well as see what affects the HDR window and how is immeasurably important when it comes to tuning individual sounds within the HDR space or preventing pumping of quiet sounds when a loud sound plays. The voice monitor is a fantastic tool whether or not you’re using HDR, but the ability to see the window behavior really makes it very intuitive as to how the effect works.

3). It’s okay to cheat: Don’t be afraid of a little judicious use of make-up gain to make an important sound punch through without affecting the HDR mix. Make up gain is applied post-HDR effect, so it won’t affect the window movement, but will boost a sound’s level. More importantly, play with the sensitivity slider in the HDR tab to dial in the best curve for your sounds. The HDR window can follow the volume of a sound, but often you only want the initial transient to affect the window and the tail to decay naturally while letting quieter sounds come through. For even more granular control, you can edit the envelope of individual waveforms in the source editor window. As an additional control, you can also reduce the tail threshold for louder sounds. Most of my louder sounds are set to 3 or 6, which means after the first 3 or 6 dB of loudness, the sound is removed from the calculation of the HDR window.WwiseEnvelopeEditing

4). Only generate envelopes on the important sounds in the game. This is a simple optimization tip. It takes CPU to constantly analyze the envelope of every sound. I only generate envelopes for the louder sounds in my game (making sure they’re not generated on ambience and incidental effects). It won’t affect the mix, but provides some performance savings.

5). The EBU/ITU-R BS.1770 standard is gold. Keep you game averaging around -23 LUFS/LKFS (based on a minimum of 30 minutes of gameplay). Everytime you play your game, connect Wwise and keep an eye on the Integrated meter in the Loudness Meter. What matters here is the AVERAGE loudness, the longer you capture, the more accurate your measurement. As a rule of thumb, I always keep the loudness meter up and running in my project.


6). Inverse square attenuation make sounds behave naturally – one of the initial “issues” I had once we got HDR working in our game was that using our old attenuation curves (generally an exponential curve over a set distance based on the general loudness of the sound ranging from 15m – 250m) just didn’t work as we needed them to. We wanted attenuation curves to sound natural in a real-world environment, so I created a set of inverse-square curves. The inverse square law states that the volume of a sound is halved by a multiple of every x meters. For example the most common curve we use falls off fifty percent every 4 meters over the span of 80 meters. So at 0m it’s 0 dB, at 4m it’s -6dB, at 8m it’s -12dB, at 16m it’s -18dB, at 32m it’s -24dB, etc. This has the added benefit of limiting the number of attenuation curves needed which is a performance savings. Of course, inverse square curves are not a blanket solution, there will always be times when you want/need something custom, so we still maintain some custom curves.


I’m happy to share the settings I have on my HDR effect, but I feel this will vary based on project, so I’m not sure how useful that would really be for people. Another feature we’ve added is a speaker_type switch controlled by an rtpc which affects the HDR threshold based on the speaker type the user is playing through. The end result is automatic dynamics switching based on speaker type where the better your speaker system, the greater the dynamic range in the mix (similar to what games like Uncharted offer in their audio options menu). In short, there’s a ton of ways to use this great feature, and I’m sure there’s going to be plenty of other tips and tricks people figure out as they start to play around. Enjoy!

A sound designer is born: my origin story, or how to get lucky and lie your way to success

April 7th, 2013

During GDC this year and the week after I ended up telling the story of how I got into the industry a few times, so I decided to commit it to the ether for posterity or some false sense of self-worth. I’ve also decided to embarrass myself publicly by digitizing the demo I made way back then in 1998 that got me into game audio.  It is horrible and borders on unlistenable. Well technically you can listen to it, but you wouldn’t want to, and it’s hard to fathom how someone could have heard this monstrosity and then offered me a job.

My story, while it may not have been exactly common 15+ years ago, doesn’t really happen anymore. The short story is that I lied my way into game audio. The longer story is that I was temping at Berkeley Systems, a video game company in Berkeley, CA after graduating college and they liked me enough to keep me on as their shipping guy. I liked it there, but really wanted to be doing something creative, so I started making a lot of noise as such. I was passed up for a production assistant job (thankfully) and ended up talking to their sound designer a couple times because I thought he had such a cool, crazy job. At this point in my life I’d never used a computer program related to sound ever. I knew how to play notes in BASIC and had a cassette 4-track and had done tons of music, tape loops, and other weird experimental stuff ever since I was a kid, but I didn’t know what MIDI was, how to create a sound effect or really much of anything in regards to sound and computers.

Anyway, one day the VP of Product Development called me into his office to tell me they fired their sound designer (apparently he didn’t come into work very often and they’d had to contract out all their sound work as a result). So he wondered what experience I had and if I’d be interested in the job. I couldn’t believe this was happening, so seeing an amazing opportunity, I lied through my teeth, telling him I had tons of experience and had scored some student films, blah blah blah. He asked me to bring in a demo the next day. I ran home that night and banged a couple things out on my sampler (half of which were a couple synthy pad soundscapes I claimed were from a student film I worked on. They weren’t.) and threw another horrible track called “Gall Stone Attack” onto a cassette and gave it to him. The next week he called me into his office and said “It’s nothing you’re ever gonna hear in any of our games, but it shows you know what you’re doing, so you got the job.” I was ecstatic. And because they’d already farmed out their sound work for the next 6 months or more, I locked myself in my office and started teaching myself everything I could about digital audio and sound design. I believe my first experiment in editing digital audio was removing all the guitar solos from Slayer’s Seasons in the Abyss, but that’s a story for another day. Nowadays, kids are coming out of school with degrees in sound design and blowing me away with their skillsets, so this whole thing known as my career could never happen today.

Everything on my demo was recorded with a Roland S-50 12-bit (!) sampler. It had a floppy drive and I had tons of sample disks for everything from pads to horns and strings to sfx. “Gall Stone Attack” also had a Roland R-8mkII drum machine and Casio SK-5 on it (and I think I used the SK-5 on “Silly Torture” as well). Since I had no sequencer or even an audio editor or audio interface for my computer, each track was recorded live onto my Fostex 4-track and mixed down to the cassette below. (I opted to not de-noise these as part of the digitization process, so they could “preserved” in the state in which they were originally heard).

And so without further ado, I present a public shaming: two tracks from my demo reel in early 1998. I cringe when I listen, and laugh a little. My skills have definitely come a long way, but I still can’t believe they listened to this crap and took a gamble on me anyway. I’m eternally grateful and shocked. Be forewarned.  Be gentle.



Modulation in Wwise

March 21st, 2013

One of Wwise’s few shortcomings is its current lack of support for LFOs. Modulation can be a godsend to make looping static sounds feel way more dynamic and alive. (an example using volume, pitch, and lpf is here). I wanted to outline two different means here you can “cheat” modulation in Wwise using some technical trickery.

1). Modulation RTPC:
This is the simpler method, although somewhat limited in it’s dynamism. Simply create an RTPC linked to a global timer, and once the timer reaches it’s max, it resets itself to 0. I’ve called mine “modulation” with a min of 0 and a max of 100 (units being set in the engine as seconds). I can then draw an rtpc curve for modulation on any sound I want and affect the pitch, volume, lpf, etc. over time (friendly reminder: subtle pitch changes are WAY more appealing than extreme pitch changes).The most important factor here is to remember to have your values at 0 and 100 be identical, so there’s no pop in the loop. The obvious drawback to this solution is that the modulation is uniform with no possibility of change per cycle. However with a 100 second loop, you have a fair bit of time to build a dynamic modulation curve whose looping won’t be easily detected by a user.

2). Using a tremolo effect as an LFO:
This solution comes from Steven Grimley-Taylor who posted about it on the Wwise forums, and is nothing short of a brilliant use of the tools available in Wwise to make an LFO a reality. It also has some limitations, which we’ll discuss in a bit. The basic gist of this concept is to create a white or pink noise sound generator and sidechain it to a tremolo effect. As Steven explains it:

“Create a Sound SFX object with a Tone Generator Source set to White Noise. Then add a tremolo plugin and then a metering plugin which generates an RTPC


The tremolo becomes your LFO control and you can map it anywhere you want. It becomes unstable at faster speeds, but then this is probably not the best solution for Audiorate FM. For normal ‘modulation’ speed LFOs it works a treat. Wwise_Mod_LFO_tremoloWwise_Mod_LFO_rtpc


You can also go into modular synth territory by creating another of these LFO’s and then modulating the frequency of the first LFO with the amplitude of the second.

Oh the LFO audio should be routed to a muted bus, you don’t actually want to hear them, just generate a control RTPC”Wwise_Mod_LFO_bus

I’m currently using a couple of these in my project and it works great, the only drawback is that using the tone generator plus the tremolo per LFO isn’t super cheap (~2 – 3% of CPU), and the more modulators you want to add, the more expensive it gets. But you can drive the parameters of the LFO from other rtpcs, opening up enormous avenues of creativity and evolving sounds.  It’s a really nice way to spice up some bland looping sounds and give them a bit more life.

The future of next-gen sound blah blah blah

March 9th, 2013

Sorry, I couldn’t resist being a little snarky as I typed that title out. Every time there’s a new generation of consoles on the horizon, words begin to flow about what “next-gen” means in relation to (insert your discipline here).  For me, there are two interrelated aspects that we can look at to push envelopes further: technological and structural. Technological advances are those made possible by the capabilities of the hardware and how that interacts with software. CD-ROMS meant we could start streaming redbook audio, tons of voiceover, and video. The Xbox’s DSP chip gave us a low pass filter and reverbs built into the system. The PS3’s SPU core architecture gave us an entire core (or more I suppose if you sufficiently bribed your programmers) to do with whatever we wanted: create custom DSP or utilize FMOD or Wwise and go crazy with the delays and reverbs and eqs, etc. The PS4’s 8gb of memory means, given the time to load data into RAM, we have a near limitless reservoir for game resources. Ok, so “near limitless” is probably an over-exaggeration, but we’re talking a 16x increase over the last generation!

By structural, I mean how does the technology create new ways for us to deliver a sonic experience. The sub-industry of mobile and web development have democratized game development significantly, and with them and the rise of Unity as a viable engine has audio middleware solutions like FMOD and Wwise along for the ride as well. Even Wwise, which started out as a PC, 360, PS3 only platform 6 years ago, now has support for iOS, Android, Windows Phone, and direct integration into Unity. With the democratization of tools comes the possibility to use these tools in novel ways. One such example is adaptive mixing. While in console land we’ve been doing this for years (for a great example of this see Rod Bridgett’s discussions of mixing Scarface: The World is Yours for PS2 back in 2006), this is only now being possible across all platforms. And with the potential for the Ouya, Green Throttle, Steam Box, Apple TV and other small Android, iOS and PC-based home consoles in the coming years we should see “next-gen” meaning what can we do to push content to be more impactful no matter the platform.

While I think the structural aspect has far more implications for sound design as a whole, much of what becomes possible is through technology. I want to touch on one specific technology in this post: procedural synthesis and design. Procedural synthesis is nothing new. Guys like Nicolas Fournel and Dylan Menzies have been doing it for years. Audiokinetic have had wind and impact generation tools in Wwise for several years now. Audiogaming’s wind and rain tools are integrated into the new version of FMOD Studio and will be making their way into Wwise soon (not to mention their latest plug-ins for procedural generation of footsteps, vehicle engine models, and fire).  And there have been countless papers and demonstration videos showing off better and better sounding procedural algorithms from the aforementioned to elements like fabric and full physics simulations.

When developing a game, we often take a cross-platform approach because it’s the easiest way to maximize profits for minimal cost: put your product out on every possible platform and you have a multiplier effect on how many people may play/buy your game, ideally with minimal additional effort on your part. Hopefully in the next year, if all these new hardware devices do come out, we’ll be at a point where we have enough processor to utilize procedural synthesis in games across all platforms, and not just minimal use on 360, PS3, and PC. Having these effects not just available, but possible, across all platforms maybe the shot in the arm procedural synthesis needs to finally bridge the gap from “talked about incessantly” to “the here and now.”

These two elements: realtime, dynamic mixing and procedural synthesis, while nothing new, may be the holy grail of audio development for games in the near-future. I am eagerly looking forward to how things shape up over the next few years, to see what others are doing, and to further explore these waters myself.