The Future of Audio: OSSIC X

Adrian Ip

Interviewing the CEO of OSSIC

The Interview

I had some time to sit down with Jason Riggs and put some questions to him. He's a talkative chap and clearly both passionate and knowledgeable about his chosen field.

W: Thanks for taking the time to talk to me Jason. Positional audio is important for gamers and particularly with the rise of VR happening these days, it gets even more important. Over the years, many have tried to implement it properly using virtual surround or discrete multi-driver surround in headphones but the experiences have been lacking. First question then, have you actually done it?!

JR: *Laughs* Well you’re going to have to take a listen and convince yourself, but I think we have addressed a lot of the challenges with a lot of those historic system and the team has a lot of background in audio, at one time all the 4 co-founders all worked at Logitech and I ran the acoustic department there for a number of years and we made a lot of those surround sound headphones and we used the model that a lot of the gaming companies do which is they build the hardware and they license an algorithm from Adobe or DTS or whoever.

The challenge with most of those algorithms is that they’re generic in a number of ways so are a kind of one size fits all headphones/users and it turns out that the way we localise sound in a 3d space is actually pretty specific to our individual anatomy so this idea of HRTF is that for any point in space, what a sound will do is create some impulses in our left and right ears and the way it does that is by interacting with our anatomy.

There are some pretty well known phenomena such as inter-aural time delay, inter-aural intensity delay and that just means “Hey, if a sound is over here to the left, it gets to the left ear before the right ear” so we measure that time difference. That works quite well for the left and right so almost all of these headphones can do a good job of putting sound to the left or right and the reason is there’s only one point in space that has a large amplitude difference and a large time delay, but imagine now let’s say that a sound is horizontal but 45 degrees in front of us to the left. There’s also a mirror point at the back that has a very similar path of delays to our ears because we’re almost axis symmetric. Our head isn’t so different from a bowling ball with 2 microphones on it. I’m babbling but the point is that there’s a whole circle of spots that have those delays and those levels and that’s what creates the confusion so even in real life, if we don’t have the visual cues, it can be hard for us to determine front or back and this is what is called the cone of confusion where it can be challenging to localise things.

3d audio

W: What about binaural?

J: Well doing it with binaural rendering becomes really hard because the challenges are when there are a lot of points in space that have similar values of delays and levels, now what information are we going to use to decide that it’s in front of us or behind us and there’s less information hence the confusion.

So we look at other things like movement, if you imagine an animal in the woods that hears a sound, it can turn its head to resolve it. When we turn our head to the left for example and the sound gets louder in the right ear, we know that it’s in front so that helps resolve the confusion. But that only happens if we’re moving and if the sound is sustained long enough for us to hear it so if we’re playing a first person shooter and there’s a bullet, head tracking can’t help us. If there’s 10 bullets in a row and we’re not dead yet? Maybe! Maybe we can sample it and turn around and be like “Oh yeah! It’s over there!”

So the other thing we have is really that our anatomy is asymmetric so like our shoulders and torso where the sound bounces off and into our ears and gives us a cue that a sound might be above or below or from below our torso may block some of the sound which gives us this up down asymmetry because we have a torso below our head and not above it. But we also have front-rear asymmetry and these are things like our outer ear which blocks more from the back but is designed to reflect more from the front so at the front when the sound comes in and bounces off the ear, some frequencies it makes it louder but some frequencies it cancels so there will be these bumps and dips based on the geometry of the ear.

But the key challenge here is that it’s YOUR ear’s geometry and across humans our ears vary so a big problem with these surround headphones is that they don’t know your position in space and the algorithm is generic so you’re listening through someone else’s ears and the cues are quite different.

W: So in terms of the anatomy, I know you guys are taking some measurements and readings when someone puts the headphones on, but how accurate are you in terms of sampling “what is a person’s ear?”

J: Well the ear is definitely a challenging spot, we found that studying a lot of HRTF stuff, there are really zones where it doesn’t vary as much and then zones where it transitions really quickly followed by more zones where it doesn’t vary as much again. So we spent a lot of time looking at what spatial resolution do you need to sample this at and where can you approximate between it so in this first product we have 4 transducers per side and we’re doing that, (I hesitate to use calibration for the ear part, but…) there’s a calibration process and part of what we measure is the head size and the ear spacing and that’s absolutely a calibration, the sensor measures you it says your head is this big and your ears are this far away. That’s helping us calculate all these levels and delays and get them correct for you and this is per sample so basically taking place in real time.

But the ear part is also happening in real time but we’re doing it a bit differently in that we’re not doing a direct measurement but steering the high frequency component around your ear to make sure that it hits your ear from the right angle and with those 4 zones combined with some amplitude panning to get the reflections happening in the right order for your ear so there’s a big difference for front vs. side vs. behind and this happens in real time with real time panning around the ear.

This is important because we have a goal for the user experience of the product in that we don’t want the user to have to do anything weird like take measurements or put microphones in your ear or anything as a lot of things can go wrong with that, the user should put it on and it just work.

Finalised dimensions and packaging...

W: You guys have been through a number of prototypes to get here right?

J: Yes, we’re kind of on version 9. The stuff we’re showing here tonight for demo is an older one for the audio is I think 6 or 7, it’s a pretty simple demo playing some 5.1 music content where you can hear stereo and then the 5.1 mix, the other demo we’re showing is the core software algorithms with the software plugged into a generic headphone showing what we can do in virtual reality with full 3d content so we have a plugin into the game engine as opposed to limiting to 5 channels we have full xyz positioning of all the sources in the demo.

W: One of the things you mentioned there was these multiple drivers you had and a lot of audio aficionados often criticise multi-channel rendering of stereo from the elitists view?

J: Well, we’ve been spending a lot of this week at Abbey Road Studios and we’re talking with a lot of engineers and figuring out the future for music so we’re in this new music tech program they have called Abbey Road Red and we’re looking at a lot of the tools and utilities to put this all together. But in terms of the purity of the sound, people should be thinking about the fact that in general, music is mixed on speakers. So even in a 2 channel scenario, things are often mixed in a mixing room with a pair of speakers in front of you and the presentation you get out of 2 speakers is very much out of head localisation with a soundstage in front of you. That’s how the content was created.

So when we play this back on headphones, we get a very different presentation because when it was coming out of speakers, say coming out of the left, indeed, it got to our left ear and then a delayed version to our right ear and it was louder on the left and quieter on the right so all this cross talk and cues tells us it was out front to the left.

But when we play that same mix out on headphones, everything on that left channel is now just going into the left ear and that’s actually a real unnatural way to listen. I mean the only way that should happen is if someone is whispering into your left ear, then you should only hear it in your left ear, otherwise you’d expect to hear some in both.

So forget about the multi-channel thing. The multi-channel is really about beam steering the high frequency component through your ear for the right angles, so if you play a stereo mix on our headphones, the high frequency component is only coming from a pair of drivers. The channels have no direct correlation to the drivers. We could be rendering 500 points in space, the drivers are just there to get the high frequency component hitting from the right angle, but they don’t represent channels because what you absolutely can’t do is have 7 channels and have them feeding into 7 speakers in headphones, it just doesn’t make sense because you have to be doing all the real time cross talk and level pieces to give you the out of head localisation that you couldn’t get by feeding these directly to drivers.

When it comes to people asking “but hey, this out of head localisation and this different processing, is it unnatural and should we just be listening to a straight mix on headphones?” Well, if you like a straight mix on headphones, go for it, but just keep in mind that’s most likely not what the original artists mixed on or recorded on, it was likely some speakers with out of head localisation and soundstage. One of the things we can do is put a model in for a mixing room and a really simple set of near field monitors and replicate what you get at a mixing desk and that sounds more like what this was created on than headphones.

HRTF concepts...

W: Ok, next question and a slightly odd one. Some of the criticism/concern I’ve read about you guys is on price. Namely that you’re actually too cheap to be a high end headphone with this many drivers given the kickstarter pricing tiers etc. What’s the story here, it is a high end headphone right?

J: It is a high end headphone. The retail price is $399 and the goal with that was absolutely to make something that can be widespread as a consumer device and do a few things. The first being act as a bridge to the future and be a headphone that can play 5.1 content and stereo content and do these cool things, but the future is having a headphone that can render with infinite 3D resolution and not be limited by these channel formats that are the constraint of speakers.

I mentioned we were working with Abbey Road this week and when you go into the building there’s a plaque that says “First stereo recording 1931” so it’s like “OK, that’s been around for a while” and then the quadrophonic thing and the first surround ones by the late 30’s they’re there. Disney did Fantasia in I guess around 39 or 40 with 4 channel surround.

You know some of these channel based formats are limited by “well the film for the movie is this wide so we can fit 4 tracks” and we’ve been in that domain for 70 or 80 years but in some ways like VR and game audio can lead that because all of these things are already object based. The sound is already attached to all the objects in full 3D. Mixing them out in 5.1 is an artefact of some film format and not necessarily a logical thing to do.

W: So how do you guys envisage audio in gaming in the future? Obviously we have a lot of standards, some which interoperate and some don’t. I see you guys have the Vive here and I’m thinking “OK, as I put these on and I’m thinking about having audio being rendered as I’m tracked, that’s a very different use case from plugging them into my phone and listening to some music”. Are you guys trying to create a standard here somehow?

J: Again, it’s what is the bridge to the future and what IS the future. In one way, we’re creating a device that in its simplest form is a portable listening room or portable theatre. So for all the content today, here’s a device you can put some 2 channel or 5 channel content into and have a good listening experience with for out of head localisation and a spatial presentation. With all the standards today, we want to best serve that and give the user some different rooms and environments and the ability to really spatially perceive something that you can’t do on normal headphones.

In terms of what the future is, there are of course a lot more question marks but in games we’re definitely thinking about those things. One version on the Vive demo is going into the middleware where we can make this plugin standard which game developers can put in and allow us direct access to the object based content. Now, that requires them to plug our thing in, or somebody else’s thing of whoever is doing this. That’s cool for now with the first VR content, but the thing that was great about 5.1 and 7.1 was that everyone had it.

The time is right for us to standardise on some format(s). Now there’s a lot of things out there, and some stuff like ambisonics or channels etc. I’m not sure how all that’s going to shake out. We’re a little bit agnostic as to what it is apart from wanting to help especially push OPEN standards for 22 channel formats or higher order ambisonic format or a way to pass the objects to make it easy to not just connect our part because again to the extent that people come out with closed standards or a closed ecosystem, we create a challenge for the entire ecosystem and we WANT people to hear higher order, higher spatial resolution content. So to the extent that we can standardise on ways to produce those or make tools that make it easier to produce those, that’s part of our mission.

W: So have you spoken to AMD at all? I know they had some initiatives tied to TrueAudio on their graphics cards.

J: Some discussions but nothing major yet.

W: Last question then. You clearly sound like you’re not a business guy. You are an audio engineer of some sort, so what is your background?

J: *Laughs* Is it that obvious? Yeah so my background is I got into audio and acoustics as a kid. I probably thought I was going to be a rock star and was building my own speakers and I started building high end speakers when I was probably about 14 and I’ve basically been doing it ever since. I studied mechanical engineering and after school I went to work designing transducers and then got more into the system side, went to Logitech and eventually started the acoustic department there.

So yeah, my background is really in acoustics and system design and I’m just masquerading as a business person.

Contents

Adrian Ip Photo

About the author: Run Product Management for Aquis stock exchange. Designed, built and managed several market making, algorithmic and aggregation trading systems for most exchange traded asset classes including Equities, FI, FX and Commods cash and derivatives markets as well as multi-venue FX spot. Massive PC gamer!

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on The Future of Audio: OSSIC X
Button