The Future of Audio: OSSIC X

Apr 13, 2016 at 06:01pm EDT

Interviewing the CEO of OSSIC

Preamble

I remember when I first read about the reboot of Clash of the Titans. I’d loved the original as a kid, the trailer looked good, I was so excited for that movie. I watched the trailer a lot of times. I showed it to friends, family, getting more and more hyped each time I watched it.

I also remember the first time I heard that Chris Pine was going to be Captain Kirk in a rebooted Star Trek… I won’t go into the details of my reaction but at the time, I vaguely recall not being particularly pleased…

These two scenarios are relevant to the coming article, I’ll explain why later.

Intro

I’m kind of into audio. Not as much as I used to be but enough that bad audio bothers me. In years gone by I’ve spent too much money on audio equipment of all kinds.

Pre-amps, amps, power amps, receivers, processors, speakers, speaker cable, cd players, super audio cd players, dvd audio, headphones, earphones, noise cancelling headphones, mp3 players (I was one of the first aboard the digital audio revolution having purchased a Diamond Rio in the 90’s!), the list goes on and on.

I’ve looked at ohms, THD’s, delays, SPL meters, vibration absorbing footpads etc. I’ve had vinyl, tape, DAT, MiniDisc, Harman, Denon, Arcam, Pioneer, Rotel, Meridian, KEF, TAG McLaren, B&W, Cambridge Soundworks, AKG, Sennheiser, Bose, Beats (didn’t say it was all good stuff!), Sony, Yamaha, Trace Elliot, Roland. I’ve had 1.0, 2.0, 2.1, 5.1 and 7.1.

You get the idea. I kind of got my geek on when it came to audio. Well since those days, I’ve had kids and downsized on the audio hardware stuff significantly. What I’ve discovered since then is that I can still get a decent audio experience from ear/headphones.

Unfortunately, there’s a huge “BUT” that comes with that statement. This being multi-channel audio and that has really come to the fore for me with my return to PC gaming. I’ve tried a lot of different earphones and headphones too over the years ranging from the cheap to the ludicrous, but when it comes to surround sound, everything on offer I found lacking. I yearned for the proper positional audio in a soundscape that a proper 7.1 speaker system could provide. I’ve spent a fair amount in buying my own discrete multi-driver and virtual surround headphones looking for something that can genuinely reproduce a decent surround experience.

The OSSIC X

Alas, I’ve come up short. There are some decent headphones out there for sure, but none of them genuinely make me believe the sound is coming from where it should be. I figured it was a limitation of the form factor (2 things on your head, one in/on/over each ear) and there was nothing else to be done. I’ve moaned about it to friends for ages. Everyone has tried to convince me that existing technology is fine with virtual surround or multi-driver surround, some have tried to show me the way of binaural audio which is also not bad, but again in all the examples I’ve heard, fails to accurately recreate a decent soundscape, particularly the “behind” noise.

Enter OSSIC with their X. I’d read a little bit about OSSIC on their kickstarter page (here) and it sounded great. But you know what? Virtual surround headphones sounded great on paper too. So did discrete multi-driver headphones, and binaural recording. All of it sounds great on paper, but when I try it, it fails to do some of what it’s supposed to. So I looked at OSSIC but resolved to leave it be until I’d read some proper decent reviews on it once it had been released. I read up a bit on HRTF (Head Related Transfer Function) which is what OSSIC uses to create some of the alleged magic of the X and again. It sounded great on paper, but there was no way I was going to spend yet more money on yet more headphones with a lacking positional audio experience. I already have lots of headphones that can’t do that properly.

That was until a couple of weeks ago when OSSIC (I’d signed up to their mailing list to keep up to date on it) announced they were doing an open house demo in London. Ok fine, time to get my hopes up and go try this thing out.

I RSVP’d and asked if I could get an interview before trying the equipment out for the site. The very friendly Jordan and Kristen sorted me out with a chat with the CEO of OSSIC who was over here on Thursday.

The Interview

I had some time to sit down with Jason Riggs and put some questions to him. He's a talkative chap and clearly both passionate and knowledgeable about his chosen field.

W: Thanks for taking the time to talk to me Jason. Positional audio is important for gamers and particularly with the rise of VR happening these days, it gets even more important. Over the years, many have tried to implement it properly using virtual surround or discrete multi-driver surround in headphones but the experiences have been lacking. First question then, have you actually done it?!

JR: *Laughs* Well you’re going to have to take a listen and convince yourself, but I think we have addressed a lot of the challenges with a lot of those historic system and the team has a lot of background in audio, at one time all the 4 co-founders all worked at Logitech and I ran the acoustic department there for a number of years and we made a lot of those surround sound headphones and we used the model that a lot of the gaming companies do which is they build the hardware and they license an algorithm from Adobe or DTS or whoever.

The challenge with most of those algorithms is that they’re generic in a number of ways so are a kind of one size fits all headphones/users and it turns out that the way we localise sound in a 3d space is actually pretty specific to our individual anatomy so this idea of HRTF is that for any point in space, what a sound will do is create some impulses in our left and right ears and the way it does that is by interacting with our anatomy.

There are some pretty well known phenomena such as inter-aural time delay, inter-aural intensity delay and that just means “Hey, if a sound is over here to the left, it gets to the left ear before the right ear” so we measure that time difference. That works quite well for the left and right so almost all of these headphones can do a good job of putting sound to the left or right and the reason is there’s only one point in space that has a large amplitude difference and a large time delay, but imagine now let’s say that a sound is horizontal but 45 degrees in front of us to the left. There’s also a mirror point at the back that has a very similar path of delays to our ears because we’re almost axis symmetric. Our head isn’t so different from a bowling ball with 2 microphones on it. I’m babbling but the point is that there’s a whole circle of spots that have those delays and those levels and that’s what creates the confusion so even in real life, if we don’t have the visual cues, it can be hard for us to determine front or back and this is what is called the cone of confusion where it can be challenging to localise things.

W: What about binaural?

J: Well doing it with binaural rendering becomes really hard because the challenges are when there are a lot of points in space that have similar values of delays and levels, now what information are we going to use to decide that it’s in front of us or behind us and there’s less information hence the confusion.

So we look at other things like movement, if you imagine an animal in the woods that hears a sound, it can turn its head to resolve it. When we turn our head to the left for example and the sound gets louder in the right ear, we know that it’s in front so that helps resolve the confusion. But that only happens if we’re moving and if the sound is sustained long enough for us to hear it so if we’re playing a first person shooter and there’s a bullet, head tracking can’t help us. If there’s 10 bullets in a row and we’re not dead yet? Maybe! Maybe we can sample it and turn around and be like “Oh yeah! It’s over there!”

So the other thing we have is really that our anatomy is asymmetric so like our shoulders and torso where the sound bounces off and into our ears and gives us a cue that a sound might be above or below or from below our torso may block some of the sound which gives us this up down asymmetry because we have a torso below our head and not above it. But we also have front-rear asymmetry and these are things like our outer ear which blocks more from the back but is designed to reflect more from the front so at the front when the sound comes in and bounces off the ear, some frequencies it makes it louder but some frequencies it cancels so there will be these bumps and dips based on the geometry of the ear.

But the key challenge here is that it’s YOUR ear’s geometry and across humans our ears vary so a big problem with these surround headphones is that they don’t know your position in space and the algorithm is generic so you’re listening through someone else’s ears and the cues are quite different.

W: So in terms of the anatomy, I know you guys are taking some measurements and readings when someone puts the headphones on, but how accurate are you in terms of sampling “what is a person’s ear?”

J: Well the ear is definitely a challenging spot, we found that studying a lot of HRTF stuff, there are really zones where it doesn’t vary as much and then zones where it transitions really quickly followed by more zones where it doesn’t vary as much again. So we spent a lot of time looking at what spatial resolution do you need to sample this at and where can you approximate between it so in this first product we have 4 transducers per side and we’re doing that, (I hesitate to use calibration for the ear part, but…) there’s a calibration process and part of what we measure is the head size and the ear spacing and that’s absolutely a calibration, the sensor measures you it says your head is this big and your ears are this far away. That’s helping us calculate all these levels and delays and get them correct for you and this is per sample so basically taking place in real time.

But the ear part is also happening in real time but we’re doing it a bit differently in that we’re not doing a direct measurement but steering the high frequency component around your ear to make sure that it hits your ear from the right angle and with those 4 zones combined with some amplitude panning to get the reflections happening in the right order for your ear so there’s a big difference for front vs. side vs. behind and this happens in real time with real time panning around the ear.

This is important because we have a goal for the user experience of the product in that we don’t want the user to have to do anything weird like take measurements or put microphones in your ear or anything as a lot of things can go wrong with that, the user should put it on and it just work.

Finalised dimensions and packaging...

W: You guys have been through a number of prototypes to get here right?

J: Yes, we’re kind of on version 9. The stuff we’re showing here tonight for demo is an older one for the audio is I think 6 or 7, it’s a pretty simple demo playing some 5.1 music content where you can hear stereo and then the 5.1 mix, the other demo we’re showing is the core software algorithms with the software plugged into a generic headphone showing what we can do in virtual reality with full 3d content so we have a plugin into the game engine as opposed to limiting to 5 channels we have full xyz positioning of all the sources in the demo.

W: One of the things you mentioned there was these multiple drivers you had and a lot of audio aficionados often criticise multi-channel rendering of stereo from the elitists view?

J: Well, we’ve been spending a lot of this week at Abbey Road Studios and we’re talking with a lot of engineers and figuring out the future for music so we’re in this new music tech program they have called Abbey Road Red and we’re looking at a lot of the tools and utilities to put this all together. But in terms of the purity of the sound, people should be thinking about the fact that in general, music is mixed on speakers. So even in a 2 channel scenario, things are often mixed in a mixing room with a pair of speakers in front of you and the presentation you get out of 2 speakers is very much out of head localisation with a soundstage in front of you. That’s how the content was created.

So when we play this back on headphones, we get a very different presentation because when it was coming out of speakers, say coming out of the left, indeed, it got to our left ear and then a delayed version to our right ear and it was louder on the left and quieter on the right so all this cross talk and cues tells us it was out front to the left.

But when we play that same mix out on headphones, everything on that left channel is now just going into the left ear and that’s actually a real unnatural way to listen. I mean the only way that should happen is if someone is whispering into your left ear, then you should only hear it in your left ear, otherwise you’d expect to hear some in both.

So forget about the multi-channel thing. The multi-channel is really about beam steering the high frequency component through your ear for the right angles, so if you play a stereo mix on our headphones, the high frequency component is only coming from a pair of drivers. The channels have no direct correlation to the drivers. We could be rendering 500 points in space, the drivers are just there to get the high frequency component hitting from the right angle, but they don’t represent channels because what you absolutely can’t do is have 7 channels and have them feeding into 7 speakers in headphones, it just doesn’t make sense because you have to be doing all the real time cross talk and level pieces to give you the out of head localisation that you couldn’t get by feeding these directly to drivers.

When it comes to people asking “but hey, this out of head localisation and this different processing, is it unnatural and should we just be listening to a straight mix on headphones?” Well, if you like a straight mix on headphones, go for it, but just keep in mind that’s most likely not what the original artists mixed on or recorded on, it was likely some speakers with out of head localisation and soundstage. One of the things we can do is put a model in for a mixing room and a really simple set of near field monitors and replicate what you get at a mixing desk and that sounds more like what this was created on than headphones.

HRTF concepts...

W: Ok, next question and a slightly odd one. Some of the criticism/concern I’ve read about you guys is on price. Namely that you’re actually too cheap to be a high end headphone with this many drivers given the kickstarter pricing tiers etc. What’s the story here, it is a high end headphone right?

J: It is a high end headphone. The retail price is $399 and the goal with that was absolutely to make something that can be widespread as a consumer device and do a few things. The first being act as a bridge to the future and be a headphone that can play 5.1 content and stereo content and do these cool things, but the future is having a headphone that can render with infinite 3D resolution and not be limited by these channel formats that are the constraint of speakers.

I mentioned we were working with Abbey Road this week and when you go into the building there’s a plaque that says “First stereo recording 1931” so it’s like “OK, that’s been around for a while” and then the quadrophonic thing and the first surround ones by the late 30’s they’re there. Disney did Fantasia in I guess around 39 or 40 with 4 channel surround.

You know some of these channel based formats are limited by “well the film for the movie is this wide so we can fit 4 tracks” and we’ve been in that domain for 70 or 80 years but in some ways like VR and game audio can lead that because all of these things are already object based. The sound is already attached to all the objects in full 3D. Mixing them out in 5.1 is an artefact of some film format and not necessarily a logical thing to do.

W: So how do you guys envisage audio in gaming in the future? Obviously we have a lot of standards, some which interoperate and some don’t. I see you guys have the Vive here and I’m thinking “OK, as I put these on and I’m thinking about having audio being rendered as I’m tracked, that’s a very different use case from plugging them into my phone and listening to some music”. Are you guys trying to create a standard here somehow?

J: Again, it’s what is the bridge to the future and what IS the future. In one way, we’re creating a device that in its simplest form is a portable listening room or portable theatre. So for all the content today, here’s a device you can put some 2 channel or 5 channel content into and have a good listening experience with for out of head localisation and a spatial presentation. With all the standards today, we want to best serve that and give the user some different rooms and environments and the ability to really spatially perceive something that you can’t do on normal headphones.

In terms of what the future is, there are of course a lot more question marks but in games we’re definitely thinking about those things. One version on the Vive demo is going into the middleware where we can make this plugin standard which game developers can put in and allow us direct access to the object based content. Now, that requires them to plug our thing in, or somebody else’s thing of whoever is doing this. That’s cool for now with the first VR content, but the thing that was great about 5.1 and 7.1 was that everyone had it.

The time is right for us to standardise on some format(s). Now there’s a lot of things out there, and some stuff like ambisonics or channels etc. I’m not sure how all that’s going to shake out. We’re a little bit agnostic as to what it is apart from wanting to help especially push OPEN standards for 22 channel formats or higher order ambisonic format or a way to pass the objects to make it easy to not just connect our part because again to the extent that people come out with closed standards or a closed ecosystem, we create a challenge for the entire ecosystem and we WANT people to hear higher order, higher spatial resolution content. So to the extent that we can standardise on ways to produce those or make tools that make it easier to produce those, that’s part of our mission.

W: So have you spoken to AMD at all? I know they had some initiatives tied to TrueAudio on their graphics cards.

J: Some discussions but nothing major yet.

W: Last question then. You clearly sound like you’re not a business guy. You are an audio engineer of some sort, so what is your background?

J: *Laughs* Is it that obvious? Yeah so my background is I got into audio and acoustics as a kid. I probably thought I was going to be a rock star and was building my own speakers and I started building high end speakers when I was probably about 14 and I’ve basically been doing it ever since. I studied mechanical engineering and after school I went to work designing transducers and then got more into the system side, went to Logitech and eventually started the acoustic department there.

So yeah, my background is really in acoustics and system design and I’m just masquerading as a business person.

Demo 1: VR with the Vive

First up is the Vive. I put on the headset and have the headphones (generics) slipped onto my head. This is to showcase the algos which OSSIC have come up with for 3D positional audio and plugged into a game engine with object location tracking on audio. I’m in some kind of wizards’ lab and there is clutter all over the place. Something seems odd and I’m trying to figure out what I should be clicking or waving the control at etc and spinning and looking around as I get auditory clues to what’s going on all the time.

Then it hits me. There is sound everywhere.

Let me just say that again. Because it’s important.

There is sound.

EVERYWHERE

OSSIC + VR...

It’s an assault on the ears. I’m not just talking about front, back, left and right, but up and down too.

All of a sudden, I realise what was odd, it took me a minute to register, because usually when I put on headphones for gaming, after all these years, I suspect like most people, we make some kind of automatic mental adjustment to not expect “real” sound like you get, you know, in real life.

But here, it’s real. And it’s weird. But in all the VR demos I’ve done over the years, this is the first time I can genuinely pick out where something is coming from properly. I was at EGX Rezzed last week (check out my VR report here) and one of the demos I tried was TheBlu, where you’re on an underwater wreck and the blue whale comes out of nowhere to stop next to you. Well what I didn’t say in that article was that when the whale first appeared, I wasn’t looking in the direction it came from, I was facing completely the opposite horizontal direction so if it was real life, I would have heard it coming from behind and above me. As it happened, I couldn’t tell where it was coming from and just spun wildly around looking for something I heard but couldn’t see and didn’t know where the sound came from.

None of that happens here. I can place (to within maybe a couple of feet) where EVERYTHING is. I don’t know how else to describe it. It’s amazing, and that’s for 2 reasons.

  1. It’s the first time I’ve experienced anything approaching to a genuine 5.1 or 7.1 soundscape from something on my head.
  2. It’s the first time I’ve experienced 3D positional audio from something on my head.

Most gaming headphones are aimed at doing the best job they can of making 5.1 or 7.1, whether that’s from virtual surround or multi-driver surround. This fully achieves that.

And then goes further by giving you the vertical axis. I’m sorry but I just don’t have the words to describe how truly blown away by this I am.

Demo 2: 2 Channel Audio

Ok, so for the next demo, I have the version 6 or 7 prototype (out of 9 in case you didn’t read the full interview) with some stereo audio. If you’ve read some of my headphone reviews previously, you may recall that I usually use Pink Floyd’s Echoes as one of my sample pieces. Well no Echoes here unfortunately, the demo is on a laptop with some software on it for switching between normal 2 channel and OSSIC’s “naturalised” stereo sound. I’m happy to note however that they do have a couple of Pink Floyd songs on the selection.

The headphones themselves are pretty large and have a small circuit board on top. They’re plugged into a big box which is plugged into the laptop.

Joy (the OSSIC CTO) explains that for the prototypes, everything was built large to prove the engineering principles of the audio before worrying about miniaturisation. They’ve done the engineering now and have the mock up final size for me to look at. It’s pretty big, but not huge. There’s no boom mic hanging out and Joy explains to me that there’s a mic array as they wanted it to be easily portable without too much stuff hanging off the headphones.

Prototype 6... or maybe 7...

I pull on the prototype and hit play. The familiar introduction sound of “Money” starts to play. Now I realise why they’ve chosen this song. Right, left, left, right, right, right, right, right, left, right. Unmistakeable. And what I’ve been accustomed to hearing on headphones for years. The opening 12 seconds are a completely segregated audio experience of discrete left then right sequential sounds, this continues but with some overlay from both channels as the guitar comes in with that iconic riff.

Joy restarts the song and clicks the control to switch over to the naturalised OSSIC audio soundscape. Wow. The effects are dazzling. This is Money as I first heard it on my dad’s stereo when I was a kid. Yes left and right, but more… natural. Left isn’t a hard left. It’s a left that’s there, but as Jason mentioned in the interview, bits of the left also reach the right, just a bit later and a bit quieter. I’m not quick enough to try to pick out the money sounds and measure the delay in my head, I try it a couple of times restarting the song but give up and decide to focus on the experience. It’s excellent.

While writing this article, I fired up Money again at home, trying it on the HyperX Cloud II's (my current usage gaming headset) and switching back and forth to the PC's speakers. They're right you know, the discrete left right in headphones is very unnatural, at least I found it so.

Another Pink Floyd song now. Time. Again, I understand the choice. The introduction with its famous clocks chiming left and right sets the soundscape again. Once again I’m clicking the switch to change back and forth between stereo and naturalised stereo. Again, the difference is huge and I prefer the OSSIC sound.

Wrapping Up

So here I am, having written way more than I’d intended for the brief glimpse I was given of OSSIC and the future of audio. I thought long and hard before I titled this article “The Future of Audio”. I don’t proclaim myself as any great visionary, but I know what I heard. I’ve been looking for OSSIC for the better part of a couple of decades. I haven’t had a finalised product on my head and as with any pre-production kickstarter, there are risks.

But, and it’s a significant but. So impressed was I with what I heard, both in the demos and from Jason and Joy in terms of explanation that it’s been way more than I needed to convince me.

Will it succeed? Only time will tell. The kickstarter was funded within one day I think so in that respect it will succeed. I agree with Jason in his assessment that a lot of these surround setups are vestiges and remains of the past, but standards are a tricky thing. Will OSSIC ultimately succeed in their trying to generate a new standard based on video game engine object location cues? Who knows. But even if they don’t, this is without doubt the most realistic soundscape I’ve ever heard from something that sits on my head, and in many ways, also more real than having 5 or 7 speakers spread out around the room given its vertical axis audio capabilities too.

Various prototypes...

I started this article talking about two very different films. One I was so excited for and which ultimately was a disappointment (Clash of the Titans) and another which I was so cautious about after it was announced but ultimately ended up loving. In many ways, this is like my experience with surround headphones. I’ve always wanted to believe the next great thing was this or that fancy thing I’d read about. Having tried all manner of surround sound headphones, I’ve become both weary and wary of all the hype. But then along came OSSIC and like the Star Trek reboot, it blew through all my expectations and then some.

That’s why I’ve pledged for OSSIC X (did it from my phone straight after the demos!). Given that I’d decided to stop buying new headphones until I finally found something that did proper 3D positional audio, that fact alone should tell you all you need to know.

Visit the Ossic website to learn more about 3D audio and the Ossic X.

Contents

About the author: Run Product Management for Aquis stock exchange. Designed, built and managed several market making, algorithmic and aggregation trading systems for most exchange traded asset classes including Equities, FI, FX and Commods cash and derivatives markets as well as multi-venue FX spot. Massive PC gamer!

Follow Wccftech on Google to get more of our news coverage in your feeds.