Category Archives: Virtual reality

Google’s WorldSense

Lenovo just announced the Mirage Solo VR headset with Google’s WorldSense inside-out tracking capability. The result is an untethered VR headset which presumably has spatial mapping capabilities, allowing spatial maps to be saved and shared. If so, this would be a massive advance over ARKit and ARCore based AR which makes persistence and collaboration all but impossible (the post here goes into a lot of detail about the various issues related to persistence and collaboration with current technology). The lack of a tether also gives it an edge over Microsoft’s (so-called) Mixed Reality headsets.

Google’s previous Tango system (that’s a Lenovo Phab 2 Pro running it above) did have much more interesting capabilities than ARCore but has fallen by the wayside. In particular, Tango had an area learning capability that is missing from ARCore. I am very much hoping that something like this will exist in WorldSense so that virtual objects can be placed persistently in spaces and that spatial maps can be shared so that multiple headsets see exactly the same virtual objects in exactly the same place in the real space. Of course this isn’t all that helpful when used with a VR headset – but maybe someone will manage a pass-through or see-through mixed reality headset using WorldSense that will enable persistent spatial augmentation using a headset with hopefully reasonable cost for ubiquitous use. If it was also able to perform real time occlusion (where virtual objects can get occluded by real objects), that would be even better!

An interesting complement to this is the Lenovo Mirage stereo camera. This is capable of taking 180 degree videos and stills suitable for use with stereoscopic 3D displays, such as the Mirage headset. Suddenly occurred to me that this might be a way of hacking a  pass-through AR capability for Mirage before someone does it for real :-). This is kind of what Stereolabs are doing for existing VR headsets with their ZED mini except that this is a tethered solution. The nice thing would be to do this in an untethered way.

Advertisements

Samsung 360 Round – 360 degree 3D live streaming

The problem with the traditional technique of texturing the inside of a sphere with a 360 equirectangular image or video to form a background in a virtual scene is that it looks like a painting of reality when viewed in a VR headset. What’s missing is a sense of depth. The Samsung 360 Round looks like it could solve that problem by live streaming a 3D 360 feed with 4096 x 2048 resolution per eye. Hopefully this would mean that the background would merge better with virtual objects in the foreground.

The downside is the cost ($10,500!) and the PC requirements for live streaming – a couple of GTX 1080 Tis (total about $1500) and an i7-6950X CPU (around $1500) makes for a pretty expensive setup. And I am not sure if I understand this correctly but you might need two PCs with that setup. Yikes.

ZenFone AR – Tango and Daydream together

The ZenFone AR is a potentially very interesting device, combining both Tango for spatial mapping and Daydream capability for VR headset use all in one package. This is a step up from the older Phab 2 Pro Tango phone in that it can also be used with Daydream (and looks like a neater package). Adding Tango to Daydream means that it is possible to do inside-out spatial tracking in a completely untethered VR device. It should be a step up from ARKit in its current form which relies on just inertial and VSLAM tracking from what I understand. Still, the ability for ARKit to be used with existing devices is a massive advantage

Maybe in the end the XR market will divide up into those applications that don’t need tight spatial locking (where standard devices can be used) and those that do require tight spatial locking (demanding some form of inside-out tracking).

Latest fun thing in the office: a Garmin VIRB 360 camera

360 degree video is all the rage right now so I cannot be left behind! One of the things I like about the Garmin VIRB 360 is the in-camera stitching and very high resolution. It is also incredibly small. Judging by my photo though keeping the dust off will be a challenge :-).

Typically I forgot to order a micro-HDMI cable so I can’t test live capture to a PC but I can create videos on the SD card. Great fun!

Cable will turn up tomorrow with any luck. I am eager to see how usable the HDMI is for live 360 videos.

Mixed reality: does latency matter and is it immersive anyway?

I had a brief discussion last night about latency and its impact on augmented reality (AR) versus virtual reality (VR). It came up in the context of tethered versus untethered HMDs. An untethered HMD either has to have the entire processing system in the HMD (as in the HoloLens) or else use a wireless connection to a separate processing system. There’s a lot to be said for not putting the entire system in the HMD – weight, heat etc. However, having a separate box and requiring two separate battery systems is annoying but certainly has precedent (iPhone and Apple Watch for example).

The question is whether the extra latency introduced by a wireless connection is noticeable and, if so, is it a problem for AR and MR applications (there’s no argument for VR – latency wants to be as close to zero as possible).

Just for the record, my definition of virtual, augmented and mixed reality is:

  • Virtual reality. HMD based with no sense of the outside world and entire visual field ideally covered by display.
  • Augmented reality. This could be via HMD (e.g. Google Glass) or via a tablet or phone (e.g. Phab 2 Pro). I am going to define AR as the case where virtual objects are overlaid on the real world scene with no or partial spatial locking but no support for occlusion (where a virtual object correctly goes behind a real object in the scene). Field of view is typically small for AR but doesn’t have to be.
  • Mixed reality. HMD based with see-through capability (either optical or camera based) and the ability to accurately spatially lock virtual objects in the real world scene. Field of view ideally as large as possible but doesn’t have to be. Real time occlusion support is highly desirable to maintain the apparent reality of virtual objects.

Back to latency and immersion. VR is the most highly immersive of these three and is extremely sensitive to latency. This is because any time the body’s sensors disagree with what the eyes are seeing (sensory inconsistency) is pretty unpleasant, leading rapidly to motion sickness. Personally I can’t stand using the DK2 for any length of time because there is always something or some mode that causes a sensory inconsistency.

AR is practically insensitive to latency since virtual objects may not be locked at all to the real world. Plus the ability to maintain sight of the real world seems to override any transient problems. It’s also only marginally immersive in any meaningful sense – there very little telepresence effect.

MR is virtually the same as AR when it comes to latency sensitivity and is actually the least immersive of all three modes when done correctly. Immersion implies a person’s sense of presence is transported to somewhere other than the real space. Instead, mixed reality wants to cement the connection to the real space by also locking virtual objects down to it. It’s the opposite of immersion.

Real world experience with the HoloLens tends to support the idea that latency is not a terrible problem for MR. Even when running code in debug mode with lots of messages being printed (which can reduce frame rate to a handful of frames per second) isn’t completely awful. With MR, latency breaks the reality of virtual objects because they may not remain perfectly fixed in place when the user’s head is moving fast. But at least this doesn’t generate motion sickness, or at least not for me.

There is a pretty nasty mode of the HoloLens though. If the spatial sensors get covered up, usually because it is paced on a table with things blocking them, the HoloLens can get very confused and virtual objects display horrendous jittering for a while until it settles down again. That can be extremely disorientating (I have seen holograms rotated through 90 degrees and bouncing rapidly side to side – very unpleasant!).

On balance though, it may be that untethered, light weight HMDs with separate processor boxes will be the most desirable design for MR devices. The ultimate goal is to be able to wear MR devices all day and this may be the only realistic way to reach that goal.

HoloLens Spectator View…without the HoloLens


I’ll explain the photo above in a moment. Microsoft’s Spectator View is a great device but not that practical in the general case. For example, the original requires modifications to the HoloLens itself and a fairly costly camera capable of outputting clean 1080p, 2k or 4k video on an HDMI port. Total cost can be more than $6000 depending on the camera used. My goal is to do much the same thing but without requiring a HoloLens and at a much lower cost – just using a standard camera with fairly simple calibration. Not only that, but I want to stream the mixed reality video across the internet using WebRTC for both conventional and stereo headsets (such as VR headsets).

So, why is there a HoloLens in the photo? This is the calibration setup. The camera that I am using for this Mixed Reality streaming system is a Stereolabs ZED. I have been working with this quite a bit lately and it seems to work extremely well. Notably it can produce a 2K stereo 3D output, a depth map and a 6 DoF pose, all available via a USB 3 interface and a very easy to use SDK.

Unlike Spectator View, the Unity Editor is not used on the desktop. Instead, a standard HoloLens UWP app is run on a Windows 10 desktop, along with a separate capture, compositor and WebRTC streamer program. There is some special code in the HoloLens app that talks to the rest of the streaming system. This can be present in the HoloLens app even when run on the HoloLens without problems (it just remains inactive in this case).

The calibration process determines, amongst other things, the actual field of view of the ZED and its orientation and position in the Unity scene used to create the virtual part of the mixed reality scene. This is essential in order to correctly render the virtual scene in a form that can be composited with the video coming from the ZED. This is why the HoloLens is placed in this prototype rig in the photo. It puts the HoloLens camera roughly in the same vertical plane as the ZED camera with a small (known) vertical offset. It’s not critical to get the orientation exactly right when fitting the HoloLens to the rig – this can be calibrated out very easily. The important thing is that the cameras see roughly the same field. That’s the because the next step matches features in each view and, from the positions of the matches, can derive the field of view of the ZED and its pose offset from the HoloLens. This then makes it possible to set the Unity camera in the desktop in exactly the right position and orientation so that the scene it streams is correctly composed.

Once the calibration step has completed, the HoloLens can be removed and used as required. The prototype version looks very ungainly like this! The real version will have a nice 3D printed bracket system that will also have the advantage of reducing the vertical separation and limit the possible offsets.

In operation, it is required that the HoloLens apps running on both the HoloLens(es) and the desktop are sharing data about the Unity scene that allows each device to compute exactly the same scene. In this way, everyone sees the same thing. I am actually using Arvizio‘s own sharing system but any sharing system could be used. The Unity scene generated on the desktop is then composited with the ZED camera’s video feed and streamed over WebRTC. The nice thing about using WebRTC is that almost anyone with a Chrome or Firefox browser can display the mixed reality stream without having to install any plugins or extensions. It is also worth mentioning that the ZED does not have to remain fixed in place after calibration. Because it is able to measure its pose with respect to its surroundings, the ZED could potentially pan, tilt and dolly if that is required.