Category Archives: HoloLens

Second version of HoloLens HPU – separating mixed reality from the cloud

Some information from Microsoft here about the next generation of HoloLens. I am a great fan of only using the cloud to enhance functionality when there’s no other choice. This is especially relevant to MR devices where internet connectivity might be dodgy at best or entirely non-existent depending on the location. Putting some AI inference capability right on the device means that it can be far more capable in stand-alone mode.

There seems to be the start of a movement to towards putting serious but low power-consuming AI capability in wearable devices. The Movidius VPU is a good example of this kind of technology and probably every CPU manufacturer is on a path to include inference engines in future generations.

While the HoloLens could certainly use updating in many areas (WiFi capability, adding cellular communications, more general purpose processing power, supporting real-time occlusion), adding an inference engine is certainly extremely interesting.

Using the HoloLens to aid back surgery

Fascinating video of a HoloLens being used in a real back surgery – presumably the video was mostly shot using Spectatorview or something similar. I have seen other systems where mocap type technology is used to get more precision in the pose of the HoloLens but this system doesn’t seem to do that. Not that I am a surgeon but I doubt that the HoloLens can replace the usual fluoroscope since that gives real time feedback on the location of things like needles with respect to the body (yes, I have been on the literal sharp end of this!). However, if the spatial stability of the hologram is good enough, I am sure that it greatly helps with visualization.

As one of the many people with dodgy backs, I am always interested in anything that can improve outcomes and minimize risk and side-effects. If the HoloLens can do that – brilliant!

Mixed Reality and the missing fourth dimension

The screen capture above is a scene from a HoloLens via mixed reality capture (MRC) showing four virtual rings with different levels or brightness. The top left is 100% red, the bottom right black and the other two are intermediate levels of brightness.

The photograph above was shot through a HoloLens and is a reasonable representation of what the wearer actually sees. Unsurprisingly, since all see-through MR headsets work by overlaying light on the real scene, the black ring has vanished and the intermediate brightness rings become transparent to some degree based on the relative brightness to the real world scene.

This is a considerable obstacle for inserting realistic virtual objects into the real world – if they are dark, they will be almost transparent. And while indoors it is possible to control ambient lighting, the same is certainly not true outdoors.

What is needed is not just support for RGB but RGBA where A is the fourth dimension of color in this case. The A (alpha) value specifies the required transparency. The Unity app running on the HoloLens does of course understand transparency and can generate the required data but the HoloLens has no way to enforce it. One way to do this would be to supplement the display with an LCD that acts as a controllable matte. The LCD controls the extent to which the real world is visible at each display pixel while the existing display controls the color and intensity of the virtual object. No doubt there are significant challenges to implementation but this may be the only way to make see-through MR headsets work properly outdoors.

Mixed reality: does latency matter and is it immersive anyway?

I had a brief discussion last night about latency and its impact on augmented reality (AR) versus virtual reality (VR). It came up in the context of tethered versus untethered HMDs. An untethered HMD either has to have the entire processing system in the HMD (as in the HoloLens) or else use a wireless connection to a separate processing system. There’s a lot to be said for not putting the entire system in the HMD – weight, heat etc. However, having a separate box and requiring two separate battery systems is annoying but certainly has precedent (iPhone and Apple Watch for example).

The question is whether the extra latency introduced by a wireless connection is noticeable and, if so, is it a problem for AR and MR applications (there’s no argument for VR – latency wants to be as close to zero as possible).

Just for the record, my definition of virtual, augmented and mixed reality is:

  • Virtual reality. HMD based with no sense of the outside world and entire visual field ideally covered by display.
  • Augmented reality. This could be via HMD (e.g. Google Glass) or via a tablet or phone (e.g. Phab 2 Pro). I am going to define AR as the case where virtual objects are overlaid on the real world scene with no or partial spatial locking but no support for occlusion (where a virtual object correctly goes behind a real object in the scene). Field of view is typically small for AR but doesn’t have to be.
  • Mixed reality. HMD based with see-through capability (either optical or camera based) and the ability to accurately spatially lock virtual objects in the real world scene. Field of view ideally as large as possible but doesn’t have to be. Real time occlusion support is highly desirable to maintain the apparent reality of virtual objects.

Back to latency and immersion. VR is the most highly immersive of these three and is extremely sensitive to latency. This is because any time the body’s sensors disagree with what the eyes are seeing (sensory inconsistency) is pretty unpleasant, leading rapidly to motion sickness. Personally I can’t stand using the DK2 for any length of time because there is always something or some mode that causes a sensory inconsistency.

AR is practically insensitive to latency since virtual objects may not be locked at all to the real world. Plus the ability to maintain sight of the real world seems to override any transient problems. It’s also only marginally immersive in any meaningful sense – there very little telepresence effect.

MR is virtually the same as AR when it comes to latency sensitivity and is actually the least immersive of all three modes when done correctly. Immersion implies a person’s sense of presence is transported to somewhere other than the real space. Instead, mixed reality wants to cement the connection to the real space by also locking virtual objects down to it. It’s the opposite of immersion.

Real world experience with the HoloLens tends to support the idea that latency is not a terrible problem for MR. Even when running code in debug mode with lots of messages being printed (which can reduce frame rate to a handful of frames per second) isn’t completely awful. With MR, latency breaks the reality of virtual objects because they may not remain perfectly fixed in place when the user’s head is moving fast. But at least this doesn’t generate motion sickness, or at least not for me.

There is a pretty nasty mode of the HoloLens though. If the spatial sensors get covered up, usually because it is paced on a table with things blocking them, the HoloLens can get very confused and virtual objects display horrendous jittering for a while until it settles down again. That can be extremely disorientating (I have seen holograms rotated through 90 degrees and bouncing rapidly side to side – very unpleasant!).

On balance though, it may be that untethered, light weight HMDs with separate processor boxes will be the most desirable design for MR devices. The ultimate goal is to be able to wear MR devices all day and this may be the only realistic way to reach that goal.

HoloLens Spectator View…without the HoloLens

I’ll explain the photo above in a moment. Microsoft’s Spectator View is a great device but not that practical in the general case. For example, the original requires modifications to the HoloLens itself and a fairly costly camera capable of outputting clean 1080p, 2k or 4k video on an HDMI port. Total cost can be more than $6000 depending on the camera used. My goal is to do much the same thing but without requiring a HoloLens and at a much lower cost – just using a standard camera with fairly simple calibration. Not only that, but I want to stream the mixed reality video across the internet using WebRTC for both conventional and stereo headsets (such as VR headsets).

So, why is there a HoloLens in the photo? This is the calibration setup. The camera that I am using for this Mixed Reality streaming system is a Stereolabs ZED. I have been working with this quite a bit lately and it seems to work extremely well. Notably it can produce a 2K stereo 3D output, a depth map and a 6 DoF pose, all available via a USB 3 interface and a very easy to use SDK.

Unlike Spectator View, the Unity Editor is not used on the desktop. Instead, a standard HoloLens UWP app is run on a Windows 10 desktop, along with a separate capture, compositor and WebRTC streamer program. There is some special code in the HoloLens app that talks to the rest of the streaming system. This can be present in the HoloLens app even when run on the HoloLens without problems (it just remains inactive in this case).

The calibration process determines, amongst other things, the actual field of view of the ZED and its orientation and position in the Unity scene used to create the virtual part of the mixed reality scene. This is essential in order to correctly render the virtual scene in a form that can be composited with the video coming from the ZED. This is why the HoloLens is placed in this prototype rig in the photo. It puts the HoloLens camera roughly in the same vertical plane as the ZED camera with a small (known) vertical offset. It’s not critical to get the orientation exactly right when fitting the HoloLens to the rig – this can be calibrated out very easily. The important thing is that the cameras see roughly the same field. That’s the because the next step matches features in each view and, from the positions of the matches, can derive the field of view of the ZED and its pose offset from the HoloLens. This then makes it possible to set the Unity camera in the desktop in exactly the right position and orientation so that the scene it streams is correctly composed.

Once the calibration step has completed, the HoloLens can be removed and used as required. The prototype version looks very ungainly like this! The real version will have a nice 3D printed bracket system that will also have the advantage of reducing the vertical separation and limit the possible offsets.

In operation, it is required that the HoloLens apps running on both the HoloLens(es) and the desktop are sharing data about the Unity scene that allows each device to compute exactly the same scene. In this way, everyone sees the same thing. I am actually using Arvizio‘s own sharing system but any sharing system could be used. The Unity scene generated on the desktop is then composited with the ZED camera’s video feed and streamed over WebRTC. The nice thing about using WebRTC is that almost anyone with a Chrome or Firefox browser can display the mixed reality stream without having to install any plugins or extensions. It is also worth mentioning that the ZED does not have to remain fixed in place after calibration. Because it is able to measure its pose with respect to its surroundings, the ZED could potentially pan, tilt and dolly if that is required.

Using a webcam with a HoloLens SpectatorView rig

Following on from my previous post regarding HoloLens SpectatorView, I had been wondering if it was possible to use a webcam instead of a DSLR. It changes the mounting concepts but, just for testing, it wasn’t hard to place a Logitech C920 webcam on top of the HoloLens and get it aligned enough physically so that the calibration data numbers looked reasonable.

An immediate problem was that the code was not setting the webcam’s frame size. A quick look at OpenCVFrameProvider.cpp showed the problem. The code was trying to set the frame width and height before opening the capture object which doesn’t work. This is the original:

The fix is put line 44 before line 41. Then it works fine. The preview window in the calibration code has red and blue swapped but the processed images are correct. Once it was calibrated, I could go on and run the Unity app and look at the composite output – now pretty decent 1080p video.

Using a webcam like the C920 is far from perfect however. The field of view was measured at 75 degrees by the calibration software which really isn’t enough to be useful. Another problem is the autofocus which causes frequent focus breathing. And then there’s the challenge of proper mounting but at least the C920 does have a 1/4 inch thread so there are possibilities.

A decent DSLR (this would be my choice as it can output clean 4k 4:2:2 video over HDMI apparently at a decent price and I have all the lenses ­čÖé ) is going to give better results for sure. On the other hand, there may be many applications where a webcam is just fine and you can’t argue with the price.

Putting together a HoloLens SpectatorView rig

Having sorted out a way of mounting the HoloLens on a camera using the alternate rig described here, it was then time to put the rest of the system together. First thing was the Blackmagic Intensity Pro 4K capture card. Hardware and software installation was very straightforward and nicely captured video from the camera’s HDMI port. Next up was the SpectatorView software itself.

The first step is to get the calibration software working – instructions are here. The OpenCV link doesn’t work – use this instead. I am actually using VS2017 but had no problems apart from being asked if it was ok to upgrade things.

To calibrate the rig, a pattern is needed. This is my attempt:

Seemed to work ok. The next thing is to build the Compositor. It needs to be built for x86 and x64 in Release mode (x86 for the HoloLens app, x64 for the Unity Editor app). I think I had to force it to build SpatialPerceptionHelper in x86 mode. Anyway, once all that’s done, the DLLs need to be copied into the sample app (I was using the sample which is the Shared Holograms tutorial code).

It took me a while to realize that the CopyDLL.cmd needs parameters for it to work with the sample app. Comments in the code tell you what to do but it is basically this:

CopyDLL "%~dp0\Samples\SharedHolograms\Assets\"

Time to fire up Unity using the sample app.┬áDouble click on the Sharing scene to kick it in. Then, click on the SpectatorViewManager object and look at the inspector. The Spectator View IP address needs to be set to the IP address of the HoloLens in the SpectatorView rig. Took me a while to work that out :-(. The Sharing Service IP field needs to be set to the address of the machine running the sharing server. The sharing server can be kicked off from Unity using the menu bar with┬áHoloToolkit->SharingServer->Launch Sharing Service. The Sharing prefab also needs to be configured with the address of the sharing server. Once that’s done, it’s pretty much ready to deploy to the SpectatorView HoloLens and any others in the system.

The app needs to be run in the Unity Editor and then, using the menu bar again, kick off Spectator View->Compositor. This will shows a window with the combined live video from the camera and the virtual objects mixed in. This window also provides buttons to save video and snapshots.

Unfortunately, I only have one HoloLens to hand so I couldn’t really test the system. I did build a little test app that seemed to work ok as well as far as I could test it.

The biggest issue was my inadequate camera. I was hoping to find a way to use my Canon 6D for this, even though it does not fill the 1920 x 1080 output frame via its live HDMI port. I figured an OpenCV hack could deal with that. The bigger problem is that output is interlaced and causes horrible horizontal tearing in the composed video if anything in the scene is moving. I think it’s the end of the line for the 6D and SpectatorView.

Time for a proper 1080p/4K camera.