Th screen capture above and video below are from a walk-through of a procedurally generated sentient space model with video and IoT data displays (derived from ZeroSensor data, rt-ai Edge and Manifold). This was made using rt3DView and the actual Unity video recording made with the aid of this very nice Unity store asset.
The idea of this model is that it reflects the major features of the real sentient space so that users of VR and AR can interact correctly. For example, an AR headset wearer in one of the rooms would also see the displays on the equivalent physical wall. This model is pretty basic but obviously a lot more bling could be added to get further along the road to realism. Plus I made no attempt to sort out the exterior for this test.
Now that the basics are working and the XR world is fully coupled to the rt-ai Edge design that is the real world element of the sentient space, the focus will move to more interaction. Instantiating new objects, positioning objects, real-time sharing of camera poses leading to avatars… The list is endless.
The Windows Mixed Reality version of 3DView is now working nicely. Had a few problems with my Windows development PC which is a few years old and didn’t have adequate USB ports. In the end this PCI-e USB 3.1 card solved that problem otherwise a complete upgrade might have been required. A different USB 3.0 card did not work however.
Hopefully this is the last time that I see the displays all lined up like that. The space modeling software is coming along and soon it will be possible to model a space with a (relatively) simple procedural definition file. Potentially this could be texture mapped from a 3D scan of rooms but the simplified models generated procedurally with simple textures might well be good enough. Then it will be possible to position versions of these displays (and lots of other things) in the correct rooms.
XRView is intended to be runnable both on Windows MR headsets (I am using the Samsung Odyssey as it has a good display and built-in audio) and HoloLens. Now clearly VR modes and AR modes have to be completely different. In VR, you navigate and interact with the motion controllers and see the modeled space whereas in AR you navigate by walking around, interact using the clicker and don’t see the modeled space directly. However, the modeled space will still be there and will be used instead of the spatially mapped surfaces that the HoloLens might normally use. This means that objects placed in the model by a VR user will appear to AR users correctly positioned and vice versa. One key advantage of using the modeled space rather than the dynamically mapped space generated by the HoloLens itself is that it is easy to add context to the surfaces using the procedural model language. Another is the ability to interwork with non-HoloLens AR headsets that can share the HoloLens spatial map data. The procedural model becomes a platform-independent spatial mapping that “just” leaves the problem of spatial synchronization to the individual headsets.
I am sure that there will be some fun challenges in getting spatial synchronization but that’s something for later.
This may not look impressive to you (or my wife as it turns out) but it has a lot of promise for the future. Following on from 3DView, there’s now an Android version called (shockingly) AndroidView that is essentially the same thing running on an Android phone in this case. The screen capture above shows the current basic setup displaying sensor data. Since Unity is basically portable across platforms, the main challenge was integrating with Manifold to get the sensor data being generated by ZeroSensors in an rt-aiEdge stream processing network.
I did actually have a Java implementation of a Manifold client from previous work – the challenge was integrating with Unity. This meant building the client into an aar file and then using Unity’s AndroidJavaObjectto pass data across the interface. Now I understand how that works, it really is quite powerful and I was able to do everything needed for this application.
There are going to be more versions of the viewer. For example, in the works is rtXRView which is designed to run on Windows MR headsets. The way I like to structure this is to have separate Unity projects for each target and then move common stuff via Unity’s package system. With a bit of discipline, this works quite well. The individual projects can then have any special libraries (such as MixedRealityToolkit), special cameras, input processing etc without getting too cute.
Once the basic platform work is done, it’s back to sorting out modeling of the sentient space and positioning of virtual objects within that space. Multi-user collaboration and persistent sentient space configuration is going to require a new Manifold app to be called SpaceServer. Manifold is ideal for coordinating real-time changes using its natural multicast capability. For Unity reasons, I may integrate a webserver into SpaceServer so that assets can be dynamically loaded using standard Unity functions. This supports the idea that a new user walking into a sentient space is able to download all necessary assets and configurations using a standard app. Still, that’s all a bit in the future.
It’s only a step on the road to the ultimate goal of AR headset support for sentient spaces but it is a start at least. As mentioned in an earlier post, passing data from rt-ai Edge into Manifold allows any number of ad-hoc uses of real time and historic data. One of the intended uses is to support a number of AR headset-wearing occupants in a sentient space – the rt-ai Edge to Manifold connection makes this relatively straightforward. Almost every AR headset supports Unity so it seemed like a natural step to develop a Manifold connection for Unity apps. The result, an app called 3DView, is shown in the screen capture above. The simple scene consists of a couple of video walls displaying MJPEG video feeds captured from the rt-ai Edge network.
The test design (shown above) to generate the data is trivial but demonstrates that any rt-ai Edge stream can be piped out into the Manifold allowing access by appropriate Manifold apps. Although not yet fully implemented, Manifold apps will be able to feed data back into the rt-ai Edge design via a new SPE to be called GetManifold.
Next step for the 3DView Unity app is to provide visualization for all ZeroSensor streams correctly physically located within a 3D model of the sentient space. Right now I am using a SpaceMouse to navigate within the space but ultimately this should work with any VR headset with appropriate controller. AR headsets will use their spatial mapping capability to overlay visualizations on the real space so they won’t need a separate controller for navigation.
One application for rt-ai Edge is ubiquitous sensing leading to sentient spaces – spaces that can interact with people moving through and provide useful functionality, whether learned or programmed. A step on the road to that is the ZeroSensor, four prototypes of which are shown in the photo. Each ZeroSensor consists of a Raspberry Pi Zero W, a Pi camera module v2, an Adafruit BME 680 breakout and an Adafruit TSL2561 breakout. The combination gives a video stream and a sensor stream with light, temperature, pressure, humidity and air quality values. The video stream can be used to derive motion sensing and identification while the other sensors provide a general idea of conditions in the space. Notably missing is audio. Microphone support would be useful for general sensing and I might add that in real devices. A 3D printable case design is underway in order to allow wide-scale deployment.
Voice-based interaction is a powerful way for users to interact with sentient spaces. However, it is assumed that people who want to interact are using an AR headset of some sort which itself provides the audio I/O capabilities. Gesture input would be possible via the ZeroSensor’s camera. For privacy reasons video would not be viewed directly or stored but just used as a source of activity data and interaction.
This is the simple rt-ai design used to test the ZeroSensors. The ZeroSynth modules are rt-ai Edge synth modules that contain SPEs that interface with the ZeroSensor’s hardware and generate a video stream and a sensor data stream. An instance of a video viewer and sensor viewer are connected to each ZeroSynth module.
This is the result of running the ZeroSensor test design, showing a video and sensor window for each ZeroSensor. The cameras are staring at the ceiling because the four sensors were on a table. When the correct case is available, they will be deployed in the corners of rooms in the space.
I came across OpenALPR a little while ago when thinking about the general problem of enhancing the value of video feeds. It has an easy to use Python binding so it didn’t take very long to create an rt-ai Edge stream processing element (SPE). Actually, the OpenALPR part of it is one line of code – it takes a jpeg from the video stream and adds any recognized plate info as metadata to the output message. The trivial stream processing network in the screen capture above shows its operation as an inline semantic enhancer of a video stream. The OpenALPR SPE only outputs a video frame it if either already contains metadata or else the OpenALPR SPE has added metadata. In this way, multiple recognizers can be applied to the same frame using a pipeline of SPEs.
While I can now see a few private houses starting to sprout specialized license plate reading cameras (which are optimized for this purpose, especially for night operation), I don’t have anything set up as yet so I had to make do with printing car images and waving them in front of a webcam. Seemed to work fine but it would be nice to have a proper setup.
Recognized license plate metadata then becomes another feature that can be used for machine learning and inference within the edge environment – another step on the path to sentient spaces perhaps.
rt-ai Edge is progressing nicely and now supports multi-node operation (i.e. multiple networked servers participating in a processing network) along with real-time monitoring. The screen capture shows a simple processing network where the video feed from a camera is passed through a DeepLab-v3+ stream processing element (SPE) and then on to two separate media viewers. At the top of each SPE block in the Designer window is some text like Cam(Default). Here, Cam is the name given to the SPE while Default is the name of the node (server) on which the SPE is running. In this design there are two nodes, Default and rtai0.
The code underlying the common SPE API communicates with the Designer window and supplies the stats about bytes and messages in and out. Soon, this path will also allow SPE-specific real-time parameter tweaking from the Designer window.
To add a node to the system, it just needs to have all of the prerequisites installed and run a special NodeManager SPE. This also communicates with the Designer and supports SPE deployment and runtime control, activated when the user presses the Deploy design button. Moving an SPE between nodes is just a case of reassigning it, generating the design and then deploying the design again.
The green outlines around each SPE indicate the state of the SPE and the node on which it is running. When it is all green, as in the first screen capture, this indicates that both SPE and node are running. For the second screen capture, I manually terminated the View2 SPE on rtai0. The inner part of the outline has now gone red. This indicates that the node is up but the SPE is down. If the outline is all red, it means that the node is down and not communicating with the Designer.
It’s interesting to note that DeepLab-v3+ is processing around 5 frames per second using a GTX-1080 GPU. The input rate from the camera is 30 frames per second. The processor drops frames while it is still processing an earlier frame, ensuring that queues do not build up and latency is kept to a minimum.