Category Archives: Artificial intelligence

DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"

        "Subs" : {
            "VideoIn" : "VideoMJPEG"

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates and scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.


How rt-ai Edge will enable Sentient Spaces

The idea of creating spaces that understand the needs of the people moving within them – Sentient Spaces – has been a long term personal goal. Our ability today to create sensor data (video, audio, environmental etc) is incredible. Our ability to make practical use of this enormous body of data is minimal. The question is: how can ubiquitous sensing in a space be harnessed to make the space more functional for people within it?

rt-ai Edge could be the basis of an answer to this question. It is designed to receive large volumes of multi-sensor data, extract meaningful information and then take control actions as necessary. This closes the local loop without requiring external cloud server interaction. This is important because creating a space with ubiquitous sensing raises all kinds of privacy issues. Because rt-ai Edge keeps all raw data (such as video and audio) within the space, privacy is much less of a concern.

I believe that a key to making a space sentient is to harness artificial intelligence concepts such as online learning of event sequences and anomaly detection. It is not practical for anyone to sit down and program a system to correctly recognize normal behavior in a space and what actions might be helpful as a result. Instead, the system needs to learn what is normal and develop strategies that might be helpful. Reinforcement via user feedback can be used to refine responses.

A trivial example would be someone moving through a dark space at night. It might be helpful to provide light at a suitable intensity to safely help a person navigate the space. The system could deduce this by having experienced other people moving though the space, turning on and off lights as they go. Meanwhile, face recognition could be employed to see if the person is known to the space and, if not, an assessment could be made if an alert needs to be generated. Finally, a video record could be put together of the person moving through the space, using assembled clips from all relevant cameras, and stored (on-site) for a time in case it is useful.

Well that’s a trivial example to describe but not at all trivial to implement. However, my goal is to see if AI techniques can be used to approach this level of functionality. In practical terms, this means developing a series of rt-ai modules using TensorFlow to perform feature extraction, anomaly detection and sequence prediction that are then glued together with sensor and control modules to perform a complete system requiring minimal supervised training to perform useful functions.

rt-ai Edge

rt-ai Edge is a new concept in edge processing that makes it easy for anyone to build AI and ML enhanced stream processing pipelines in order to close the local loop and offload communications networks and the cloud. Semantic extraction of meaningful data from raw data feeds at the edge ensures that the core only has to deal with actionable information, not noise. rt-ai Edge leverages hardware acceleration within embedded devices to filter raw data into highly salient messages for higher level processing.

rt-ai Edge is in active development right now.

DroNet – flying a drone using data from cars and bikes

Fascinating video about a system that teaches a drone to fly around urban environments using data from cars and bikes as training data. There’s a paper here and code here. It’s a great example of leveraging CNNs in embedded environments. I believe that moving AI and ML to the edge and ultimately into devices such as IoT sensors is going to be very important. Having dumb sensor networks and edge devices just means that an enormous amount of worthless data has to be transferred into the cloud for processing. Instead, if the edge devices can perform extensive semantic mining of the raw data, only highly salient information needs to be communicated back to the core, massively reducing bandwidth requirements and also allowing low latency decision making at the edge.

Take as a trivial example a system of cameras that read vehicle license plates. One solution would be to send the raw video back to somewhere for license number extraction. Alternately, if the cameras themselves could extract the data, then only the recognized numbers and letters need to be transferred, along with possibly an image of the plate. That’s a massive bandwidth saving over sending constant compressed video. Even more interesting would be edge systems that can perform unsupervised learning to optimize performance, all moving towards eliminating noise and recognizing what’s important without extensive human oversight.

Latest fun thing in the office: a Movidius Neural Compute Stick

A Movidius Neural Compute Stick just turned up in a delightfully retro style box. Won’t have time to do anything with it until the weekend unfortunately but very interested to see what it can do. It’s another enabler of the movement to add inference to low power mobile devices without relying on a cloud server.