Moving into the big (data) league with rt-ai Edge and Apache NiFi

The main reason for rt-ai Edge‘s existence is to reduce large volumes of raw data into much smaller amounts of data with high semantic content. Sometimes this can be acted upon in the local loop (i.e. within the edge space) when that makes sense or low latency is critical. Even if it is, it may still be useful to store the information extracted from the raw streams for later offline processing such as machine learning. Since Apache NiFi has all the required interfaces, it makes sense that rt-ai Edge can pass data into Apache NiFi, using it as a gateway to big data type applications.

For this simple example, I am storing recovered license plate data in Elasticsearch. The screen capture above shows the rt-ai Edge stream processing network (SPN) with the new PutNiFi stream processing element (SPE). PutNiFi transfers any rt-ai message desired into an Apache NiFi instance using MQTT for transport.

This screen capture shows the very simple Apache NiFi design. The ConsumeMQTT processor is used to collect messages from the PutNiFi SPE and then passes these to Elasticsearch for storage. Obviously a lot more could be going on here if required.


Automatic license plate recognition with OpenALPR and rt-ai Edge

I came across OpenALPR a little while ago when thinking about the general problem of enhancing the value of video feeds. It has an easy to use Python binding so it didn’t take very long to create an rt-ai Edge stream processing element (SPE). Actually, the OpenALPR part of it is one line of code – it takes a jpeg from the video stream and adds any recognized plate info as metadata to the output message. The trivial stream processing network in the screen capture above shows its operation as an inline semantic enhancer of a video stream. The OpenALPR SPE only outputs a video frame it if either already contains metadata or else the OpenALPR SPE has added metadata. In this way, multiple recognizers can be applied to the same frame using a pipeline of SPEs.

While I can now see a few private houses starting to sprout specialized license plate reading cameras (which are optimized for this purpose, especially for night operation), I don’t have anything set up as yet so I had to make do with printing car images and waving them in front of a webcam. Seemed to work fine but it would be nice to have a proper setup.

Recognized license plate metadata then becomes another feature that can be used for machine learning and inference within the edge environment – another step on the path to sentient spaces perhaps.

Real time edge inference monitoring with rt-ai Edge

rt-ai Edge is progressing nicely and now supports multi-node operation (i.e. multiple networked servers participating in a processing network) along with real-time monitoring. The screen capture shows a simple processing network where the video feed from a camera is passed through a DeepLab-v3+ stream processing element (SPE) and then on to two separate media viewers. At the top of each SPE block in the Designer window is some text like Cam(Default). Here, Cam is the name given to the SPE while Default is the name of the node (server) on which the SPE is running. In this design there are two nodes, Default and rtai0.

The code underlying the common SPE API communicates with the Designer window and supplies the stats about bytes and messages in and out. Soon, this path will also allow SPE-specific real-time parameter tweaking from the Designer window.

To add a node to the system, it just needs to have all of the prerequisites installed and run a special NodeManager SPE. This also communicates with the Designer and supports SPE deployment and runtime control, activated when the user presses the Deploy design button. Moving an SPE between nodes is just a case of reassigning it, generating the design and then deploying the design again.

The green outlines around each SPE indicate the state of the SPE and the node on which it is running. When it is all green, as in the first screen capture, this indicates that both SPE and node are running. For the second screen capture, I manually terminated the View2 SPE on rtai0. The inner part of the outline has now gone red. This indicates that the node is up but the SPE is down. If the outline is all red, it means that the node is down and not communicating with the Designer.

It’s interesting to note that DeepLab-v3+ is processing around 5 frames per second using a GTX-1080 GPU. The input rate from the camera is 30 frames per second. The processor drops frames while it is still processing an earlier frame, ensuring that queues do not build up and latency is kept to a minimum.

Sending and receiving binary data using JSON encoding, Python and MQTT

I really like using JSON encoding as a way of transferring messages between processes as it is machine and language independent. Plus, it is very well suited to stream processing networks (such as rt-ai Edge) as arbitrary fields can be added to existing JSON messages and passed along. Contrast this with compiled IDLs which typically have no flexibility whatsoever.

One problem though is that binary data cannot be included in JSON messages directly. Typically base64 encoding is used to convert binary data into text. However, this is inefficient, especially in a stream processing network where base64 decoding and encoding might have to be done several times.

There are a variety of modifications to JSON around but it is very simple to just add binary data on to the end of a JSON message to form a complete message that can be transferred via MQTT for example.

In Python, an MQTT message can be published like this:

    import json
    import struct
    def publish(topic, jsonData, binData = None):
        jsonDump = json.dumps(jsonData)
        jsonString = struct.pack('>I', len(jsonDump)) + jsonDump + binData
        MQTTClient.publish(topic, jsonString)

Here, jsonData contains the normal JSON message text, binData contains the binary data to be sent along with it. To receive the message, use something like this:

    import json
    import struct
    def onMessage(client, userdata, message):
        jsonLength = struct.unpack('>I', message.payload[0:4])[0]
        jsonData = json.loads(message.payload[4:4+jsonLength])
        binData = message.payload[4+jsonLength:]

DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"

        "Subs" : {
            "VideoIn" : "VideoMJPEG"

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates and scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.

Semantic image segmentation with TensorFlow using DeepLab

I have been trying out a TensorFlow application called DeepLab that uses deep convolutional neural nets (DCNNs) along with some other techniques to segment images into meaningful objects and than label what they are. Using a script included in the DeepLab GitHub repo, the Pascal VOC 2012 dataset is used to train and evaluate the model. One of the results is shown above. It has managed to extract some pretty ugly furniture from a noisy background quite nicely. Here are couple more examples:

The software has done a nice job of extracting the foreground objects in another very noisy scene.

The person in the background is picked up pretty nicely here – I didn’t even notice the person at first.

Incidentally, to get the to work on Ubuntu 16.04 I had to change the call to to use bash instead of sh otherwise it generated an error. Also, I needed to install cuDNN 7.0.4 for Cuda 9.0 rather than cuDNN 7.1.1 in order to get the Jupyter notebook example operating.

What I would like to do now is to create an rt-ai Edge Stream Processing Element (SPE) based on this code to act as a preprocessor stage in order to isolate and identify salient objects in a video stream in real time. One of my interests is understanding behaviors from video and this could be a valuable component in that pipeline by allowing later stages to focus on what’s important in each frame.

How rt-ai Edge will enable Sentient Spaces

The idea of creating spaces that understand the needs of the people moving within them – Sentient Spaces – has been a long term personal goal. Our ability today to create sensor data (video, audio, environmental etc) is incredible. Our ability to make practical use of this enormous body of data is minimal. The question is: how can ubiquitous sensing in a space be harnessed to make the space more functional for people within it?

rt-ai Edge could be the basis of an answer to this question. It is designed to receive large volumes of multi-sensor data, extract meaningful information and then take control actions as necessary. This closes the local loop without requiring external cloud server interaction. This is important because creating a space with ubiquitous sensing raises all kinds of privacy issues. Because rt-ai Edge keeps all raw data (such as video and audio) within the space, privacy is much less of a concern.

I believe that a key to making a space sentient is to harness artificial intelligence concepts such as online learning of event sequences and anomaly detection. It is not practical for anyone to sit down and program a system to correctly recognize normal behavior in a space and what actions might be helpful as a result. Instead, the system needs to learn what is normal and develop strategies that might be helpful. Reinforcement via user feedback can be used to refine responses.

A trivial example would be someone moving through a dark space at night. It might be helpful to provide light at a suitable intensity to safely help a person navigate the space. The system could deduce this by having experienced other people moving though the space, turning on and off lights as they go. Meanwhile, face recognition could be employed to see if the person is known to the space and, if not, an assessment could be made if an alert needs to be generated. Finally, a video record could be put together of the person moving through the space, using assembled clips from all relevant cameras, and stored (on-site) for a time in case it is useful.

Well that’s a trivial example to describe but not at all trivial to implement. However, my goal is to see if AI techniques can be used to approach this level of functionality. In practical terms, this means developing a series of rt-ai modules using TensorFlow to perform feature extraction, anomaly detection and sequence prediction that are then glued together with sensor and control modules to perform a complete system requiring minimal supervised training to perform useful functions.