There’s a new TensorFlow model for image captioning available here. It combines a deep convolutional neural network (Inception-v3) with an LSTM-based decoder network. LSTM is cropping up just about everywhere now…
I’ve certainly been learning a fair bit about Docker lately. Didn’t realize that it is reasonably easy to containerize GUI nodes as well as console mode nodes so now rtnDocker contains scripts to build and run almost every rtndf and Manifold node. There are only a few that haven’t been successfully moved yet. imuview, which is an OpenGL node to view data from IMUs, doesn’t work for some reason. The audio capture node (audio) and the audio part of avview (the video and audio viewer node) also don’t work as there’s something wrong with mapping the audio devices. It’s still possibly to run these outside of a container so it isn’t the end of the world but it is definitely a TODO.
Settings files for relevant containerized nodes are persisted at the same locations as the un-containerized versions making it very easy to switch between the two.
rtnDocker has an all script that builds all of the containers locally. These include:
- manifoldcore. This is the base Manifold core built on Ubuntu 16.04.
- manifoldcoretf. This uses the TensorFlow container as the base instead of raw Ubuntu.
- manifoldcoretfgpu. This uses the TensorFlow GPU-enabled container as the base.
- manifoldnexus. This is the core node that constructs the Manifold.
- manifoldmanager. A management tool for Manifold nodes.
- rtndfcore. The core rtn data flow container built on manifoldcore.
- rtndfcoretf. The core rtn data flow container built on manifoldcoretf.
- rtndfcoretfgpu. The core rtn data flow container built on manifoldcoretfgpu.
- rtndfcoretfcv2. The core rtn data flow container built on rtndfcoretf and adding OpenCV V3.0.0.
- rtndfcoretfgpucv2. The core rtn data flow container built on rtndfcoretfgpu and adding OpenCV V3.0.0.
The last two are good bases to use for anything combining machine learning and image processing in an rtn data flow PPE. The OpenCV build instructions were based on the very helpful example here. For example, the recognize PPE node, an encapsulation of Inception-v3, is based on rtndfcoretfgpucv2. The easiest way to build these is to use the scripts in the rtnDocker repo.
In order to use the dockerized version of TensorFlow with GPU support, it is necessary to install nvidia-docker. To do this, the Ubuntu docker.io seemingly has to be uninstalled and the procedure here followed to install the required docker-engine. Once that is done, the instructions here can be followed to complete the installation of nvidia-docker.
Yes, that is me waving my Taylor (made in San Diego 🙂 ) guitar around in a very careless manner. It’s all in a good cause though. Turns out that Inception-v3 is very good at recognizing acoustic and electric guitars. I put together a new rtndf PPE called recognize based on the code here from the TensorFlow repo.
In its simplest mode, the recognize PPE takes an incoming video stream and tries to recognize an object in the entire frame. If it finds something, it adds a label in the bottom left corner of the image and uses that to generate a new output stream. That’s ok, but what’s more interesting is when it works with another PPE, modet. modet detects moving objects in the stream and draws a box around them. It now also adds metadata to the outgoing pipeline messages that can be used by downstream PPEs to do something with the regions where motion has been detected.
recognize can work in a mode where it uses the modet metadata to recognize moving objects in the stream. The screen capture with the guitar is an example. That’s why I was waving it around – it had to be in motion to get detected and recognized. The box is that big because I am in motion too! However, Inception-v3 seems quite able to recognize the dominant object in the image segment. While there is only one recognized object in this example, if there were more regions they would be individually recognized.
Of course, the example data set for Inception-v3 only knows so many things, guitars being an example. However, something I want to use this for is to detect a UPS truck coming up the drive. I’ll probably have to try retraining the final layer to do this.
The idea of joining together separate, lightweight processing elements to form complex pipelines is nothing new. DirectX and GStreamer have been doing this kind of thing for a long time. More recently, Apache NiFi has done a similar kind of thing but with Java classes. While Apache NiFi does have a lot of nice features, I really don’t want to live in Java hell.
I have been playing with MQTT for some time now and it is a very easy to use publish/subscribe system that’s used in all kinds of places. Seemed like it could be the glue for something…
So that’s really the background for rtnDataFlow or rtndf as it is now called. It currently uses MQTT as its pub/sub infrastructure but there’s nothing too specific there – MQTT could easily be swapped out for something else if required. The repo consists of a number of pipeline processing elements that can be used to do some (hopefully) useful things. The primary language is Python although there’s nothing stopping anything being used provided it has an MQTT client and handles the JSON messages correctly. It will even be able to include pipeline processing elements in Docker containers. This will make deployment of new, complex, pipeline processing elements very simple.
The pipeline processing elements are all joined up using topics. Pipeline processing elements can publish to one or more topics and/or subscribe to one or more topics. Because pub/sub systems are intrinsically multicasting, it’s very easy to process data in multiple ways in parallel (for redundancy, performance or functionality). MQTT also allows pipeline processing elements to be distributed on multiple systems, allowing load sharing and heterogeneous computing systems (where only some machines might be fitted with GPUs for example).
Obviously, tools are required to design the pipelines and also to manage them at runtime. The design aspect will come from an old code generation project. While that actually generates C and Python code from a design that the user inputs via a graphical interface, the rtnDataFlow version will just make sure all topic names and broker addresses line up correctly and then produce a pipeline configuration file. A special app, rtnFlowControl, will run on each system and will be responsible for implementing the pipeline design specified.
So what’s the point of all of this? I’m tired of writing (or reworking) code multiple times for slightly different applications. My goal is to keep the pipeline processing elements simple enough and tightly focused so that the specific application can be achieved by just wiring together pipeline processing elements. There’ll end up being quite a few of these of course and probably most applications will still need custom elements but it’s better than nothing. My initial use of rtnDataFlow will be to assist with experiments to see how machine learning tools can be used with IoT devices to do interesting things.
TensorFlow provides a very convenient dataflow graph framework for not just machine learning applications but really anything where data goes through a number of processing stages. The great thing about using TensorFlow for this is that all the GPU and scaling capabilities are potentially available, along with a Python API for added convenience.
To test this out, I created a simple Python script to act as an image processor that can be inserted into a Jpeg video stream using MQTT as a way of moving the data around. The collection of scripts to generate, process and display the video can be found here. The imageproc.py script uses TensorFlow to shrink each frame in the stream by a factor of two (using average pooling) and then performing simple edge detection using a discrete Laplacian, implemented with a 2-D convolution. Jpeg encoding and decoding is also performed using TensorFlow functions.
The frame rate tops out at around 17 frames per second on my i7-2700K/GTX 970 machine (video source frame rate was 30 frames per second). I am guessing that there is a finite startup latency in TensorFlow – it’s no doubt highly inefficient to run the graph with one image at a time.
There’s no rocket science here and the functionality is trivial. However, it is interesting to think how else TensorFlow can be used. Given the incredible interest and the likelihood of dedicated hardware acceleration one day, there might be considerable value in mapping problems onto TensorFlow graphs.
Great tutorial here showing a fun use of TensorFlow to fill in missing parts in images of faces. The training part took just over two hours on my i7-5820K/GTX 1070 machine (80 epochs). Clearly the GPU was getting used a lot – nvidia-smi reported over 150W of power being used at times. Nice to see. I used the LFW image set for training as it can be downloaded without filling in forms and things.
Had a few small issues getting things running properly – probably my errors. My system required an extra few pre-requisites:
sudo apt-get install libboost-python-dev sudo pip install --upgrade dlib sudo pip install --upgrade scipy
Seems that the SciPy version has to be at least 0.18. 0.16 didn’t work.
Also, the models have to be downloaded for OpenFace:
cd <path to openface>/models ./get-models.sh
With all that done, training worked just fine and it was time to move on to testing the completion code. Since I was using Python 2.7, I had to make a change in model.py as otherwise it would blow up on an unsupported parameter when creating the output directories. The start of the complete() function should look like this to work with Python 2.7:
if not os.path.exists(os.path.join(config.outDir, 'hats_imgs')): os.makedirs(os.path.join(config.outDir, 'hats_imgs')) if not os.path.exists(os.path.join(config.outDir, 'completed')): os.makedirs(os.path.join(config.outDir, 'completed'))
Another problem I had with running completion was that model.py could not find the checkpoint from the training session. The load() function in model.py was post-pending the model_dir to the checkpoint_dir. Just commenting out this line fixed the problem. The results of the completion process are shown in the gif. I was lazy and used some faces from the training set – I really should have found some different images to test the results with of course.