There’s a new TensorFlow model for image captioning available here. It combines a deep convolutional neural network (Inception-v3) with an LSTM-based decoder network. LSTM is cropping up just about everywhere now…
Very interesting work here that uses recurrent neural network ideas to predict next frames in a video sequence. It’s amazing how many times LSTM pops up these days. Unsupervised learning is one of the most interesting areas of machine learning at the moment and the potential is seemingly unlimited. This is another example of using LSTM for understanding video representations using LSTM. It’s a fascinating area.
Some every interesting software from Facebook’s AI Research that implements segmentation and labelling of images. Code is available on GitHub that uses Torch as its AI engine. Could be a good addition to rtndf as part of a video pipeline. Even if the segmentation and labelling is slower than real time, it’s possible to use a bypass system to keep the frame rate up while also processing selected key frames. This is done by the OpenFace PPE already. As things may move between key frames in a video pipeline, a strategy might be to buffer frames after the first key frame until the results from the second key frame are available and interpolate the segmentation results for the intermediate frames. Then, the buffered frames can be played out at the correct rate. Obviously this adds latency but might be acceptable in some situations.
I had obtained some very nice results with OpenFace in a previous project and thought it would be fun to wrap it into an rtndf pipeline processing element (PPE). It’s also a good test to see whether docker containers can be used with rtndf. Turns out they work just fine. OpenFace has some complex dependencies and it is much easier just to pull a docker container than build it locally. One approach would have been to build a new container based on the original bamos/openface but instead facerec uses a bit of a hack involving host directory mapping.
To make it easy to use, there’s a bash script in the rtndf/facerec directory called facerecstart that takes care of the docker command line (which is a bit messy). Of course, in order to recognize faces, the system needs to have been trained. rtndf/facerec includes a modified version of the OpenFace web demo that saves the data from the training in the correct form for facerec. There’s a bash script, trainstart, that starts it going and then a browser and webcam can be used to perform the training.
As with the recognize PPE, facerec can either process the whole frame or just segments that contain motion by using the output from the modet PPE. In fact both recognize and facerec can be used in the same pipeline to get combined recognition:
uvccam -> modet -> facerec -> recognize -> avview
This illustrates one of the nice features of the pipeline concept: metadata and annotation can be added progressively by multiple processing stages, adding significant value to the resulting stream.
I am currently working with TensorFlow and I thought it’d be interesting to see what kind of performance I could get when processing video and trying to recognize objects with Inception-v3. While I’d like to get TensorFlow integrated with some of my Qt apps, the whole “build with Bazel” thing is holding that up right now (problems with Eigen includes – one day I’ll get back to that). As a way of taking the path of least resistance, I included TensorFlow in an inline MQTT filter written in Python. It subscribes to a video topic sourced from a webcam and outputs recognized objects in the stream.
As can be seen from the screen capture, it’s currently achieving 11 frames per second using 640 x 480 frames with a GTX 970 GPU. With a GTX 960 GPU, the rate falls to around 8 frames per second. This is pretty much what I have seen with other TensorFlow graphs – the GTX 970 is about 50% faster than a GTX 960, probably due to the restricted memory bus width on the GTX 960.
Hopefully I’ll soon have a 10 series GPU – that should be an interesting comparison.
This one looks quite a bit nicer than my previous attempt at this design! The functionality is the same but now a lot of the heavier processing has been moved into a new infrastructure that’s been developed to integrate artificial intelligence and machine learning functions into data flows very efficiently. Now I am able to leverage Apache NiFi‘s extensive range of processors to interface to all kinds of things but also escape the JVM environment to get bare metal performance for the higher level functions including access to GPUs and things like that. In this design I am just using NiFi’s MQTT and Elasticsearch processors but it could just as easily fire processed data into HDFS, Kafka etc.
Just came across this new book all about deep learning. I have only had time to scan through it so far but it looks to cover a lot of ground that is often assumed elsewhere. If you want to know all about how regularized autoencoders and recurrent neural nets work (to pick random examples), this is the place.