There’s a new TensorFlow model for image captioning available here. It combines a deep convolutional neural network (Inception-v3) with an LSTM-based decoder network. LSTM is cropping up just about everywhere now…
Very interesting work here that uses recurrent neural network ideas to predict next frames in a video sequence. It’s amazing how many times LSTM pops up these days. Unsupervised learning is one of the most interesting areas of machine learning at the moment and the potential is seemingly unlimited. This is another example of using LSTM for understanding video representations using LSTM. It’s a fascinating area.
I am currently working with TensorFlow and I thought it’d be interesting to see what kind of performance I could get when processing video and trying to recognize objects with Inception-v3. While I’d like to get TensorFlow integrated with some of my Qt apps, the whole “build with Bazel” thing is holding that up right now (problems with Eigen includes – one day I’ll get back to that). As a way of taking the path of least resistance, I included TensorFlow in an inline MQTT filter written in Python. It subscribes to a video topic sourced from a webcam and outputs recognized objects in the stream.
As can be seen from the screen capture, it’s currently achieving 11 frames per second using 640 x 480 frames with a GTX 970 GPU. With a GTX 960 GPU, the rate falls to around 8 frames per second. This is pretty much what I have seen with other TensorFlow graphs – the GTX 970 is about 50% faster than a GTX 960, probably due to the restricted memory bus width on the GTX 960.
Hopefully I’ll soon have a 10 series GPU – that should be an interesting comparison.
Came across this great blog about machine learning with the most recent entry describing how to build a neural stack machine in Python based on a paper published by DeepMind. There are some earlier blog entries that build up to this to help with the background. Looks like a tremendous amount of effort was put into this work and it’s well worth a read – and trying out the Python code.
Interesting story here about what parallel resources the brain musters to perform simple tasks. It suggests that trying to build a functional brain-analog by simulating individual neurons is unnecessary. Instead, a much more practical silicon implementation would come from understanding the aggregate behavior of groups of neurons and simulating that instead. Not a new idea but it’s interesting to see an attempt to start to understand how this might work.
Very interesting paper here about Neural Turing Machines, essentially adding memory to conventional neural networks to achieve new capabilities. Might explain why Google wanted DeepMind so much!