Apache NiFi is a great way of capturing and processing streams while Apache Kafka is a great way of storing stream data. There’s an excellent description here of how to configure NiFi to pass data to Kafka using MovieLens data as its source. Since I am not running HDFS I modified the example to just put the movies and tags data into Kafka and save the ratings data to a local file. Trying to stash the ratings data into Kafka doesn’t work – there is just too much of it too fast and buffers overflow. It’s pretty easy to use the Kafka console consumer to check that the data is being stored for the movies and tags topics and the local ratings.dat file will be generated also.