The pyramid was originally put together for another project but has received a new lease of life as an rtn data flow point of presence. It uses a Logitech C920 webcam for video and audio and has powered speakers for text to speech or direct audio output. The top of the pyramid has an LED panel that indicates the current state of the pyramid:
- Idle – waiting for wakeup phrase.
- Listening – collecting input.
- Processing – performing speech recognition and processing.
- Speaking – indicates that the pyramid is generating sound.
The pyramid has a Raspberry Pi 2 internally along with a USB-connected Teensy 3.1 with an OctoWS2811 to run the LED panel. The powered speakers came out of some old Dell PC speakers and the case was 3D printed.
It runs these rtndf/Manifold nodes:
- uvccam – generates a 1280 x 720 video stream at 30fps.
- audio – generates a PCM audio stream suitable for speech recognition.
- tts – text to speech node to convert text to speech.
- tty – a serial interface used to communicate with the Teensy 3.1.
Speech recognition is performed by the speechdecode node that runs on a server, as is object recognition (recognize), motion detection (modet) and face recognition (facerec).
The old project had an intelligent agent that took the output of the various stream processors and generated the messages to control the pyramid. This has yet to be moved over to rtndf.