I’ve been using HTC Vive Trackers a bunch for BodyMouth and would like to explore methods of computationally-generated visuals that involve them. My program of choice for this is TouchDesigner, which has built in operators for use with VR controllers such as Vive Trackers and is a powerful and popular tool for real-time generated graphics. I’d like to attempt to create a “movement extrapolator”: a neural network that looks at, say, 120 frames of tracker position/orientation data and tries to predict what the next frame would be. By predicting a new frame, appending it to the most recent 119 frames and repeating the process repeatedly, one could conceivably generate visual effects such as ribbons, particles, etc. whose movements match that of a human performer. Presumably.
TDxTF Workflow
My preliminary goal, before training any sort of neural network, is to create the workflow with which I’ll accumulate Vive Tracker data, use a neural network to predict the next frame of data, generate some visual effects using the new frame, and repeat. I plan to use TensorFlow to train my neural network, but first need to consider how the NN will fit into the TouchDesigner network.
I’m planning to run a Python script that will receive the tracker data over UDP, make the prediction, then send the new frames back over UDP. This type of remote setup makes it easy to outsource the machine learning to a faster virtual machine later.
TouchDesigner -> Python w/ UDP
The OpenVR chop gets position and rotation data from a single Vive Tracker. To normalize the rotation data for machine learning purposes, I’m adding 360 degrees to each datapoint, wrapping around 360 (mod 360), then dividing by 360 to get a rotation value between 0 and 1. The values are all fed into a trail CHOP with a window length of 121 frames (representing a 120 frame “history” and the frame that immediately followed it).
To transmit a UDP message with the entire history, I’m using a udpOut DAT and the following code in a script CHOP connected to the trail:
import json, numpy as np
# ...rest of starter script CHOP code
def onCook(scriptOp):
scriptOp.copy(scriptOp.inputs[0])
chans = scriptOp.chans()
op('toTrain').send([[round(val, 3) for val in chan.vals] for chan in chans], terminator="") #round decimals to reduce message size
The following Python code receives the message and parses the contents into a numpy array. Be sure to use the same port # (4242, in this case) in here and your udpOut DAT settings.
import socket, json, numpy as np
socketIn = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
socketIn.bind(("localhost", 4242))
while True:
data = np.array(json.loads(socketIn.recv(100000)))
# do stuff with newest data
NN Training w/ TensorFlow
To train my dataset, I’ll be assembling 2 numpy arrays of equal length, one being a list of 120 frame “histories” and one being the corresponding frames that immediately followed those histories. I’ll train a neural network to receive one such history and predict the next frame that will follow it.
Training Data Capture Script
#capture.py
import socket, json, keyboard, os, numpy as np
socketIn = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
socketIn.bind(("localhost", 4242))
try:
histories = np.load("./train/histories.npy")
frames = np.load("./train/frames.npy")
except: #files don't exist? Start new arrays
histories = np.zeros((0, 120, 6))
frames = np.zeros((0, 6))
while True:
data = np.transpose(np.array(json.loads(socketIn.recv(1000000))))
histories = np.append(histories, [data[:120]], axis=0)
frames = np.append(frames, [data[-1]], axis=0)
if keyboard.is_pressed("q"):
break
if not "train" in os.listdir("."):
os.mkdir("./train")
np.save("./train/histories", histories)
np.save("./train/frames", frames)
2 accumulator nparrays, histories and frames, are created to hold training data. The shape of the data is (6, 121) initially: 6 CHOP channels with 121 datapoints each. Transposing this data flips the dimensions to represent 121 frames with 6 datapoints each. The data is split into the 120-frame history and the frame immediately following it, and then appended to the appropriate accumulators. By saving/loading these arrays to/from local files, this script can be run at any time to add more training data.
Model Training Script
For my neural network, I’ll use LSTM (long short-term memory) layers, which are designed to handle sequential data. I’ll load the numpy arrays from the local files, create and train my model, then save it to a local file. The following code, written in a separate python script, accomplishes all of this:
#train.py
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Input
histories = np.load("./train/histories.npy")
frames = np.load("./train/frames.npy")
model = Sequential()
model.add(Input(shape=(120, 6)))
model.add(LSTM(64, activation="relu", return_sequences=True))
model.add(LSTM(32, activation="relu"))
model.add(Dense(6))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(histories, frames, epochs=30, batch_size=32, validation_split=0.1)
model.save("./model.keras") #this doesn't overwrite existing model!
Model Runtime Script
The following code loads the saved model, creates 2 sockets for receiving a history and sending back the prediction and makes predictions on incoming data in a loop:
#run.py
from tensorflow.keras.models import load_model
import socket, json, numpy as np
model = load_model("model.keras")
tdSocketIn = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
tdSocketIn.bind(("localhost", 4243)) #note port #
tdSocketOut = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
while True:
data = np.transpose(np.array(json.loads(tdSocketIn.recv(1000000))))
frame = model.predict(np.expand_dims(data, axis=0))[0]
toSend = bytes(json.dumps([round(val, 3) for val in frame.tolist()]), encoding="utf-8")
tdSocketOut.sendto(toSend, ("localhost", 7000))
Note the port # 4243 in the code above. It can be any number, but make sure you use the same number in the udpIn DAT which will receive the data!
Complete TD Network
Back in TouchDesigner, a udpIn DAT receives the predicted frames from the Python script above. Each frame is parsed into CHOP channels, which are renamed to be identical to those received from the Vive Trackers.
From here, I’ve created a second trail CHOP with a window of 120 frames. Just like the one used in training, it outputs the history to a script CHOP and corresponding udpOut DAT, which differ from the ones creating during training only by the UDP port #.
Initially, the trail CHOP stores data coming from the Vive Tracker (the “human”). However, when a switch is toggled, the source of the data switches to the output of the neural network. This creates a feedback loop: the running history, initially filled with tracker data, is eventually replaced entirely with frames the neural network itself previously predicted. In effect, the neural network extrapolates on the Vive Tracker data that was stored when the switch was flipped. This could also lead to some interesting, though undesired, positive feedback effects where quirks in the predictions are amplified.
Training + Results
I want to test this entire workflow by training some neural networks on very limited training data.
This network, trained only on vertical motion, behaves as expected and even a bit better. I start with a motion similar to the one I used during training, and the network is able to keep the motion going for a while when I hit the switch and stop moving.
When I repeat the process but this time while keeping the tracker still, the network is able to actually maintain the stillness for a bit, which I didn’t expect. Nevertheless, this network, with its extremely limited training, will have an extreme tendency to predict vertical motion; what starts as a small vertical “twitch” in the produced motion, through the feedback loop, eventually turns into the full vertical motion it was trained on.
Here’s how the network performs after I added some training data consisting of clockwise motion and retrained it. When I give it more sequential clockwise data, it’s able to keep the motion going for a while! After a few seconds however, the motion transitions to a purely vertical one, showing the network’s bias towards predicting vertical motion.
When I repeat the process with COUNTER-clockwise motion, the network definitely seems to struggle. The extrapolated motion doesn’t follow much of a pattern, but you can really see it “trying” to create vertical or clockwise motion out of something which is neither.
I’m definitely excited by this preliminary step towards motion prediction! My initial tests show me just how much the training data (or lack thereof) can affect the outcome of your neural network. I’m planning for my next step to focus on methods of producing sufficient training data that don’t involve me standing in a room waving Vive Trackers around for hours.