Deflectometric Eye Tracking

Eye-tracking plays a crucial role in the development of virtual reality devices, neuroscience research, and psychology. Despite its significance in numerous applications, achieving an accurate, robust, and fast eye-tracking solution remains a considerable challenge for current state-of-the-art methods. While existing reflection-based techniques (e.g., “glint tracking”) are considered the most accurate, their performance is limited by their reliance on sparse 3D surface data acquired solely from the cornea surface.

In this research track, we rethink how specular reflections can be used for eye tracking: We have developed a set of novel methods for accurate and fast evaluation of the gaze direction that utilize dense deflectometric information. Deflectometry is a well-known technique in optical 3D metrology for the measurement of specular surfaces. For the first time ever, our group has developed a family of “computational deflectometry” methods for eye tracking. The reflection of a patterned screen is observed over the specular eye surface, with the information about the gaze being encoded in the deformation of the pattern in the camera image. With “standard” screens and cameras (~1Mpix resolution), improvements in the number of acquired reflection surface points (“glints”) of factors >3000X and higher (compared to the current state-of-the-art in glint tracking) are easily achievable. The additional information allows for eye tracking at high accuracy.

So far, we have explored 3 different approaches to decode the information about the gaze direction, that is encoded in the deflectometric images:

Selected Talks
Utilizing deflectometric information for single-shot eye-tracking with high accuracy
Florian Willomitzer
Utilizing deflectometric information for single-shot eye-tracking with high accuracy
Single-Shot Stereo Deflectometry approach with shape/normal-based geometric processing

This method uses a crossed fringe pattern on the screen to measure the surface normal map of the eye surface. To establish the required correspondence between the screen and the camera, the phase information for both fringe directions is evaluated in single-shot via a 2D continuous wavelet transform approach. The second camera is used to solve the Deflectometry normal-depth ambiguity problem via stereo deflectometry. The (small) overlap region between both cameras is used to reconstruct an initial surface model of the eye. Eventually we use deflectometric correspondences in conjunction with geometric constraints about the eye ball to expand the measured surface and retrieve the eye’s optical axis via backtracing of the captured surface normal towards the eye center. Our current experimental evaluations achieve gaze errors on a realistic model eye below only 0.13°. Moreover, we have demonstrated quantitative measurements on real human eyes in vivo, reaching accuracy values between only 0.46° and 0.97°.

Selected Publications
Optimization-based inverse rendering using deflectometric information

This method uses the known geometry of our calibrated deflectometric setup to develop a PyTorch3D-based differentiable rendering pipeline that simulates a virtual computer-generated (CG) eye model under screen illumination. Eventually, the images and screen-camera correspondence information of the real eye measurement are used to optimize the CG eye’s rotation, translation, and shape parameters with our renderer via gradient descent. This is done until the simulated setup with the CG eye produces images and correspondences that closely match the real measurements, and the gaze direction of the CG eye is eventually used as an estimate of the real eye’s gaze direction. The method does not require a second camera. Moreover, it does not require a specific screen pattern and can even work with ordinary video frames of the main VR screen itself - also in single-shot. All evaluated errors are below 0.5 degrees.

Selected Publications
Deep learning-based deflectometric eye tracking

The deep learning-based deflectometric eye-tracking approach uses the “digital twin" of our Deflectometry setup to generate a training dataset simulating the deflectometric pattern reflection for different eye shapes at different locations and rotation angles. Instead of trying to render photorealistic intensity camera images under varying ambient lighting and screen illumination (which is difficult and leads to an “unrealistic" training dataset), we use the retrieved camera-screen correspondence maps as input to our model. The network learns the relation between input correspondence map and gaze direction. The simulated correspondence maps closely resemble the real captured correspondence maps which are used for gaze evaluation. Moreover, the use of “pure" correspondence maps prevents the network from learning secondary eye features (such as veins and wrinkles in the periocular region), which may not be present in real images.

Our first quantitative experiments on a realistic rotated eye model deliver gaze errors below 1°.

Selected Publications