Nab theme, more professional navigation theme
Ctrl + D Favorites
Current Position:fig. beginning " AI Tool Library

Gaze-LLE: A Target Prediction Tool for Character Gaze in Video

2025-01-24 807

General Introduction

Gaze-LLE is a gaze target prediction tool based on a large-scale learning encoder. Developed by Fiona Ryan, Ajay Bati, Sangmin Lee, Daniel Bolya, Judy Hoffman, and James M. Rehg, the project aims to enable efficient gaze target prediction with pre-trained visual base models such as DINOv2.The architecture of Gaze-LLE is concise and only freezes the pre-trained visual coder to learn a lightweight gaze decoder, which reduces the amount of parameters by 1-2 orders of magnitude compared to previous work, and does not require additional input modalities such as depth and pose information.

Gaze-LLE: Target Prediction Tool for Character Gaze in Video-1

 

Function List

  • Focus on target forecasting: Efficient prediction of gaze targets based on pre-trained visual coders.
  • Multi-gaze prediction: Supports gaze prediction for multiple individuals in a single image.
  • Pre-trained models: Provides a variety of pre-trained models to support different backbone networks and training data.
  • Lightweight Architecture: Learning lightweight gaze decoders only on frozen pre-trained visual coders.
  • No additional input modes: No additional depth and attitude information inputs are required.

 

Using Help

Installation process

  1. Cloning Warehouse:
   git clone https://github.com/fkryan/gazelle.git
cd gazelle
  1. Create a virtual environment and install dependencies:
   conda env create -f environment.yml
conda activate gazelle
pip install -e .
  1. Optional: Install xformers to accelerate attention calculations (if supported by the system):
   pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118

Using pre-trained models

Gaze-LLE provides a variety of pre-trained models that users can download and use as needed:

  • gazelledinov2vitb14: Model based on DINOv2 ViT-B with training data from GazeFollow.
  • gazelledinov2vitl14: A model based on DINOv2 ViT-L with training data from GazeFollow.
  • gazelledinov2vitb14_inout: A model based on DINOv2 ViT-B with training data for GazeFollow and VideoAttentionTarget.
  • gazellelargevitl14_inout: A model based on DINOv2 ViT-L with training data for GazeFollow and VideoAttentionTarget.

usage example

  1. Load the model in PyTorch Hub:
   import torch
model, transform = torch.hub.load('fkryan/gazelle', 'gazelle_dinov2_vitb14')
  1. Check out the demo notebook in Google Colab to learn how to detect the target of everyone's gaze in an image.

watch for forecasts

Gaze-LLE supports gaze prediction for multiple people, i.e., a single image is encoded once and then features are used to predict the gaze targets of multiple people in the image. The model outputs a spatial heat map representing the probability of the location of the gaze target in the scene with values ranging from [0,1], where 1 represents the highest probability of the gaze target location.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Scan the code to follow

qrcode

Contact Us

Top

en_USEnglish