Proposal

Project Title: Super Rapid Annotator 2.0: Advanced Multimodal Video Annotation Agent

Mentors: Raúl Sánchez Sánchez ([email protected]), Manish Kumar Thota ([email protected]), Cristobal Pagán Cánovas ([email protected]), Rosa Illán Castillo ([email protected])

Project Size: Medium (175-hour project)

Difficulty Level: Medium

Objective:

Building upon the foundation of the previous Super Rapid Annotator project (https://github.com/manishkumart/Super-Rapid-Annotator-Multimodal-Annotation-Tool), this initiative aims to develop an advanced annotation agent that leverages state-of-the-art multimodal large language models (MLLMs) and reasoning models. The agent will process videos and generate structured CSV outputs for annotation purposes, operable via a command-line interface (CLI) or Python.

We have a software called Rapid Annotator(https://sites.google.com/case.edu/techne-public-site/red-hen-rapid-annotator). Students upload a bunch of videos and watch them one by one annotating if the person is inside or outside, if it wear glasses, … We want to automate when possible it and avoid the repetitive tasks to the students and get a resultant csv with all the annotations using a multimodal model.

System Components:

  1. Agent-Based Annotation System:
  2. Command-Line Interface (CLI):

Example Workflow:

Red Hen Lab Potential Contributors