Glance Transforms Long Videos into Mobile Clips Using Advanced Technology

Glance Transforms Long Videos into Mobile Clips Using Advanced Technology

As audiences increasingly consume content on mobile devices, Glance, a mobile-first content platform, is tackling the challenge of transforming lengthy videos into bite-sized clips. The company specializes in processing 1-2 hour videos from various sources, including podcasts and movies, and converting them into 30 to 180-second vertical clips designed for mobile lock screens. With a projected increase in daily output from 3,500 to over 10,000 videos, manual editing methods are no longer feasible.

Glance's innovative solution goes beyond basic cropping; it intelligently identifies key speakers and dynamically adjusts the layout to enhance viewer engagement. Here’s a closer look at how Glance achieves this transformation.

Creating Mobile-Optimized Content

The primary objective is to develop a comprehensive pipeline that converts long-form landscape videos (16:9) into multiple short-form portrait videos (9:16). Key functionalities include:

  1. Key Moment Identification: Extracting the most compelling 60-second segments from extensive footage.
  2. Active Speaker Detection: Identifying speakers in each frame and positioning them appropriately.
  3. Split Screen Detection: Recognizing interview formats and stacking frames to maintain context.
  4. Intelligent Reframing: Adapting wide shots into focused vertical frames without losing essential details.
  5. Dynamic Caption Highlighting: Creating engaging captions that enhance viewer interaction.
  6. Automated Branding: Consistently applying branding elements across all videos.

The technical foundation employs Google Cloud Speech-to-Text, Gemini, and Google Vision API, along with custom video manipulation tools.

Pipeline Architecture

The process consists of three main modules:

Module 1: Video Clipping

This module focuses on converting long videos into transcripts, identifying key segments, and clipping the footage accurately. Key functions include:

  • Audio Extraction: Isolating the audio from the video.
  • Speech-to-Text Transcription: Converting audio to text with precise timestamps.
  • Segment Identification: Analyzing transcripts to determine optimal clip timings.
  • Video Clipping: Cutting the video into short segments based on identified timestamps.
  • Transcript Validation: Ensuring accurate capture of phrases and words.

Module 2: Intelligent Reframing Engine

This module is responsible for converting horizontal frames into vertical ones. It employs a multi-stage scene analysis to ensure key elements are preserved during cropping.

Active Speaker Detection

Utilizing Google Cloud Vision API, the system identifies who is speaking in each frame. The process includes:

  • Liveness Check: Distinguishing live speakers from static images through facial landmark tracking.
  • Engagement Quantification: Measuring activity scores based on facial movements and emotional changes.
  • Primary Speaker Identification: Designating the most animated speaker as the primary focus.

Split-Screen Detection

This feature detects interview layouts and stacks the two halves of the frame vertically to maintain context. The system uses continuous face tracking and frame-by-frame detection to identify split-screen segments.

Automated Reformatting

Once split screens are detected, the system reformats the video segments accordingly:

  • Single Speaker Crop: Centers the frame on the primary speaker.
  • Split Screen: Stacks the two halves vertically.
  • Multi-Speaker Crop: Focuses on the most prominent speaker in multi-person discussions.
  • Fallback Options: Applies center crops or letterboxing when no faces are detected.

Module 3: Finishing and Branding

The final module prepares clips for publication, enhancing viewer engagement and reinforcing brand identity through dynamic captioning and consistent logo placement.

Conclusion

Glance’s video processing pipeline exemplifies the potential of automated systems in video editing. By integrating advanced technologies, the platform efficiently converts long-form content into mobile-friendly clips, allowing organizations to maximize the value of their video archives.

This editorial summary reflects Google and other public reporting on Glance Transforms Long Videos into Mobile Clips Using Advanced Technology.

Reviewed by WTGuru editorial team.