Complete Data Dictionary
Field | Type | Description |
---|---|---|
segment_id | UUID | Primary Key - Globally unique identifier for each transcript segment |
video_id | UUID | Unique YouTube video identifier |
channel_id | UUID | Immutable Babbl Labs internal unique identifier for channel |
channel_uri | UUID | Immutable YouTube’s unique identifier for channel |
channel_custom_url | STRING | Mutable custom URL for channel |
channel_name | STRING | Channel title from YouTube metadata |
channel_description | STRING | Channel description from YouTube metadata |
channel_locale | ENUM | Channel geographic country (ISO 3166-1 alpha-2) |
channel_published_at | TIMESTAMP | Timestamp when channel was created |
channel_coverage_initiated_at | TIMESTAMP | Timestamp when we initiated coverage |
video_title | STRING | Video title from YouTube metadata |
video_description | STRING | Video description from YouTube metadata |
video_language | STRING | Video language code (ISO 639-1) |
video_published_dt | TIMESTAMP | Timestamp when video was originally published |
video_download_dt | TIMESTAMP | Timestamp when we downloaded the video |
video_transcribed_at | TIMESTAMP | Timestamp when we transcribed the video |
video_in_dataset_at | TIMESTAMP | Timestamp when video was included in dataset |
model_transcription_tag | STRING | Identifier for transcription model used |
segment_start | FLOAT | Starting point of segment in seconds from video start |
segment_end | FLOAT | End point of segment in seconds from video start |
segment_start_char | INT | Character index where segment starts in transcript |
segment_end_char | INT | Character index where segment ends in transcript |
segment_text | STRING | Complete verbatim transcript text for this segment |
speaker_name | STRING | Name of speaker (optional - may be null) |