Documentation Index
Fetch the complete documentation index at: https://docs.babbl-labs.com/llms.txt
Use this file to discover all available pages before exploring further.
Field categories organize the Extended Dataset’s 24 fields into logical groupings for easier understanding and implementation.
Channel Fields
YouTube channel metadata and coverage information.
| Field | Type | Description |
|---|
| channel_id | STRING | Immutable Babbl Labs internal unique identifier (UUID) |
| channel_custom_url | STRING | Mutable custom URL for channel |
| channel_name | STRING | Channel title from YouTube metadata |
| channel_description | STRING | Channel description from YouTube metadata (default: NONE) |
| channel_locale | STRING | Geographic country code (ISO 3166-1 alpha-2) (default: NONE) |
| channel_published_at | FLOAT | Unix timestamp when channel was created |
| channel_coverage_initiated_at | FLOAT | Unix timestamp when we initiated coverage |
Video Fields
Video-level metadata and publication information.
| Field | Type | Description |
|---|
| video_published_at | FLOAT | Unix timestamp when video was originally published |
| video_title | STRING | Video title from YouTube metadata |
| video_description | STRING | Video description from YouTube metadata (default: NONE) |
| video_language | STRING | Video language code (ISO 639-1) |
Processing Fields
Data processing timeline and model versioning information.
| Field | Type | Description |
|---|
| downloaded_at | FLOAT | Unix timestamp when we downloaded video |
| transcribed_at | FLOAT | Unix timestamp when we transcribed video |
| transcription_version_tag | STRING | Identifier for transcription model used |
| recorded_at | FLOAT | Unix timestamp when video was added to dataset |
Segment Fields
Transcript segment identification, timing, and content information.
| Field | Type | Description |
|---|
| segment_id | STRING | Unique identifier for transcript segment across all videos |
| segment_start_ts | FLOAT | Starting point in seconds from video start |
| segment_end_ts | FLOAT | End point in seconds from video start |
| segment_text | STRING | Complete verbatim transcript text for segment (5-60s of speech) |
| segment_char_start | INT | Character index from start of video transcript |
| segment_char_end | INT | Character index to end of segment in transcript |
Speaker Fields
Speaker identification and context information (fields with NONE defaults).
| Field | Type | Description |
|---|
| speaker_name | STRING | Name of speaker if identifiable (default: NONE) |
| speaker_associated_entity | STRING | Entity/company speaker is associated with (default: NONE) |
| speaker_position | STRING | Known title of speaker (default: NONE) |
| speaker_role_context | STRING | Role within video context (HOST, GUEST, etc.) (default: NONE) |
Transcript Metrics
Complete transcript statistics and duration information.
| Field | Type | Description |
|---|
| transcript_total_char_count | INT | Total characters in complete video transcript |
| transcript_total_duration | FLOAT | Total video duration in seconds |
Data Types & Constraints
| Type | Format | Example |
|---|
| STRING | Variable-length text | "CNBC" |
| FLOAT | Floating-point number | 49.5 |
| INT | Integer number | 3543 |
All timestamp fields use Unix timestamp format. Fields with NONE defaults handle missing information gracefully. Geographic codes follow ISO 3166-1 alpha-2, language codes follow ISO 639-1.