Field categories organize the YouTube Core Dataset’s 40+ fields into logical groupings for easier understanding and implementation.
Channel Fields
Channel-level metadata and statistics from YouTube.
Field | Type | Description |
---|
channel_id | UUID | Primary Key - Immutable Babbl Labs internal unique identifier |
channel_uri | UUID | Immutable YouTube’s unique identifier for channel |
channel_custom_url | STRING | Mutable custom URL for channel |
channel_name | STRING | Channel title from YouTube metadata |
channel_description | STRING | Channel description from YouTube metadata |
channel_locale | ENUM | Channel geographic country (ISO 3166-1 alpha-2) |
channel_view_count | INT | Most recent channel total view count |
channel_subscriber_count | INT | Most recent channel total subscriber count |
channel_video_count | INT | Most recent channel total video count |
channel_published_at | TIMESTAMP | Timestamp when channel was created |
channel_coverage_initiated_at | TIMESTAMP | Timestamp when we initiated coverage |
channel_last_updated_at | TIMESTAMP | Timestamp when we last polled YouTube metadata |
Video Fields
Video-level metadata and engagement metrics from YouTube.
Field | Type | Description |
---|
video_id | UUID | Primary Key - Unique YouTube video identifier |
video_view_count | INT | Most recent view count recorded |
video_like_count | INT | Most recent video like count recorded |
video_comment_count | INT | Most recent video comment count recorded |
video_published_dt | TIMESTAMP | Timestamp when video was originally published |
video_transcribed_at | TIMESTAMP | Timestamp when we originally transcribed video |
video_scored_at | TIMESTAMP | Timestamp when we originally processed video |
video_last_updated_at | TIMESTAMP | Timestamp when we last polled YouTube metadata |
video_title | STRING | Video title from YouTube metadata |
video_description | STRING | Video description from YouTube metadata |
video_language | STRING | Video language code (ISO 639-1) |
Processing Fields
Model and processing metadata for transcription and analysis.
Field | Type | Description |
---|
model_transcription_tag | STRING | Identifier for transcription model used |
model_scoring_tag | STRING | Identifier for scoring model used |
Segment Fields
Transcript segment timing and content information.
Field | Type | Description |
---|
segment_start | FLOAT | Starting point of segment - seconds from start of video |
segment_end | FLOAT | End point of segment - seconds from start of video |
segment_text | STRING | Transcribed text (50 words before/after entity mention) |
Speaker Fields
Speaker identification and context information.
Field | Type | Description |
---|
speaker_name | STRING | Name of speaker in this segment |
speaker_associated_entity | UUID | Entity (company) speaker is associated with (FIGI) |
speaker_position | STRING | Known title of speaker in this segment |
speaker_role_context | ENUM | Role of speaker (Host, Guest, ReferencedSource, Other) |
Entity Fields
Named entity recognition and financial instrument mapping.
Field | Type | Description |
---|
entity_id | UUID | Immutable identifier of entity referenced |
entity_symbol | STRING | Public company ticker symbol (ORG entities only) |
entity_figi_id | UUID | Financial Instrument Global Identifier (FIGI) |
entity_type | ENUM | Type of entity (ORG, PERSON, PRODUCT) |
entity_name | STRING | Mapped entity name from raw anchor |
entity_name_raw_anchor | STRING | Raw string of named entity detected |
Sentiment Fields
Multi-layered sentiment analysis for entity mentions.
Field | Type | Description |
---|
entity_sentiment_overt_buy_sell | ENUM | Overt buy/sell sentiment (POSITIVE, NEGATIVE, NONE_EXPRESSED, NULL) |
entity_sentiment_generic | ENUM | General sentiment (POSITIVE, NEGATIVE, NONE_EXPRESSED, NEUTRAL) |
Data Types & Constraints
Type | Format | Example |
---|
UUID | 36-character identifier | 550e8400-e29b-41d4-a716-446655440000 |
STRING | Variable-length text | "Bloomberg Technology" |
INT | Integer number | 12906 |
FLOAT | Floating-point number | 168.7303438 |
TIMESTAMP | UTC timestamp | 1747251178 |
ENUM | Predefined values | POSITIVE , NEGATIVE , NEUTRAL |
All timestamp fields use UTC timezone. Geographic codes follow ISO 3166-1 alpha-2, language codes follow ISO 639-1.