Field categories organize the Core Dataset’s 40+ fields into logical groupings for easier understanding and implementation.
Channel Fields
Channel-level metadata and statistics from YouTube.
| Field | Type | Description |
|---|
| channel_id | UUID | Primary Key - Immutable Babbl Labs internal unique identifier |
| channel_uri | UUID | Immutable YouTube’s unique identifier for channel |
| channel_custom_url | STRING | Mutable custom URL for channel |
| channel_name | STRING | Channel title from YouTube metadata |
| channel_description | STRING | Channel description from YouTube metadata |
| channel_locale | ENUM | Channel geographic country (ISO 3166-1 alpha-2) |
| channel_view_count | INT | Most recent channel total view count |
| channel_subscriber_count | INT | Most recent channel total subscriber count |
| channel_video_count | INT | Most recent channel total video count |
| channel_published_at | TIMESTAMP | Timestamp when channel was created |
| channel_coverage_initiated_at | TIMESTAMP | Timestamp when we initiated coverage |
| channel_last_updated_at | TIMESTAMP | Timestamp when we last polled YouTube metadata |
Video Fields
Video-level metadata and engagement metrics from YouTube.
| Field | Type | Description |
|---|
| video_id | UUID | Primary Key - Unique YouTube video identifier |
| video_view_count | INT | Most recent view count recorded |
| video_like_count | INT | Most recent video like count recorded |
| video_comment_count | INT | Most recent video comment count recorded |
| video_published_dt | TIMESTAMP | Timestamp when video was originally published |
| video_transcribed_at | TIMESTAMP | Timestamp when we originally transcribed video |
| video_scored_at | TIMESTAMP | Timestamp when we originally processed video |
| video_last_updated_at | TIMESTAMP | Timestamp when we last polled YouTube metadata |
| video_title | STRING | Video title from YouTube metadata |
| video_description | STRING | Video description from YouTube metadata |
| video_language | STRING | Video language code (ISO 639-1) |
Processing Fields
Model and processing metadata for transcription and analysis.
| Field | Type | Description |
|---|
| model_transcription_tag | STRING | Identifier for transcription model used |
| model_scoring_tag | STRING | Identifier for scoring model used |
Segment Fields
Transcript segment timing and content information.
| Field | Type | Description |
|---|
| segment_start | FLOAT | Starting point of segment - seconds from start of video |
| segment_end | FLOAT | End point of segment - seconds from start of video |
| segment_text | STRING | Transcribed text (50 words before/after entity mention) |
Speaker Fields
Speaker identification and context information.
| Field | Type | Description |
|---|
| speaker_name | STRING | Name of speaker in this segment |
| speaker_associated_entity | UUID | Entity (company) speaker is associated with (FIGI) |
| speaker_position | STRING | Known title of speaker in this segment |
| speaker_role_context | ENUM | Role of speaker (Host, Guest, ReferencedSource, Other) |
Entity Fields
Named entity recognition and financial instrument mapping.
| Field | Type | Description |
|---|
| entity_id | UUID | Immutable identifier of entity referenced |
| entity_symbol | STRING | Public company ticker symbol (ORG entities only) |
| entity_figi_id | UUID | Financial Instrument Global Identifier (FIGI) |
| entity_type | ENUM | Type of entity (ORG, PERSON, PRODUCT) |
| entity_name | STRING | Mapped entity name from raw anchor |
| entity_name_raw_anchor | STRING | Raw string of named entity detected |
Sentiment Fields
Multi-layered sentiment analysis for entity mentions.
| Field | Type | Description |
|---|
| entity_sentiment_overt_buy_sell | ENUM | Overt buy/sell sentiment (POSITIVE, NEGATIVE, NONE_EXPRESSED, NULL) |
| entity_sentiment_generic | ENUM | General sentiment (POSITIVE, NEGATIVE, NONE_EXPRESSED, NEUTRAL) |
Data Types & Constraints
| Type | Format | Example |
|---|
| UUID | 36-character identifier | 550e8400-e29b-41d4-a716-446655440000 |
| STRING | Variable-length text | "Bloomberg Technology" |
| INT | Integer number | 12906 |
| FLOAT | Floating-point number | 168.7303438 |
| TIMESTAMP | UTC timestamp | 1747251178 |
| ENUM | Predefined values | POSITIVE, NEGATIVE, NEUTRAL |
All timestamp fields use UTC timezone. Geographic codes follow ISO 3166-1 alpha-2, language codes follow ISO 639-1.