Skip to main content
Field categories organize the YouTube Extended Dataset’s 24 fields into logical groupings for easier understanding and implementation.

Channel Fields

YouTube channel metadata and coverage information.
FieldTypeDescription
channel_idSTRINGImmutable Babbl Labs internal unique identifier (UUID)
channel_custom_urlSTRINGMutable custom URL for channel
channel_nameSTRINGChannel title from YouTube metadata
channel_descriptionSTRINGChannel description from YouTube metadata (default: NONE)
channel_localeSTRINGGeographic country code (ISO 3166-1 alpha-2) (default: NONE)
channel_published_atFLOATUnix timestamp when channel was created
channel_coverage_initiated_atFLOATUnix timestamp when we initiated coverage

Video Fields

Video-level metadata and publication information.
FieldTypeDescription
video_published_atFLOATUnix timestamp when video was originally published
video_titleSTRINGVideo title from YouTube metadata
video_descriptionSTRINGVideo description from YouTube metadata (default: NONE)
video_languageSTRINGVideo language code (ISO 639-1)

Processing Fields

Data processing timeline and model versioning information.
FieldTypeDescription
downloaded_atFLOATUnix timestamp when we downloaded video
transcribed_atFLOATUnix timestamp when we transcribed video
transcription_version_tagSTRINGIdentifier for transcription model used
recorded_atFLOATUnix timestamp when video was added to dataset

Segment Fields

Transcript segment identification, timing, and content information.
FieldTypeDescription
segment_idSTRINGUnique identifier for transcript segment across all videos
segment_start_tsFLOATStarting point in seconds from video start
segment_end_tsFLOATEnd point in seconds from video start
segment_textSTRINGComplete verbatim transcript text for segment (5-60s of speech)
segment_char_startINTCharacter index from start of video transcript
segment_char_endINTCharacter index to end of segment in transcript

Speaker Fields

Speaker identification and context information (fields with NONE defaults).
FieldTypeDescription
speaker_nameSTRINGName of speaker if identifiable (default: NONE)
speaker_associated_entitySTRINGEntity/company speaker is associated with (default: NONE)
speaker_positionSTRINGKnown title of speaker (default: NONE)
speaker_role_contextSTRINGRole within video context (HOST, GUEST, etc.) (default: NONE)

Transcript Metrics

Complete transcript statistics and duration information.
FieldTypeDescription
transcript_total_char_countINTTotal characters in complete video transcript
transcript_total_durationFLOATTotal video duration in seconds

Data Types & Constraints

TypeFormatExample
STRINGVariable-length text"CNBC"
FLOATFloating-point number49.5
INTInteger number3543
All timestamp fields use Unix timestamp format. Fields with NONE defaults handle missing information gracefully. Geographic codes follow ISO 3166-1 alpha-2, language codes follow ISO 639-1.
I