Skip to main content

What is Babbl Labs’ Entity Resolution Strategy?

Our entity resolution system identifies and maps company mentions to standardized Financial Instrument Global Identifiers (FIGI), enabling consistent tracking across different naming conventions and contexts. We use specialized financial NLP models combined with hierarchical matching against company databases, achieving 95.6% overall accuracy across tens of thousands of analyzed records. Our approach handles company name variations and point-in-time resolution, with error rates declining exponentially as market cap increases.

How do we measure quality in data sources?

We employ a comprehensive 5-stage processing pipeline that transforms raw video content into structured, analyzed data. This includes fine-tuned transcription model processing, speaker diarization, named entity recognition, speaker identification across videos, and dual-classification sentiment scoring. Each stage includes automated consistency checks and regular expert review, with continuous monitoring to maintain quality standards across our expanding dataset.

How do we QA our data?

Our quality assurance combines automated validation with human expert oversight. We perform stratified accuracy assessments across market cap segments, maintain an expanding catalog of financial experts for speaker identification, and use mixture-of-experts classification for sentiment analysis. Channel curation involves keyword search, social media monitoring, and network similarity analysis to ensure comprehensive coverage of relevant financial content.

How do we encompass customer feedback into our builds?

We continuously expand our coverage based on keyword search, social media relevance monitoring, and network similarity to existing channels. Our dataset optimization focuses on financially relevant transcript segments, and we provide recommendations for point-in-time simulation using appropriate delivery lags. Regular accuracy assessments and feedback integration drive ongoing improvements to our models and feature enhancements.
I