Data Transformation
Part 1 of 2 — How we turn messy time entries into structured matter profiles
The Challenge of Legal Narratives
Time entry narratives are notoriously difficult to parse:
"Review docs"
"Call w/ client re closing"
"Draft; revise per comments"
"Various"
"Work on matter"These entries are meaningful to the person who wrote them — but useless for analysis. Traditional keyword search fails. Simple categorization rules miss context. Manual review doesn't scale.
Narrative solves this through a combination of techniques that understand legal work the way an experienced lawyer would — but at scale and with consistency.
Core Techniques
1. Multi-Pass Contextual Parsing
We don't analyze entries in isolation. Each entry is interpreted in context:
Matter Context
What type of matter is this? (M&A, litigation, regulatory, etc.)
What's the typical workflow for this matter type?
What phase is the matter currently in?
Temporal Context
What entries came before and after?
Is this part of a sequence of related tasks?
Where does this fall in the matter timeline?
Team Context
Who else is working on this matter?
What are they doing?
How do activities relate across team members?
Example:
Raw entry: "Draft motion"
Without context: Could be any kind of motion in any phase.
With context:
Matter type: Employment litigation
Recent entries: "Review deposition transcripts," "Research summary judgment standard"
Timeline: 8 months into matter
Conclusion: Summary judgment motion, litigation phase
The system makes multiple passes, refining its understanding each time. Initial classifications are validated against patterns, corrected where inconsistent, and enriched with inferred details.
2. Role-Aware Inference
Different roles perform different functions. A partner's "review" is different from an associate's "review." Narrative uses role information to understand the structure of work:
Partner
Review, strategy, client calls, negotiation
Decision points, supervision patterns
Senior Associate
Drafting, research, coordination
Core work products, matter progression
Junior Associate
Research, document review, support
Volume tasks, learning activities
Paralegal
Document management, filing, admin
Process steps, compliance tasks
When a partner entry says "Review closing checklist" and associate entries show "Prepare closing documents," the system infers:
The associate prepared the deliverable
The partner reviewed/approved it
This is the closing phase of a transaction
The work product is a closing checklist
This role-aware inference reconstructs the narrative of how work actually happened, even when individual entries are sparse.
3. Consistency Normalization
The same work gets described many ways:
"DD," "due dili," "diligence review," "review data room"
Due Diligence
"Call w/ opposing," "conf call defense counsel," "tel. w/ OC"
Opposing Counsel Communication
"Draft SPA," "work on purchase agreement," "revise acquisition docs"
Draft Purchase Agreement
Narrative builds a normalization layer that:
Maps synonyms and abbreviations to standard terms
Learns firm-specific terminology over time
Applies consistent phase/task coding across all matters
Flags and corrects obvious miscategorizations
The result: clean, uniform data regardless of who wrote the original entries or when.
4. Phase Reclassification
Time entries are often coded to the wrong phase — or not coded at all. Narrative infers the correct phase from the work described:
Before (raw phase codes):
Phase: General
- "Initial client meeting"
- "Review target financials"
- "Draft LOI"
- "Due diligence calls"
- "Negotiate SPA"
- "Closing call"After (Narrative reclassification):
Phase: Origination
- "Initial client meeting"
Phase: Due Diligence
- "Review target financials"
- "Due diligence calls"
Phase: Negotiation
- "Draft LOI"
- "Negotiate SPA"
Phase: Closing
- "Closing call"This reclassification happens automatically, using patterns learned from thousands of matters. The system identifies phase boundaries, assigns entries correctly, and produces accurate phase-level analytics.
The Data Enrichment Pipeline
Accuracy & Validation
How We Measure Accuracy
Phase classification accuracy: % of entries assigned to correct phase
Task categorization accuracy: % of entries mapped to correct task type
Timeline accuracy: Predicted matter duration vs. actual
Fee prediction accuracy: Estimated fees vs. actual closed fees
Target accuracy: ≥95% on phase classification for well-documented matters.
Validation Approach
Historical backtesting: Process closed matters, compare predictions to actuals
Human review sampling: Pricing team reviews sample of enriched profiles
Feedback loop: Corrections feed back into model improvement
Continuous monitoring: Track accuracy metrics over time
Handling Edge Cases
Not all matters are equally well-documented. Narrative handles this gracefully:
Sparse entries: Flagged with confidence scores; use related matters to infer patterns
Unusual matters: Identified as outliers; don't distort benchmarks
Incomplete matters: Partial profiles generated; updated as data becomes available
Mixed-quality data: Normalize what we can; flag what we can't
SALI Compatibility
Narrative is actively working towards full alignment with the SALI (Standards Advancement for the Legal Industry) taxonomy as a core part of our future product roadmap.
Our goal is to ensure:
Standardized classification of matter types, practice areas, and tasks
Interoperability with other systems using SALI standards
Future-proofing as the industry converges on common data standards
Easier integration with benchmarking data and external tools
We are building our data model to be adaptable, ensuring that as SALI standards evolve, your enriched data remains structured and useful across your technology ecosystem.
What We Don't Do
To be clear about scope:
We don't access privileged materials without guardrails: We process document content strictly for context building and entity extraction, respecting all privilege and security boundaries.
We don't replace human judgment: We provide data; partners make decisions
We don't require perfect input data: We improve messy data; we don't demand clean data
We don't have AI train on your data: Everything is ring-fenced and on a zero-data retention basis with the LLMs
Narrative is designed to work with the data firms actually have — not the data they wish they had.
Last updated