Data Transformation

Part 1 of 2 — How we turn messy time entries into structured matter profiles


Time entry narratives are notoriously difficult to parse:

"Review docs"
"Call w/ client re closing"
"Draft; revise per comments"
"Various"
"Work on matter"

These entries are meaningful to the person who wrote them — but useless for analysis. Traditional keyword search fails. Simple categorization rules miss context. Manual review doesn't scale.

Narrative solves this through a combination of techniques that understand legal work the way an experienced lawyer would — but at scale and with consistency.


Core Techniques

1. Multi-Pass Contextual Parsing

We don't analyze entries in isolation. Each entry is interpreted in context:

Matter Context

  • What type of matter is this? (M&A, litigation, regulatory, etc.)

  • What's the typical workflow for this matter type?

  • What phase is the matter currently in?

Temporal Context

  • What entries came before and after?

  • Is this part of a sequence of related tasks?

  • Where does this fall in the matter timeline?

Team Context

  • Who else is working on this matter?

  • What are they doing?

  • How do activities relate across team members?

Example:

Raw entry: "Draft motion"

Without context: Could be any kind of motion in any phase.

With context:

  • Matter type: Employment litigation

  • Recent entries: "Review deposition transcripts," "Research summary judgment standard"

  • Timeline: 8 months into matter

  • Conclusion: Summary judgment motion, litigation phase

The system makes multiple passes, refining its understanding each time. Initial classifications are validated against patterns, corrected where inconsistent, and enriched with inferred details.

2. Role-Aware Inference

Different roles perform different functions. A partner's "review" is different from an associate's "review." Narrative uses role information to understand the structure of work:

Role
Typical Activities
Inference Value

Partner

Review, strategy, client calls, negotiation

Decision points, supervision patterns

Senior Associate

Drafting, research, coordination

Core work products, matter progression

Junior Associate

Research, document review, support

Volume tasks, learning activities

Paralegal

Document management, filing, admin

Process steps, compliance tasks

When a partner entry says "Review closing checklist" and associate entries show "Prepare closing documents," the system infers:

  • The associate prepared the deliverable

  • The partner reviewed/approved it

  • This is the closing phase of a transaction

  • The work product is a closing checklist

This role-aware inference reconstructs the narrative of how work actually happened, even when individual entries are sparse.

3. Consistency Normalization

The same work gets described many ways:

Variations
Normalized Form

"DD," "due dili," "diligence review," "review data room"

Due Diligence

"Call w/ opposing," "conf call defense counsel," "tel. w/ OC"

Opposing Counsel Communication

"Draft SPA," "work on purchase agreement," "revise acquisition docs"

Draft Purchase Agreement

Narrative builds a normalization layer that:

  • Maps synonyms and abbreviations to standard terms

  • Learns firm-specific terminology over time

  • Applies consistent phase/task coding across all matters

  • Flags and corrects obvious miscategorizations

The result: clean, uniform data regardless of who wrote the original entries or when.

4. Phase Reclassification

Time entries are often coded to the wrong phase — or not coded at all. Narrative infers the correct phase from the work described:

Before (raw phase codes):

Phase: General
  - "Initial client meeting"
  - "Review target financials"
  - "Draft LOI"
  - "Due diligence calls"
  - "Negotiate SPA"
  - "Closing call"

After (Narrative reclassification):

Phase: Origination
  - "Initial client meeting"

Phase: Due Diligence
  - "Review target financials"
  - "Due diligence calls"

Phase: Negotiation
  - "Draft LOI"
  - "Negotiate SPA"

Phase: Closing
  - "Closing call"

This reclassification happens automatically, using patterns learned from thousands of matters. The system identifies phase boundaries, assigns entries correctly, and produces accurate phase-level analytics.


The Data Enrichment Pipeline


Accuracy & Validation

How We Measure Accuracy

  • Phase classification accuracy: % of entries assigned to correct phase

  • Task categorization accuracy: % of entries mapped to correct task type

  • Timeline accuracy: Predicted matter duration vs. actual

  • Fee prediction accuracy: Estimated fees vs. actual closed fees

Target accuracy: ≥95% on phase classification for well-documented matters.

Validation Approach

  1. Historical backtesting: Process closed matters, compare predictions to actuals

  2. Human review sampling: Pricing team reviews sample of enriched profiles

  3. Feedback loop: Corrections feed back into model improvement

  4. Continuous monitoring: Track accuracy metrics over time

Handling Edge Cases

Not all matters are equally well-documented. Narrative handles this gracefully:

  • Sparse entries: Flagged with confidence scores; use related matters to infer patterns

  • Unusual matters: Identified as outliers; don't distort benchmarks

  • Incomplete matters: Partial profiles generated; updated as data becomes available

  • Mixed-quality data: Normalize what we can; flag what we can't


SALI Compatibility

Narrative is actively working towards full alignment with the SALI (Standards Advancement for the Legal Industry) taxonomy as a core part of our future product roadmap.

Our goal is to ensure:

  • Standardized classification of matter types, practice areas, and tasks

  • Interoperability with other systems using SALI standards

  • Future-proofing as the industry converges on common data standards

  • Easier integration with benchmarking data and external tools

We are building our data model to be adaptable, ensuring that as SALI standards evolve, your enriched data remains structured and useful across your technology ecosystem.


What We Don't Do

To be clear about scope:

  • We don't access privileged materials without guardrails: We process document content strictly for context building and entity extraction, respecting all privilege and security boundaries.

  • We don't replace human judgment: We provide data; partners make decisions

  • We don't require perfect input data: We improve messy data; we don't demand clean data

  • We don't have AI train on your data: Everything is ring-fenced and on a zero-data retention basis with the LLMs

Narrative is designed to work with the data firms actually have — not the data they wish they had.

Last updated