Matter Matching & Pricing

Part 2 of 2 β€” How we find relevant precedents and generate data-backed RFP responses


From Enriched Data to Actionable Intelligence

Once historical matters are transformed into structured profiles (see Part 1), Narrative uses that data to power intelligent matter matching and automated pricing generation.


Step 1: RFP Ingestion & Requirement Extraction

When a new RFP arrives, Narrative doesn't just keyword-search β€” it understands the request.

What We Extract

Requirement Type
Examples
Why It Matters

Matter Type

M&A, patent litigation, arbitration

Determines which historical matters are relevant

Deal/Case Characteristics

Deal size, claim value, number of parties

Affects complexity and resource needs

Jurisdiction

England & Wales, Delaware, multi-jurisdictional

Impacts timeline, staffing, and costs

Industry/Sector

Technology, healthcare, financial services

Relevant experience demonstration

Timeline

"Complete by Q2", "expedited"

Affects staffing intensity

Special Requirements

Regulatory approvals, cross-border elements

Complexity drivers

Clarifying Questions

The system proactively identifies ambiguities and asks clarifying questions:

"The RFP mentions 'regulatory approval' β€” is this FDI screening, merger control, or sector-specific regulation?"

"Deal value isn't specified β€” what range should we assume for benchmarking purposes?"

This ensures the matter matching is precise, not just approximate.


Step 2: Semantic Matter Matching

Traditional search finds matters by keywords. Narrative finds matters by meaning and relevance.

Every enriched matter profile is converted into a high-dimensional vector (embedding) that captures its semantic meaning β€” not just keywords, but the underlying characteristics of the work.

What Makes a Match "Relevant"

The matching algorithm considers multiple dimensions:

Dimension
Weight
Rationale

Matter type

High

M&A matters should match M&A, not litigation

Complexity indicators

High

Deal size, party count, jurisdictions

Practice area

High

Corporate, disputes, regulatory, etc.

Industry/sector

Medium

Sector expertise often matters to clients

Jurisdiction

Medium

Local law impacts effort significantly

Timeline/duration

Medium

Fast-tracked vs. standard pacing

Outcome

Low

Success patterns, but less predictive

Beyond Simple Similarity

The system also identifies:

  • Near misses: Matters that are 80% similar but differ in one key dimension (useful for understanding variance)

  • Complexity outliers: Matters with unusual characteristics that drove costs up or down

  • Recency weighting: Recent matters may be more relevant for current market conditions


Step 3: Relevance Ranking & Curation

Automatic Baseline Selection

Narrative automatically surfaces and selects the most relevant matters β€” whether that's 3 or 15 β€” based on the clarified criteria.

The baseline set is presented with:

  • Match score (e.g., 92% relevance)

  • Key similarities (why this matter matched)

  • Key differences (how it differs from the RFP requirements)

Human-in-the-Loop Curation

Users can refine the selection:

  • Add matters the system didn't prioritize (e.g., "Include Project X β€” the client specifically mentioned it")

  • Remove matters that aren't relevant (e.g., "Exclude β€” that had unusual circumstances")

  • Sort by relevance, fees, recency, or outcome

  • Filter by specific attributes (e.g., "Only matters with successful outcomes")

This ensures the final comparison set reflects both data-driven relevance and human judgment.


Step 4: Pricing Generation

Once the precedent set is finalized, Narrative generates a comprehensive pricing proposal.

What Gets Generated

Output
Description

Total fee estimate

Recommended fee based on precedent averages and adjustments

Phase-by-phase breakdown

Hours and fees per phase (e.g., Due Diligence: 30%, Negotiation: 45%, Closing: 25%)

Staffing plan

Recommended role mix (partner %, associate %, paralegal %)

Timeline estimate

Expected duration based on comparable matters

Confidence range

Low/mid/high estimates based on precedent variance

The Pricing Rationale

Critically, every recommendation comes with justification:

"Recommended fee: Β£1.2M"

Based on 5 comparable matters with average fees of Β£1.15M. Adjusted +5% for:

  • Multi-jurisdictional complexity (UK + Germany)

  • Expedited timeline (4 months vs. typical 6)

Key precedent: Project Atlas (Β£1.4M) β€” similar deal size and jurisdictions, but included post-merger integration work not scoped here.

This data-backed rationale:

  • Gives pricing teams confidence in the recommendation

  • Provides defensible justification for client discussions

  • Highlights risk factors that might drive costs up or down


Step 5: Scenario Modeling

Before finalizing, users can test alternative approaches:

What-If Analysis

Scenario
Adjustment
Impact

More junior leverage

Increase associate ratio from 60% to 75%

-12% on fees

Fixed fee

Cap at Β£1.1M

Margin risk if complexity exceeds baseline

Expedited timeline

Compress from 6 months to 4

+15% on fees (overtime, parallel workstreams)

AI tool usage

Apply Harvey to due diligence

-20% on DD hours, +8% margin

Real-Time Impact Visualization

Adjustments immediately show:

  • Fee impact (total and by phase)

  • Margin impact (expected profitability)

  • Risk indicators (how the scenario compares to precedent range)


The Complete Workflow


Benchmarking & Evaluation

We rigorously evaluate our system to ensure consistent, accurate results across two critical dimensions: matter selection and resource allocation.

Evaluation Framework

Every RFP scenario is tested across 100+ runs to measure both accuracy and consistency. We evaluate:


Matter Selection Consistency

For each test RFP, we run the matter selection 100 times and measure how consistently the system selects the same precedent matters.

What We Measure

Metric
Description
Target

Selection rate

How often does a matter appear across 100 runs?

β‰₯95% for core matches

Rank stability

Does the matter appear at consistent positions?

Low variance

Similarity score

Semantic similarity between RFP and matter

Consistent across runs

Example Benchmark Output

For a cross-border tech M&A RFP, here's what a benchmark run looks like:

πŸ“Œ SELECTED MATTERS CONSISTENCY (100 runs):
────────────────────────────────────────────────────────────────

1. HarborPoint–NorthBeacon Strategic Alliance (IP Cross-License)
   βœ… Appeared in: 97/100 runs (97.0%)
   Average rank: 1.4
   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
   Rank distribution:
     Rank 1: 74.2%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
     Rank 2: 16.5%  β–ˆβ–ˆβ–ˆ
     Rank 3:  7.2%  β–ˆ
     Rank 4:  2.1%

2. Orion Analytics IP Divestiture (Cross-Border Sale)
   βœ… Appeared in: 98/100 runs (98.0%)
   Average rank: 2.6
   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
   Rank distribution:
     Rank 1: 17.3%  β–ˆβ–ˆβ–ˆ
     Rank 2: 39.8%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
     Rank 3: 11.2%  β–ˆβ–ˆ
     Rank 4: 24.5%  β–ˆβ–ˆβ–ˆβ–ˆ
     Rank 5:  7.1%  β–ˆ

3. Helios Data Systems SaaS IP Spin-Out (UK/US)
   ⚠️ Appeared in: 93/100 runs (93.0%)
   Average rank: 3.8
   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
   Rank distribution:
     Rank 2:  5.4%  β–ˆ
     Rank 3: 39.8%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
     Rank 4: 22.6%  β–ˆβ–ˆβ–ˆβ–ˆ
     Rank 5: 30.1%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

Acceptance Criteria

Threshold
Current
Target

Core matters (top 3-4)

β‰₯95% selection rate

99%

Extended set (top 5-6)

β‰₯80% selection rate

β‰₯90%

Rank variance

Low variance

Minimal variance

Jaccard similarity

β‰₯85% pairwise similarity

β‰₯95%

Our target: 99% selection consistency. We're actively working to minimize variance and ensure the system reliably surfaces the same high-quality precedent matters on every run.


Queried Matters Analysis

Beyond selected matters, we track the full set of matters queried (typically 40 from the database) to ensure the retrieval layer is stable:

πŸ“Œ QUERIED MATTERS CONSISTENCY:
────────────────────────────────────────────────────────────────
Average queried matters per run: 40.0

Top queried matters by appearance:
  1. Helios Data Systems (5d91...)    β€” 100/100 runs, avg rank: 1.92, avg similarity: 0.656
  2. Orion Analytics (8e42...)        β€” 100/100 runs, avg rank: 3.95, avg similarity: 0.639
  3. HarborPoint–NorthBeacon (69c6...)β€” 100/100 runs, avg rank: 4.64, avg similarity: 0.642
  4. SentinelWave Acquisition (67e0...)β€” 100/100 runs, avg rank: 2.79, avg similarity: 0.648

βœ“ Queried matters with >95% consistency: 33/50
βœ“ Average pairwise Jaccard similarity: 0.881

Similarity Distribution

We track the semantic similarity scores to ensure the matching is based on meaningful relevance:

πŸ“ˆ SIMILARITY DISTRIBUTION:
────────────────────────────────────────────────────────────────
Count: 4000 (40 matters Γ— 100 runs)
Mean similarity: 0.585
Std deviation: 0.048

Percentiles:
  p10: 0.520  |  p25: 0.551  |  p50: 0.587  |  p75: 0.620  |  p90: 0.646

Distribution:
  [0.4, 0.5)  β–ˆ 158
  [0.5, 0.6)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 2256
  [0.6, 0.7)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 1578
  [0.7, 0.8)  β–ˆ 8

Phase & Keyword Consistency

We also verify that the system consistently identifies the correct practice area, phase taxonomy, and key terms:

🎯 PHASE GROUP CONSISTENCY:
────────────────────────────────────────────────────────────────
βœ… "M&A" β€” 100/100 runs (100.0%)

πŸ”€ KEYWORD CONSISTENCY (top terms):
────────────────────────────────────────────────────────────────
  βœ… "minority"    β€” 100/100 runs (100.0%)
  βœ… "licensing"   β€” 100/100 runs (100.0%)
  βœ… "AI"          β€”  99/100 runs (99.0%)
  βœ… "acquisition" β€”  98/100 runs (98.0%)
  ⚠️ "cloud"       β€”  94/100 runs (94.0%)
  ⚠️ "antitrust"   β€”  59/100 runs (59.0%)

Resource Allocation Accuracy

Once matters are selected, we evaluate whether the system accurately predicts hours and fees by phase and role.

Metric
What We Measure
Current
Target

Budget variance

Variance in final budget prediction

5%

<3%

Total hours accuracy

Predicted vs. actual total hours

Β±10%

Β±5%

Total fee accuracy

Predicted vs. actual total fees

Β±10%

Β±5%

Phase-level accuracy

Predicted vs. actual hours per phase

Β±15%

Β±10%

Role-level accuracy

Predicted vs. actual hours per role

Β±15%

Β±10%

Current budget variance: 5%. We're actively working toward <3% variance to give pricing teams even greater confidence in generated estimates.

How we test:

  • Holdout validation: Train on historical matters, predict on held-out closed matters

  • Thousands of prediction runs: Ensure consistency and low variance

  • Stratified testing: Evaluate separately by matter type, complexity band, and jurisdiction


Overall Assessment

Each benchmark run produces a pass/fail assessment:

================================================================================
🎯 OVERALL ASSESSMENT:
────────────────────────────────────────────────────────────────────────────────

βœ… PASS: System consistency meets threshold
   βœ“ Core matters (3-4) with >95% selection rate
   βœ“ Queried matters with >95% consistency: 33
   βœ“ Pairwise Jaccard similarity: 0.881 (target: β‰₯0.85)
   βœ“ Phase group consistency: 100%

   Average run duration: 77.3s
   Total test duration: 11636.3s (100 runs)
================================================================================

Continuous Monitoring

Beyond initial benchmarking, we continuously monitor production accuracy:

Signal
Frequency
Action Threshold

Selection consistency

Per release

Alert if <95% for core matters

Prediction vs. actual

Weekly

Alert if accuracy drops below 90%

User overrides

Real-time

Flag patterns where users adjust recommendations

Feedback submissions

Real-time

Route to model improvement pipeline

Feedback Loop

When predictions diverge from actuals:

  1. Root cause analysis: Why did this matter cost more/less than predicted?

  2. Pattern identification: Is this a systematic issue (e.g., a certain phase always overruns)?

  3. Model refinement: Adjust weights and factors based on new data

  4. Re-benchmark: Validate improvements against full test suite before deployment

Every matter completed adds to the precedent database, making future predictions more accurate.


Summary: From RFP to Proposal in Minutes

Before Narrative
After Narrative

Hours searching for similar matters

Seconds to surface best matches

Guesswork on pricing

Data-backed recommendations

Generic phase breakdowns

Precedent-based phase estimates

No rationale for fees

Justified pricing with citations

Static quotes

Interactive scenario modeling

Days to prepare response

Minutes to generate draft

Last updated