A number is only useful if you can defend it. One of the more common conversations we have with managing directors at mid-market consulting firms goes like this: they've seen the engagement health score for a current project, it shows yellow or red, and their first instinct is to challenge the methodology rather than act on the signal. This is not irrational. They've spent years being sold dashboards that produce attention-grabbing metrics with opaque or questionable logic underneath.
So this piece is straightforward transparency: here is what the engagement health score measures, what goes into it, how the components are weighted, and — equally important — what we deliberately chose not to include, and why.
What We're Trying to Measure
The engagement health score answers one question: given the behavioral signals available in the current engagement's operational data, how does the trajectory of this engagement compare to the distribution of trajectories that historically led to strong outcomes versus degraded outcomes?
That framing has an important implication: the score is not an absolute quality rating. It's a relative position on a distribution derived from your own firm's engagement history. An engagement health score of 72 doesn't mean "this engagement is doing well" in any universal sense. It means "this engagement's current trajectory is consistent with the top two-thirds of engagements in your historical record at a similar project phase."
This firm-relative calibration is deliberate. Engagement dynamics vary substantially by firm type, client industry, and engagement scope. A score calculated against a universal baseline would be misleading for most firms. The signal is most useful when it reads your history back to you.
The Input Variables
Engagement health scoring draws from three input categories, weighted differently based on empirical analysis of their predictive relationship to outcome scores in historical data:
Execution pace signals (weighted ~45%) — derived from time and billing records. Specifically: actual versus planned utilization per workstream per week, deliverable completion rate against the project plan, and milestone slip accumulation rate. Utilization compression below 80% of plan for two or more consecutive weeks is a consistent early warning pattern. Deliverable slip accumulation above 15% of total project deliverables within the first quarter of the engagement is a strong predictor of outcome degradation.
Client engagement signals (weighted ~35%) — derived from CRM activity records and calendar data where available. Meeting cadence change (increase or decrease relative to the engagement's own established pattern), client-side stakeholder substitution (when the senior sponsor stops attending and delegates attendance), and response latency on deliverables and approval requests. These signals are noisier than utilization data — a client sponsor missing one meeting isn't a signal; missing three consecutive meetings with no substitute is.
Staffing fit indicators (weighted ~20%) — derived from the engagement staffing record combined with the consultant history. Specifically: vertical familiarity score for the lead consultant (how many prior engagements in this client's primary vertical), team pairing history (whether the lead-senior pairing has a prior engagement record), and organizational-level match between consultant experience history and client sponsor seniority. These are static at engagement start and don't change week-to-week, which is why they carry less weight in an active engagement score — but they set a baseline expectation that the other signals are measured against.
What the Score Intentionally Doesn't Measure
Client satisfaction surveys — the standard NPS or CSAT scores that many firms collect at project milestones — are not included in the engagement health signal. This is deliberate and worth explaining.
Survey-based satisfaction scores have two characteristics that make them unreliable as leading indicators. First, they're lagging: by the time a client completes a satisfaction survey, the experience being measured is already in the past. Second, they're compressed: clients at professionally-run consulting firms tend to avoid negative survey scores unless the situation has deteriorated to a level where they've already decided not to re-engage. The survey scores that exist in most CRMs have very limited variance at the top of the scale and don't distinguish between a good engagement and an excellent one.
The behavioral signals — utilization, meeting patterns, deliverable velocity — are leading indicators. They change before the client articulates dissatisfaction. That's the operational value. Survey scores belong in the outcome record, not the early warning system.
Calibration and Firm-Specific Adjustment
The default weights described above are starting points derived from pattern analysis across the firms whose data we've worked with. Individual firms often show a different predictive weighting based on their specific engagement mix and client base.
A firm heavily concentrated in financial services advisory work, for example, tends to show client engagement signals as a stronger predictor of outcome than the default weight suggests — because FS clients are operationally demanding about meeting cadence and deliverable timing in ways that firms with mixed client bases aren't. The weight recalibrates based on the firm's historical pattern.
This means the score you see in week three of a new engagement reflects your firm's specific history, not a generic consulting industry baseline. The recalibration runs as more historical data is ingested and the pattern engine refines its understanding of your engagement distribution.
A Note on Confidence Levels
Engagements in their first two to three weeks produce health scores with explicit low-confidence notation. This is because the behavioral signals haven't yet established a pattern — a deliverable that's two days late in week one may be normal for this engagement's scope, or it may be the first sign of pace problems. We don't compress this uncertainty into a false-precision score. The display shows the signal direction with a confidence band until sufficient behavioral data exists to support a reliable read.
We'd rather show uncertainty honestly than produce a number that feels authoritative and misleads a practice lead into either unwarranted concern or false reassurance. The score earns trust by being calibrated — acknowledging what it doesn't yet know, and showing confidence as the pattern becomes legible.
When a managing director challenges the methodology and we walk through the inputs and the historical correlation, the typical response is: "That's actually more defensible than I expected." That's the bar we hold ourselves to.