Cybersecurity Maturity Assessment: NIST CSF vs CMMI, and How to Pick the Right Scale Before the Board Asks for a Number
Alexander Sverdlov
Security Analyst

Key Takeaways
- NIST CSF and CMMI answer different questions. NIST CSF asks what you do (coverage across Govern, Identify, Protect, Detect, Respond, Recover) and rates implementation on a four-tier scale. CMMI asks how well you do it (process discipline) and rates capability on a five-level scale. A program can be broad and shallow (high CSF coverage, low CMMI capability) or narrow and deep, and conflating the two produces a score nobody can defend
- NIST CSF Tiers (Partial, Risk Informed, Repeatable, Adaptive) are not maturity levels and the framework explicitly says so. Treating Tier 4 Adaptive as the universal goal is the single most common assessment error. Tiers describe how risk decisions are made and integrated, not a ladder every organization must climb
- A defensible maturity assessment scores each control area on a documented 0-to-5 or 1-to-5 capability scale, anchors every score to evidence, and produces a current-state radar, a target-state radar, and a prioritized gap closure roadmap. A spreadsheet of self-assessed green-yellow-red is not a maturity assessment and will not survive a customer security review
- For most mid-market organizations the answer is a hybrid: use NIST CSF 2.0 as the control taxonomy (the what), and apply a CMMI-style 1-to-5 capability score to each function or category (the how well). This is what CMMC effectively did for the defense base, and it is the model most external assessors run
- Independent assessment matters because self-scored maturity inflates by roughly one full level on the categories that are hardest to evidence (governance, third-party risk, detection engineering, recovery testing). Customers, cyber insurers, and acquirers increasingly require third-party validation rather than a self-attestation spreadsheet
- A focused maturity assessment for a mid-market company runs USD 12,000 to USD 35,000 and ships in 3 to 6 weeks. The output is not a grade, it is a board-ready narrative plus a costed 12 to 24 month roadmap that turns "we are a 2.4" into a sequenced investment plan with owners and dates
The question arrived in a single line at the bottom of a board deck. The head of the audit committee had added a comment to slide 14: "Can we get one number for where our cybersecurity maturity sits today, and where it will be in twelve months?" The VP of Engineering at a 480-person SaaS company forwarded it to us with a note of his own: "I have a NIST CSF self-assessment that says we are mostly Tier 3. I have a SOC 2 Type 2 with no exceptions. And I have a customer who just asked us to confirm we are at CMMI Level 3. I genuinely do not know which of these answers the board's question, or whether they are even the same scale."
They are not the same scale. They are not even measuring the same property. The NIST CSF self-assessment was describing how the company makes and integrates risk decisions. The CMMI Level 3 request was describing how repeatable and well-defined the company's security processes are. The SOC 2 was describing whether a specific set of controls operated effectively over a window. Three credible artifacts, three different axes, and a board question that assumed they all collapse into one number. They do not, and the most useful thing an assessor can do in the first conversation is explain why.
We scoped a four-week maturity assessment. The taxonomy was NIST CSF 2.0, all six functions and their categories. The scoring scale was a documented 1-to-5 capability model adapted from CMMI. The evidence base was 22 interviews, a document review of 60-plus policies and runbooks, and configuration evidence pulled from their identity provider, cloud accounts, and endpoint platform. The deliverable was a current-state radar showing a weighted average of 2.6, a target-state radar at 3.4 set deliberately below the theoretical maximum, and a 18-month roadmap with eleven initiatives, each costed and assigned to an owner.
The board got its number. More importantly, it got the context that makes a number safe to act on: what the scale means, why the target is 3.4 and not 5, which three gaps move the score the most per dollar, and what the company should deliberately choose not to fix this year. Below is the full picture of the two models, how they differ, when each one is the right tool, how the assessment actually runs, and what the report has to contain.
Section 1
"Maturity" Is a Confused Word: Two Models, Two Different Axes
The reason maturity conversations go sideways is that the word carries two distinct meanings that nobody separates out loud. The first meaning is coverage and integration: do you have the full set of capabilities a modern security program needs, and are they wired into how the business makes decisions? The second meaning is process discipline: for the things you do, are they ad hoc and personality-dependent, or are they documented, repeatable, measured, and continuously improved? These are independent properties. You can have one without the other.
Picture two companies. Company A has every capability a framework asks for - vulnerability management, detection, incident response, third-party risk, identity governance - but each one lives in a single engineer's head, runs differently every time, and produces no metrics. Company B has only four capabilities, but each one is documented, owned, measured monthly, and improved on a schedule. On a coverage axis, A scores higher. On a process-discipline axis, B scores higher. Which one is "more mature" depends entirely on which axis the person asking has in mind, and they almost never say.
NIST CSF lives primarily on the coverage-and-integration axis. Its Tiers describe how thoroughly an organization has integrated cybersecurity risk management into its decision-making and how adaptive that integration is. CMMI lives on the process-discipline axis. Its capability levels describe how well-defined, managed, and optimized a given process is, independent of which processes you have chosen to run. The two are complementary, which is exactly why so many real assessments use the CSF taxonomy for the what and a CMMI-style scale for the how well.
Getting this distinction right at the start of an engagement is not pedantry. It determines what you measure, how you score, who the report is for, and what "improvement" even means. A roadmap that improves coverage (adding capabilities) looks completely different from a roadmap that improves discipline (hardening the capabilities you already have). Most mid-market organizations need some of both, in a deliberate ratio that the assessment is supposed to reveal.
Section 2
NIST CSF 2.0: Six Functions, Four Tiers, and the Tier Trap
The NIST Cybersecurity Framework, updated to version 2.0, organizes cybersecurity outcomes into six Functions. The headline change in 2.0 was the addition of Govern alongside the five functions that have been there since the start. The Functions are the top of a hierarchy: each contains Categories, and each Category contains Subcategories that state specific outcomes. The six Functions are the backbone of the framework and the natural taxonomy for any coverage assessment.
Govern covers the organizational context, risk management strategy, roles and responsibilities, policy, and oversight. It is the function that ties cybersecurity to enterprise risk and was elevated in 2.0 precisely because so many programs had strong technical controls and weak governance. Identify covers asset management, risk assessment, and understanding the business context. Protect covers access control, awareness, data security, platform security, and the protective technology that reduces the likelihood and impact of events. Detect covers continuous monitoring and the analysis that finds adverse events. Respond covers incident management, analysis, mitigation, and communication. Recover covers restoration of capabilities and the lessons-learned loop that feeds back into the rest.
Separately from the Functions, the framework defines four Implementation Tiers: Tier 1 Partial, Tier 2 Risk Informed, Tier 3 Repeatable, and Tier 4 Adaptive. The Tiers describe the degree to which an organization's cybersecurity risk management practices exhibit the characteristics defined across three dimensions: the risk management process, the integrated risk management program, and external participation (how the organization shares and consumes threat and supply-chain risk information). A Tier 1 organization handles risk reactively and in isolation. A Tier 4 organization adapts its practices based on lessons learned and predictive indicators and actively participates in a broader ecosystem of risk sharing.
The Tier trap: NIST states plainly that the Tiers do not represent maturity levels and that progression to higher Tiers is encouraged only when it would reduce cybersecurity risk and be cost-effective. Yet the most common assessment error we see is treating Tier 4 Adaptive as the goal every organization must reach. A 60-person company that makes sound, documented, repeatable risk decisions is appropriately at Tier 3, and pushing it toward Tier 4 often means spending on threat-intelligence integration and predictive tooling that delivers no risk reduction for its profile. The right Tier is the one that matches your risk, not the highest one on the page.
Because the Tiers are coarse - four points, and applied to the whole organization rather than to each control area - they are poor at showing where a program is strong and where it is weak. An organization that is excellent at Protect and weak at Recover gets averaged into a single Tier that hides the imbalance. This is the central limitation of using NIST CSF Tiers alone for a maturity assessment, and it is exactly the gap that a per-category capability score fills.
What NIST CSF does superbly is define the what. Its Subcategories are a comprehensive, widely recognized checklist of cybersecurity outcomes, mapped to other standards, understood by customers and auditors, and stable enough to anchor a multi-year program. This is why nearly every credible maturity assessment uses the CSF as its control taxonomy even when it borrows its scoring scale from somewhere else. The framework's own Profiles mechanism - a Current Profile versus a Target Profile - is essentially a gap assessment, and it pairs naturally with a numeric capability score per category.
Section 3
CMMI: Five Capability Levels and Where CMMC Borrowed Them
CMMI, the Capability Maturity Model Integration, grew out of software process improvement work and became the canonical way to describe how disciplined a process is. Its core idea is a five-level scale that applies to any process, security included. The levels describe a progression from chaos to continuous improvement, and the language has become the lingua franca of maturity scoring even for assessors who never formally certify against CMMI itself.
Level 1 Initial: the process exists but is ad hoc and unpredictable. Outcomes depend on individual heroics. It works when the right person is in the room and fails when they are not. Level 2 Managed: the process is planned and executed according to policy, with resources, responsibilities, and basic project-level discipline. It is repeatable within a team but may differ across the organization. Level 3 Defined: the process is documented as an organizational standard, tailored consistently from that standard, and understood the same way across teams. This is the level most organizations target for their core security processes. Level 4 Quantitatively Managed: the process is measured and controlled using metrics, with statistical understanding of performance and variation. Level 5 Optimizing: the process is continuously improved based on quantitative feedback and deliberate experimentation.
The strength of the CMMI scale is that it gives precise, defensible meaning to a number. "We are a 3 on incident response" has a definition: there is a documented organizational standard, it is followed consistently, and people are trained on it. "We are a 4" means you also measure mean time to detect and respond and manage those metrics against targets. This precision is what makes a CMMI-style score survive scrutiny. When a customer's security team challenges a maturity claim, an answer anchored to capability-level definitions and evidence holds up; a green dot on a heat map does not.
The defense industrial base lived through the most consequential application of this idea. CMMC, the Cybersecurity Maturity Model Certification, took a control set (largely from NIST SP 800-171) and wrapped it in a tiered maturity construct. The lesson the wider market took from CMMC is the one this whole article is built around: a control taxonomy answers the what, and a maturity scale answers the how well, and you need both. You can run the same pattern with any control set - NIST CSF, ISO 27001 Annex A, the CIS Controls - and a CMMI-style 1-to-5 capability score layered on top.
Section 4
Head to Head: When NIST CSF, When CMMI, When Both
The choice between the two models is not really a choice in most engagements, because they answer different questions and the best assessments use both. But the emphasis shifts depending on who is asking and why. The decision comes down to the audience, the driver, and what you intend to do with the result.
Reach for NIST CSF as the primary lens when the driver is coverage and communication: a board that wants to know whether the program is complete, a customer security questionnaire that references the framework directly, a regulator or insurer that expects CSF alignment, or a leadership team that needs a shared vocabulary across functions. CSF is recognized, mapped to nearly everything, and easy for non-specialists to follow. Its Functions translate cleanly into a one-slide story.
Reach for the CMMI scale as the primary lens when the driver is process improvement and you already know which capabilities you have: a security team that wants to harden how it runs the things it already does, an organization preparing for a regime that scores capability explicitly, or a leadership team that has heard "we have a tool for that" too many times and wants to know whether the tool is actually operated with any discipline. CMMI is unforgiving about the difference between owning a capability and running it well, which is exactly the question process-improvement work needs answered.
For most mid-market organizations, the right answer is the hybrid: NIST CSF 2.0 supplies the taxonomy, every category is scored on a documented 1-to-5 capability scale derived from CMMI, and the report shows both a coverage view (which functions are thin) and a discipline view (which capabilities are personality-dependent). This is the model that produces a board number that is also defensible to a customer's security reviewer, because the number has a named scale and per-category evidence behind it.
| Dimension | NIST CSF 2.0 | CMMI capability scale |
|---|---|---|
| Primary question | What do you do, and is it integrated? | How well do you do it? |
| Scale | 4 Tiers, whole-organization | 5 Levels, per-process |
| Best audience | Board, customers, insurers, regulators | Security team, process owners |
| Strength | Coverage, recognition, mappings | Precise, defensible per-area scores |
| Weakness alone | Tiers too coarse to show where you are weak | Says nothing about whether coverage is complete |
| In the hybrid | Supplies the control taxonomy | Supplies the per-category score |
Section 5
How a Maturity Assessment Actually Runs, Step by Step
A maturity assessment is not a questionnaire someone fills in over coffee. A defensible one is an evidence-gathering exercise that produces a score per control area, each score anchored to artifacts a skeptical reviewer could verify. The mechanics are consistent across frameworks, and understanding them is the difference between a report that drives investment and a spreadsheet that gets filed and forgotten.
Step one, scope and scale. Agree the taxonomy (for most engagements, the six CSF functions and their categories), the scoring scale (a documented 1-to-5 capability model with explicit anchor definitions for each level), and the boundary (which entities, which environments, which business units). Write down what a 3 means for each category before scoring anything, so the scores are reproducible rather than impressionistic.
Step two, evidence collection. Three streams run in parallel: interviews with the people who own each area, document review of policies, standards, and runbooks, and technical evidence pulled from the systems themselves (identity provider configuration, cloud security posture, endpoint coverage, logging and detection content, backup and recovery test records). The technical stream is what separates a real assessment from a self-reported one. A claim that "we have MFA everywhere" is scored against the identity provider's conditional access policies, not against the interviewee's recollection.
Step three, scoring against anchors. Each category gets a capability score justified in writing against the scale definitions and the evidence. A category sits at Level 2 if it is repeatable within a team but not documented as an organizational standard; it reaches Level 3 only when the documented standard exists, is followed consistently, and people are trained on it. The justification text matters as much as the number, because that is what makes the score auditable.
Step four, current versus target. The current-state radar is only half the deliverable. The target-state radar sets a deliberate goal per category, informed by the organization's risk profile, customer and regulatory expectations, and cost-effectiveness. Target is rarely a uniform 5; it is typically a 4 on the categories where the business risk justifies the investment and a 3 on the rest. Setting the target is a risk decision, made with leadership, not an assessor's unilateral call.
Step five, gap to roadmap. Every gap between current and target becomes a costed initiative with an owner, an effort estimate, a dependency map, and a sequence. The roadmap is ordered by risk reduction per dollar, not by category number. This is the step that converts a score into a plan, and it is the part of the deliverable that leadership actually uses.
Section 6
The Five Scoring Mistakes That Make a Maturity Report Worthless
Most maturity assessments that fail to drive any change fail for the same handful of reasons. None of them is exotic, and all of them are avoidable if the assessment is designed properly at the scoping stage.
6.1 Scoring without anchor definitions
If "what counts as a 3" is not written down before scoring, every score is an opinion and the assessment cannot be reproduced or defended. Anchor definitions per level, per category, are the foundation. Without them, two assessors score the same program a full level apart, and the board has no basis to trust the number.
6.2 Self-assessment inflation
People score what they intend, not what they evidence. Self-scored maturity runs roughly a full level high on the areas hardest to prove: governance, third-party risk, detection engineering, and recovery testing. The fix is evidence, not optimism. A score of 4 on incident response requires the metrics that define Level 4, pulled from the ticketing and detection systems, not a confident assertion in an interview.
6.3 Treating the maximum as the target
A target of 5 everywhere is almost always wrong. Level 5 means continuous, metric-driven optimization, which is expensive to operate and justified only where the risk warrants it. A target profile that sets 4 on the few categories that carry the most business risk and 3 on the rest is more honest and far more achievable. Aiming uniformly high produces a roadmap nobody funds.
Practitioner note: The most valuable line in a maturity report is often the one that says what not to do. "We recommend Recover stays at target 3, not 4, this cycle - the cost of quantitative recovery metrics is not justified by your current RTO commitments, and the budget is better spent moving Detect from 2 to 3." A report that only ever recommends spending more is easy to write and easy to ignore. A report that makes deliberate trade-offs gets acted on.
6.4 Averaging away the shape
A single overall score hides the imbalance that matters most. A program averaging 2.8 with every category near 2.8 is in a very different position from one averaging 2.8 with Protect at 4 and Recover at 1. The first needs broad uplift; the second has a specific, dangerous hole. Always lead with the radar shape and the lowest spokes, and treat the average as a headline, not the analysis.
6.5 No path from score to roadmap
A score with no costed, sequenced, owned remediation plan is trivia. The board cannot act on "you are a 2.4." It can act on "moving from 2.4 to 3.2 takes eleven initiatives over eighteen months, the first three cost roughly USD 90,000 combined and close the highest-risk gaps, and here are the owners and dates." The roadmap is the deliverable; the score is just its title.
Section 7
What a Defensible Maturity Deliverable Contains
The maturity report serves two audiences at once: the board and executives who need a clear narrative and a number they can act on, and the security and IT teams who have to execute the roadmap and will be challenged on the scores by customers and auditors. A deliverable that serves only one audience fails the other. The structure below serves both.
An executive summary, two pages, plain English. The headline score, the scale it sits on stated explicitly, the radar shape, the three findings that move the score the most, and the total cost and timeline of the recommended roadmap. A board member should be able to read this and explain the program's position to a peer without opening the appendix.
The scoring scale and methodology. The anchor definitions for each capability level, the taxonomy used, the evidence sources, and the boundary. This is what makes every downstream number defensible. When a customer's security reviewer asks how you arrived at a 3 on access control, this section is the answer.
The per-category scorecard with evidence. One row per CSF category: current score, target score, the written justification tied to evidence, and the key gaps. This is the working heart of the report and the part the security team lives in. A reviewer should be able to trace any score back to a specific artifact.
Current and target radars. The visual that makes the shape legible to non-specialists. Current state and agreed target on the same chart, with the gap visible at a glance.
The costed roadmap. The sequenced list of initiatives, each with the gap it closes, the score movement it produces, the owner, the effort, the dependencies, and the cost. Ordered by risk reduction per dollar. This is the artifact leadership uses to fund the program, and it is the difference between an assessment that changes the budget and one that decorates a shared drive.
What turned the board question into a decision: For the SaaS company in the opening, the line that landed was not the 2.6 current score. It was a single roadmap row: "Detect moves from 2.0 to 3.0 by standing up documented detection content and a monthly review, owned by the platform lead, roughly USD 28,000 over the first quarter, and it closes the gap a recent customer flagged in their security review." The board funded it in the same meeting, because the score had been turned into a specific, owned, costed decision rather than a grade.
Section 8
Delivery Models, Timeline, and Real Pricing
A maturity assessment is sized by the breadth of the taxonomy, the number of entities and environments in scope, and the depth of evidence required. The driver behind the engagement shapes the model, and there are four common ones.
Model 1: Board-driven baseline
Triggered by a board or executive request for a maturity number and a forward plan. NIST CSF taxonomy, hybrid CMMI-style scoring, current and target radars, and an 18 to 24 month roadmap. Timeline 3 to 5 weeks. The deliverable is built for the boardroom first and the security team second.
Model 2: Customer or sales-driven validation
Triggered by enterprise customers asking for an independent maturity attestation as part of vendor due diligence. The emphasis is on defensible per-category scores and an independent assessor signature that a customer security team will accept in place of a self-assessment. Timeline 3 to 4 weeks.
Model 3: Annual re-assessment and trend line
An annual cadence that re-scores against the same scale and shows the movement since the prior year. The trend line - current score rising toward target, the radar filling out - is the artifact the board and customers value most, because it demonstrates a program that is actually improving rather than one that buys a snapshot and shelves it.
Model 4: Targeted deep-dive
A focused assessment of one or two functions (commonly Detect and Respond, or Govern) at a higher depth, for organizations that already know where their gap is and want a precise capability score and remediation plan for that area rather than a full-program sweep. Timeline 2 to 3 weeks.
| Scope | Typical organization | Timeline | Indicative cost |
|---|---|---|---|
| Targeted deep-dive (1-2 functions) | Knows the gap, wants depth | 2-3 weeks | USD 8,000 - 14,000 |
| Full-program baseline (all 6 functions) | Mid-market, single environment | 3-5 weeks | USD 12,000 - 22,000 |
| Full-program, multi-entity | Multi-BU or multi-cloud | 5-7 weeks | USD 22,000 - 35,000 |
| Annual re-assessment | Existing baseline client | 2-4 weeks | 60-70% of baseline |
The cost of the assessment is small relative to what it redirects. A mid-market security budget is frequently spent on the categories that are already strong, because those are the ones the team is comfortable with, while the thin spokes - usually Detect, Respond, and Recover - stay thin. A maturity assessment that reorders the spend toward the gaps that carry the most risk pays for itself many times over in the first budget cycle, which is the real argument for running one before the next planning round rather than after.
FAQ
Six Questions We Get on Every Maturity Assessment Call
Is a NIST CSF Tier the same as a CMMI maturity level?
No. A NIST CSF Implementation Tier (1 through 4) describes how an organization manages and integrates cybersecurity risk across the whole organization, and NIST explicitly states the Tiers are not maturity levels. A CMMI capability level (1 through 5) describes how disciplined a specific process is. They sit on different axes and have different numbers of points. You can be Tier 3 on the CSF scale and still have individual processes scoring anywhere from Level 1 to Level 4 on the CMMI scale. Most assessments avoid the confusion by using the CSF as the control taxonomy and applying a separate, documented CMMI-style 1-to-5 capability score to each category.
Our board wants one maturity number. Is a single score even meaningful?
A single number is fine as a headline as long as it sits on a named, documented scale and is backed by a per-category breakdown. The danger is leading with the average and stopping there, because the average hides the shape. A 2.6 with every category near 2.6 means broad uplift is needed; a 2.6 with Protect at 4 and Recover at 1 means there is a specific dangerous hole. We give the board the single number it asked for and immediately follow it with the radar so the shape is visible. The number opens the conversation; the radar and roadmap are what the board actually decides on.
Should our target maturity be 5 across the board?
Almost never. Level 5 means continuous, metric-driven optimization of a process, which is expensive to operate and justified only where the business risk warrants it. A realistic target profile sets 4 on the few categories that carry the most risk and 3 on the rest. Setting the target is a risk and cost decision made with leadership, not an assessor picking the top of the scale. A roadmap that aims for 5 everywhere is a roadmap nobody funds, which means the assessment changes nothing.
We already have SOC 2 and ISO 27001. Do we still need a maturity assessment?
They answer different questions. SOC 2 and ISO 27001 are point-in-time or period attestations that a defined set of controls is in place and operating. A maturity assessment measures how well-developed and disciplined those controls and the broader program are on a graduated scale, and it produces a forward roadmap rather than a pass or fail. Many organizations with clean SOC 2 reports score 2 to 3 on a maturity scale, because the audit confirms the control exists while the maturity assessment reveals it is repeatable but not yet measured or optimized. The two are complementary: the attestation proves the control runs, the maturity assessment shows how to make it run better.
Can we just self-assess with a spreadsheet instead of paying for an assessment?
You can, and it is a reasonable way to get a rough internal picture. The limitations are two. First, self-scored maturity inflates by roughly a full level on the hardest-to-evidence areas because people score their intentions, not their evidence. Second, customers, insurers, and acquirers increasingly want independent validation, and a self-assessment spreadsheet does not satisfy a due-diligence request the way a signed third-party assessment does. If the audience is purely internal and you have an honest, evidence-disciplined team, a self-assessment is a fine starting point. If the audience is external, or if the number will drive significant budget, independent assessment is worth the cost.
How often should we re-run the assessment?
Annually is the right cadence for most organizations, aligned to the budget planning cycle so the roadmap feeds directly into the next year's spend. Re-scoring against the same scale produces the trend line - the movement from current toward target - which is the single most persuasive artifact for a board or a customer, because it shows a program that is genuinely improving rather than one that bought a one-time snapshot. Organizations going through rapid change (a major acquisition, a shift to a new cloud platform, a significant headcount jump) benefit from an interim re-assessment of the affected functions rather than waiting the full year.

Alexander Sverdlov
Founder of Atlant Security. Author of 2 information security books, cybersecurity speaker at the largest cybersecurity conferences in Asia and a United Nations conference panelist. Former Microsoft security consulting team member, external cybersecurity consultant at the Emirates Nuclear Energy Corporation.