Riley v. Commissioner, SSA: Enforcing a Meaningful “Supportability” Explanation for Medical Opinions in Social Security Disability Cases
I. Introduction
The Tenth Circuit’s order and judgment in Riley v. Commissioner, SSA, No. 25‑5007 (10th Cir. Nov. 25, 2025), though formally designated as non‑precedential, delivers a pointed clarification of what the Social Security Administration’s (SSA’s) “supportability” requirement actually demands of Administrative Law Judges (ALJs) when they rely on medical opinions—especially psychological opinions based on standardized testing—to craft a residual functional capacity (RFC).
The case involves Plaintiff–Appellant Colton James Riley, a young man who suffered a severe traumatic brain injury (TBI) at age 21 in a 2018 car accident. His third application for disability benefits turned on whether he was disabled on or before March 31, 2020. An ALJ acknowledged multiple serious mental impairments, including major depressive disorder, PTSD, anxiety disorder, and an unspecified neurocognitive disorder, all associated with the TBI. Nonetheless, relying heavily on the report of consultative psychologist Dr. Rebecca Fisher, the ALJ concluded that Riley could perform simple work and therefore was not disabled.
On appeal, the Tenth Circuit did not question the ALJ’s general authority to rely on Dr. Fisher’s opinion. Instead, it focused on a narrower but critical legal issue: whether the ALJ adequately explained why Dr. Fisher’s opinion was “supported” by her own examination findings, as required by 20 C.F.R. § 404.1520c. The panel held that the ALJ’s explanation was legally insufficient, because it failed to engage with highly salient test results—Wechsler Memory Scale scores in the first and second percentiles—that appeared in tension with Dr. Fisher’s relatively mild characterization of some of Riley’s functional limitations.
The decision reinforces the principle that courts must be able to “follow the adjudicator’s reasoning” in weighing medical opinions, and that simply saying an opinion is “supported by” examination findings is not enough where those findings are detailed and seemingly inconsistent with the ALJ’s (or the examiner’s) conclusion. This has direct implications for how ALJs articulate their reasoning under the SSA’s current medical‑opinion framework and for how claimants and advocates frame challenges to RFC determinations in mental‑impairment cases.
II. Summary of the Opinion
A. Factual and Procedural Background
In May 2018, Riley sustained a severe TBI in a motor vehicle accident at age 21. He has since filed at least three claims for Social Security disability benefits. The appeal concerns his third application, which, for reasons not detailed by the panel, raised the question of whether he was disabled on or before March 31, 2020 (a cutoff date analogous to, or functioning in the role of, a “date last insured” for purposes of eligibility).
The ALJ found that as of that date, Riley suffered from several severe impairments:
- Major depressive disorder
- Post-traumatic stress disorder (PTSD)
- Anxiety disorder
- Unspecified neurocognitive disorder
- Status post-traumatic brain injury in 2018 with residuals of shortness of breath with exertion and dizziness
The ALJ then assessed Riley’s RFC and concluded that he could:
“understand, remember, and carry out only simple instructions on a sustained basis in a work-related setting.”
Crucially, this RFC rested on the opinion of Dr. Rebecca Fisher, a psychologist who examined Riley in December 2018. Dr. Fisher opined that Riley had:
- A moderate limitation in his ability to understand and remember simple instructions
- A marked limitation in his ability to understand and remember detailed instructions
- A marked limitation in his ability to sustain concentration, persistence, and pace for detailed tasks
Based on this RFC, the ALJ concluded Riley could perform several simple, unskilled jobs existing in significant numbers in the national economy, including:
- Laundry sorter
- Hotel housekeeper
- Inspector and hand packager
The ALJ therefore found Riley not disabled as of March 31, 2020. The Appeals Council declined review, and the district court (N.D. Okla.) affirmed the denial of benefits. Riley appealed to the Tenth Circuit.
B. Issues on Appeal
The appeal centered on two alleged errors:
- Whether the ALJ satisfied the regulatory obligation under 20 C.F.R. § 404.1520c(b)(2) to explain how Dr. Fisher’s opinion was “supported” by her own examination findings.
- Whether the ALJ erred by not discussing a June 2022 report from examiner Kim Beair, MS, LPC, who performed additional memory testing and documented further deficits.
The Tenth Circuit resolved the first issue in Riley’s favor and determined that this error required remand. Because that was sufficient to overturn the agency decision, the panel declined to resolve the second issue but noted that Riley may raise it on remand.
C. Holding
Applying the familiar standard of review—that the court must determine whether the Commissioner’s factual findings are supported by substantial evidence and whether the correct legal standards were applied—the panel held:
- The ALJ failed to properly apply the legal standards governing evaluation of medical opinions, specifically the “supportability” requirement of 20 C.F.R. § 404.1520c.
- The ALJ’s conclusory statement that Dr. Fisher’s opinion was supported by “deficits in memory, [and] attention/concentration” did not suffice, given the detailed and strikingly poor test results in Dr. Fisher’s own report.
- Because a reviewing court could not “follow the adjudicator’s reasoning” from the test results to the conclusion of only moderate functional limitations regarding simple instructions, this amounted to reversible legal error.
The court therefore:
- Reversed the district court’s judgment.
- Remanded with instructions to vacate the agency’s decision.
- Directed the district court to remand the matter to the SSA for further proceedings consistent with its order and judgment.
Although the panel designated the order as non‑binding precedent (except under law of the case, res judicata, and collateral estoppel), it may be cited for its persuasive value under Fed. R. App. P. 32.1 and 10th Cir. R. 32.1.
III. Analysis
A. Precedents and Authorities Cited
1. Barnett v. Apfel (Standard of Review)
The court began by reciting the conventional Social Security standard of review from Barnett v. Apfel, 231 F.3d 687, 689 (10th Cir. 2000):
“We review the Commissioner’s decision to determine whether the factual findings are supported by substantial evidence and whether correct legal standards were applied.”
This dual inquiry—substantial evidence plus correct legal standards—is critical. Even when evidence in the record might be sufficient to uphold a decision, applying the wrong legal framework or ignoring mandatory rules in weighing evidence constitutes reversible error. In Riley, the court found error not in the sufficiency of the evidence, but in the failure to follow required legal standards in evaluating a medical opinion.
2. Reyes v. Bowen (Failure to Follow Rules in Weighing Evidence)
The panel invoked Reyes v. Bowen, 845 F.2d 242, 244 (10th Cir. 1988), for a key principle:
“There are specific rules of law that must be followed in weighing particular types of evidence in disability cases. Failure to follow these rules constitutes reversible error.”
This framed the case as one involving an error of law in how the ALJ weighed medical opinion evidence, not merely a disagreement about how to interpret the facts. Under Reyes, when SSA regulations or circuit precedent set out specific requirements for describing or justifying the treatment of certain evidence, violation of those requirements mandates reversal and remand.
3. Keyes‑Zachary v. Astrue (Requirement of Discernible Reasoning)
The Tenth Circuit also relied on Keyes-Zachary v. Astrue, 695 F.3d 1156 (10th Cir. 2012), to articulate the balance between demanding detailed explanations from ALJs and avoiding hypertechnical review. In Keyes-Zachary, the court observed:
“The more comprehensive the ALJ’s explanation, the easier our task; but we cannot insist on technical perfection. Still, we must be able to follow the adjudicator’s reasoning.”
Riley applies this principle in the context of the newer SSA regulations on medical opinions. The court acknowledged that ALJs need not provide exhaustive discussion in every case. However, when the record contains sophisticated, quantitative test results—such as Wechsler Memory Scale scores in the 1st and 2nd percentiles—that appear inconsistent with the severity ratings used (“moderate” vs. “marked”), a generic statement that an opinion is “supported by” examination findings does not allow the court to follow the reasoning.
4. 20 C.F.R. § 404.1520c (Supportability and Consistency)
The crucial regulatory framework is 20 C.F.R. § 404.1520c, which governs how ALJs evaluate medical opinions for claims filed on or after March 27, 2017, under the SSA’s “persuasiveness” regime (which replaced the old “treating physician rule”).
Two factors are “most important” in assessing medical opinions:
- Supportability – 20 C.F.R. § 404.1520c(c)(1)
- Consistency – 20 C.F.R. § 404.1520c(c)(2)
The regulation defines “supportability” as:
“the objective medical evidence and supporting explanations presented by a medical source . . . to support his or her medical opinion(s).”
As the panel paraphrased it, this effectively asks:
“Are the examiner’s opinions well explained, and are they supported by the medical evidence he or she relied upon?”
“Consistency,” by contrast, considers how the medical opinion lines up with the rest of the evidence (other medical opinions, clinical records, imaging studies, etc.). The panel noted that consistency was not at issue in this appeal; the focus was solely on supportability—whether Dr. Fisher’s narrative opinion was adequately grounded in her own testing and examination.
Section 404.1520c(b)(2) requires ALJs to explain in their decisions how they considered these factors (at least supportability and consistency) when determining how persuasive they find each medical opinion. Riley turns on the adequacy of that explanation.
B. The Court’s Legal Reasoning
1. The ALJ’s Supportability Finding and Its Defect
The ALJ characterized Dr. Fisher’s opinion as persuasive and addressed supportability as follows:
“Her opinion is supported by her consultative examination findings of deficits in memory, [and] attention / concentration . . . .”
On its face, this might appear to satisfy the regulation: the ALJ linked the opinion to specific domains of impairment (memory, attention, concentration) documented in the exam. However, the Tenth Circuit found this explanation insufficient under the circumstances of the case.
The problem lay in the nature and severity of the underlying test results and the tension between those results and Dr. Fisher’s characterization of Riley’s limitations. During her consultative examination, Dr. Fisher administered the Wechsler Memory Scale‑IV, which includes:
- Immediate Memory – recall of verbal and visual information immediately after presentation.
- Delayed Memory – recall of verbal and visual information after a 20–30 minute delay.
On these measures, Riley scored:
- Immediate Memory: 69 – “borderline range,” 2nd percentile.
- Delayed Memory: 70 – “borderline range,” 2nd percentile.
- Auditory Memory Overall: in the 1st percentile.
These scores indicate that Riley performed worse than approximately 98–99% of the normative population. Against this backdrop, Dr. Fisher nonetheless described only a moderate limitation in Riley’s ability to understand and remember simple instructions, while finding marked limitations for understanding/remembering detailed instructions and for sustaining concentration, persistence, and pace for detailed tasks.
The panel highlighted Riley’s argument that this combination of test scores and functional ratings raises a natural question:
“[H]ow being in the first or second percentile for memory tasks can yield an opinion that he has only ‘moderate limitation in his ability to understand and remember simple instructions,’ especially given Dr. Fisher’s further opinion that he has ‘marked limitation in his ability to understand and remember detailed instructions.’”
As the court observed, numerically, his immediate and delayed memory could only be “one percentile lower” than they already were, and his auditory memory was at the bottom (1st percentile). Riley’s quip—“one wonders what kind of score [Dr. Fisher] would have needed for the individual to [have] ‘marked’ [limitations] in remembering simple instructions”—underscored the conceptual tension.
The Tenth Circuit did not hold that Dr. Fisher’s opinion was necessarily wrong or that the ALJ had to reject it. Rather, the court emphasized that, in light of these striking test results:
- There was an apparent inconsistency between the raw data and the moderate/marked distinction drawn by Dr. Fisher.
- The ALJ’s generic statement that the opinion was “supported” by memory and concentration deficits failed to address that inconsistency in a way that allowed meaningful judicial review.
In other words, the ALJ’s analysis did not grapple with how the specific low percentile scores translated into only moderate limitations for simple instructions. Without further explanation, a reviewing court could not tell whether the ALJ understood those scores, discounted them, or simply failed to appreciate their implications for functional capacity.
2. Applying the Keyes‑Zachary Standard: Beyond Boilerplate
The Tenth Circuit’s reliance on Keyes‑Zachary is important. The court reiterated that it does not demand perfect, exhaustive explanations in every case:
“Certainly, ‘[t]he more comprehensive the ALJ’s explanation, the easier our task; but we cannot insist on technical perfection.’ Still, we must be able to ‘follow the adjudicator’s reasoning.’”
On one level, the ALJ’s explanation in Riley used language commonly seen in Social Security decisions: a short, conclusory statement that an opinion is supported by examination findings in certain domains (e.g., memory, concentration). In many cases, that might suffice because the underlying findings are straightforward or largely self‑explanatory.
However, where:
- The opinion hinges on a single medical source (here, Dr. Fisher) to justify a crucial RFC finding, and
- That source’s testing reveals extremely poor performance on standardized measures (1st–2nd percentile), yet
- The narrative opinion classifies some resulting functional limitations as merely moderate, and
- The ALJ simply asserts that the opinion is “supported” by those same findings without elaboration,
the court cannot reasonably be expected to “follow the adjudicator’s reasoning” without more. That is the precise line Riley draws: when the internal logic from test data to limitation rating is not facially obvious, the ALJ must say something substantive about that link.
The Tenth Circuit stressed that it was not suggesting that the ALJ could never find Dr. Fisher’s opinion supportable. Rather, the problem was that:
“the ALJ’s supportability analysis merely pointed to the alleged congruence with the ‘consultative examination findings,’ without discussing what those consultative examination findings actually were.”
Given the stark nature of the memory scores, that omission meant the ALJ did not “fulfill his duty to explain the supportability of the key medical opinion on which he based the RFC.”
3. The “Supportability” Requirement as a Legal Standard
From a doctrinal standpoint, Riley reinforces that “supportability” is not a perfunctory box to be checked. Rather, it is a substantive legal requirement that:
- Requires the ALJ to consider both the objective medical evidence and the supporting explanations that a medical source provides for an opinion.
- Requires enough articulation in the written decision that a reviewing court can understand the logical bridge between the evidence and the ALJ’s persuasiveness finding.
Where the underlying testing is numerical and technically complex—such as psychometric memory scales—the ALJ is not expected to become a neuropsychologist. But the ALJ must at least:
- Recognize the severity indicated by such scores (e.g., “1st percentile” implying exceptionally poor performance).
- Explain why, despite such low scores, the functional limitation on a particular domain (e.g., remembering simple instructions) is interpreted as only moderate rather than marked, or vice versa.
- Indicate whether and why the ALJ adopts or discounts the examiner’s severity ratings in light of the data.
Failure to supply that explanation is a failure to apply the “supportability” standard properly—and by virtue of Reyes, that is reversible legal error.
4. The Unresolved Issue: Later Evidence (Beair Report)
Riley also challenged the ALJ’s failure to discuss a June 2022 report by Kim Beair, MS, LPC. Beair performed additional memory testing and reportedly identified memory deficits “arguably more severe” than Dr. Fisher’s earlier findings. The government argued that this later evidence was irrelevant because it post‑dated the March 31, 2020 disability cutoff. Riley argued it was relevant because it assessed long‑term effects of the same 2018 TBI.
The Tenth Circuit acknowledged this dispute but declined to resolve it, explaining:
“Because the ALJ never explained why he chose not to discuss the Beair opinion, and because the ALJ will need to take another look at Riley’s case regardless, we do not resolve this issue. Riley may raise it before the agency, or not, as he sees fit.”
While not a holding, this passage suggests that:
- On remand, the ALJ will likely need to articulate a rationale for either considering or discounting the Beair report, especially if Riley presses the issue.
- Post‑cutoff evidence that reflects the enduring effects of a pre‑cutoff condition may be potentially relevant if it illuminates the claimant’s functioning during the insured period, a principle widely recognized in Social Security law.
The court thus signaled that, at minimum, an ALJ cannot silently ignore such evidence where it is arguably probative of the earlier period.
C. Impact and Implications
1. For ALJs and the SSA
Even as a non‑precedential order, Riley sends an important message to ALJs within the Tenth Circuit (and will be persuasive elsewhere):
- Boilerplate is not enough in complex cases. In mental‑impairment cases involving standardized psychological testing, ALJs must do more than broadly state that an opinion is supported by its exam findings. They must address the key data points that logically bear on functional limitations.
- Explain the moderate/marked distinction. When a medical source (or the ALJ) distinguishes between “moderate” and “marked” limitations in the face of extremely poor test scores, the decision should give at least a brief explanation of why one domain (e.g., simple instructions) is rated less severely than another (e.g., detailed instructions).
- Build a record that can be reviewed. ALJs should ensure that their written decisions contain a clear reasoning chain: from test scores and clinical observations to the examiner’s opinion, to the ALJ’s persuasiveness finding, and then to the RFC. Without this, courts may remand even where underlying evidence might support the agency’s ultimate conclusion.
Practically, Riley may encourage the SSA to refine its templates and training materials to emphasize more robust articulation of “supportability,” particularly for consultative examinations that rely heavily on psychometric data.
2. For Claimants and Advocates
For claimants and their representatives, Riley provides several strategic tools:
-
Leverage standardized test results. When a consultative examiner’s test scores are extremely low but the examiner (or ALJ) still labels limitations as merely moderate, claimants can invoke Riley to argue that the ALJ must either:
- Explain how the test results support that classification; or
- Reevaluate the severity of limitations in light of the scores.
- Challenge “supportability” articulations. If the ALJ’s discussion of a critical medical opinion simply parrots that it is “supported” by the examiner’s findings, without engaging with crucial data, Riley supports the argument that the explanation does not meet the requirements of § 404.1520c(b)(2).
- Highlight inability to follow the reasoning. Advocates can frame legal error not merely as disagreement over weight of evidence but as a failure to construct an RFC that is logically traceable from the evidence, relying on Keyes‑Zachary as applied in Riley.
3. For the Law of Medical Opinions under § 404.1520c
More broadly, Riley contributes to an emerging body of case law interpreting the SSA’s post‑2017 medical‑opinion framework. It underscores that:
- The end of the “treating physician rule” did not reduce judicial scrutiny of how ALJs evaluate and articulate their treatment of medical opinions.
- “Supportability” is a meaningful, reviewable concept: ALJs must engage with the logic and evidence underlying each medical opinion they rely upon or reject.
- The “substantial evidence” standard does not shield decisions where the legal framework is misapplied or key reasoning steps are omitted.
Although this particular order is non‑precedential, its reasoning will likely be used in future cases:
- To argue that ALJs must specifically address unusual or extreme test results that bear directly on functional limitations.
- To require at least brief narrative justifications for accepting an examiner’s moderate/marked distinction when the data appear to point more strongly in one direction.
IV. Clarifying Key Legal and Technical Concepts
A. Residual Functional Capacity (RFC)
RFC is a central Social Security concept. It represents the most a claimant can still do on a sustained basis despite physical and mental limitations. Here, the ALJ decided Riley could:
“understand, remember, and carry out only simple instructions on a sustained basis in a work-related setting.”
That RFC, combined with vocational evidence, led to a finding that Riley could perform simple, unskilled jobs in the national economy, which precluded a disability finding under the SSA’s five‑step sequential evaluation.
B. “Moderate” vs. “Marked” Limitations
ALJs and medical sources often classify mental limitations using terms like:
- Mild – slight limitations, generally compatible with most work.
- Moderate – noticeable limitations but not completely preclusive of work in that domain.
- Marked – very serious limitations that significantly interfere with the ability to perform work‑related functions.
In Riley, this moderate/marked distinction became critical because:
- “Moderate” limitations in understanding and remembering simple instructions are often thought to be compatible with simple, unskilled work.
- “Marked” limitations can be work‑preclusive, especially when they affect basic abilities necessary for any competitive employment (memory, concentration, persistence, pace).
That is why the panel scrutinized how Dr. Fisher drew the line between moderate and marked limitations in light of the very low memory scores.
C. “Supportability” vs. “Consistency” (20 C.F.R. § 404.1520c)
The Tenth Circuit helpfully summarized these two core factors:
-
Supportability – looks at the strength of the opinion based on the examiner’s own work:
- What tests did they administer?
- What observations did they make?
- How well did they explain the connection between those findings and their conclusions?
-
Consistency – compares the opinion to the rest of the record:
- Do other providers’ records line up or conflict?
- Are imaging studies, lab results, and treatment notes congruent with the opinion?
In this appeal, the court focused solely on supportability. It was not enough for the ALJ to assert that Dr. Fisher’s opinion was supported; he had to demonstrate, through discussion of her test results and narrative, that he understood and accepted the basis for her functional ratings—or explain why he interpreted the data differently.
D. Wechsler Memory Scale Percentiles
The Wechsler Memory Scale (WMS‑IV) is a standardized psychological test. Scores are often converted into percentile ranks:
- 2nd percentile – the individual performed as well as or better than only 2 out of 100 people in the normative sample.
- 1st percentile – essentially at the very bottom of the distribution; 99 out of 100 people perform better.
Scores in the 1st–2nd percentiles typically indicate very severe impairment. Thus, when a psychologist finds such scores but rates related functional limitations as only “moderate,” it naturally invites scrutiny and requires explanation. Riley leverages this intuitive mismatch as the basis for demanding a clearer articulation of supportability.
V. Conclusion
Riley v. Commissioner, SSA stands as a significant—even if formally non‑precedential—clarification of what the “supportability” requirement under 20 C.F.R. § 404.1520c demands in practice. The Tenth Circuit reversed and remanded because the ALJ:
- Relied heavily on a single psychological consultative opinion to support a limiting but work‑compatible RFC;
- Failed to engage with the striking results of that examiner’s own testing (WMS‑IV scores at or near the bottom of the distribution); and
- Offered only a conclusory statement that the opinion was supported by deficits in memory and concentration, without explaining how those deficits translated into the specific moderate/marked limitations assigned.
By insisting that the court must be able to “follow the adjudicator’s reasoning,” the panel reaffirmed that:
- ALJs must do more than recite that an opinion is “supported” by exam findings; they must connect the dots between data and conclusions.
- Where standardized test scores and narrative opinions appear in tension, the ALJ’s duty of explanation is heightened.
- Failure to comply with the supportability framework is a legal error requiring remand, even if the underlying evidence might support the agency’s conclusion.
For the broader Social Security landscape, Riley underscores that the new medical‑opinion regulations did not diminish the need for reasoned, reviewable decisionmaking. Instead, they restructured the factors but preserved—and arguably sharpened—the judiciary’s role in ensuring that ALJs meaningfully apply those factors. In mental‑impairment cases involving neurocognitive testing, this decision will likely be cited frequently as persuasive authority to demand careful, transparent articulation of how raw scores translate into functional capacities and, ultimately, into disability determinations.
Comments