Technical Guide
Security Architecture Review Guide for SaaS Platforms
A practical guide for CTOs and engineering leaders building or scaling SaaS platforms
Updated June 2026
Most SaaS platforms were not designed. They evolved. A monolith got split into services. Auth was bolted on at the gateway. Internal APIs were built fast and secured later, or never. Each decision made sense at the time. But the security posture of the platform is the sum of those decisions, and nobody has looked at the whole picture since the early days.
A security architecture review is a structured analysis of how your system is designed, how its parts interact, and where the real weaknesses live. Not theoretical ones. The specific weaknesses that exist because of how your services trust each other, how your data flows, and which assumptions stopped being true two years ago.
This guide covers what the process looks like, what we usually find, and how to tell whether your platform would benefit from one.
Important distinction
What a security architecture review is not
Before going further, it helps to be explicit about what this is not:
- Not a penetration test. A pentest finds individual vulnerabilities by attacking a running system. An architecture review finds classes of vulnerabilities by analyzing how the system is designed. We have reviewed platforms that passed two consecutive pentests with minor findings, and still had unauthenticated internal APIs that any compromised service could call freely.
- Not a compliance audit. Compliance frameworks check whether controls exist on paper. An architecture review checks whether those controls actually work given the system's real trust boundaries. SOC 2 will verify that you have an access control policy. It will not tell you that your billing service bypasses it entirely because it predates the policy engine.
- Not a code review. Code reviews examine implementation details. An architecture review examines the structural decisions that determine whether a correct implementation is even possible. If your authorization model is broken at the design level, no amount of code-level fixes will help.
Architecture-level issues survive multiple pentests because they are not bugs. They are design decisions. Fixing them requires changing how services interact, not patching individual endpoints. That is why pentests and architecture reviews are complementary: one finds what is broken, the other finds what cannot work.
Why it matters
The most dangerous security issues are not bugs
Most security investment goes into late-lifecycle activities: penetration testing, vulnerability scanning, dependency checks, compliance assessments. These catch real problems. But the issues that cause the worst incidents are rarely code-level bugs found by scanners.
They are structural:
- A billing service that can query any tenant's data because tenant isolation was implemented at the API gateway but not at the service layer, so internal callers skip it entirely
- An integration service that stores third-party OAuth tokens with platform-wide read access because scoping them per-tenant would have required a schema migration nobody prioritized
- A webhook endpoint that accepts unsigned payloads from a payment provider, trusting source IP filtering that stopped working when the provider migrated to new infrastructure
- Service-to-service calls authenticated by a shared API key that is the same across all environments, including local development
None of these are hypothetical. Each one is a pattern we have seen in real platforms, and each one was invisible to the pentests and compliance audits that preceded our review. The common thread: they are the result of reasonable decisions that were never revisited as the system grew.
Before review
What we typically see in a growing SaaS platform
The diagram below shows a simplified SaaS platform with four layers: external clients, API gateway, internal services, and data stores. This is a composite based on patterns we see repeatedly. Not a single platform, but a representative architecture.
The pattern is always the same: authentication enforced at the edge, but once a request passes the gateway, every downstream service implicitly trusts it. The billing service calls the order service directly. The integration service talks to the cache with no identity verification.
If an attacker compromises any single internal service (through SSRF, a dependency vulnerability, or a misconfigured debug endpoint), they can move laterally to every other service with no additional barriers.
After review
The same architecture with explicit trust
After a review, the architecture has not changed dramatically. The same services exist, the same data flows. What changes is that trust is explicit instead of implicit.
Every service now authenticates with mTLS and a verified identity, proving who it is on every call instead of only at the edge. But identity is only half of it: proving who you are is not the same as being allowed to act.
Each call is also checked against an authorization policy. The billing service can reach the order service, but only for the operations it genuinely needs, and only in the tenant context it was invoked with. The policy decision can live in one place; the enforcement has to happen at every service, not at the gateway.
The real payoff is blast radius: a compromised service no longer inherits the platform, it inherits exactly what that one service was already allowed to do, and nothing more.
When to engage
When companies typically request an architecture review
Most teams do not wake up one morning and decide they need an architecture review. Something triggers it: a specific event that makes the current state of security feel inadequate.
The monolith is gone and nobody mapped the new trust boundaries
A platform that started as a monolith or a few services has grown into a distributed system. What used to be function calls within a single process are now HTTP requests crossing network boundaries. The original security model assumed co-location. That assumption broke when the services were split, but nobody redesigned the trust model to match. We regularly find that the first 3-4 services in a platform have stricter security than the 10 that followed, because the early ones were built when there was time to think about it.
An enterprise deal is blocked by security questions you cannot answer
Enterprise customers send security questionnaires that ask about service-to-service authentication, tenant isolation models, data residency, and key management. Teams discover that they cannot confidently answer because the architecture was never documented with these questions in mind. Access control models are inconsistent across services. Nobody is certain which services can access which tenant data. An architecture review creates the clarity needed to answer these questions honestly, and to fix the gaps those answers would reveal.
Pentest reports keep pointing at the same structural themes
When the third pentest in a row flags "insufficient authorization on internal endpoints" or "lateral movement possible between services," the issue is not the individual endpoints. It is the architecture. A pentest can tell you that service X has an authorization bypass. An architecture review tells you that 8 of your 15 services have the same bypass because they all inherited the same flawed middleware pattern from the original service template.
You are shipping AI features that touch customer data
AI-powered features introduce attack surface that does not fit traditional security models. Consider a common pattern: a customer-facing chat feature backed by an LLM, with access to internal APIs for retrieving account data. The LLM can now be manipulated through prompt injection to call APIs it should not call, exfiltrate data through its responses, or access other tenants' context if the retrieval layer does not enforce tenant isolation. These are not code bugs. They are architecture decisions about what the AI component is allowed to access, how its tool calls are scoped, and where tenant boundaries are enforced in the retrieval pipeline. We see this in nearly every platform that has added AI features in the last 18 months.
You are about to make a major architectural change
A migration to microservices, a new identity provider, a move to multi-region: these are the moments when the security model is cheapest to get right, because you are already changing the code that enforces it. Reviewing the design before you build it costs a fraction of retrofitting trust boundaries into a system that has already shipped.
If none of these describe your platform, you probably do not need an architecture review right now.
Recognize your platform in any of these?
If two or more of these triggers feel familiar, a 30-minute call is enough to tell you whether a focused architecture review is worth it. No prep required.
Talk to a security architectProcess
How we actually run an architecture review
Every consultancy lists the same phases: "architecture analysis, threat modeling, recommendations." That tells you nothing. Here is what we actually do, and why each step matters.
Map the real architecture, not the documented one
Architecture diagrams are always out of date. We start by building an accurate picture of the system through conversations with engineers, infrastructure configuration, API schemas, and service dependency graphs. The goal is to identify every trust boundary, every point where the system decides whether to trust a request.
In a typical platform with 10-15 services, we find 3-5 trust assumptions that the team did not know existed: services that implicitly trust each other because they share a database, internal endpoints reachable from the public internet through a misconfigured load balancer, admin APIs that rely on IP allowlists that no longer match the actual infrastructure.
Trace how sensitive data actually moves
We follow sensitive data (credentials, PII, payment information, API tokens) from ingestion to storage to access. This is where the gap between design intent and reality becomes visible. A platform may have a clean data model on paper, but in practice user email addresses flow through a logging pipeline into an analytics warehouse that half the engineering team can query.
Model threats against the actual system, not a generic one
Threat modeling is only useful if it is specific. We do not work from generic threat catalogs. We identify the 5-8 most realistic attack paths for this particular system, based on its actual entry points, service topology, and data sensitivity. For a platform with heavy third-party integrations, the most dangerous path might be through a compromised webhook handler. For a multi-tenant platform, it might be through tenant context manipulation in the session layer. The output is a ranked list of attack paths with concrete exploitation steps, specific enough that an engineer can reproduce the chain.
Test authorization boundaries end-to-end
Authorization is where we consistently find the highest-impact issues. We trace authorization decisions from the API gateway through every service in the call chain. The question is simple: if a user has access to tenant A, can they reach tenant B's data through any path? In practice, the answer is "yes" more often than anyone expects. Not because the gateway is broken, but because a downstream service trusts the calling service's context without revalidating the tenant claim.
Evaluate integration trust chains
Every external integration is a trust chain: your platform trusts the integration provider, the provider trusts the webhook payload, the webhook handler trusts the data it receives. We evaluate each link. A single weak link (an unsigned webhook, an overly scoped OAuth token, a shared secret reused across environments) collapses the entire chain. We recently reviewed a platform where a Slack integration had been granted database-level read access because the original developer needed it for a demo, and nobody revoked it.
Multi-tenancy
Tenant isolation is the defining SaaS security problem
If there is one architectural concern that matters more than all others in a SaaS platform, it is tenant isolation. A cross-tenant data leak is not just a security incident. It is a business-ending event. And yet, tenant isolation is almost always weaker than the team believes.
Where isolation actually breaks
Tenant isolation looks simple from the outside: each request carries a tenant ID, and every data access is scoped to that tenant. In practice, the enforcement is spread across the entire stack, and each layer does it differently.
The API gateway extracts the tenant from the JWT. The user service filters by tenant in its database queries. The reporting service trusts whatever tenant context it receives from the calling service. The background job processor has no tenant context at all because jobs were originally single-tenant.
The result is a patchwork where some paths enforce isolation strictly and others bypass it entirely.
The shared infrastructure blind spot
Even platforms with strong application-level isolation often share infrastructure in ways that undermine it: a single Redis instance for all tenants' session data, a shared message queue where a malformed message from one tenant's pipeline can affect another's, a shared search index where a missing tenant filter returns another tenant's documents directly. These are not theoretical. They are the patterns we see in platforms that have passed SOC 2 audits and multiple penetration tests.
Findings
What we find in practice
Every platform is different, but after dozens of reviews, the patterns are clear. The specific implementations vary. The categories do not.
Implicit trust between services
This is the single most common finding. Internal services assume that requests from other internal services are trustworthy because they originate within the VPC. There is no service identity, no per-request authorization, no call graph enforcement. The fix is not a network firewall. It is service-to-service authentication that verifies identity on every call.
Authorization that works at the gate but not inside
Teams invest heavily in API gateway authorization: OAuth flows, JWT validation, scope checks. But past the gateway, authorization degrades. Service B trusts that service A already checked permissions. Service C was originally internal-only and never had authorization at all. The result: if you can reach any internal service directly (through SSRF, a debug endpoint, or a compromised integration), you bypass all the authorization that was carefully built at the edge.
Secrets that outlived their intended scope
API keys, service tokens, and database credentials accumulate over time. A token created for a one-time data migration two years ago is still active, still has write access, and is now embedded in three different services' environment variables. We have found active credentials in CI/CD pipelines that grant broader access than any human user in the organization.
Session design that predates the current architecture
Session tokens designed for a simpler system get inherited by the more complex one. The token does not encode tenant context, or it encodes it in a way that services interpret differently. In a multi-tenant platform, this turns a stolen token, or a tenant claim that is never re-checked, into a cross-tenant breach. In one case, a session token's tenant field was set at login and never revalidated. A user who had been removed from a tenant could still access its data until the token expired 30 days later.
Deliverables
What you get at the end
The output is a technical report, not a slide deck. It is written for engineers who will implement the changes, not for executives who will file it.
Architecture risk map
A description of systemic risks and how they interact. Not a flat list of findings. A map showing that the weak tenant isolation in service A, combined with the unauthenticated internal API in service B and the overly scoped integration token in service C, creates a specific cross-tenant data access path. Understanding how risks chain together is what makes the difference between patching symptoms and fixing root causes.
Prioritized recommendations
Every finding is ranked by three factors: impact (what can an attacker achieve), exploitability (how hard is it to reach), and engineering effort (how much work to fix). The result is a realistic roadmap, not a list of 50 equally urgent items. In most reviews, a handful of changes account for the bulk of the risk, and they cluster in the same places: service identity, tenant isolation enforcement, and credential scoping. Those go at the top.
Concrete architectural changes
Rather than "implement better authorization," the recommendations describe specific changes: introduce a service mesh with mTLS for internal communication, enforce tenant validation somewhere a service cannot skip it (a mesh sidecar on every call, or row-level security at the data layer) rather than a library each team has to remember to import, rotate and scope the integration tokens that currently have platform-wide access. Each recommendation includes enough technical detail for an engineering team to estimate and plan the work.
Engagement
What a typical engagement looks like
A standard engagement runs 6-8 days for a platform with 8-20 services. Larger or more complex systems (heavy third-party integrations, multiple identity providers, AI components) may take 14+ days.
Here is what the process looks like in practice:
- Kickoff call (1 hour). We understand the system's purpose, its customers, and the team's specific concerns. We agree on scope and access requirements.
- Access and documentation. We need architecture diagrams (even outdated ones), API schemas or OpenAPI specs, infrastructure configuration (Terraform, Kubernetes manifests), and read access to the codebase. Your team's time commitment at this stage is minimal, typically 1-2 hours to set up access.
- Architecture deep-dive (2-3 hours). A working session with 2-3 senior engineers who know how the system actually works. We walk through service interactions, data flows, and authentication/authorization patterns. This session is where the most important context comes from.
- Analysis and threat modeling (3-5 days). We work independently, tracing trust boundaries, modeling attack paths, and testing assumptions against the actual codebase and infrastructure. We may ask follow-up questions asynchronously.
- Report and walkthrough (1-2 hours). We deliver the report and walk through the findings with the engineering team. This is a working session, not a presentation. The goal is to make sure the recommendations are practical and to answer technical questions about implementation.
At Pentecton, this is one of the most common engagements we run. The structure above reflects how we typically work, adapted to the specific system, scale, and concerns of each platform.
Know where your security architecture stands before it becomes a blocker
A 30-minute call is enough to understand your system, identify the highest-risk areas, and decide whether a focused review would help.
Talk to a security architect