Security Checklist for AI-Built Apps

This is the security checklist we run through on AI-built apps before they go to production. It is deliberately specific. Vague advice (“validate your inputs”) is the kind of thing AI-generated code already nods at without actually doing. The items below are things you can check today, on your codebase, and either find or not find.

It is grouped by area: auth, data, secrets, dependencies, logs, backups. If you only have an hour, start with the first two — they catch the most damage.

For a broader pre-launch view, see the public AI App Launch Checklist. This post focuses specifically on the security slice.

Authentication

Every authenticated route requires a valid session or token. “Requires” means the server rejects the request, not that the UI hides the link.
Sessions expire. Idle timeout is set to a value you actually decided on, not the framework default.
Password reset uses a single-use, time-limited token. Tokens are invalidated after use and on password change.
Login does not leak whether an email is registered (“user not found” vs. “wrong password” — pick one generic message).
Rate limiting on login, signup, and password reset endpoints. A bot cannot hammer them indefinitely.
Multi-factor authentication is at least available for accounts with access to sensitive data, even if not required.
OAuth flows verify the state parameter and validate redirect URIs against an allow-list.

Authorization

This is where AI-built apps fail most often. Authentication (“who are you?”) gets implemented because login screens are visible. Authorization (“are you allowed to do this to this resource?”) gets skipped because it is invisible.

Every endpoint that operates on a record checks that the current user owns or has access to that specific record. Not just that the user is logged in.
IDOR (insecure direct object reference) is tested: pick any URL with an ID, change the ID to one belonging to another user, confirm the request is rejected.
Role checks (admin, owner, member) are enforced server-side. “The UI doesn’t show the button” is not a security control.
Multi-tenant apps scope every database query by tenant ID at the query level, not the application level.
File downloads check ownership before serving the file. Direct links to S3 objects, if used, are signed and short-lived.

Input handling

Database queries use parameterized statements or an ORM. No string concatenation with user input into SQL.
User-supplied content rendered in HTML is escaped by the templating layer. If you need to render raw HTML, you sanitize it with a maintained library, not a regex.
File uploads validate type, size, and (where relevant) content. Filenames are not used as paths on disk.
User input is never used to build shell commands, file paths, or URLs to internal services without strict allow-list validation.
Forms accept the data they say they accept. If the API contract says “string of length 1–100”, the server enforces that, not just the client.

Data exposure

API responses return only the fields the client needs. No leaking of hashed passwords, internal IDs, admin flags, soft-delete columns, or audit metadata.
Error responses do not leak stack traces, SQL, file paths, or internal hostnames in production.
Search and listing endpoints have pagination caps. A single request cannot dump the entire users table.
PII is identified explicitly. You can answer “what columns contain personal data?” without grepping.
Data exports (CSV, JSON dumps) require explicit authorization and are logged.

Secrets and configuration

No secrets in the repo. git log -p for key, secret, token, password, and any provider names returns nothing real.
.env is in .gitignore. .env.example exists with placeholder values, not real ones.
Production secrets live in a secret manager (host platform, vault, cloud KMS). They are not in CI logs, build artifacts, or shipped bundles.
Different environments use different secrets. Dev does not have access to prod credentials.
Secret rotation procedure is documented for at least the database, payment provider, and AI provider keys. You have done it at least once.
Webhook endpoints verify the provider’s signature header. They reject requests that don’t.

Dependencies

You have a single, current lockfile. package-lock.json, yarn.lock, pnpm-lock.yaml, requirements.txt, etc., committed.
You can list every direct dependency and what it does. Anything you can’t justify gets removed.
An automated vulnerability scanner runs on every push or at least weekly (Dependabot, Renovate, npm audit, pip-audit, GitHub Advanced Security).
Critical-severity findings get triaged within a week, not left in a tab.
No dependencies installed from random URLs, forks, or untrusted registries.

Transport and headers

HTTPS is enforced everywhere. HTTP redirects to HTTPS. HSTS is enabled.
Cookies for sessions are Secure, HttpOnly, and have a sensible SameSite setting.
CORS is configured to a specific allow-list, not *. If you needed * to make a thing work, that thing is wrong.
Standard security headers are set: Content-Security-Policy, X-Content-Type-Options: nosniff, Referrer-Policy, X-Frame-Options (or CSP frame-ancestors).
CSRF protection is enabled for any form-based, cookie-authenticated endpoint that mutates state.

Logs and observability

Application logs go somewhere durable, not just stdout of the current container.
You can search logs by user ID, request ID, and time range without writing a custom script.
Logs do not contain passwords, tokens, full credit card numbers, or complete request bodies for sensitive endpoints.
Errors are tracked in a service that alerts you (Sentry, Honeybadger, Rollbar, etc.). You will hear about a crash before a user emails you.
Authentication events (login, password change, MFA reset) are logged with timestamp, user, and source IP.
You have a dashboard or metric for “is this app currently working” that does not require a human to refresh it.

Backups and recovery

The database is backed up at a frequency that matches what you can afford to lose. “Daily” is the floor, not the goal.
You have restored from backup at least once, into a non-production environment, end to end. You know how long it takes.
User-uploaded files are backed up or replicated. “It’s in S3” is not a backup; an S3 bucket can still be deleted, locked by a bad lifecycle rule, or have its contents corrupted by a deploy.
Backups are stored in a different account or region than the primary, with separate credentials.
You have written down what you would do if the entire environment vanished tonight. The document is short and specific.

AI-specific concerns

If your app calls a large language model with user input or returns model output to users, add these:

User-controlled input that becomes part of a model prompt is treated as untrusted. Prompts are constructed with separation between system instructions and user content.
Model output that gets rendered in the UI is escaped or sanitized the same way you’d treat any user-generated content.
Model output that gets used to take actions (run code, call APIs, modify data) goes through a validation layer, not a “trust the model” layer.
You have a per-user rate limit on AI calls. A single user cannot run your provider bill into four figures overnight.
You log enough about each AI call to debug an incident, but not so much that the logs themselves become a PII liability.

How to use this list

Run the list against your own codebase. For each item, the answer is yes, no, or “I don’t know.” Anything that isn’t yes is your work list.

If a third of the items are “I don’t know,” that’s a good sign you want a code audit rather than another solo pass — the items you can identify are usually not the most dangerous ones.

When you’re ready to walk through these against a real codebase, book an audit.