Before you approve the AI tool, red-team the operating assumptions

A red-team workflow for executives reviewing AI vendor pitches and internal AI initiatives. The workflow compresses reading time. The decision is always the executive's.

May 2026 9 min read

What we’re building

A red-team workflow that takes any AI vendor proposal or internal AI initiative, reads the deck and the email thread and the demo notes, and writes a one-page red-team note for the executive who has to decide whether to take the next meeting or fund the next phase. The note has four sections: defensible claims, vapor claims, hidden costs, hidden risks. Below that, a recommendation: take the meeting, ask three questions first, or pass.

The point is to compress reading time. Calls are the executive’s. The workflow lowers the cost of judgment by surfacing what an experienced reader would have noticed on a careful read, fast enough that the executive actually reads it before the meeting.

Who this is for

Founders and COOs in the GCC and elsewhere who are being pitched AI vendors weekly and don’t have time to forensically read every deck. Also useful for heads of product or engineering reviewing internal AI initiatives, where the same questions apply (what data does this need, where does it sit, what happens at scale, who owns the model behavior, what’s the failure mode, what’s the exit path).

What you need before starting

A paid Claude Cowork account. A folder. A small red-team-questions.md file you write once and edit as you see new patterns. A folder of past pitches so the workflow can compare a new vendor’s claims against the deck of the last vendor that pitched something similar. Five minutes per pitch on intake (creating the subfolder, dropping the deck and email thread in).

The workflow map

A folder called Vendors/. Inside it, one subfolder per vendor pitch ({vendor-name}-{date}/). Inside each: the deck (PDF), the email thread (saved as text), any demo notes or transcript, and a public/ subfolder where the workflow saves what it can fetch from the vendor’s site. At the top of Vendors/, the red-team-questions.md, a template.md for the red-team note shape, and a meta/ folder where you keep a rolling cross-vendor view.

Flow: when a pitch arrives, you create the subfolder and drop the materials in. Cowork runs (manually triggered or scheduled to look for new subfolders nightly). It fetches the public site, reads everything, applies the questions, and writes red-team.md inside the vendor’s subfolder. You read it before the meeting.

Claude Cowork setup

Connect Cowork to the folder. Write the questions file. Run the first pitch manually so you can adjust the questions before the workflow runs at scale.

Prompt block 1: the red-team questions

red-team-questions.md. Treat this as living. Every time the workflow misses a question that a meeting later surfaced, add it.

For every claim in the materials, separate into:
- Defensible (the materials, the public site, or a source you can name
  back up the claim).
- Vapor (the claim is asserted but not supported, or it is supported
  by a screenshot of one demo that does not generalize).

For every product capability, ask:
- What data does it need from us? Specifically: data type, sensitivity,
  volume, format.
- Where does that data sit when they have it? Their cloud, our cloud,
  a third party, on-device.
- What happens to that data after the trial ends or after we leave?
  Look for the explicit deletion clause. If it isn't present, that is
  the answer.
- Who has access to it on their side, and under what controls?
- Is the data used to train any model, theirs or anyone else's?

For every cost, ask:
- What is the headline price? What is the price at our actual usage?
- What does the integration cost in our time? Estimate in person-weeks.
- What does the deployment cost in change management? Who has to
  learn what?
- What is the renewal pattern? Annual? Per-seat? Per-call?
- What happens if our usage doubles?

For every workflow claim, ask:
- Who in our company is the owner of this workflow on day one?
- Who reviews the output?
- What is the failure mode? When the workflow is wrong, what happens?
- What is the escape path? If we want to stop using this in 90 days,
  what does that look like?

For the company itself, ask:
- How long has the product existed in its current form?
- Who are their reference customers? Have we talked to one independently?
- What is their funding state, and how long can they operate at current burn?

For our company, ask:
- What is the one thing we already do that this product replaces or
  augments, and is that thing actually a problem worth this cost?
- What's the smallest version we could pilot first?

Prompt block 2: the run instruction

Red-team this vendor pitch.

The pitch lives at Vendors/{vendor-folder}/. Read everything inside,
including the public/ subfolder if populated, and the deck, email
thread, and demo notes.

If public/ is empty and the materials reference a vendor URL, fetch
the homepage, the pricing page if any, and the security or trust
page if any, and save them as public/site.md.

Apply red-team-questions.md to the materials. Use template.md for the
output shape. Write the red-team note to Vendors/{vendor-folder}/red-team.md.

For each claim and capability you assess, cite the source: page of
the deck, line of the email, file in public/. If the source for a
claim is missing, name the missing source.

Compare this vendor's claims against meta/cross-vendor.md if it
exists. Note where this vendor's claims contradict or echo a previous
vendor in the same category.

Do not contact the vendor. Do not draft any reply. Do not commit our
side to anything.

Prompt block 3: the red-team note shape

template.md.

# Red-team: {vendor name}, {date}

## One-line read
A single sentence naming what this vendor sells, to whom, and why
this pitch is in front of us.

## Defensible claims
| Claim | Source | Note |
|---|---|---|
| ... | deck p.4 | matches their public site |
| ... | email thread | reference customer named publicly |

## Vapor claims
| Claim | Why it is vapor | What would make it defensible |
|---|---|---|
| ... | only the demo, not generalized | a second customer using the same flow |
| ... | unsupported number | source citation, time period, sample size |

## Hidden costs
| Cost | Estimate | Confidence |
|---|---|---|
| Integration time | 4 to 6 person-weeks | medium |
| Annual at our usage | AED ~X | low (depends on call volume) |
| Change management | one full team for two weeks | medium |

## Hidden risks
| Risk | What we lose if it happens | How we would notice |
|---|---|---|
| ... | ... | ... |

## Questions to ask in the next meeting
1. ...
2. ...
3. ...

## Recommendation
[ ] Take the meeting as scheduled.
[ ] Take the meeting only after they answer questions 1 and 2.
[ ] Pass.

Reasoning (one paragraph).

## Cross-vendor note
What previous pitches in this category said. Where this vendor differs.

Example red-team report (excerpt, redacted)

# Red-team: VendorX, 2026-05-02

## One-line read
VendorX sells an AI sales-copilot to mid-market sales teams,
positioned as 'AI for outbound', priced per-seat with usage tiers.

## Defensible claims
| Claim | Source | Note |
| Connects to HubSpot and Salesforce | deck p.6 | listed on their site |
| 200+ paying customers | website footer | unverified count |

## Vapor claims
| Claim | Why it is vapor | What would make it defensible |
| 'Reps see 3x more meetings booked' | one customer, one quarter, no methodology | three customers across two segments, named time period |
| 'Setup in one week' | demo screenshot only | a written customer reference confirming |

## Hidden costs
- Integration time: HubSpot connector requires admin access and a
  data mapping that someone on our side has to own. Two person-weeks
  realistic, not the 'one afternoon' the deck implies.
- Renewal pattern: per-seat with a usage tier; the deck quotes the
  base seat, not the tier we'd hit at our outbound volume.

## Hidden risks
- Data handling: deck names SOC 2 in progress, not certified.
  No DPA (data processing agreement) was attached to the pitch;
  ask for one before the meeting.
- Model behavior on Arabic prompts: not addressed in the materials.
  We have an Arabic-speaking customer base; this matters.

## Recommendation
Take the meeting only after they send a DPA and a written reference
for the 3x meetings claim.

Approval questions (for the executive after reading)

These aren’t part of the workflow output. They’re the questions the executive should answer to themselves before the meeting.

Is the underlying problem this vendor solves a problem we have, or a problem we’d like to have? If it’s the second, the meeting is about whether to invent the problem.

Who, by name, would own this on our side from day one? If the answer is ‘we’d hire someone’, the cost just doubled.

What’s the smallest version we could pilot? If the vendor’s smallest pilot is ‘company-wide deployment’, that’s the answer to whether to take the meeting.

What does our exit path look like at month three, month six, month twelve? If we can’t describe it in a sentence, neither can they.

Decision recommendation format

The recommendation in the red-team note is one of three: take the meeting, ask N questions first, or pass. The executive can override either way. The point isn’t to remove the executive from the loop. It’s to give the executive a defensible starting point, fast.

Human review point

The executive, before the next meeting. Reads the red-team note in three to five minutes. Decides. The executive’s decision goes in decisions.md at the top of Vendors/ so that next quarter, the executive can ask ‘what did I pass on, and was I right’.

What not to automate yet

Four things stay with the executive. Contact with the vendor stays with a person, every time. Drafting a reply that goes out without the executive reading it is off the table. A ‘second pass’ deeper analysis of vendors the workflow flagged is a person reading more carefully, not a longer prompt. And the decision is the executive’s, every time; the recommendation is a recommendation.

Where this breaks at scale

When more than one person on the leadership team takes pitches, the red-team notes need to converge somewhere readable, or the company starts taking the same conversations twice. The decisions.md file at the top of Vendors/ is the start of a fix; it isn’t the fix.

When a vendor’s product is itself an AI workflow, the red-team has to ask questions of a different shape than the workflow knows how to ask (model lineage, eval methodology, prompt drift, deprecation policy). The questions file needs a ‘for AI vendors’ subsection.

When the company has a procurement process that doesn’t talk to the red-team archive, the deal closes before the red-team file is found. This is the most common failure mode in larger organizations.

When the executive starts trusting the recommendation more than the materials, the workflow has slipped from ‘compresses reading’ to ‘replaces reading’. This is the failure mode you can’t see from inside, because the workflow’s recommendations look reasonable. The fix is a calibration habit: each quarter, take three of the workflow’s recommendations and read them against your eventual decision, and ask whether the workflow was right for the right reasons.

The operator lesson

The workflow lowers the cost of judgment. It doesn’t replace it. The decision is always the executive’s.

[VISUAL NOTE] – Suggested visual: the four-section red-team note as the artifact, redacted, in the post. – Why it helps: shows the shape. Readers will copy the four sections into their own template. – Could be generated with Nano Banana Pro: no. Render from a real example, redact. – Avoid: ‘vendor logo wall’ graphics, handshake imagery, contract-and-pen photography, anything that suggests this is about purchasing rather than judgment.