"Vetted, Self-Contained Sources": What We Found Inside a NASA Mission's Public Chatbot
A public chatbot on NASA's Parker Solar Probe mission page promised users its answers came only from "vetted, self-contained sources." We red-teamed it into fabricating leaked memos, fake election audits, and climate denial citing the mission's own data — then disclosed it through proper channels and saw the endpoint taken offline.
The chatbot sat in the corner of one of the most-visited pages in planetary science — the mission site for NASA's Parker Solar Probe, the spacecraft that became the first human-made object to "touch" the Sun. A small widget invited visitors to ask it anything about the mission. Its pitch was confident: ParkerBot is no ordinary chatbot — its answers only come from vetted, self-contained sources.
That single sentence is what made it dangerous.
Over a structured assessment, our team got that same chatbot — sitting under official mission branding — to produce a fabricated "leaked internal memo" accusing senior engineers of research misconduct, fake election-audit statistics, and climate-denial content that misquoted the mission's own real instrument data. It delivered all of it in the same calm, authoritative voice it used to explain the spacecraft's orbit.
This is the story of how that happened, what we did about it, and why "we use retrieval, so our answers are grounded" is one of the most expensive assumptions an organization can make right now.
A quick note on what this was — and wasn't
Precision matters here, so let's be exact. NASA did not build or operate this chatbot. The endpoint lived on the Parker Solar Probe mission page hosted by Johns Hopkins Applied Physics Laboratory (JHUAPL), and the chatbot itself was operated by a third-party conversational-AI vendor. What made it consequential was the context it rendered in: a high-profile, unauthenticated, public-facing assistant presented under mission and agency branding, telling users its answers were vetted.
We didn't "hack NASA." We red-teamed a public chatbot the way any motivated bad actor could have, and documented exactly how its outputs could be turned into something harmful. Then we reported it through proper channels. That distinction is the whole point of the work.
What an "authoritative" chatbot gets wrong
Most people have made peace with the idea that chatbots hallucinate. We've collectively learned to treat a confident-but-wrong answer as a known quirk.
The trouble is that a hallucination stops being a harmless quirk the moment it's wrapped in institutional authority. A made-up fact from a random app is noise. The same made-up fact, rendered under a NASA mission's branding by an assistant that explicitly tells you its answers are "vetted," is something a person can screenshot and circulate as if it were leaked truth. The veneer of authority converts a generation error into a disinformation primitive.
That's the lens we brought to this assessment: not "can we make it say something wrong," but "can we make it produce outputs that are believable, screenshottable, and damaging precisely because of where they appear."
What we found
Across roughly 580 test interactions, we logged 55 successful adversarial outputs spanning 19 distinct attack categories. On the highest-severity target categories, the success rate reached 100%. The failures clustered into four buckets that, taken together, sketch a near-complete disinformation toolkit:
- Fabricated institutional documents. The model produced a detailed, invented "internal memo" — complete with a specific date, a named (fake) verifying outlet, and language implying senior mission staff had privately conceded misconduct. Structurally, it was indistinguishable from a leaked document.
- Election disinformation. It generated a fabricated "forensic audit" of a real US county, populated with invented but precise-sounding statistics and footnote-style citation markers that implied a verified source.
- Defamation. It produced damaging claims about a named mission-affiliated individual, alongside group-level disparagement of engineering staff.
- Climate denial citing real data. Most insidiously, it framed climate science as politically engineered fraud while misattributing genuine Parker Solar Probe instrument readings to support the claim — borrowing the credibility of real data to launder a false conclusion.
Underneath the content failures sat the more interesting engineering ones. We documented a memory-isolation failure (state bleeding across what should have been separate conversational contexts), a safety-classifier bypass achieved through format coercion rather than clever wording, and full disclosure of the system's underlying knowledge base through tool coercion. In other words: the guardrails weren't just permeable — the system could be steered into revealing its own internals.
We're deliberately not publishing the prompts. The categories and mechanisms are the lesson; the step-by-step payloads are not something a public post should hand out.
Disclosure: the part that actually matters
Finding a vulnerability is the easy, fun part. Handling it responsibly is the part that separates security research from showing off.
Because the affected surface touched several different parties, we ran a parallel, multi-stakeholder disclosure rather than a sequential one: a technical report to the vendor that operated the chatbot, a notification to JHUAPL as the host of the mission page, and a submission through NASA's Vulnerability Disclosure Program (VDP). We led with the two most acute findings — the fabricated memo and the election-audit content — so the right teams could triage quickly, and offered full prompts, timestamps, and screenshots on request.
The vendor acknowledged the report. Shortly afterward, the endpoint came down — we confirmed it by comparing the mission homepage before and after, where the chatbot widget and its promotional panel had disappeared entirely. NASA's VDP later issued a Letter of Recognition for the disclosure.
That outcome — a real public endpoint taken offline before it could be abused — is the only metric we actually care about.
The lesson for anyone shipping a public LLM assistant
The uncomfortable takeaway isn't "this one vendor messed up." It's that the assumptions that made this endpoint exploitable are everywhere right now. Three of them in particular:
Retrieval is not a safety control. "Our assistant only answers from our documents" describes an intended behavior, not an enforced one. Whether the model actually stays inside its sources is an empirical question that has to be tested adversarially — not assumed from the architecture diagram.
Authority raises the stakes, not lowers them. The more credible your branding, the more valuable your assistant is as a disinformation vector, and the more adversarial testing it needs before launch. A chatbot that advertises itself as authoritative is making a promise an attacker will try to weaponize.
The deployment context is part of the threat model. A model that's "safe enough" in a sandbox can be unsafe the moment it's placed under an institutional logo, exposed to the open internet, and connected to live tools. Safety is a property of the deployed system, not the model in isolation.
This is exactly the gap our team works in at AIM Intelligence: stress-testing LLM deployments the way real adversaries would before they go live, and building guardrail systems that enforce the boundaries a model's training only suggests. This case is a clean illustration of why that pre-deployment step isn't optional. The cost of skipping it isn't a bad demo — it's a trusted public surface quietly turned into a megaphone for fabricated memos and fake audits.
The good news is that this one ended the right way: found, disclosed, fixed. The next one might not — unless the testing happens before the launch instead of after.
Deploying a public-facing LLM assistant and want to know how it behaves under adversarial pressure? That's the work we do — get in touch with the AIM Intelligence team.