[Speakers]
Adversary Village at
DEF CON 33

Mark Perry

Mark Perry | Lead Applied Cyber Security Engineer at MITRE Corp.

Mark Perry is a Lead Applied Cyber Security Engineer at MITRE Corp, where he specializes in adversary emulation and work development. With a robust background in infrastructure and cyber security frameworks, Mark brings extensive expertise to his role, focusing on fortifying systems against sophisticated cyber threats. He has worked on projects involving adversary emulation, red teaming, cyber threat intelligence, and software development. Mark also leads development and delivery of Caldera workshops, providing participants with practical, hands-on training utilizing cybersecurity techniques. Additionally, he actively promotes Caldera’s benefactor program, fostering community support and engagement to further the development of cybersecurity tools and resources. Outside of his professional endeavors, Mark enjoys traveling and is a supercar enthusiast.

Hands-on workshop : MITRE iCaldera: Purple Teaming in the Future

Saturday | Aug 9th 2025
Adversary Village workshop stage | Las Vegas Convention Center

Purple Team
Adversary Emulation

The rapid advancement of large language models (LLMs) is reshaping the landscape of cybersecurity. These models are not only achieving higher benchmarks in math, coding, and cybersecurity tasks but are also being leveraged by threat actors to enhance resource development and social engineering capabilities. As LLMs continue to evolve, what could autonomous cyber capabilities powered by these models look like? How can we responsibly harness their potential for adversary emulation and defense?
In this talk, we will explore the integration of LLMs into MITRE Caldera, a scalable automated adversary emulation platform, and investigate how these models can transform adversary emulation through three distinct paradigms: as planners, as factories for constructing custom cyber abilities, and as forward-deployed autonomous agents. Drawing on existing research, including papers on LLM-assisted malware development and benchmarks for offensive cyber operations, we will examine the capabilities of LLMs in generating plausible emulations of advanced persistent threats (APTs).

The session will feature live demonstrations showcasing how LLMs can replicate adversary profiles, construct new cyber abilities on the fly, and autonomously execute emulation tasks. Attendees will gain insights into the performance of these paradigms, their implications for purple teaming, and the challenges of maintaining realistic emulations.
Finally, we will look ahead to the future of adversary emulation, discussing how APTs might leverage autonomous or semi-autonomous LLM capabilities in practice and the role of increasingly powerful models in shaping the next generation of cybersecurity tools. Whether you're a defender, researcher, or technologist, this talk will provide a compelling glimpse into the possibilities and risks of LLM-enabled adversary emulation.

Detailed workshop outline :

  • 1. Introduction
    • a. Personal Introductions
    • b. Large language models are getting better and better
      • i. LLM models showcase higher scores on math, coding, and cybersecurity benchmarks over time.
      • ii. Evidence suggests large language models are currently being used by threat actors to enhance capabilities mostly in resource development and social engineering.
      • iii. Where do we go from here? (What would LLM enabled autonomous cyber capabilities look like)
    • c. Presentation thesis statement
      • i. We can explore several iterations of autonomous capability with large language models, either through adjusting control of the Caldera planner, giving control of the Caldera agent to create an LLM agent, or to construct new cyber abilities on the fly.
  • 2. Existing Research
  • 3. What is Caldera and how does it currently function?
    • a. Caldera is a scalable, automated adversary emulation platform
      • i. autonomous adversary emulation / press "go" style
      • ii. testing of EDR/XDR / ATT&CK Evals
      • iii. purple team style of testing
    • b. Caldera current capabilities and functions
      • i. agents, adversary profiles, abilities linked to ATT&CK
      • ii. example adversary profile "Thief" in detail
  • 4. Large Language Models in Caldera
    • a. Different Paradigms for autonomous functionality
      • i. Large Language models as a planner(Instructing and directing)
      • ii. Large language models as a factory(Constructing custom abilities(commands and cyber capability) on the fly).
      • iii. Forward deployed large language model agent
        • iiia. Instructing itself and directing itself
        • iiib. Guidance based on initial deployed goal, loosely coupled with C2(caldera). Reaches out once its accomplished its goal
        • iiic. Creates own abilities
    • b. Demo
      • i. Planner demo, instructed to replicate original thief adversary profile
      • ii. Factory demo, instructed to accomplish goal outside scope of abilities already present
      • iii. Agent demo, instructed to replicate specified APT profile
    • c. Results
      • i. How did the different paradigms perform? (was the goal accomplished while maitaining a plausible emulation)
        • ia. Planner experiment analysis results
        • ib. Factory experiment analysis results
        • ic. Agent experiment analysis results
  • 5. Looking forward to large language models in adversary emulation
    • a. What will APTs seek to use in practice?
      • i. Autonomous vs semi-Autonomous use cases
    • b. Models will likely become even better in the future
      • i. Increase in model performance will likely influence autonomous purple teaming in the future

Access Everywhere.


Join Adversary Village Discord Server.

Join Adversary Village official Discord server to connect with our amazing community of adversary simulation experts and offensive security researchers!