[Speakers]
Adversary Village at
DEF CON 33

Ethan Michalak

cybersecurity engineer | MITRE | Caldera contributor

Ethan Michalak is a cybersecurity engineer and an avid CTF player. Ethan pursues efforts in adversary emulation, detection engineering, and malware development. In his free time, Ethan plays video games, reads a book, or makes a cocktail.

Hands-on workshop : MITRE iCaldera: Purple Teaming in the Future

Saturday | Aug 9th 2025
Adversary Village workshop stage | Las Vegas Convention Center

Purple Team
Adversary Emulation

The rapid advancement of large language models (LLMs) is reshaping the landscape of cybersecurity. These models are not only achieving higher benchmarks in math, coding, and cybersecurity tasks but are also being leveraged by threat actors to enhance resource development and social engineering capabilities. As LLMs continue to evolve, what could autonomous cyber capabilities powered by these models look like? How can we responsibly harness their potential for adversary emulation and defense?
In this talk, we will explore the integration of LLMs into MITRE Caldera, a scalable automated adversary emulation platform, and investigate how these models can transform adversary emulation through three distinct paradigms: as planners, as factories for constructing custom cyber abilities, and as forward-deployed autonomous agents. Drawing on existing research, including papers on LLM-assisted malware development and benchmarks for offensive cyber operations, we will examine the capabilities of LLMs in generating plausible emulations of advanced persistent threats (APTs).

The session will feature live demonstrations showcasing how LLMs can replicate adversary profiles, construct new cyber abilities on the fly, and autonomously execute emulation tasks. Attendees will gain insights into the performance of these paradigms, their implications for purple teaming, and the challenges of maintaining realistic emulations.
Finally, we will look ahead to the future of adversary emulation, discussing how APTs might leverage autonomous or semi-autonomous LLM capabilities in practice and the role of increasingly powerful models in shaping the next generation of cybersecurity tools. Whether you're a defender, researcher, or technologist, this talk will provide a compelling glimpse into the possibilities and risks of LLM-enabled adversary emulation.

Detailed workshop outline :

  • 1. Introduction | 5 min total
    • a. Personal Introductions | 1 min
    • b. Large language models are getting better and better | 3 min
      • i. LLM models showcase higher scores on math, coding, and cybersecurity benchmarks over time.
      • ii. Evidence suggests large language models are currently being used by threat actors to enhance capabilities mostly in resource development and social engineering.
      • iii. Where do we go from here? (What would LLM enabled autonomous cyber capabilities look like)
    • c. Presentation thesis statement | 1 min
      • i. We can explore several iterations of autonomous capability with large language models, either through adjusting control of the Caldera planner, giving control of the Caldera agent to create an LLM agent, or to construct new cyber abilities on the fly.
  • 2. Existing Research | 5 min total
  • 3. What is Caldera and how does it currently function? | 5 min total
    • a. Caldera is a scalable, automated adversary emulation platform | 2 mins
      • i. autonomous adversary emulation / press "go" style
      • ii. testing of EDR/XDR / ATT&CK Evals
      • iii. purple team style of testing
    • b. Caldera current capabilities and functions | 3 mins
      • i. agents, adversary profiles, abilities linked to ATT&CK
      • ii. example adversary profile "Thief" in detail
  • 4. Large Language Models in Caldera | 10 min total
    • a. Different Paradigms for autonomous functionality | 2 mins
      • i. Large Language models as a planner(Instructing and directing)
      • ii. Large language models as a factory(Constructing custom abilities(commands and cyber capability) on the fly).
      • iii. Forward deployed large language model agent
        • iiia. Instructing itself and directing itself
        • iiib. Guidance based on initial deployed goal, loosely coupled with C2(caldera). Reaches out once its accomplished its goal
        • iiic. Creates own abilities
    • b. Demo | 8 mins
      • i. Planner demo, instructed to replicate original thief adversary profile
      • ii. Factory demo, instructed to accomplish goal outside scope of abilities already present
      • iii. Agent demo, instructed to replicate specified APT profile
    • c. Results | 2 mins
      • i. How did the different paradigms perform? (was the goal accomplished while maitaining a plausible emulation)
        • ia. Planner experiment analysis results
        • ib. Factory experiment analysis results
        • ic. Agent experiment analysis results
  • 5. Looking forward to large language models in adversary emulation | 5 min total
    • a. What will APTs seek to use in practice? | 3 mins
      • i. Autonomous vs semi-Autonomous use cases
    • b. Models will likely become even better in the future | 2 mins
      • i. Increase in model performance will likely influence autonomous purple teaming in the future

Access Everywhere.


Join Adversary Village Discord Server.

Join Adversary Village official Discord server to connect with our amazing community of adversary simulation experts and offensive security researchers!