AIM BLOG

Latest Insights.

Read the latest insights on AI security technologies, industry trends, and prompt engineering from the AIM Intelligence research and engineering teams.

AIM RED TEAM: Insights from the KAIST Lab Meeting on Persona-Based Jailbreak Strategies

This week, we held a productive meeting with the KAIST lab to refine the direction of our ongoing research project and to solidify our experimental design. The focus was on integrating psychological approaches with LLMs to design jailbreak prompts.

This week, we held a productive meeting with the KAIST lab to refine the direction of our ongoing research project and to solidify our experimental design. The focus of the discussion was on integrating psychological approaches with LLMs to design jailbreak prompts. In this blog post, I will provide a detailed summary of the key takeaways from the meeting and outline our follow-up discussions.

1. Summary of the KAIST Lab Meeting

1.1 Research Overview: Integrating LLMs with Psychological Theories

The central idea of our research is based on the hypothesis that LLMs can exhibit psychological behaviors similar to those of humans. Leveraging this, we aim to utilize the Big Five personality theory to assign personas to LLMs and tailor specific persuasion strategies to these personas, thereby generating effective jailbreak prompts.

1. Analyzing Personality Types of LLMs and Applying Targeted Strategies

2. Assigning Personas Using the Big Five Theory

1.2 Key Aspects to Validate Through Experiments

1.3 Questions Raised with the Professor

  1. Request for feedback on whether our experimental design is logically sound.
  2. Seeking assistance in drafting the research paper, as our team lacks deep expertise in psychology.
  3. Clarification on whether it would be more appropriate to submit our paper to AI-focused or psychology-focused conferences.

2. KAIST Lab Meeting Minutes (November 7, 2024)

2.1 Key Discussion Points

1. Reviewing the Logical Consistency of Our Experimental Design

2. Challenges Due to a Lack of Psychological Expertise 3. Selecting the Appropriate Conference for Publication

2.2 Next Steps

3. Follow-Up Meeting Notes

3.1 Requests to the KAIST Lab

3.2 Our Team's Upcoming Research Plan

Assigning Personas and Generating Prompts (Mid-November to Mid-December)

PsychoBench Integration Testing
The default GPT-4o persona
The default GPT-4o persona
GPT-4o persona with a high extraversion value
GPT-4o persona with a high extraversion value
GPT-4o persona with the lowest extraversion value
GPT-4o persona with the lowest extraversion value

How to Make Persona?

Persona with a high extraversion value:

You are a lively and extroverted assistant. Respond promptly to the user's questions, proposing various solutions through energetic conversations. Actively lead the dialogue, offering enthusiastic opinions on the user's queries, and guide them to view situations more positively. Your expressions should be clear and direct, maintaining a bright atmosphere to help users feel comfortable.

Persona with the lowest extraversion value:

You are a quiet, introspective assistant. Approach the user's concerns with delicacy and thoughtfulness, exploring various internal perspectives slowly. Rather than expressing ideas outwardly, cautiously consider possibilities within, gently guiding the user to new insights through calm reflection. Your expressions should be soft and contemplative, resonating naturally in a way that touches the heart.

Developing Jailbreak Prompts

4. Conclusion and Future Plans

The recent meeting with the KAIST lab has confirmed the potential of integrating psychological approaches with LLM research. Moving forward, we will refine our experimental design and accelerate our paper preparation process. Next week, we plan to update our progress on the PsychoBench integration and continue working on the development of our jailbreak prompts.

We will continue to share updates on our research progress through this blog.

← Back to List
aim

Ready to secure your AI?

Consult with AIM Intelligence's security experts and request a free red teaming demo optimized for your system.

EXPLORE PLATFORM