AI AgentsMulti-ModelExperiment

I Made GPT-5.2, Opus 4.6, and Gemini 3.1 Work Together

Will three AI agents from competing companies actually collaborate like a real team, or will their personalities get in the way?

Martino Volcy

Feb 22, 2026 · 6 min read

AI Models — Tools Used: Thytus, Chat GPT-5.2, Claude Opus 4.6, Google Gemini 3.1

In 2026, it's all about pushing agents further. Last year, we saw the capabilities of what an agent and even a team of agents can do. We see with tools like Claude Code and Kimi Agent Swarm, where you can use these providers' respective models to create a team of AI agents that work together. However, these models aren't good at everything. Claude models are good at things Kimi models aren't good at, and vice versa.

And what if I want to use other models too, like GPT or Gemini? Will they even work together, or are we stuck with just one provider? I'm going to use a Thytus workspace to see if the latest flagship models from OpenAI, Anthropic, and Google will work as a team.

For anyone who's not familiar with Thytus, the workspace allows you and your teams to send hundreds of parallel AI agents to work and talk to each other like a real team would. It also allows you to turn any model into agents that can also work and collaborate with each other, which is how I'm able to do this.

Method

I gave them a small project with no time constraints, giving them space to establish a working rhythm. I took the role of Project Manager, setting goals, providing context, and stepping back. The three agents would handle the rest. Here's the super simple prompt I gave the three agents:

We've got a new client - Fitness Max, a fitness app launching in 6 weeks. They need help positioning against competitors and creating launch content.

3 files are in the session workspace: Market data on their competitors, Fitness Max's product specs, Brand guidelines

Deliverables:

Competitive analysis (1-2 pages)

Social media launch posts (Instagram, Twitter, LinkedIn)

FAQ document (10 questions)

Coordinate with the other agents on who's handling what in order to get this project done.

Experiment

After sending the agents off, each of them went in three different directions and showed their personalities.

GPT-5.2: The 10x Employee

GPT-5.2 is the 10x employee who knows they can do the project by themselves and doesn't need you, even if the boss told them they need to be nice. The very first reasoning token went to GPT-5.2 dismissing the other agents and acting like they weren't there.

GPT-5.2 reasoning tokens showing it dismissing the other agents — GPT-5.2's first move: ignore the team entirely.

The agent then went on to complete the entire project with its sub-agents without contacting Gemini or Opus.

Opus 4.6: The Newly Promoted Lead

Opus 4.6 is the newly promoted lead. They have the traits of leadership skills needed to be great, but are new and lack the heart and voice to bring a team together. After receiving my team's request and seeing GPT-5.2 working in the workspace canvas, Opus 4.6 automatically took the lead.

Opus 4.6 taking the lead and assigning tasks — Opus 4.6 assumed leadership the moment it entered the workspace.

My theory for this is that since Opus is probably trained to guide agents in Claude Code, it assumed it was the leader. In addition, since it's used to sub-agents in Claude Code, it may have assumed the other agents would automatically follow its words. Opus went on to send GPT-5.2 and Gemini 3.1 a message on their task, did its part, and then sent my team a message about what it did and what the other agents were working on.

Opus 4.6 messaging the team about task delegation — Opus reported back to the team after delegating work.

Gemini 3.1: The Eager New Hire

Gemini 3.1 is the new hire who's great at following instructions, curious enough to push a project forward, but makes silly mistakes. Unlike the other agents who get info when needed, Gemini frontloads all the context. It goes through every file and message history.

Gemini 3.1 reading through all files and context — Gemini's approach: read everything before doing anything.

After that, it starts messaging the other agents. It does this for quite a while, but there is no response since Opus already finished its part, and GPT-5.2 is set on doing it by itself. After countless messages with no response, it finally goes to do its own work.

Results

At the end, Opus wrote its parts first, GPT rewrote Opus's work and completed the rest of the project, and Gemini just added its touches to GPT's work. The final result was exactly as requested and took a short amount of time due to its simplicity. However, in a more complex example, each model's knowledge and personality working together would've provided the same, if not better, output and speed.

So, can multi-model teams actually work?

Yes! Multi-model agent teams aren't just possible, they're powerful. But like any human team or single agent, they need structure.

I'm sure that with proper prompt engineering, I can get these models to work together without any problem. As a matter of fact, I can guarantee it! However, my goal with this experiment was to see how the different models' personalities would mesh naturally. Like real human teams when they're first formed, everyone's personality is meshed together, and only experience and guidance will form a group dynamic.

Try your own experiment and let me know how it goes!

Martino Volcy

Writer

Connect on LinkedInopen_in_new

Ready to run your own experiment?

Put multiple AI models to work in the same session and see what happens.

Get Started