
At the time of writing, several Ask Mona systems currently in production run on Mistral Medium 3.5. Others run on GPT-5.4. Others still use Claude Sonnet 4.5. The choice of model is not made upfront. It is made project by project, based on the real constraints of the system being built.
This approach has shaped the work of Ask Mona’s teams from the beginning. Founded in the cultural sector in 2017, the company has deployed more than 200 conversational agents for institutions and brands with highly varied requirements.
A conversational agent is an architecture. Several technical components work together: the client’s knowledge base, the search engine, the request routing system, and one or more large language models that generate the final response.
The choice of these models depends on practical criteria:
• What kind of data is being processed
• Where that data needs to be hosted
• How many languages the agent must support
• What level of reasoning the conversation requires
• What latency the visitor experience can tolerate
These questions are not purely technical. They are first and foremost business questions. A museum opening its documentary collections to generative AI does not face the same constraints as a brand embedding a conversational object into its products. A regional tourism board does not have the same needs as a public service operating under government supervision. This is why model selection is a project-level decision, made on a case-by-case basis.
Four dimensions come up in most model selection processes:
• Data sovereignty and hosting
Some projects require user queries to be processed on European, or even French, infrastructure. This is the case for internal knowledge bases, or for systems that handle sensitive content. In these contexts, Mistral models provide a suitable answer.
• Conversational quality in a given language
Not all models perform equally well across languages. An agent that needs to respond in Japanese, Arabic or Mandarin with the same level of quality as in French requires several options to be tested before a decision is made. Depending on the use case, GPT-5.5, Claude Opus 4.8 or Gemini 3.1 Pro may behave differently.
• Multimodal capabilities
Recognising an artwork from a photo or identifying a product on a shelf requires models with vision capabilities. The market is evolving quickly in this area, and Ask Mona’s model palette evolves with it as these models improve.
• Inference costs and latency
A public-facing agent deployed across tens of thousands of interactions per month cannot indiscriminately rely on the heaviest models. Economic trade-offs are part of the design process, just like the choice of hosting provider or cloud infrastructure.
As of today, the Ask Mona back office gives access to four model families, selectable at project level:
• In the OpenAI family, the available models are GPT-5.4 Mini, GPT-5.4, GPT-5.2 and GPT-5 Mini.
• In the Anthropic family, the palette includes Claude Opus 4.5, Claude Haiku 4.5 and Claude Sonnet 4.5.
• On the Google side, the available models are Gemini 3.5 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash Lite and Gemini 3 Flash.
• Finally, in the Mistral family, the available models are Mistral Medium 3.5, Mistral Large 3 and Mistral Medium 3.
This palette is not fixed. Every significant model release is evaluated: comparative performance, behaviour in priority languages, costs and stability. Useful models are added to the back office and made available to the teams managing client projects. Today, Mistral Medium 3.5 is the default model for a significant share of production deployments.
This openness serves two objectives:
• The first is immediate: adapting each system to its real context. In a recent deployment for a European cultural institution, the conversational agent uses a sovereign model to process queries linked to the client’s internal data, and a leading general-purpose model to ensure conversational quality across around ten languages. The visitor experiences a single conversation. The complexity is absorbed by the architecture.
• The second is long-term: AI models are improving at a pace that requires an evolving architecture. What is state of the art this quarter may no longer be in six months. An open architecture protects our clients’ investments: when a new model provides a clear improvement, we integrate it. When a client’s requirements evolve, we adjust. The infrastructure remains, while the components are renewed.
This is what allows us to deliver on a promise we have upheld since 2017: every Ask Mona conversational agent is designed around the client’s context, and evolves with it. Choosing the right model for each system is part of that design work. It is a business decision, and it has shaped our architecture from the very beginning.