#30 2025 end of summer reflection: a year in AI
Summer to summer: what a year in AI has changed
Note: this reflects my personal opinions and reflections from my work over the last 12 months. It does not represent the view of my employer. It is also not a comprehensive AI industry view, instead focused solely on my personal experiences.
A year that feels like a decade
I went back to work post-sabbatical just over 12 months ago now, and as I took my summer holidays this year I’ve been reflecting on how my career has evolved over this time. It feels like a year in the world of AI moves at a different pace: like a decade’s worth of change where things are evolving at such a rapid pace it’s been hard to keep up.
I’ve been working in AI consulting for several years now and have seen firsthand how attitudes and expectations of AI have evolved over this time, with the last 12 months like a complete gear shift. Recent projects have included designing and implementing agentic AI solutions for data insights, customer service call auditing, and home WiFi troubleshooting, with current work focused on autonomous and self-healing telecoms networks. For me, the conversation this time last year was focused on the battle of the benchmarks, model specs, and parameter volumes. In a year, the conversation has now shifted, as companies are increasingly facing the reality of practical implementation challenges beyond raw model capabilities.
Here’s a picture of summer 2024 to take you back 12 months:
Model-centric mindset: As models got bigger and better, everyone was comparing LLM capabilities on benchmarks and leaderboards. High profile releases, both open-source and private, (GPT-4o, Claude 3.5 Sonnet, Meta Llama 3.1) drove a race on speed and quality of LLM responses, where it felt like each week brought a new breakthrough or release. As models got closer to saturation points on traditional language benchmarks like MMLU, researchers introduced new more complex benchmark sets in response like GPQA and MATH. It felt like people were comparing models like Pokemon cards. In my experience, lots of people (both friends and family and clients alike) were asking ‘which model should I use’, rather than broader questions like ‘what problem should I solve’ or ‘how can I use AI in my process or product’.
Prompt engineering was the primary means to improve quality of model responses, providing better context in expanded context windows. RAG was gaining traction as a means to provide improved context, but this ‘domain specificity’ challenge remained a key focus for businesses.
Copilot-core: For many large companies, Microsoft Copilot rollout was the default AI strategy. Giving users Copilot and developers Github Copilot was becoming table-stakes, but few companies had concrete frameworks for driving behaviour change and evaluating impact. Many also had high expectations that out-of-the-box Copilot would automate workflows, without fully understanding the level of integration or domain-specific grounding required to achieve this. This was something I experienced frequently, with clients saying ‘but can’t Copilot do this’?
The year of agents - maybe: 2024 was positioned in the industry as ‘the year of AI agents’ based on the idea of shifting from single models to compound AI systems with the ability to take action (i.e. adding tool-calling along with LLMs). However while in summer 2024 this was gaining enthusiasm as a concept and was inescapable at conference talks and Youtube videos, the reality was still early stage: early agents performed better than LLM-based chatbots, but still with many errors and not fully autonomously. Only a few of my clients began exploring agentic builds.
Multimodal: Models were increasingly focused on multimodal capabilities beyond text to include voice and images. Amidst this, GPT-4o’s voice capabilities sparked celebrity controversy: Sam Altman tweeted a single word in May (”her”) as a reference to the 2013 sci-fi move starring Scarlett Johansson as an AI assistant. OpenAI’s voice, called ‘Sky’, sounded uncannily like Johansson’s character in the film, despite Johansson refusing to voice the system for ‘personal reasons’. But this wasn’t just celebrity drama: the incident raised key questions about consent, creativity, and intellectual property. If AI could so convincingly mimic human attributes and creativity, what did this mean for artists, actors, and creators? There weren’t (and aren’t) yet clear answers to these questions, but it felt like the first time many were raised in earnest outside of niche groups.
Consumers outpacing business: ordinary consumers had, for perhaps the first time, better access to advanced AI technology and with fewer limits than in an enterprise setting. Many people I know were using LLM’s as ‘Google-plus’ to get faster answers to questions or as a novelty to play with, and were beginning to bring these expectations into their workplaces. Surveys at the time showed nearly half of Europeans had used generative AI for personal tasks, and only a quarter using it for work.
Now: Summer 2025 - Agentic Everything: Promise vs Practice
The biggest shift overall that I have experienced has been from thinking about ‘models’ to thinking about ‘systems’ - models may still be at the heart, but more of my conversations are now focused on workflows, architectures, and products, rather than the models themselves. The proliferation of agentic capabilities means we’ve moved beyond assuming chat as a default UI and into thinking about what AI can do on a user’s behalf and how AI can be embedded in experiences, opening up new opportunities as we move beyond knowledge retrieval copilots and making it an exciting time to be working in this space.
Expectations got real: the move from POCs to delivering outcomes. By the end of 2024, only 1 in 4 companies had successfully launched AI initiatives and surveys showed a slowdown in AI investments within companies compared to 2023-2024. Clients have started to be increasingly focused on upfront visibility of potential benefits, in many cases requiring clear route-to-value before starting development. The good news is fewer dead end experiments and POCs. The less good news is how difficult this makes experimentation, and how much pre-work is required to even get started, which massively slows down pace.
Rise of agentic systems: agentic frameworks and multi-agent orchestration tools have surged over the last few months, enabling the move from isolated ReAct agents to multi-agent systems. Accenture reports 1 in 3 companies are pivoting towards agentic AI. I’ve seen this firsthand: many of my clients have pivoted their thinking in the last 3-4 months to move beyond copilots providing knowledge retrieval and to ask about creating end-to-end autonomous workflows. Many vendors, especially big tech players, are offering easily configurable agentic capabilities, which is opening up new opportunities for companies to quickly trial and roll out agents in their business. However there remains a lack of understanding and consensus on what an agent actually is - with many of my clients wanting to introduce agents without understanding what they are or being comfortable with non-deterministic features, creating internal tensions as development progresses and more of the realities become clear. GPT5 also illustrates this way of thinking: rather than being a single model, GPT5 is actually made up of multiple models (GPT-5, GPT-5 thinking, mini) and a router which interprets intent and sends queries to the best model for the task. Many of the capabilities now associated with ChatGPT are part of the application engineering (e.g. short and long term memory, conversation history, agentic tool calling) which sits outside of the core model(s).
Introducing ‘AgentOps’: Beginning to think more about ‘post launch’ agent operations - cost monitoring & management, performance management, etc. Early stages of new skillsets and roles needed for these kind of tasks.
Challenge to traditional software development methodologies: given the iterative nature of working with LLMs, AI projects are well-suited to early trial, feedback, and iteration. Typical ‘wagile’ approaches in legacy companies lose time and buy in trying to extensively design something before trying out whether it can work. Traditional test cases don’t work for non-deterministic components. But companies don’t yet have the frameworks in place to work in a different way at scale, and underemphasise new challenges like hallucination mitigation.
Continued divergence between consumer expectations and enterprise capability: Consumer tools have continued to evolve massively, putting the most advanced agentic features on the market directly in the hands of ordinary users - both through leaders like OpenAI, Anthropic and Google, and through newer players like Manus. Consumer use of these tools continues to rise, with many employees using their personal tools in the workplace as well as for personal use. But enterprise rollout remains patchy: businesses remain held back by legacy systems and architecture and slow pace of delivery. Risk averse cultures mean companies may be spending more time following governance processes and aligning stakeholders than building and iterating, losing them critical time advantage.
Creative AI & response to IP lawsuits: as lawsuits have kicked off, a new crop of startups and applications particularly for the creative industry has come out, focused on responsible use of training data and keeping control in the hands of expert users, rather than allowing the model to do everything. For example, Act 2 Runway and Moonvalley Marey provide video generation models along with rich application features that allow users to specify motion paths, camera angles, reference imagery and effects rather than purely controlling the model through text prompting. This is also coupled with increased adoption within creative industries - for example, H&M’s recent product imagery was created with a hybrid photography and AI process.
My overall reflection is that we currently remain in an ‘implementation gap’. Consumers are dragging enterprises forward through their expectations, and while there is openness and desire to keep up, the pace and reality within enterprises tells a different story.
For example, one client wanted to roll out agentic data insights - but didn’t have most of its data available in a cloud data lake. Another wanted to automate a manual process, but doesn’t yet have API endpoints available to interact with critical systems. A third has an existing governance process for new tech developments, which takes a minimum of 12 weeks to progress through business case approval, security and privacy approvals, and cost estimations - all before even provisioning an environment and starting to experiment.
Back in 2017, Andrew Ng likened AI to ‘the new electricity’, but many companies still don’t know how to wire it. It’s similar to retrofitting electric wiring into an old house, versus new-builds with electricity built in. Tech vendors offer ‘plug and play’ capabilities, but these still require a plug - a plug which many legacy enterprises don’t yet have. It feels like AI is surfacing every weakness or lack of investment: limited digital adoption and data modernisation are no longer things we can get by without. Legacy tech, legacy data, and legacy mindsets all limit the potential for AI in large enterprises and, in my opinion, will continue to drive a divergence between those who can adapt versus those who will stagnate and shrink.
So what’s next? Where will we focus in Q4 2025 and into 2026?
Orchestration and agent-to-agent interactions: to truly automate business workflows, outsourcing labour to digital workers, agents will need to communicate with each other across systems. Isolated agents within specific vendor platforms can drive value in narrow use cases, but for autonomous operations greater integration and orchestration will be required. The need for this will increase as more companies successfully deploy agents within distinct domains, and interoperability becomes a limiting factor.
Double down on systems thinking: the applications which have the biggest impact won’t necessarily be those with the biggest or best models, but those which best utilise available AI and traditional software tools for the tasks at hand. Companies who are able to use both traditional and AI components throughout their processes and experiences, or who can rewire their business with AI at the core, will find more success than those who continue to try and manage AI within a bottlenecked dedicated team or try to solve every problem with LLMs.
Agent safety and controls will become frontline: I wouldn’t be surprised to see the first major AI agent-driven business disruption, which will raise important questions about the organisational structure, skills, and infrastructure needed to safely roll these out and maintain/operate them in production. While there is already good discussion on agent test approaches, observability, and building trust, this is nascent. There will be important lessons learned from any such incident, but key will be ensuring this doesn’t further hamper innovation through restrictive governance while keeping customer and employee safety first.
Skill gaps remain front and centre: I expect continued strain on AI-skilled staff, across both technical and non-technical roles. Hands on experience with AI in an enterprise context remains limited, meaning those who have it are in high demand. For example, McKinsey reports a 985% increase in job postings for agentic AI compared to 2023-2024. More broadly, I am already beginning to see increasing demand for new AI roles which many don’t yet have the skills to fill - for example, knowledge strategy for AI context, agent operations (AgentOps), new kinds of BAs and Product Owners who can ‘vibe code’ a prototype - which will hopefully continue to grow.
References
https://www.accenture.com/gb-en/insights/data-ai/front-runners-guide-scaling-ai
https://www.accenture.com/gb-en/insights/data-ai/hive-mind-harnessing-power-ai-agents
https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-top-trends-in-tech
https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality#:~:text=ready you are.”-,Narrative 3: AI orchestrators will govern networks of AI agents,specific expertise to complete tasks.
https://action.deloitte.com/insight/4210/europe-keeps-calm-carries-on-with-genai
https://www.gsb.stanford.edu/insights/andrew-ng-why-ai-new-electricity