Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
If 2023 was the year of generative AI-powered chatbots and search, 2024 was all about AI agents. What started from Devin earlier this year grew into a full-blown phenomenon, offering enterprises and individuals a way to transform how they work at different levels, from programming and development to personal tasks such as planning and booking tickets for a holiday.
Among these wide-ranging applications, we also saw the rise of data agents this year — AI-powered agents that handle different types of tasks across the data infrastructure stack. Some did basic data integration work while others handled downstream tasks, such as analysis and management in the pipeline, making things simpler and easier for enterprise users.Â
The benefits were improved efficiency and cost savings, leading many to wonder: How will things change for data teams in the years to come?
Gen AI Agents took over data tasks
While agentic capabilities have been around for some time, allowing enterprises to automate certain basic tasks, the rise of generative AI has taken things entirely to the next level.
With gen AI’s natural language processing and tool use capabilities, agents can go beyond simple reasoning and answering to actually planning multi-step actions, independently interacting with digital systems to complete actions while collaborating with other agents and people at the same time. They also learn to improve their performance over time.
Cognition AI’s Devin was the first major agentic offering, enabling engineering operations at scale. Then, bigger players began providing more targeted enterprise and personal agents powered by their models.Â
In a conversation with VentureBeat earlier this year, Google Cloud’s Gerrit Kazmaier said he heard from customers that their data practitioners constantly faced challenges including automating manual work for data teams, reducing the cycle time of data pipelines and analysis and simplifying data management. Essentially, the teams were not short on ideas on how they could create value from their data, but they lacked the time to execute those ideas.
To fix this, Kazmaier explained, Google revamped BigQuery, its core data infrastructure offering, with Gemini AI. The resulting agentic capabilities not only provide enterprises the ability to discover, cleanse and prepare data for downstream applications — breaking down data silos and ensuring quality and consistency — but also support pipeline management and analysis, freeing up teams to focus on higher-value tasks.Â
Multiple enterprises today use Gemini’s agentic capabilities in BigQuery, including fintech company Julo, which tapped Gemini’s ability to understand complex data structures to automate its query generation process. Japanese IT firm Unerry also uses Gemini SQL generation capabilities in BigQuery to help its data teams deliver insight more quickly.
But, discovering, preparing and assisting with analysis was just the beginning. As the underlying models evolved, even granular data operations — pioneered by startups specializing in their respective domains — were targeted with deeper agent-driven automation.
For instance, AirByte and Fastn made headlines in the data integration category. The former launched an assistant that created data connectors from an API documentation link in seconds. Meanwhile, the latter enhanced its broader application development offering with agents that generated enterprise-grade APIs — whether it’s for reading or writing information on any topic — using just a natural language description.Â
San Francisco-based Altimate AI, for its part, targeted different data operations including documentation, testing and transformations, with a new DataMates tech, which used agentic AI to pull context from the entire data stack. Multiple other startups, including Redbird and RapidCanvas, also worked in the same direction, claiming to offer AI agents that can handle up to 90% of data tasks required in AI and analytics pipelines.Â
Agents powering RAG and more
Beyond wide-ranging data operations, agentic capabilities have also been explored in areas such as retrieval-augmented generation (RAG) and downstream workflow automation. For instance, the team behind vector database Weaviate recently discussed the idea of agentic RAG, a process allowing AI agents to access a wide range of tools — like web search, calculator or a software API (like Slack/Gmail/CRM) — to retrieve and validate data from multiple sources to enhance the accuracy of answers.
Further, towards the end of the year, Snowflake Intelligence appeared, giving enterprises the option to set up data agents that could tap not only business intelligence data stored in their Snowflake instance, but also structured and unstructured data across siloed third-party tools — such as sales transactions in a database, documents in knowledge bases like SharePoint and information in productivity tools like Slack, Salesforce and Google Workspace.Â
With this additional context, the agents surface relevant insights in response to natural language questions and take specific actions around the generated insights. For instance, a user could ask their data agent to enter the surfaced insights into an editable form and upload the file to their Google Drive. They could even be prompted to write to Snowflake tables and make data modifications as needed.
Much more to come
While we may not have covered every application of data agents seen or announced this year, one thing is pretty clear: The technology is here to stay. As gen AI models continue to evolve, the adoption of AI agents will move at full steam, with most organizations, regardless of their sector or size, choosing to delegate repetitive tasks to specialized agents. This will directly translate into efficiencies.
As evidence of this, in a recent survey of 1,100 tech executives conducted by Capgemini, 82% of the respondents said they intend to integrate AI-based agents across their stacks within the next 3 years — up from a current 10%. More importantly, as many as 70 to 75% of the respondents said they would trust an AI agent to analyze and synthesize data on their behalf, as well as handle tasks such as generating and iteratively improving code.
This agent-driven shift would also mean significant changes to how data teams function. Currently, agents’ outcomes are not production-grade, which means a human has to take over at some point to fine-tune the work for their needs. However, with a few more advancements over the coming years, this gap will most likely go away — giving teams AI agents that would be faster, more accurate and less prone to the errors usually made by humans.Â
So, to sum up, the roles of data scientists and analysts that we see today are likely to change, with users possibly moving to the AI oversight domain (where they could keep an eye on AI’s actions) or higher-value tasks that the system could struggle to perform.
Source link