Sumeet Tandure, senior manager of sales engineering at Snowflake, spoke about the current state and future direction of data engineering, highlighting key disruptions and the principles shaping the practice.
Speaking at AIM’s DES 2025 event, he said, “This is the year when we will see more and more use cases going into production. Over the last couple of years, we saw a lot of experimentation and pilots…and now it seems like…more of these use cases will actually make it to production.”
He outlined core principles for a modern data engineering practice, centred on simplification, openness, productivity, strong governance, and operational efficiency.
He added that this transition from experimentation to production necessitates a robust data engineering foundation. Deriving return on investment (ROI) from these projects is also a critical focus. Tandure stressed that when it comes to enterprise AI, GenAI products cannot be productionised if the data is inadequate.
Tandure observed that LLMs’ impact on unstructured data mirrors SQL’s impact on structured data. While unstructured data constitutes much of the world’s data, extracting value from it is a complex procedure. LLMs make it “very easy to get value from PDF, audio, video and all sorts of multimodal data”.
Tandure gave the example of Snowflake’s Document AI, which excels at extracting certain specific fields from documents. “Once the fields are defined via a UI, a function is generated to automate extraction into structured tables,” he said.
This, according to him, empowers data engineers to treat unstructured data as another source, enabling a very tight coupling between structured and unstructured data, which was difficult earlier.
The second disruption is data interoperability. This involves the ability for multiple engines to talk to the same data, regardless of which engine it originated from. Pointing to the rise of open table formats like Iceberg, Tandure stated, “The vendor support, which has built around formats like Iceberg, has become phenomenal.”
He added that Iceberg has seen significant adoption, with contributions from different customers, vendors, and partners.
Moreover, Tandure highlighted that the Iceberg ecosystem is evolving beyond just table formats. “What really is happening now in the Iceberg space is that the catalogues are also becoming open source,” he said, adding that this shift reduces vendor lock-in. “Regardless of the vendor, you can actually make use of the same catalogues.”
He added that these layers can then come together again in a completely open fashion and can be used with multiple interoperable engines, allowing multiple engines to talk to the same data and maintaining the same level of governance.
Improving Developer Productivity
Tandure further explained that enterprises building search engines and chatbots often rely on Retrieval-Augmented Generation (RAG) pipelines.
He said that, typically, data flows into the data site, passes through AI systems for embedding, chunking, and other processing, and is then written back. To simplify this process, Snowflake offers Cortex Search, which handles embedding, chunking, and indexing on the backend, requiring only minimal preprocessing.
This product from Snowflake reflects a move towards fewer, simpler pipelines. “The best pipelines are those which do not have to be built,” Tandure said. This, he added, is made possible by dynamic tables, which use a single declarative SQL statement to handle incremental updates. “It keeps calculating automatically at a periodic interval on an incremental data set and keeps persisting those changes.”
Snowflake also supports data sharing to eliminate the need for pipelines altogether. Native integrations with platforms like Salesforce and ServiceNow allow direct data exchange. “You do not have to build the pipeline,” Tandure said.
The acquisition of Datavolo is now embedded in Snowflake OpenFlow, allowing users to configure connectors natively within Snowflake without external ETL frameworks.
The company is also pushing DevOps principles into data operations. Through the use of Python APIs, Snowflake CLI, and declarative change management, users can integrate CI/CD workflows. “You can push those changes… from the GitHub repo into Snowflake and release these pipelines continuously,” Tandure explained.
Governance and Efficiency at the Core
Tandure said that data governance is framed around three core tasks which are knowing the data, protecting it and maintaining efficiency.
Snowflake supports automatic classification of Indian identifiers like PAN, Aadhaar, and GSTN. “If there are columns which contain this PII (Personally Identifiable Information)…Snowflake will automatically classify that as a semantic category.”
He said that very few platforms offer support for Indian identifiers, but Snowflake has integrated that capability.
According to him, Snowflake offers attribute-based access controls, row-level and column-level masking policies, and data clean rooms. “You only expose the required data amount and not the PII directly.”
He emphasised that the governance capabilities are built in, not added later. “From day one, you have a platform where you can start with governance baked in,” he said. This extends even to AI workloads. “When we do things like embedding and vectorisation, we can actually make sure there are access controls implemented at that level.”
Beyond Data Engineering
Snowflake supports a complete end-to-end data engineering lifecycle. Integration options include Kafka, streaming pipelines, and OpenFlow. Transformations are handled through dynamic tables, Snowpark, and stored procedures. For delivery, options include data sharing and app building using Streamlit.
“You can actually do this end-to-end data engineering framework with Snowflake,” Tandure said, adding that support for notebooks and containers offers flexibility for different user preferences.
He emphasised that data engineering is just one part of the platform’s broader capability. “Snowflake is very well known in the analytics space,” he said, pointing to features like geospatial, time series, and lakehouse analytics.
Tandure concluded, saying that Snowflake continues to move toward a unified data platform where AI, data engineering, governance, and application development coexist with minimal friction.