Google unveiled a wide array of new generative AI-powered services at its Google Cloud Next 2023 conference in San Francisco on August 29. At the pre-briefing, we got an early look at Google’s new Cloud TPU, A4 virtual machines powered by NVIDIA H100 GPUs and more.
Vertex AI increases capacity, adds other improvements
June Yang, vice president of cloud AI and industry solutions at Google Cloud, announced improvements to Vertex AI, the company’s generative AI platform that helps enterprises train their own AI and machine learning models.
Customers have asked for the ability to input larger amounts of content into PaLM, a foundation model under the Vertex AI platform, Yang said, which led Google to increase its capacity from 4,000 tokens to 32,000 tokens.
Customers have also asked for more languages to be supported in Vertex AI. At the Next ’23 conference, Yang announced PaLM, which resides within the Vertex AI platform, is now available in Arabic, Chinese, Japanese, German, Spanish and more. That’s a total of 38 languages for public use; 100 additional languages are now options in private preview.
SEE: Google opened up its PaLM large language model with an API in March. (TechRepublic)
Vertex AI Search, which lets users create a search engine inside their AI-powered apps, is available today. “Think about this like Google Search for your business data,” Yang said.
Also available today is Vertex AI Conversation, which is a tool for building chatbots. Search and Conversion were previously available under different product names in Google’s Generative AI App Builder.
Improvements to the Codey foundation model
Codey, the text-to-code model inside Vertex AI, is getting an upgrade. Although details on this upgrade are sparse, Yang said developers should be able to work more efficiently on code generation and code chat.
“Leveraging our Codey foundation model, partners like GitLab are helping developers to stay in the flow by predicting and completing lines of code, generating test cases, explaining code and many more use cases,” Yang noted.
Match your business’ art style with text-to-image AI
Vertex’s text-to-image model will now be able to perform style tuning, or matching a company’s brand and creative guidelines. Organizations need to provide just 10 reference images for Vertex to begin to work within their house style.
New additions to Model Garden, Vertex AI’s model library
Google Cloud has added Meta’s Llama 2 and Anthropic’s Claude 2 to Vertex AI’s model library. The decision to add Llama 2 and Claude 2 to the Google Cloud AI Model Garden is “in line with our commitment to foster an open ecosystem,” Yang said.
“With these additions compared with other hyperscalers, Google Cloud now provides the widest variety of models to choose from, with our first-party Google models, third-party models from partners, as well as open source models on a single platform,” Yang said. “With access to over 100 curated models on Vertex AI, customers can now choose models based on modality, size, performance latency and cost considerations.”
BigQuery and AlloyDB upgrades are ready for preview
Google’s BigQuery Studio — which is a workbench platform for users who work with data and AI — and AlloyDB both have upgrades now available in preview.
BigQuery Studio added to cloud data warehouse preview
BigQuery Studio will be rolled out to Google’s BigQuery cloud data warehouse in preview this week. BigQuery Studio assists with analyzing and exploring data and integrates with Vertex AI. BigQuery Studio is designed to bring data engineering, analytics and predictive analysis together, reducing the time data analytics professionals need to spend switching between tools.
Users of BigQuery can also add Duet AI, Google’s AI assistant, starting now.
AlloyDB enhanced with generative AI
Andy Goodman, vice president and general manager for databases at Google, announced the addition of generative AI capabilities to AlloyDB — Google’s PostgreSQL-compatible database for high-end enterprise workloads — at the pre-brief. AlloyDB includes capabilities for organizations building enterprise AI applications, such as vector search capabilities up to 10 times faster than standard PostgreSQL, Goodman said. Developers can generate vector embeddings within the database to streamline their work. AlloyDB AI integrates with Vertex AI and open source tool ecosystems such as LangChain.
“Databases are at the heart of gen AI innovation, as they help bridge the gap between LLMs and enterprise gen AI apps to deliver accurate, up to date and contextual experiences,” Goodman said.
AlloyDB AI is now available in preview through AlloyDB Omni.
A3 virtual machine supercomputing with NVIDIA for AI training revealed
General availability of the A3 virtual machines running on NVIDIA H100 GPU as a GPU supercomputer will open next month, announced Mark Lohmeyer, vice president general manager for compute and machine learning infrastructure at Google Cloud, during the pre-brief.
The A3 supercomputers’ custom-made 200 Gbps virtual machine infrastructure has GPU-to-GPU data transfers, enabling it to bypass the CPU host. The GPU-to-GPU data transfers power AI training, tuning and scaling with up to 10 times more bandwidth than the previous generation, A2. The training will be three times faster, Lohmeyer said.
NVIDIA “enables us to offer the most comprehensive AI infrastructure portfolio of any cloud,” said Lohmeyer.
Cloud TPU v5e is optimized for generative AI inferencing
Google introduced Cloud TPU v5e, the fifth generation of cloud TPUs optimized for generative AI inferencing. A TPU, or Tensor Processing Unit, is a machine learning accelerator hosted on Google Cloud. The TPU handles the massive amounts of data needed for inferencing, which is a logical process that helps artificial intelligence systems make predictions.
Cloud TPU v5e boasts two times faster performance per dollar for training and 2.5 times better performance per dollar for inferencing compared to the previous-generation TPU, Lohmeyer said.
“(With) the magic of that software and hardware working together with new software technologies like multi-slice, we’re enabling our customers to easily scale their [generative] AI models beyond the physical boundaries of a single TPU pod or a single TPU cluster,” said Lohmeyer. “In other words, a single large AI workload can now span multiple physical TPU clusters, scaling to literally tens of thousands of chips and doing so very cost effectively.”
The new TPU is generally available in preview starting this week.
Introducing Google Kubernetes Engine Enterprise edition
Google Kubernetes Engineer, which many customers use for AI workloads, is getting a boost. The GKE Enterprise edition will include muti-cluster horizontal scaling and GKE’s existing services running across both cloud GPUs and cloud TPUs. Early reports from customers have shown productivity gains of up to 45%, Google said, and reduced software deployment times by more than 70%.
GKE Enterprise Edition will be available in September.