gRPC can be used through Vertex Prediction private endpoint, but it is not yet officially supported. See sample here: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/vertex_endpoints/optimized_tensorflow_runtime/tabular_optimized_online_prediction.ipynb

Answer from Aleksey Vlasenko on Stack Overflow
🌐
Google Cloud
cloud.google.com › vertex ai › api usage overview
API usage overview | Vertex AI | Google Cloud Documentation
This guide provides an overview of using the Vertex AI API and its reference documentation. You can access the API via REST, gRPC, or one of the provided client libraries (built on gRPC).
BigQuery
BigQuery is the autonomous data and AI platform, automating the entire data life cycle so you can go from data to AI to action faster.
Text-to-Speech
Convert text to lifelike audio with Gemini-powered AI voices. Choose from 380+ natural-sounding voices across 75+ languages and variants.
Google Cloud Pricing Calculator
Create your own Custom Price Quote for the products offered through Google Cloud based on number, usage, and power of servers
🌐
Hoop
hoop.dev › blog › the-simplest-way-to-make-vertex-ai-grpc-work-like-it-should
The Simplest Way to Make Vertex AI gRPC Work Like It Should
October 17, 2025 - Vertex AI gRPC connects directly to Google’s managed ML endpoints using the gRPC protocol, letting services exchange data efficiently with less overhead. When tuned right, this link feels like a private fiber optic line for your models. It cuts payload sizes, supports streaming inference, and integrates identity checks through Google Application Credentials or external OIDC tokens.
Discussions

Vertex AI: Getting a GRPC Exception when sending a prediction request in Java
Hi, I have deployed a custom model (from a Docker image) in a Vertex AI endpoint. When I try to get a prediction in Java with the following code: private PredictResponse predict(String endpointId, String query, String… More on googlecloudcommunity.com
🌐 googlecloudcommunity.com
1
1
June 7, 2023
Vertex AI on GCP(Google Cloud)
You’ll find that most managed/serverless services usually cost more than rolling your own, even on the same cloud provider. There are commercial reasons for that which I won’t get into now, suffice to say that if you aren’t running a fairly highly volume critical production system, services like vector search on vertex probably isn’t for you. Either keep using pinecone, or spin up a qdrant instance on a compute engine vm (or cloud run service if you’re feeling adventurous). More on reddit.com
🌐 r/googlecloud
5
2
January 6, 2026
Vertex AI: Getting a GRPC Exception when sending a prediction request in Java
Hi, I have deployed a custom model (from a Docker image) in a Vertex AI endpoint. When I try to get a prediction in Java with the following code: private PredictResponse predict(String endpointId, String query, String… More on discuss.google.dev
🌐 discuss.google.dev
1
1
June 7, 2023
google cloud platform - How to make a prediction to a private Vertex AI endpoint with Node.js client libraries? - Stack Overflow
Documentation on this is a bit vague at the time of posting https://cloud.google.com/vertex-ai/docs/predictions/using-private-endpoints#sending-prediction-to-private-endpoint , they only mention ho... More on stackoverflow.com
🌐 stackoverflow.com
🌐
LangChain
api.python.langchain.com › en › latest › google_vertexai › llms › langchain_google_vertexai.llms.VertexAI.html
VertexAI — 🦜🔗 LangChain documentation
Desired API endpoint, e.g., us-central1-aiplatform.googleapis.com · param api_transport: str | None = None# The desired API transport method, can be either ‘grpc’ or ‘rest’. Uses the default parameter in vertexai.init if defined. param cache: BaseCache | bool | None = None# Whether to cache the response.
🌐
Google Cloud
cloud.google.com › blog › products › ai-machine-learning › reliable-ai-with-vertex-ai-prediction-dedicated-endpoints
Reliable AI with Vertex AI Prediction Dedicated Endpoints | Google Cloud Blog
May 5, 2025 - Announcing Vertex AI Prediction Dedicated Endpoints, a new family of Vertex AI Prediction endpoints, designed to address the needs of modern AI applications.
🌐
Google
docs.cloud.google.com › google distributed cloud › air-gapped › 1.15.3 (latest) › vertex ai api overview
Vertex AI API overview | Google Distributed Cloud air-gapped | Google Cloud Documentation
To get the endpoints for the pre-trained APIs, view service status and endpoints. You can access the pre-trained APIs using gRPC or one of the provided client libraries. The client libraries are built on gRPC.
🌐
GitHub
github.com › GoogleCloudPlatform › vertex-ai-samples › blob › main › notebooks › community › vertex_endpoints › optimized_tensorflow_runtime › tabular_optimized_online_prediction.ipynb
vertex-ai-samples/notebooks/community/vertex_endpoints/optimized_tensorflow_runtime/tabular_optimized_online_prediction.ipynb at main · GoogleCloudPlatform/vertex-ai-samples
Here we upload the same model using TF2.7 GPU and Vertex AI Prediction optimized TensorFlow runtime images.\n", ... "In order to be able to send requests to your models over gRPC, you need to set `model_name` argument and update `predict_route` and `health_route` accordingly.\n",
Author   GoogleCloudPlatform
Find elsewhere
🌐
xAI Docs
docs.x.ai › developers › community › google-cloud-vertex-ai
Google Cloud Vertex AI | xAI Docs
6 days ago - This guide walks through setting up and using Grok models on Google Cloud Vertex AI / Gemini Enterprise Agent Platform. Grok on Vertex AI is accessed as a partner model through the OpenAI-compatible API, including the Responses API and Chat Completions.
🌐
Google Developer forums
googlecloudcommunity.com › google cloud › build with ai › custom ml & mlops
Vertex AI: Getting a GRPC Exception when sending a prediction request in Java - Custom ML & MLOps - Google Developer forums
June 7, 2023 - Hi, I have deployed a custom model (from a Docker image) in a Vertex AI endpoint. When I try to get a prediction in Java with the following code: private PredictResponse predict(String endpointId, String query, String project, String location) throws IOException { try (PredictionServiceClient serviceClient = getPredictionServiceClient()) { EndpointName endpointName = EndpointName.of(project, location, endpointId); ListValue.Builder listValue = ListValue.newBuilder(); ...
🌐
Google
docs.cloud.google.com › gemini enterprise agent platform › agent platform api
Package google.rpc | Vertex AI | Google Cloud Documentation
June 27, 2025 - Vertex AI · Reference · Send feedback · Code (enum) Status (message) The canonical error codes for gRPC APIs. Sometimes multiple error codes may apply. Services should return the most specific error code that applies. For example, prefer OUT_OF_RANGE over FAILED_PRECONDITION if both codes apply.
🌐
Google Cloud
cloud.google.com › google distributed cloud air-gapped › prediction grpc api
Prediction gRPC API | Google Distributed Cloud air-gapped | Google Cloud
July 10, 2025 - PredictionService represents a model API to serve online predictions from customized models · This is the gRPC API reference for the Online Predictions pre-trained API. Use this API to get online predictions using your custom-trained models
🌐
Reddit
reddit.com › r/googlecloud › vertex ai on gcp(google cloud)
r/googlecloud on Reddit: Vertex AI on GCP(Google Cloud)
January 6, 2026 -

Me and my team have been developing a RAG app using Pinecone for vector DB, AWS for storage , google cloud for Gemini API , Digital Ocean for web admin hosting
There was a lot of third parties involved and we decided to use one platform that offers all services at once to prevent latency and cross platform billing but to also leverage its robust AI(GCP) , and so we opted for GCP and it was my decision , so looking into replacing all these technologies to use GCP , we found out that using vector search for vertex AI could replace pinecone , so i created the index and also deployed it before the developer responsible for deploying the ai-service could do the testing for the index .
So google could charge us over 100USD per day before anything worked out , not even anything was put to the created index
So we deactivated and now wondering what the issue was

I come here to seek help and advice regarding RAG using the vector search index for vertex AI , is there anyone who has used it and they were successful without being overcharged , what are its advantages and disadvantages faced

🌐
Google Cloud
cloud.google.com › gemini enterprise agent platform › agent platform api
Agent Platform API | Gemini Enterprise Agent Platform | Google Cloud Documentation
The Agent Platform API lets you manage Agent Platform resources in Google Cloud · To call this service, we recommend that you use the Google-provided client libraries. If your application needs to use your own libraries to call this service, use the following information when you make the ...
🌐
Packagist
packagist.org › packages › google › cloud-ai-platform
google/cloud-ai-platform - Packagist.org
ext-protobuf: Provides a significant increase in throughput over the pure PHP protobuf implementation. See https://cloud.google.com/php/grpc for installation instructions. ... This package is auto-updated. ... Idiomatic PHP client for Google Cloud Vertex AI.
🌐
Leanware
leanware.co › home › tech blog › what is gcp vertex ai? ultimate guide
What Is GCP Vertex AI? Ultimate Guide
November 13, 2025 - Vertex AI runs on Google's infrastructure, which includes NVIDIA GPUs (T4, V100, A100, H100) for general-purpose training and Google's custom TPUs (v2, v3, v5e) for large-scale workloads.
🌐
Elastic
elastic.co › elastic docs › reference › ingestion tools › elastic integrations › google cloud › gcp vertex ai
GCP Vertex AI | Elastic integrations
It aims to streamline and expedite ... Cloud Platform (GCP) Vertex AI allows you to gather metrics such as token usage, latency, overall invocations, and error rates for deployed models....