The answer you referred to will likely work (I've also not fully tested it). The Cloud ML Engine team is working hard to simplify this.
In the meantime, if you prefer working with a VM (just remember to shut it off when not in use), follow instructions on this page. For example,
export IMAGE_FAMILY="pytorch-latest-cu91"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-instance"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator='type=nvidia-tesla-v100,count=1' \
--metadata='install-nvidia-driver=True'
Answer from rhaertel80 on Stack OverflowCan we deploy Pytorch models to Google Cloud ML Engine? If so, how? - Stack Overflow
Deploy ML model on GCP
PyTorch on Google Cloud: How to deploy PyTorch models on Vertex AI - AI Discussions - DeepLearning.AI
[D] Best way to host an image classification model for predictions on Google Cloud
Videos
Hello experts,
What is the most practical way to serve an ML model on GCP for daily batch predictions. The received batch has to go through multiple preprocessing and feature engineering steps before being fed to the model to produce predictions. The preprocessing is done using pandas (doesn't utilize distributed processing). Therefore, I am assuming a vertically scalable instance has to be triggered at inference time. Based on your experience, what should I use? I am thinking cloud functions that consist of multiple preprocessing steps and then calls the model for predictions.