A Unique One Solution for ML with NO CODING: Google’s Vertex AI is the best platform to train ML models using the AutoML feature. By Vaishnavee Baskaran
Google Cloud Platform brought a unique platform for all AI solutions. Vertex AI brings AutoML and AI Platform together into a unified API, client library, and user interface. In Vertex AI, you can easily train and compare models using AutoML or custom code training.
In this blog, we will see how to build simple Time Series Forecasting Models using AutoML. Also, AutoML lets you, train models, on the image, tabular, text, and video datasets without writing code.
AutoML Forecasting Model
In GCP, the Vertex AI is a unified UI for the entire ML workflow. To train and build the machine learning models, the first step is to select a suitable training dataset and prepare them. From the GCP products, under Artificial Intelligence, select Vertex AI and enter the dashboard UI as shown in the below screenshot.
Select Create Dataset as highlighted in the above screenshot to add the data source as the training dataset. You will be taken to the Create Dataset page where you should type the Dataset name. Vertex AI is one AI platform for every ML tool you need. It allows tabular, image, textual, and video formats to build entirely new intelligent applications or existing applications including translation and speech-to-text.
In this blog, we will go through the steps to building forecasting models using AutoML. Hence, we will select ‘Tabular’ to predict the target column value using the Classification/Regression and Forecasting algorithm. Click ‘Forecasting’ to prepare training data for time series forecasting.
The data source can be uploaded from a local computer/Cloud Storage/BigQuery. I will upload the table from BigQuery as training data for our model.
Choose “Select a table or view from BigQuery” and then give the table or view path in the given BigQuery Path as per the format ‘projectId.datasetId.tableId’.
As shown in the screenshot below, you will be asked to select Series Identifier Column and Timestamp Column. The series identifier column is the variable that uniquely identifies observations in a time series. For example, the product_id column of our table distinguish fields from each other (other product ids) in a series of particular time periods or interval (we have several product ids in our data, also termed as multiple time series) and the Timestamp column is a periodic field in the table.
After you selected the required fields as highlighted above, now select Train New Model. As shown in the below screenshot, the dataset name is product_price_forecasting and I chose AutoML for training our model. We can also create custom training by coding DL libraries like tf, ML libraries such as sklearn, and boosting algorithms which we will see in the further blogs in our Time series forecasting blog series.
Google has added a new model training method called Seq2Seq+ which means sequence to sequence. The model takes a sequence of items and outputs the sequence of items, simply as an encoder and decoder using a deep learning algorithm. We will also see this type of training model in our series of blogs soon.
After you select AutoML and click Continue, you will be navigated to the Model details page as in the image below and you should give the model an appropriate name.
Target column – the value which needs to be predicted.
E.g.: the price column
Data Granularity – the time gap between each entry in the timestamp column.
E.g.: monthly tells that the timestamp column in our table has month-wise entries.
Holiday regions – select the respective holiday region of your training data.
E.g.: EMEA (Europe)
Forecast Horizon – is the number of future timesteps the model predicts after the recent timestamp of the given data.
E.g.: In our case, I have taken the granularity as monthly. Hence Forecast horizon 7 means our model predicts the product price for the next 7 months.
Context Window – is 0 to 5 times the forecast horizon values. Hence, in our case, we can give a value ranging from 0 to 35.
You can also export your test dataset into your BigQuery project by giving the appropriate destination path of BigQuery. After clicking on Continue, the timestamp column, target column, and series identifier column will be automatically given the required feature type and Available at forecast options. For the rest of the columns, we need to specify the Feature type for how that feature relates to its time series, and whether it is available at the forecast time. Learn more about feature type and availability.
Once you finish generating statistics which takes a while, you should continue to the next step ‘compute and pricing’. As recommended. The node hours are calculated by the number of rows in our dataset.
Let’s have a break! Since it takes hours, you can also check our other blog ‘Introduction of BiqQuery ML’.
Now our model has been successfully trained. Go to models in Vertex AI and select your model’s name to evaluate it.
In the below screenshot, you could see the evaluation results of our trained model. The feature importance of each column is visualized through a bar graph and evaluation metrics such as MAE, MAPE, and RMSE are calculated automatically.
The next step is Batch Prediction. Select Batch predicts and creates a new batch prediction by giving the source and destination path of BigQuery in the given format.
Take a Coffee Break !!!
Once the batch prediction is done, you will receive an auto-generated mail in your Gmail account as you received for model training.
Select the finished batch prediction model and go to the export location of BigQuery. You will see the prediction results in the specified destination in the BigQuery path with the name starting with ‘prediction---------‘. If you open the table, you could see three columns such as predicted_price.value, predicted_price.lower_bound, and predicted_price.upper_bound as shown in the below screenshot.
The model predicted pretty good results compared to the actual price value. Now our next step is to visualize the predicted forecast price value in Google Data Studio. In BigQuery, go to the batch prediction export location and select Export. You will see the options as shown in the below screenshot.
Choose to Explore with Data Studio and add a Time series chart to visualize the forecast results of the trained model.
Vertex AI is a simplified, non-coding, high-speed solution for storing and organizing Machine Learning tasks with less complicated input data. The dataset can be directly used from the cloud data warehouse BigQuery and we can even run AutoML script straight away in sophisticated BigQueryML. We will see about mind-blowing features of BigQueryML and how to build ML models using standard SQL queries in the upcoming blog post of this series.
This post is part of the Data Analytics series from datadice enlightening GCP AI in building ML models and unfolding data handling, manipulation, and visualization techniques using Google Cloud Services.
Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics
We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.
If you want to learn more about how to use Google Data Studio and take it to the next level in combination with BigQuery, check out our Udemy course here.
If you are looking for help setting up a modern and cost-efficient data warehouse or analytical dashboard, send us an email at email@example.com and we will schedule a call.