- Predict project risk sentiment using NLP and machine learning
- Develop and Deploy an NLP Model
You can use the FastAI (v2) deep learning library to develop the NLP model to implement this solution.
FastAI uses a transfer learning approach which is relatively recent (2019) in NLP practice. Originating in image recognition modeling, it allows you to use small datasets and transfer learning and leverage pretrained (usually large) models. It then adds or adjusts the final layers for a specific classification. Before the advent of deep neural networks and transfer learning, classic NLP development involved tokenization and statistical analysis of word occurrences. So, before jumping in to transfer learning, review the sample Jupyter Notebook which presents some classic statistical analysis. The statistical analysis gives sanity checks on the dataset and adheres to the modern principle that the more you check and recheck your data, the better.
You'll need to develop a model for this solution to work. The model used in this solution is based on NLP using a deep learning library called FastAI.
Get started using the following samples available on Github:
- A Jupyter Notebook in the Github repository to demonstrate the model development process using the sample data.
- A dataset of project and task status reports in this solution.
Note:
Consider providing more datasets in the project and task management domain to improve model training and prediction accuracy.
The following are the steps to develop and deploy this model:
- Deploy the final model binary file
(export.pkl)
created using Jupyter Notebook. - Upload it to the Oracle Cloud Infrastructure Data Science model.
- Deploy it to the ODS model deployment running on Oracle Cloud Infrastructure Compute.
- Expose it as a REST API for the Oracle Functions application to call.
For your convenience, we provided a Python script (scripts/ModelDeployment.py)
to demonstrate how to use OCI Python API to create and deploy a model to ODS.
Develop a Model
You will now learn about the model-building processes using the example Jupyter notebook. Follow these steps:
-
Set up Jupyter notebook
Set up the Jupyter notebook in a conda environment defined in the environmentYAML
file for this solution. -
Prepare dataset
Use the sample dataset provided in aCSV
file. Review and analyze the data to enable you to create a good machine learning model. Include additional data in the project or task management domain if required. -
Process dataset
Process your dataset using the framework provided which makes it easier for you to load and process the data. This includes tokenization, splitting the training and validation data, and numericalization.See Also5 Steps to Create a Basic Machine Learning Model using PythonThe 7 Key Steps To Build Your Machine Learning ModelRandom Forest Regression in Python Sklearn with Example - MLK - Machine Learning Knowledge30+ Best Python Project Ideas -Easy, Intermediate, and Advanced Ideas - 2023 - Machine Learning Projects -
Perform statistical analysis
Analyze the proportion of reports with each status label; Red, Amber, and Green. For example, look for the most frequently occurring words in Red status. Using the analysis, you can find a way to make a classifier based on it called Naive Bayes. -
Create NLP model using FastAI
Use FastAI to create a language model based on the vocabulary of the dataset. Then use a transfer learning technique to integrate the language model with the core model (AWD_LSTM
). -
Test and validate your model
Use some sample data to test your model. -
Create a model binary file
Export the model to a pickle file (pkl
) and upload it to the OCI Data Science platform.
After you have the model binary file, you must package the binary file together with the model runtime and custom logic Python script (score.py
).
Upload and Deploy the Model
You can now upload it to the Oracle Cloud Infrastructure Data Science (ODS) platform and expose it as a REST API.
Use the model deployment Python script (ModelDeployment.py
) to learn how to use the OCI Python SDK and create and deploy the model to ODS. Additionally use the included example custom logic Python script to load serialized model objects to the memory, and define an inference endpoint, predict()
.
The model deployment Python script performs the following steps to deploy the model:
- Retrieves the existing model from Model Catalog and Model Deployment.
- Deactivates the existing model in Model Deployment.
- Deletes the existing model artifact from Model Catalog.
- Uploads the artifacts
export.pkl
,runtime.yaml
andscore.py
in a compressed ZIP format to Model Catalog. - Creates or updates Model Deployment.
Modify the following Python script and provide the correct parameter values for your environment:
compartmentID="<YOUR OCI COMPARTMENT OCID>" projectID="<YOUR ODS PROJECT OCID>" modelDisplayName="Risk Predictor" modelDescription="Risk Predictor Sample" modelDeploymentDisplayName="Risk Predictor" modelDeploymentDescription="Risk Predictor Sample" modelDeploymentInstanceCount=1 modelDeploymentLBBandwidth=10 #mbps modelDeploymentPreditLogID="<YOUR ODS DEPLOYMENT PREDICT LOG OCID>" modelDeploymentPreditLogGroupID="<YOUR ODS DEPLOYMENT PREDICT LOG GROUP OCID>" modelDeploymentAccessLogID="<YOUR ODS DEPLOYMENT ACCESS LOG OCID>" modelDeploymentAccessLogGroupID="<YOUR ODS DEPLOYMENT ACCESS LOG GROUP OCID>" modelDeploymentInstanceShapeName="<OCI VM SHAPE , FOR EXAMPLE: VM.Standard2.1>"
Alternatively, you can update the parameters from the OCI Console.
Follow the instructions in About Model Catalog in the Oracle Cloud Infrastructure Data Science documentation linked in the Explore section.
Note:
The maximum model artifact file size limit is 2 GB and the upload file size limit using the OCI Console is 100 MB. In most cases the NLP model binary file will exceed this limit.
You can choose the Virtual Machine (VM) shape when you deploy the model. See the Oracle Cloud Infrastructure Data Science Models and VM Shapes documentation linked in the Explore section to find the supported shapes.
Create an ODS Project
Follow these instructions to create a project and populate the ODS Project ID:
- In the OCI Console, open the compartment you used to create the ODS Project.
- In the navigation menu, under Analytics & AI, click Data Science.
- Click Create project to create a Data Science project.
- After the project is created, copy the ODS Project OCID, and paste it in the Python script parameter under project ID.
Create a Log Group and Log
For Model Deployment Predict and Access Logs, you must first create a log group and then create a log shared by both events.
Follow these steps:
- In the OCI Console, confirm that you are viewing the compartment you used to create the ODS Project.
- In the navigation menu, under Observability & Management, click Log Groups.
- Click Create Log Group to create a log group for the model deployment. The predict and access log can use the same log group.
- After the log group is created, copy the log group OCID, and paste it in the Python script parameter under
modelDeploymentPreditLogGroupID
andmodelDeploymentAccessLogGroupID
. - While you are still in the Logging service screen, click Logs.
- Click Create custom log to create a custom log and select the log group you created earlier.
- After the log is created, copy the log
OCID
, and paste it in the Python script parameter undermodelDeploymentPreditLogID
andmodelDeploymentAccessLogID
.
Develop Functions
You can now develop and deploy your functions to complete the implementation of this solution.
You implement multiple API calls to backend applications and perform data aggregation, filtering, and manipulation.
OCI Health Checks does the following:
- Calls Oracle Functions at the end of each day.
- Gets the task progress using the Oracle Project Management REST API.
- Identifies the project manager using the Oracle Cloud HCM REST API.
- Retrieves the OSN task wall conversation and the text comments posted by the project team member.
Oracle Functions then posts the prediction about the project status to the project manager's OSN wall.
For the purposes of this example, we have used Java as the language and the Apache HttpClient
library to connect to the REST Service. This example uses the Apache library because it's easy to use and implement. Alternatively, you can use the new HTTP client that Java 11 provides. You also use the OCI Java SDK to retrieve the resource principal and the secret stored in Oracle Cloud Infrastructure Vault.
The sample Functions Java code executes the following steps:
- Retrieve the credential/secret from Oracle Cloud Infrastructure Vault.
- Retrieve the project details from Oracle Project Management.
- Retrieve the project manager person ID from HCM.
- Retrieve the username of the project manager in HCM.
- Retrieve the project manager wall details in OSN.
- Retrieve all tasks for the project using the project ID.
- For each project and task:
- Retrieve the social object ID using the task external ID.
- Retrieve the latest comment in OSN.
- Send the comment to the Oracle Cloud Infrastructure Data Science platform to predict the sentiment.
Tip:
Oracle recommends that you avoid frameworks that instantiate many in-memory objects when calling REST APIs. These objects are discarded on each call and may slow down the function's execution.
Store Credentials in Oracle Vault
Store the Fusion Applications Suite user credentials in Oracle Cloud Infrastructure Vault.
To create a secret, follow the instructions linked OCI Vault documentation in the Explore section.
Create an app
Run the following command to create an Oracle Functions application.
fn create app RiskPredictor --annotation oracle.com/oci/subnetIds='["<subnet-ocid>"]'
Configure Environment Parameters
You can define certain parameters in Oracle Functions OCI environment and then reference those parameters from your code.
For this use case, you must set the following parameters used in this sample code. After creating the app, you can execute the following commands to create the configuration parameters. Then modify the values with the correct values for your configuration and environment:
fn config app RiskPredictor FA_BASE_URL <ORACLE PROJECT MANAGEMENT BASE URL>fn config app RiskPredictor ODS_REST_API <ODS MODEL ENDPOINT>fn config app RiskPredictor PROJECT_ID_LIST <PROJECT ID IN ORACLE PROJECT MANAGEMENT>fn config app RiskPredictor FA_USER <ORACLE PROJECT MANAGEMENT USER NAME>fn config app RiskPredictor OCI_REGION <OCI REGION>fn config app RiskPredictor FA_SECRET_OCID <FA SECRET OCID>
You can provide multiple project IDs in Oracle Project Management in the Project ID list, separated by a commas.