How machine learning can predict the price of the diamond you desire to buy?

Predicting diamond prices using basic measurement metrics.

Photo by chuttersnap on Unsplash


I want to buy my mother a diamond ring as soon as I have enough money. The other day I went on Google and searched up its prices, but I didn’t know what metrics drove those prices. Therefore, I decided to apply some machine learning techniques to figure out what drives the price of a flawless diamond ring!


Build a web application where users can look up a predicted price for their desired diamonds.


For this project, I used a dataset from pycaret’s dataset folder on GitHub, performed data preprocessing transformations, and built a regression model to predict the price ($326-$18,823) of the diamond using basic diamond measurement metrics. Each diamond in this dataset is given a price. The price of the diamond is determined by 7 input variables:

  1. Carat Weight: 0.2Kg – 5.01Kg
  2. Cut: Fair, Good, Very Good, Premium, Ideal
  3. Color: from J (Worst) to D (Best)
  4. Clarity: I1 (Worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (Best)
  5. Polish: ID (Ideal), EX (Excellent), G (Good), VG (Very Good)
  6. Symmetry: ID (Ideal), EX (Excellent), G (Good), VG (Very Good)
  7. Report: AGSL (American Gem Society Laboratories), GIA (Gemological Institute of America)


  1. Model Training and Validation: Train, validate models and develop a machine learning pipeline for deployment using Python (PyCaret).
  2. Front End Web Application: Build a basic HTML front-end with an input form for independent variables (Carat Weight, Cut, Color, Clarity, Polish, Symmetry, Report).
  3. Back End Web Application: Using a FlaskFramework.
  4. Deployment of the web application: Using Heroku,once deployed, it will become publicly available and can be accessed via a Web URL.

? Project Workflow

Machine Learning Workflow (from Training to Deployment on PaaS)

  ? Task 1 — Model Training and Validation

Training and model validation are performed in Python (Jupyter Notebooks) using PyCaret to develop machine learning pipelines and train regression models. I used the default preprocessing settings in PyCaret (missing value imputation, categorical encoding, etc.)

 from pycaret.regression import *s2 = setup(data, target = 'Price', session_id = 123,
           normalize = True,
           polynomial_features = True, trigonometry_features = True, feature_interaction=True, 
           bin_numeric_features= ['Carat Weight']
Comparison of the transformation in the dataset

This transformed the dataset, it now has 65 features for training derived from only 8 features in the original dataset.

Model training and validation in PyCaret:

# Model Training and Validation 
lr = create_model('lr') 
10 Fold cross-validation of the Linear Regression Model

Here the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) have been significantly impacted.

# plot the trained model
Residual plot of the Linear Regression Model

After building the model, I saved it as a file that can be transferred to and consumed by other applications:

# save transformation pipeline and model 
save_model(lr, 'deployment_28042020') 

When the model is saved in PyCaret, the entire transformation pipeline based on the configuration defined in the setup() function is created. All inter-dependencies are orchestrated automatically. The final machine learning pipeline and the linear regression model is now saved in the save_model() function.

? Task 2 — Front End Web Application

CSS Style Sheet
CSS (also known as Cascading Style Sheets) describes how HTML elements are displayed on a web page. It is an efficient way of controlling the layout of your application. Style sheets contain information such as background color, font size, color, margins, etc. They are saved externally as a .css file and is linked to the HTML code.

  <meta charset="UTF-8">
  <title>Predict Diamond Price</title>
  <link href='' rel='stylesheet' type='text/css'>
<link href='' rel='stylesheet' type='text/css'>
<link href='' rel='stylesheet' type='text/css'>
<link href='' rel='stylesheet' type='text/css'>
<link type="text/css" rel="stylesheet" href="{{ url_for('static', filename='./style.css') }}">


Generally, the front-end of a web application is built using HTML. We have used a simple HTML template and a CSS style sheet to design an input form. Here’s the HTML snippet of the front-end page of our web application.

 <div class="login">
 <h1>Predict Diamond Price</h1><!-- Form to enter new data for predictions  -->
    <form action="{{ url_for('predict')}}"method="POST">
      <input type="text" name="Carat Weight" placeholder="Carat Weight" required="required" /><br>
     <input type="text" name="Cut" placeholder="Cut" required="required" /><br>
        <input type="text" name="Color" placeholder="Color" required="required" /><br>
        <input type="text" name="Clarity" placeholder="Clarity" required="required" /><br>
        <input type="text" name="Polish" placeholder="Polish" required="required" /><br>
        <input type="text" name="Symmetry" placeholder="Symmetry" required="required" /><br>
        <input type="text" name="Report" placeholder="Report" required="required" /><br>
        <button type="submit" class="btn btn-primary btn-block btn-large">Predict</button>

? Task 3— Back End Web Application

The back-end of a web application is developed using a Flask framework. See the sample code snippet of the back-end written using a Flask framework in Python.

from flask import Flask,request, url_for, redirect, render_template, jsonify
from pycaret.regression import *
import pandas as pd
import pickle
import numpy as npapp = Flask(__name__)model = load_model('deployment_28042020')
cols = ['Carat Weight', 'Cut', 'Color', 'Clarity', 'Polish', 'Symmetry', 'Report']@app.route('/')
def home():
return render_template("home.html")@app.route('/predict',methods=['POST'])
def predict():
int_features = [x for x in request.form.values()]
final = np.array(int_features)
data_unseen = pd.DataFrame([final], columns = cols)
prediction = predict_model(model, data=data_unseen, round = 0)
prediction = int(prediction.Label[0])
return render_template('home.html',pred='Price of the Diamond is ${}'.format(prediction))@app.route('/predict_api',methods=['POST'])
def predict_api():
data = request.get_json(force=True)
data_unseen = pd.DataFrame([data])
prediction = predict_model(model, data=data_unseen)
output = prediction.Label[0]
return jsonify(output)if __name__ == '__main__':

? Task 4— Deployment of the Web Application

Now that the model is trained, the machine learning pipeline is ready, and the application is tested on our local machine, deployment on Heroku is the final step. There are a couple of ways to upload the application source code onto Heroku. The simplest way is to link a GitHub repository to Heroku. The code for this project can be found on my GitHub repository here.

I deployed the web app on Heroku next, the app is published at URL:

I would like to thank the Founder and Principal Author of PyCaret, Moez Ali. The project’s process inspiration came from his recent Medium post.

Thanks for reading!

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *

en_USEnglish viVietnamese