How to host your machine learning model as a REST API endpoint on Python Flask

In this tutorial, we use the best model for predicting HDB resale prices and expose it as an endpoint on Flask. We would have the training endpoint and prediction endpoint separately. To try out the prediction accuracy for yourself, you may go to hdbpricer.com

This is part 2 of a 4-part tutorial
1. Building a good prediction model
2. Hosting the model prediction as an API endpoint on Flask
3. Building a simple VueJS frontend for a users to price their HDBs
4. Deploying the entire full stack application to the internet

The Git repository for the implementation can be found here
My hdbpricer app server

Background

Through part 1 of the tutorial, we found out that K nearest neighbors seemed to yield the best prediction outcome. Hence, we packaged it to be used by a front end application (part 3 of the 4-part tutorial)

Server setup (app.py)

Firstly,

  • Import the necessary libraries
  • Set up CORS
from flask import Flask, jsonify, request

from flask_cors import CORS

import random

from predict import predictPrice

from train import train

import os

 

# configuration

DEBUG = True

 

# instantiate the app

app = Flask(__name__)

app.config.from_object(__name__)

 

# enable CORS

CORS(app, resources={r'/*': {'origins': '*'}})

 

Here’s an example of what the flask application is storing. A list of HDB dict with the attributes an resale price


HDBs = [

    {

        'town': 'ANG MO KIO',

        'flat_type': '2 ROOM',

        'storey_range': '10 TO 12',

        'floor_area_sqm': 44.0,

        'lease_commence_date': 1979,

        'resale_price': 232000.0,

 

    },

 

    #town, flat_type,storey_range,floor_area_sqm,lease_commence_date

]

We then set up the routes.

ping: To check if the server is running fine

# sanity check route

@app.route('/ping', methods=['GET'])

def ping_pong():

    return jsonify('pong!')

hdbs:
– GET method returns all the hdbs from the list
– POST method receives a payload of HDB information, runs the predict function and then appends the new hdb to the list

@app.route('/hdbs', methods=['GET', 'POST'])

def all_hdbs():

    response_object = {'status': 'success'}

    if request.method == 'POST':

        post_data = request.get_json()

        HDBs.append({

            'town': post_data.get('town'),

            'flat_type': post_data.get('flat_type'),

            'storey_range': post_data.get('storey_range'),

            'floor_area_sqm': post_data.get('floor_area_sqm'),

            'lease_commence_date': post_data.get('lease_commence_date'),

            'resale_price': round(predictPrice( town = post_data.get('town'),flat_type=post_data.get('flat_type'),storey_range=post_data.get('storey_range'),floor_area_sqm=post_data.get('floor_area_sqm'),lease_commence_date=post_data.get('lease_commence_date'))*1.01), # To return from model

        })

        response_object['message'] = 'Priced!'

    else:

        response_object['hdbs'] = HDBs

    return jsonify(response_object)

train: Runs the training method and returns the score of the latest training. This will only need to be used when we have new datasets to train

@app.route('/train', methods=['GET'])

def train_model():

    response_object = {'status': 'success'}

    response_object['score'] = train()

    return jsonify(response_object)

Last but not least, we will need this to be able to run python app.py both locally and on server. (Deploying to server will be covered in part 4 of 4-part tutorial)

Essentially if there is an environment variable for ‘PORT’, this means that the app is getting deployed on a server and will use the correstponding port number. Else it will host on port 5000 in your localhost.

if __name__ == '__main__':

    port = int(os.getenv('PORT', 5000))

 

    print("Starting app on port %d" % port)

    if(port!=5000):

        app.run(debug=False, port=port, host='0.0.0.0')

    else:

        app.run()

Here’s the full code block for app.py

from flask import Flask, jsonify, request

from flask_cors import CORS

import random

from predict import predictPrice

from train import train

import os

 

# configuration

DEBUG = True

 

# instantiate the app

app = Flask(__name__)

app.config.from_object(__name__)

 

# enable CORS

CORS(app, resources={r'/*': {'origins': '*'}})

 

HDBs = [

    {

        'town': 'ANG MO KIO',

        'flat_type': '2 ROOM',

        'storey_range': '10 TO 12',

        'floor_area_sqm': 44.0,

        'lease_commence_date': 1979,

        'resale_price': 232000.0,

 

    },

 

    #town, flat_type,storey_range,floor_area_sqm,lease_commence_date

]

 

# sanity check route

@app.route('/ping', methods=['GET'])

def ping_pong():

    return jsonify('pong!')

 

@app.route('/hdbs', methods=['GET', 'POST'])

def all_hdbs():

    response_object = {'status': 'success'}

    if request.method == 'POST':

        post_data = request.get_json()

        HDBs.append({

            'town': post_data.get('town'),

            'flat_type': post_data.get('flat_type'),

            'storey_range': post_data.get('storey_range'),

            'floor_area_sqm': post_data.get('floor_area_sqm'),

            'lease_commence_date': post_data.get('lease_commence_date'),

            'resale_price': round(predictPrice( town = post_data.get('town'),flat_type=post_data.get('flat_type'),storey_range=post_data.get('storey_range'),floor_area_sqm=post_data.get('floor_area_sqm'),lease_commence_date=post_data.get('lease_commence_date'))*1.01), # To return from model

        })

        response_object['message'] = 'Priced!'

    else:

        response_object['hdbs'] = HDBs

    return jsonify(response_object)

 

@app.route('/train', methods=['GET'])

def train_model():

    response_object = {'status': 'success'}

    response_object['score'] = train()

    return jsonify(response_object)

 

 

 

if __name__ == '__main__':

    port = int(os.getenv('PORT', 5000))

 

    print("Starting app on port %d" % port)

    if(port!=5000):

        app.run(debug=False, port=port, host='0.0.0.0')

    else:

        app.run()

Now, we will move on to the training API.

train.py

Import necessary libraries

import math

from collections import defaultdict

import numpy as np

from numpy import unique

import pandas as pd

from sklearn.preprocessing import StandardScaler, LabelEncoder

import geopy

from geopy.geocoders import Nominatim

from geopy.extra.rate_limiter import RateLimiter

from sklearn.neighbors import KNeighborsRegressor

import pickle

You will see that the code looks extremely similar to the jupyter notebook

Quick recap, here’s what we do

  1. Read dataset
  2. Preprocess data
  3. Geocode the towns
  4. Encode String data into integers
  5. Drop columns
  6. Split data into training and testing
  7. Scaling
  8. Train and fit

The additional work done was
1. Saving the Scaler
2. Saving the model
3. Evaluating the saved model

This is so that the model and scaler can be reused by the predict function (predict.py) later. We use Pickle here, but joblib works fine as well.

def train():

    #Dataset from https://data.gov.sg/dataset/resale-flat-prices

    file_url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8OfO82KXoRmO0E6c58MdwsOSc8ns5Geme87SiaiqTUrS_hI8u8mYE5KIOfQe4m2m3GGf9En22xuXx/pub?gid=382289391&single=true&output=csv"

    data = pd.read_csv(file_url)

 

    dataframe = data.copy()

 

    #let's break date to years, months

    dataframe['date'] = pd.to_datetime(dataframe['month'])

    dataframe['month'] = dataframe['date'].apply(lambda date:date.month)

    dataframe['year'] = dataframe['date'].apply(lambda date:date.year)

 

    #Get number of years left on lease as a continuous number (ignoring months)

    dataframe['remaining_lease'] = dataframe['remaining_lease'].apply(lambda remaining_lease:remaining_lease[:2])

 

    #Get storey range as a continuous number

    dataframe['storey_range'] = dataframe['storey_range'].apply(lambda storey_range:storey_range[:2])

 

    #Concat address

    dataframe['address'] = dataframe['block'].map(str) + ', ' + dataframe['street_name'].map(str) + ', Singapore' 

 

    '''

    #Geocode by address

    locator = Nominatim(user_agent="myGeocoder")



    # 1 - convenient function to delay between geocoding calls

    geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

    # 2- - create location column

    dataframe['location'] = dataframe['address'].apply(geocode)

    print("step 2")

    # 3 - create longitude, laatitude and altitude from location column (returns tuple)

    dataframe['point'] = dataframe['location'].apply(lambda loc: tuple(loc.point) if loc else None)

    print("step 3")

    # 4 - split point column into latitude, longitude and altitude columns

    dataframe[['latitude', 'longitude', 'altitude']] = pd.DataFrame(dataframe['point'].tolist(), index=df.index)

    print("step 4")

    '''

    #Geocode by town (Singapore is so small that geocoding by addresses might not make much difference compared to geocoding to town)

    town = [x for x in dataframe['town'].unique().tolist() 

                if type(x) == str]

    latitude = []

    longitude =  []

    for i in range(0, len(town)):

        # remove things that does not seem usefull here

        try:

            geolocator = Nominatim(user_agent="ny_explorer")

            loc = geolocator.geocode(town[i])

            latitude.append(loc.latitude)

            longitude.append(loc.longitude)

            #print('The geographical coordinate of location are {}, {}.'.format(loc.latitude, loc.longitude))

        except:

            # in the case the geolocator does not work, then add nan element to list

            # to keep the right size

            latitude.append(np.nan)

            longitude.append(np.nan)

    # create a dataframe with the locatio, latitude and longitude

    df_ = pd.DataFrame({'town':town, 

                        'latitude': latitude,

                        'longitude':longitude})

    # merge on Restaurant_Location with rest_df to get the column 

    dataframe = dataframe.merge(df_, on='town', how='left')

 

    ### label encode the categorical values and convert them to numbers 

    '''

    le = LabelEncoder()



    dataframe['town']= le.fit_transform(dataframe['town'].astype(str))



    dataframe['flat_type'] = le.fit_transform(dataframe['flat_type'].astype(str))



    dataframe['street_name'] = le.fit_transform(dataframe['street_name'].astype(str))



    #dataframe['storey_range'] = le.fit_transform(dataframe['storey_range'].astype(str))



    dataframe['flat_model'] = le.fit_transform(dataframe['flat_model'].astype(str))



    dataframe['block'] = le.fit_transform(dataframe['block'].astype(str))



    dataframe['address'] = le.fit_transform(dataframe['address'].astype(str))

    '''

 

    townDict = {'ANG MO KIO': 1,'BEDOK': 2,'BISHAN': 3,'BUKIT BATOK': 4,'BUKIT MERAH': 5,'BUKIT PANJANG': 6,'BUKIT TIMAH': 7,'CENTRAL AREA': 8,'CHOA CHU KANG': 9,'CLEMENTI': 10,'GEYLANG': 11,'HOUGANG': 12,'JURONG EAST': 13,'JURONG WEST': 14,'KALLANG/WHAMPOA': 15,'MARINE PARADE': 16,'PASIR RIS': 17,'PUNGGOL': 18,'QUEENSTOWN': 19,'SEMBAWANG': 20,'SENGKANG': 21,'SERANGOON': 22,'TAMPINES': 23,'TOA PAYOH': 24,'WOODLANDS': 25,'YISHUN': 26,}

    flat_typeDict = {'1 ROOM': 1,'2 ROOM': 2,'3 ROOM': 3,'4 ROOM': 4,'5 ROOM': 5,'EXECUTIVE': 6,'MULTI-GENERATION': 7,}

 

 

    dataframe['town'] = dataframe['town'].replace(townDict, regex=True)

    dataframe['flat_type'] = dataframe['flat_type'].replace(flat_typeDict, regex=True)

 

    # drop some unnecessary columns

    dataframe = dataframe.drop('date',axis=1)

 

    dataframe = dataframe.drop('block',axis=1)

    #dataframe = dataframe.drop('lease_commence_date',axis=1)

    dataframe = dataframe.drop('month',axis=1)

    dataframe = dataframe.drop('street_name',axis=1)

    dataframe = dataframe.drop('address',axis=1)

    dataframe = dataframe.drop('flat_model',axis=1)

    #dataframe = dataframe.drop('town',axis=1)

    dataframe = dataframe.drop('year',axis=1)

    #dataframe = dataframe.drop('latitude',axis=1)

    dataframe = dataframe.drop('remaining_lease',axis=1)

 

 

    X = dataframe.drop('resale_price',axis =1)

    y = dataframe['resale_price']

    X=X.values

    y=y.values

    #splitting Train and Test 

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

 

    #standardization scaler - fit&transform on train, fit only on test

 

    s_scaler = StandardScaler()

    X_train = s_scaler.fit_transform(X_train.astype(np.float))

    X_test = s_scaler.transform(X_test.astype(np.float))

 

    knn = KNeighborsRegressor(algorithm='brute')

 

    knn.fit(X_train,y_train)

 

    #save model

    filename = 'hdbknn.sav'

    scalername = 'scaler.sav'

    pickle.dump(knn, open(filename, 'wb'))

    pickle.dump(s_scaler, open(scalername, 'wb'))

 

    loaded_model = pickle.load(open(filename, 'rb'))

    result = loaded_model.score(X_test, y_test)

    print(result)

    return result

Without further ado, let’s look at the prediction function.

predict.py

Import necessary libraries


import math

from collections import defaultdict

import numpy as np

from numpy import unique

import pandas as pd

from sklearn.preprocessing import StandardScaler

import geopy

from geopy.geocoders import Nominatim

from geopy.extra.rate_limiter import RateLimiter

from sklearn.neighbors import KNeighborsRegressor

import pickle

Define function to receive inputs from REST API

def predictPrice(town,flat_type,storey_range,floor_area_sqm,lease_commence_date):

    

    #town, flat_type,storey_range,floor_area_sqm,lease_commence_date

    input_data = {

        'town': town,

        'flat_type': flat_type,

        'storey_range': storey_range,

        'floor_area_sqm': floor_area_sqm,

        'lease_commence_date': lease_commence_date,

    }

Geocode the town (Future versions of hdbpricer will also be able to geocode exact addresses since it’s literally the same method)

 #Geocode by town (Singapore is so small that geocoding by addresses might not make much difference compared to geocoding to town)

    town = input_data["town"]

    latitude = 0

    longitude =  0

    try:

        geolocator = Nominatim(user_agent="ny_explorer")

        loc = geolocator.geocode(town)

        latitude= loc.latitude

        longitude = loc.longitude

        #print('The geographical coordinate of location are {}, {}.'.format(loc.latitude, loc.longitude))

    except:

        # in the case the geolocator does not work, then add nan element

        # to keep the right size

        latitude = np.nan

        longitude = np.nan

    input_data['latitude'] = latitude

    input_data['longitude'] = longitude

    input_data['storey_range'] = input_data['storey_range'][:2]

Encode String values



    townDict = {'ANG MO KIO': 1,'BEDOK': 2,'BISHAN': 3,'BUKIT BATOK': 4,'BUKIT MERAH': 5,'BUKIT PANJANG': 6,'BUKIT TIMAH': 7,'CENTRAL AREA': 8,'CHOA CHU KANG': 9,'CLEMENTI': 10,'GEYLANG': 11,'HOUGANG': 12,'JURONG EAST': 13,'JURONG WEST': 14,'KALLANG/WHAMPOA': 15,'MARINE PARADE': 16,'PASIR RIS': 17,'PUNGGOL': 18,'QUEENSTOWN': 19,'SEMBAWANG': 20,'SENGKANG': 21,'SERANGOON': 22,'TAMPINES': 23,'TOA PAYOH': 24,'WOODLANDS': 25,'YISHUN': 26,}

    flat_typeDict = {'1 ROOM': 1,'2 ROOM': 2,'3 ROOM': 3,'4 ROOM': 4,'5 ROOM': 5,'EXECUTIVE': 6,'MULTI-GENERATION': 7,}

 

    input_data['town'] = townDict[input_data['town']]

    input_data['flat_type'] = flat_typeDict[input_data['flat_type']]

Convert to dataframe


    dataframe = pd.DataFrame.from_records([input_data])

    data = dataframe.values

Scaling with our saved scaler. This is important to use a saved scaler because you should not fit your scaler based on new/test data. The scaler was fitted using the training data in train.py



    scalername = 'scaler.sav'

    s_scaler = pickle.load(open(scalername, 'rb'))

    data = s_scaler.transform(data.astype(np.float))


Predict!


    filename = 'hdbknn.sav'

    loaded_model = pickle.load(open(filename, 'rb'))

    result = loaded_model.predict(data)

    #print(result)

    return result[0]

The full predict.py file

import math

#import tensorflow as tf

from collections import defaultdict

import numpy as np

from numpy import unique

import pandas as pd

from sklearn.preprocessing import StandardScaler

#from tensorflow import keras

#from tensorflow.keras import layers

#import geopandas as gpd

import geopy

from geopy.geocoders import Nominatim

from geopy.extra.rate_limiter import RateLimiter

from sklearn.neighbors import KNeighborsRegressor

import pickle

 

def predictPrice(town,flat_type,storey_range,floor_area_sqm,lease_commence_date):

    

    #town, flat_type,storey_range,floor_area_sqm,lease_commence_date

    input_data = {

        'town': town,

        'flat_type': flat_type,

        'storey_range': storey_range,

        'floor_area_sqm': floor_area_sqm,

        'lease_commence_date': lease_commence_date,

    }

    

 

    #Geocode by town (Singapore is so small that geocoding by addresses might not make much difference compared to geocoding to town)

    town = input_data["town"]

    latitude = 0

    longitude =  0

    try:

        geolocator = Nominatim(user_agent="ny_explorer")

        loc = geolocator.geocode(town)

        latitude= loc.latitude

        longitude = loc.longitude

        #print('The geographical coordinate of location are {}, {}.'.format(loc.latitude, loc.longitude))

    except:

        # in the case the geolocator does not work, then add nan element

        # to keep the right size

        latitude = np.nan

        longitude = np.nan

    input_data['latitude'] = latitude

    input_data['longitude'] = longitude

    input_data['storey_range'] = input_data['storey_range'][:2]

 

    townDict = {'ANG MO KIO': 1,'BEDOK': 2,'BISHAN': 3,'BUKIT BATOK': 4,'BUKIT MERAH': 5,'BUKIT PANJANG': 6,'BUKIT TIMAH': 7,'CENTRAL AREA': 8,'CHOA CHU KANG': 9,'CLEMENTI': 10,'GEYLANG': 11,'HOUGANG': 12,'JURONG EAST': 13,'JURONG WEST': 14,'KALLANG/WHAMPOA': 15,'MARINE PARADE': 16,'PASIR RIS': 17,'PUNGGOL': 18,'QUEENSTOWN': 19,'SEMBAWANG': 20,'SENGKANG': 21,'SERANGOON': 22,'TAMPINES': 23,'TOA PAYOH': 24,'WOODLANDS': 25,'YISHUN': 26,}

    flat_typeDict = {'1 ROOM': 1,'2 ROOM': 2,'3 ROOM': 3,'4 ROOM': 4,'5 ROOM': 5,'EXECUTIVE': 6,'MULTI-GENERATION': 7,}

 

    input_data['town'] = townDict[input_data['town']]

    input_data['flat_type'] = flat_typeDict[input_data['flat_type']]

 

    dataframe = pd.DataFrame.from_records([input_data])

    data = dataframe.values

 

    scalername = 'scaler.sav'

    s_scaler = pickle.load(open(scalername, 'rb'))

    data = s_scaler.transform(data.astype(np.float))

 

    #print(data)

 

    filename = 'hdbknn.sav'

 

    loaded_model = pickle.load(open(filename, 'rb'))

 

    result = loaded_model.predict(data)

    #print(result)

    return result[0]

Running the server

Depending on whether you use Conda or just Vanilla Python

Install your dependencies (pip or conda both works fine)

pip install -r requirements.txt

Run Server

  1. Python
env\Scripts\activate
(env) python server/app.py
  1. Conda
conda activate modelenv
python server/app.py


Conclusion

That’s all for how to host your machine learning models, training / prediction models via a REST API using python’s Flask library. Some inspiration was taken from here. In our next tutorial, you may look forward to learning about building a front end application using VueJS!

Side note – This is not a production ready implementation, it is not recommended to do a flask “python app.py” for real production use cases 🙂

4 thoughts on “How to host your machine learning model as a REST API endpoint on Python Flask

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s