K-Nearest Neighbor Algorithm (KNN)

Theory

Mahima Jain
2 min readJan 6, 2021
K-Nearest Neighbor

K-Nearest Neighbor is one of the most basic and essential classification algorithms in Machine Learning. It is the type of supervised learning and having application in pattern recognition and data mining etc. It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not make any underlying assumptions about the distribution of data. We are given some prior data (also called training data), which classifies coordinates into groups identified by an attribute.

Algorithm

Implement KNN algorithm using following steps :

  1. Load data set
  2. Initialize value of k
  3. For getting the predicted class, iterate from 1 to total number of training data points :
  • Calculate the distance between test data and each row of training
    data by using Euclidean distance.
  • Sort the calculated distances in ascending order based on distance
    values
  • Get top k rows from the sorted array
  • Get the most frequent class of these rows
  • Return the predicted class

Euclidean Distance

It is the distance between two points. These points can be in different dimensional space and are represented by different forms of coordinates. In 1D space, the points are on a straight line. In 2D, the coordinates are given as
points on the x and y axes, and in 3D x, y and z axes are used. Finding the
Euclidean Distance of the two points depends on the particular dimensional
space in which they are found. Euclidean distance between two points is : sqrt((x2 − x1)2 + (y2 − y1)2)

Pros and Cons of KNN

Below are the Pros for choosing KNN algorithm :

  • KNN is very simple and easy to understand and use.
  • KNN is a non-parametric algorithm which means it doesn’t have any
    assumptions.
  • It doesn’t have any training step.
  • It constantly evolves, allows algorithm to respond quickly to change
    the input.
  • KNN can be easily implementable for multi-class problem.
  • It can be used for both Classification and Regression.
  • One Hyper Parameter: K-NN might take some time while selecting the
    first hyper parameter but after that rest of the parameters are aligned
    to it.
  • Variety of distance criteria to be choose from: K-NN algorithm gives
    user the flexibility to choose distance while building K-NN model.
  1. Euclidean Distance
  2. Hamming Distance
  3. Manhattan Distance
  4. Minkowski Distance

Below are the Cons for choosing KNN algorithm :

  • K-NN slow algorithm.
  • Curse of Dimensionality.
  • It needs homogeneous features.
  • Optimal number of neighbors need to be consider for classifying new
    data entry.
  • Data which is unbalanced causes problems.
  • It is very sensitive to outliers as it simply chose the neighbors based on
    distance criteria.
  • KNN has no capability of dealing with missing value problem.

In next part I’ll explain coding part of KNN algorithm.…

--

--

Mahima Jain

Just a geek who enjoys learning new technologies. Please feel free to correct me if there is anything wrong in my blogs.