# How Does the K-Nearest Neighbors (KNN) Algorithm Work in Python?

Today, I will try to describe one of the most intuitive and fascinating classification algorithms in its simplicity: the K-Nearest Neighbors, known as KNN.

KNN is based on a simple yet powerful concept: “Tell me who you’re with, and I’ll tell you who you are.” In practical terms, it classifies a new data point based on the k nearest data points in the training set, using Euclidean distance as the metric.

### The Nearest Neighbors

Imagine you have a dataset that represents various animals in a zoo, with information like weight, height, and age. When a new animal is added, KNN checks which k animals are closest in terms of Euclidean distance and uses this information to classify it. It’s a simple but highly effective approach.

### Euclidean Distance

Euclidean distance represents the “straight line” between two points in Euclidean space. Mathematically, the distance between two points P(x_{1}, y_{1}) and Q(x_{2}, y_{2}) is calculated as:

sqrt((x2 - x1)^{2}+ (y2 - y1)^{2})

This method easily extends to multi-dimensional spaces, making it suitable for datasets with many features.

### Implementation in Python

Let’s see how to put KNN into practice using scikit-learn, a Python library that simplifies the implementation of machine learning algorithms.

# Import the necessary libraries from sklearn.neighbors import KNeighborsClassifier import numpy as np # Create a small example dataset X = np.array([[1, 2], [2, 3], [3, 4], [6, 7], [7, 8]]) y = np.array([0, 0, 0, 1, 1]) # Initialize the KNN classifier with k=3 knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X, y) # Predict the class of a new entry new_entry = np.array([[5, 5]]) prediction = knn.predict(new_entry) print("Predicted class:", prediction[0]) # Output: 0 or 1, depending on the nearest neighbors

In just a few lines of code, we created, trained, and used a KNN classifier. This example illustrates how simple it is to implement this algorithm.

### Advantages and Limitations of KNN

After exploring how KNN works, it’s important to examine its advantages and limitations.

#### Advantages of KNN

– Simplicity: Easy to implement and understand.

– No Preliminary Assumptions: Being a non-parametric algorithm, it doesn’t make assumptions about the data distribution.

– Multi-class Adaptability: Easily handles classification problems with multiple classes.

– Versatility: Can be used for both classification and regression problems.

#### Limitations of KNN

– Computational Efficiency: It can be slow on large datasets as it requires calculating the distance to all points in the set.

– Sensitivity to Outliers: Outliers can negatively affect predictions.

– Curse of Dimensionality: In high-dimensional spaces, Euclidean distance can become less meaningful.

– Choosing the Value of k: Finding the optimal value of k often requires experimentation and validation.

In summary, KNN is a powerful algorithm, but it is essential to understand when and how to use it effectively.

### Practical Example: Flower Species Classification with the Iris Dataset

To further illustrate the use of KNN, let’s consider the famous Iris dataset, which contains 150 flower samples divided into three species: Setosa, Versicolor, and Virginica. Each sample includes four features: sepal length and width, and petal length and width.

#### KNN Implementation in Python

Here’s how to implement KNN to classify flower species:

import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score # Load the Iris dataset iris = load_iris() X = iris.data y = iris.target # Split the dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) # Create the KNN model with k=3 knn = KNeighborsClassifier(n_neighbors=3) # Train the model knn.fit(X_train, y_train) # Make predictions y_pred = knn.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Model accuracy: {accuracy * 100:.2f}%")

– Dataset: Using the Iris dataset with `load_iris()`.

– Data Preparation: Splitting the data into a training set and test set with `train_test_split()`.

– Model Creation: Initializing the KNN classifier with `n_neighbors=3`.

– Model Training: Training with `fit()`.

– Prediction and Evaluation: Predicting the classes and calculating accuracy with `accuracy_score()`.

This example demonstrates how KNN can be used to solve real-world classification problems.

I am passionate about technology and the many nuances of the IT world. Since my early university years, I have participated in significant Internet-related projects. Over the years, I have been involved in the startup, development, and management of several companies. In the early stages of my career, I worked as a consultant in the Italian IT sector, actively participating in national and international projects for companies such as Ericsson, Telecom, Tin.it, Accenture, Tiscali, and CNR. Since 2010, I have been involved in startups through one of my companies, Techintouch S.r.l. Thanks to the collaboration with Digital Magics SpA, of which I am a partner in Campania, I support and accelerate local businesses.

Currently, I hold the positions of:

CTO at MareGroup

CTO at Innoida

Co-CEO at Techintouch s.r.l.

Board member at StepFund GP SA

A manager and entrepreneur since 2000, I have been:

CEO and founder of Eclettica S.r.l., a company specializing in software development and System Integration

Partner for Campania at Digital Magics S.p.A.

CTO and co-founder of Nexsoft S.p.A, a company specializing in IT service consulting and System Integration solution development

CTO of ITsys S.r.l., a company specializing in IT system management, where I actively participated in the startup phase.

I have always been a dreamer, curious about new things, and in search of “*new worlds to explore*.”