Crime Analysis Using HCA
Project information
- Title: Crime Analsis Using HCA
- Category: Data Science
- Client: Mr Tobi Dare
- Project date: 17 August, 2020
- Project URL: Crime Analsis Using HCA
Project Details
This study approach begins with the collection the 2017 Nigeria crime dataset from the Nigeria Bureau of Statistics. The data is passed through series of stages called data pre-processing. This stage ensures replacing the missing value with either mean or median and cleaning and trimming the dataset to a new one that can be used for the training phase. Then, the Hierarchical clustering technique is applied to the pre-processed data to detect the State prone to crime.
Hierarchical Clustering Algorithm
Hierarchical cluster analysis or HCA is an unsupervised clustering algorithm that involves creating clusters with predominant ordering from top to bottom.E.g., All files and folders on our hard disk are organised in a hierarchy.The algorithm groups similar objects into groups called clusters. The endpoint is a set of clusters or groups, where each cluster is distinct from the other cluster, and the objects within each cluster are broadly like each other.
- Agglomerative Hierarchical Clustering
- Divisive Hierarchical Clustering
- Compute the proximity matrix
- Let each data point be a cluster
- Repeat: Merge the two closest clusters and update the proximity matrix
- Until only a single cluster remains
- Step- 1: In the initial step, we calculate the proximity of individual points and consider all the six data points as individual clusters, as shown in the image below
- Step- 2: In step two, similar clusters are merged and formed as a single cluster. Let’s consider B, C, and D, E are similar clusters merged in step two. Now, we're left with four clusters which are A, BC, DE, F.
- Step- 3: We again calculate the proximity of new clusters and merge the similar clusters to form new clusters A, BC, DEF.
- Step- 4: Calculate the proximity of the new clusters. The clusters DEF and BC are similar and merged to form a new cluster. We’re now left with two clusters A, BCDEF.
- Step- 5: Finally, all the clusters are merged and form a single cluster.
This clustering technique is divided into two types:
Agglomerative Hierarchical Clustering
In this technique, initially, each data point is considered as an individual cluster. The similar clusters merge with other clusters at each iteration until one cluster or K clusters are formed.
The basic algorithm of Agglomerative is straightforward.
Essential operation is the computation of the proximity of two clusters
To understand better, let's see a pictorial representation of the Agglomerative Hierarchical clustering Technique. We have six data points {A, B, C, D, E, F}.