Crime Analysis Using HCA

Scatter plot of crime analysis using HCA.

Murder rate against the State.

Bar Chart of Murder rate against the state

Assault Rate against the State.

Bar Chart of Assault Rate against the State

The Rape and Indecent Assault against the State.

The Rape and Indecent Assault against the State

Offence against a person

Offence against a person

Clusters of Crime per State

Clusters of Crime

Project information

  • Title: Crime Analsis Using HCA
  • Category: Data Science
  • Client: Mr Tobi Dare
  • Project date: 17 August, 2020
  • Project URL: Crime Analsis Using HCA

Project Details

This study approach begins with the collection the 2017 Nigeria crime dataset from the Nigeria Bureau of Statistics. The data is passed through series of stages called data pre-processing. This stage ensures replacing the missing value with either mean or median and cleaning and trimming the dataset to a new one that can be used for the training phase. Then, the Hierarchical clustering technique is applied to the pre-processed data to detect the State prone to crime.

Hierarchical Clustering Algorithm

Hierarchical cluster analysis or HCA is an unsupervised clustering algorithm that involves creating clusters with predominant ordering from top to bottom.E.g., All files and folders on our hard disk are organised in a hierarchy.The algorithm groups similar objects into groups called clusters. The endpoint is a set of clusters or groups, where each cluster is distinct from the other cluster, and the objects within each cluster are broadly like each other.

    This clustering technique is divided into two types:

    1. Agglomerative Hierarchical Clustering
    2. Divisive Hierarchical Clustering
    Agglomerative Hierarchical Clustering

    In this technique, initially, each data point is considered as an individual cluster. The similar clusters merge with other clusters at each iteration until one cluster or K clusters are formed.

    The basic algorithm of Agglomerative is straightforward.

  • Compute the proximity matrix
  • Let each data point be a cluster
  • Repeat: Merge the two closest clusters and update the proximity matrix
  • Until only a single cluster remains
  • Essential operation is the computation of the proximity of two clusters

    To understand better, let's see a pictorial representation of the Agglomerative Hierarchical clustering Technique. We have six data points {A, B, C, D, E, F}.

  • Step- 1: In the initial step, we calculate the proximity of individual points and consider all the six data points as individual clusters, as shown in the image below
  • Step- 2: In step two, similar clusters are merged and formed as a single cluster. Let’s consider B, C, and D, E are similar clusters merged in step two. Now, we're left with four clusters which are A, BC, DE, F.
  • Step- 3: We again calculate the proximity of new clusters and merge the similar clusters to form new clusters A, BC, DEF.
  • Step- 4: Calculate the proximity of the new clusters. The clusters DEF and BC are similar and merged to form a new cluster. We’re now left with two clusters A, BCDEF.
  • Step- 5: Finally, all the clusters are merged and form a single cluster.