Wring articles with high algorithmic relevancy can be a time-consuming task. Organizing the sub-topics you want to cover, furthermore. In this tutorial, you will learn how to use some of the most popular Python machine learning libraries, in order to thematically organize various subjects you are (or will be) writing about. Therefore to achieve the desired results, we will use the Affinity Propagation function from sklearn.cluster module.


  1. At least a basic understanding of Python.
  2. Jupyter Notebook server installed.

Step 1 – Import Dependencies

Import dependencies:

%matplotlib inline
import numpy as np
import sklearn.cluster
from pyxdameraulevenshtein import damerau_levenshtein_distance, normalized_damerau_levenshtein_distance
from pylab import *

Step 2 – Keywords & Phrases Ιnsertion

Now we need to enter the keyword phrases we want to cluster (categorize according to their similarity)

keywords = "what about artificial intelligence,where is artificial intelligence,how is artificial intelligence,what does artificial intelligence mean,what does the term artificial intelligence mean,what is artificial intelligence,what is artificial intelligence technology,ai or artificial intelligence,what defines artificial intelligence,what is artificial intelligence definition,analytics software,business analytics software,predictive analytics software,marketing analytics software,seo analytics software,best analytics software,social media analytics software,web analytics software,website analytics software,arbutus audit analytics software,google site analytics,site analytics,add site to google analytics,google analytics site search,google analytics add new site,how to add a site to google analytics,add site to google analytics 2019,analytics site id,clinical trial site selection analytics,user behavior analytics,google analytics behavior flow,analytical behavior,behavior flow report google analytics,behavioral analytics jobs,predictive analytics consumer behavior,user entity behavior analytics,analytical behaviorism,behavior analytics in retail,add user to google analytics,user behavior analytics,google analytics user id,google analytics users,google analytics new users,how to add a user to google analytics,google analytics active users,how to add user to google analytics,unique users google analytics,user analytics".split(",")

Step 3 – Keyword & Phrase Clustering

Now the magic happens – we will use the normalized version of Damerau-Levenshtein distance, and we will be provided with a seed keyword (which can be used in Ads Title, and the similar ones which can be used as the ad text)

keywords = np.array(keywords) #So that indexing with a list will work
word_similarity = -1*np.array([[damerau_levenshtein_distance(w1,w2) for w1 in keywords] for w2 in keywords])

affinity_propagation = sklearn.cluster.AffinityPropagation(affinity="precomputed", damping=0.8)
clusters = np.unique(affinity_propagation.labels_)
for cluster_id in clusters:
    exemplar = keywords[affinity_propagation.cluster_centers_indices_[cluster_id]]
    cluster = np.unique(keywords[np.nonzero(affinity_propagation.labels_==cluster_id)])
    cluster_str = ", ".join(np.array(cluster))
    print("* %s: %s" % (exemplar, cluster_str))
    scatter(cluster_id, cluster_str)

Visualizing The Results

Let’s calculate the number of clusters we have ended up with:

nun_clusters = len(clusters)

Output: 6

And the clustered keyword phrases are:

  1. * what is artificial intelligence: ai or artificial intelligence, how is artificial intelligence, what about artificial intelligence, what defines artificial intelligence, what does artificial intelligence mean, what does the term artificial intelligence mean, what is artificial intelligence, what is artificial intelligence definition, what is artificial intelligence technology, where is artificial intelligence
  2. * best analytics software: analytics software, arbutus audit analytics software, behavior analytics in retail, best analytics software, business analytics software, marketing analytics software, predictive analytics software, seo analytics software, social media analytics software, web analytics software, website analytics software
  3. * analytical behavior: analytical behavior, analytical behaviorism, analytics site id, predictive analytics consumer behavior
  4. * add user to google analytics: add site to google analytics, add site to google analytics 2019, add user to google analytics, behavior flow report google analytics, clinical trial site selection analytics, unique users google analytics, user behavior analytics, user entity behavior analytics
  5. * google analytics users: behavioral analytics jobs, google analytics active users, google analytics add new site, google analytics behavior flow, google analytics new users, google analytics site search, google analytics user id, google analytics users, google site analytics, site analytics, user analytics
  6. * how to add a user to google analytics: how to add a site to google analytics, how to add a user to google analytics, how to add user to google analytics

The bold key phrases represents the exemplar, and is a keyword that could be used as Document Title, or a Heading Title in a webpage, or as Ad Title if you writing an Adwords Copy. The rest of the keywords could be used a body text.

Lastly, matplotlib graph outputs:


And there you have it! You learned how to clusterize words or keyphrases using machine learning!

1 reply
  1. ปั๊มไลค์
    ปั๊มไลค์ says:

    Like!! Really appreciate you sharing this blog post.Really thank you! Keep writing.


Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *