Topic modeling in SEO is the use of statistical models for discovering topics in a collection of documents. By examining the co-occurrence of words and phrases across thousands of pages, algorithms are able to assign topical relevancy to a page and rank the page against a search query.
From Keywords to Topics
In the early days of search engines – the late 1990s – algorithms did little more than match keywords in the results to keywords in the query. The search engines didn’t understand the context of the query or the intent behind the keyword.
But search engines have come a long way since then. Search engine algorithms now understand not just keywords but the topic behind the keywords. This emphasis on topics rather than keywords is called semantic SEO.
The first big advance toward understanding topics came with the Google Hummingbird Update in 2013. That’s when Google started analyzing entire phrases, and not just individual keywords.
The next big step forward came in 2015 with Google’s RankBrain algorithm, which used natural language processing (NLP) to understand the context and intent behind search queries.
By this time, keyword density as a measure of relevance was fast disappearing in the rearview mirror. It was being replaced by topical relevance. How well you rank on Google now depends on how comprehensively your content piece covers the topic.
Since then, Google and other search engines have been getting better and better at understanding topics. They do this through a technique called topic modeling.
Topic Modeling vs Topic Classification
Topic modeling is a statistical method for discovering the relationships that exist between words and phrases.
With topic modeling, the algorithm discovers the categories of information itself, unsupervised. It does this by scanning a set of documents and clustering words and phrases based on how frequently they occur alongside other words and phrases. Topic modeling is an ‘unsupervised’ learning technique: the algorithm discovers the categories itself, based on the patterns that it finds.
Topic modeling is distinct from topic classification which is a machine learning technique where humans have to ‘train’ the algorithm by giving it certain rules.
With topic classification, you first need to define the categories of information that you want to use. You then give the algorithm some examples of raw data that has been tagged with those pre-defined categories. The algorithm then uses those pre-defined categories to analyze the data.
The difference between the two techniques is this: in topic classification, humans tell the algorithm what the categories are, whereas, in topic modeling, the algorithm discovers what the categories are through statistical analysis of how words and phrases cluster together in certain patterns.
These methods of text analysis are being used not only by search engines but right across the Internet.
For example, a business that receives high volumes of online customer feedback might use topic modeling or topic classification to sort its feedback into categories, such as post-purchase notifications, experience follow-ups, brand loyalty feedback, customer complaints, and customer reviews.
Two Types of Topic Modeling
So far, I’ve been using the term ‘topic modeling’ as if it were a single thing. But it’s actually an umbrella term that covers a range of different techniques.
Let’s look now at some of the different types of topic modeling.
Latent Dirichlet allocation (LDA)
Latent Dirichlet Allocation (LDA) is based on two assumptions: that similar topics make use of similar words and that documents talk about several topics for which a statistical distribution can be detected.
LDA maps documents to a list of topics by assigning topics to arrangements of words such as n-grams. An n-gram is a sequence of words that are used in Natural Language Processing.
The designator ‘n’ refers to the number of words in the n-gram. Where N=1, the n-gram contains one word, where N=2, the n-gram contains two words, and so on.
For example, the sentence “The cow jumps over the moon” would contain the following 2-word n-grams (known as bi-grams):
- the cow
- cow jumps
- jumps over
- over the
- the moon
Once you have n-grams, you can then make calculations that predict the likelihood that certain words will occur in the same sentence or in the same paragraph, or at a certain distance from each other.
Latent Dirichlet Allocation works on the assumption that documents consist of particular arrangements of words and that those arrangements determine the topic of the document.
Latent semantic analysis
Like LDA, latent semantic analysis is based on the distributional hypothesis: the meaning of words can be grasped by looking at the contexts in which words appear. As the English linguist, J R Firth put it: “You shall know a word by the company it keeps” (Firth, J. R. 1957:11).
Unlike LDA, which assigns topics to particular arrangements of words, latent semantic analysis simply computes how frequently words occur in a set of documents. It assumes that documents belonging to similar topics will contain approximately the same distribution of word frequencies for certain words.
The method it uses for calculating word frequency is Term Frequency-Inverse Document Frequency or tf-idf.
Term Frequency (TF) refers to the number of times a keyword appears in a single document.
Inverse Document Frequency (IDF), measures how many times the term appears in a collection of documents.
The Term Frequency (TF) is then divided by the Inverse Document Frequency (IDF) to get the TF-IDF value.
Both LDA and LSA are unsupervised techniques.
Topic Clusters – The Key To Ranking Higher
As you can see, search engines are turning their attention from keywords to topics. They are using various statistical methods to identify patterns in the way certain words are found with other words. Those patterns allow search engines to identify topics.
And that’s why topic clusters are now a vital part of ranking high in the search results.
Google wants to deliver search results that are authoritative. That means delivering content that covers a topic well, in both depth and breadth.
Pillar posts and topic clusters
The best way to do that is to use the topic cluster model. That’s a collection of pages with a central page called a pillar post. The pillar post covers the topic in depth and is usually at least 3000 words long.
In the pillar post, you cover all the subtopics associated with your topic. But you don’t necessarily go into those subtopics in great detail. Spend a few paragraphs introducing each subtopic and then link out to a separate blog post where you cover that subtopic in more detail.
For example, your pillar post might be about ‘garden tools’. That would be a longer than average article where you briefly describe all the main types of garden tools: lawnmowers, line trimmers, hedge trimmers, pruning shears, mulchers, leaf blowers, edging tools, sprinklers, etc.
You would then create a separate piece of content for each of those subtopics and link to those articles from the pillar post.
Why do topic clusters help with SEO?
How does a topic cluster help you rank higher? It shows search engines that your website has topical authority for a particular topic. When you create a topic cluster, your content will be full of related keywords. And that’s exactly what search engine algorithms are now looking for. A website that has ten or fifteen pages of closely related content full of keywords that are typically found together will get a green light from the algorithm.
So far in this article, we have looked at why topics are replacing keywords as the focus of SEO and how search engines use various topic modeling tools to understand topics and their subtopics.
As a content creator, you might be wondering if there are topic modeling tools that will help you ‘map out’ a particular topic so you can create content that comprehensively covers that topic.
Well, not surprisingly, such tools already exist. And in the next section, I’m going to show you two of them.
Topic Modeling Tools
This section gives you a walk-through of two topic modeling tools that will help you write content with high topical authority.
MarketMuse
MarketMuse is an AI-powered content research and keyword planner tool. It uses machine learning and artificial intelligence to analyze content, suggest topics to cover, and develop briefs to help you create better content.
When you log in to MarketMuse, you’ll see five tools in the lefthand menu, Research, Compete, Optimize, Questions, and Connect:
Let’s look at these tools one by one.
The Research tool
In the research tool, type in your keyword, and MarketMuse will identify the main topics for that keyword:
The topics appear in the lefthand column. In the righthand column, you’ll see the estimated search volume for each related topic, as well as a graph showing the search trend for that topic.
The column at the far right shows you the suggested number of times you should mention that related topic in your content. MarketMuse uses a color code for this:
- Yellow = 1 to 2 mentions
- Green = 3 to 10 mentions
- Blue = 10+ mentions
You can drill down into each related topic by clicking on the topic. You’ll see a list of variants for that topic:
Including these variants in your content will help you rank for multiple keywords. It will also increase the topical authority of your article because search engines are now aware that certain words appear together in content that covers a topic in depth.
The Compete tool
The Compete tool creates a topic model by analyzing thousands of documents. It then analyses the top 20 results against that model and presents the results as a heat map.
Compete is used to assess and analyze the competition for a given topic and make decisions about the coverage you want to have for that topic.
Compete’s heat map helps you quickly understand how the competition approaches a subject that you want to write about, what related topics you need to include, and which ones you should cover to make your content stand out from the crowd:
At the top of the Compete screen, you’ll see the top 20 search results for that topic. Underneath each search result is the MarketMuse content score for that article. This is a proprietary score developed by MarketMuse that shows how well the page covers a topic.
The color codes on the heat map show you how well each piece of content covers the topic:
- Red = 0 mentions
- Yellow = 1-2 mentions
- Green = 3-10 mentions
- Blue = 10+ mentions
A quick way to assess how well a page covers a topic is to scan vertically down a column:
Likewise, you can see how the competition covers a particular topic by scanning horizontally across a row:
Another thing to look for in the Compete tool is the content scores. These allow you to see at a glance how well the top-ranking content covers that topic:
If the scores are low, that’s an indication that you have a good chance to rank high for that topic with a well-researched piece of content.
Down the left side of the Compete screen, you’ll see all the topics that make up the topic model.
When using the Compete tool, there are two things to look for: must-have topics and topic gaps.
Must-have topics are those that are consistently found among the top-ranking pages in the search results. To perform well, these topics must be included in your piece.
Topic gaps are topics that are not covered by the competition. They are an excellent opportunity to optimize your content by including topics that your competitors are missing.
The Optimize tool
The Optimize tool is a text editor that gives you real-time feedback on how well your content covers a topic. Just type in your keyword and the URL of your article and MarketMuse will show
The color codes in the right-hand panel show you how many times you have used that term and how many times you should be using that term.
As you add suggested terms to your content piece, the color codes will update to show that you are approaching the optimum number of mentions for that term.
The ‘Feed’ tab gives you a running assessment of how well your content addresses the topics, as you scroll down the page:
At the top of the Compete screen, you’ll see a status bar that tells you your content score, the average score, your target score, your word count, the average word count, and your target word count:
The Questions tool
The Questions tool in MarketMuse is useful when you are in the research stage of writing your article. It shows you the most frequently asked questions related to your topic:
Including related questions in your content is another way to boost the topical authority of your article.
On the righthand side of the screen, you’ll a column with a button that says “Run in”. This gives you the option to run each question in one of the other four tools:
MarketMuse is a powerful tool for analyzing a topic and ensuring that your piece content covers as much of the topic as possible. What makes MarketMuse particularly useful is that it is based on the top-ranking results for that particular keyword.
It not only shows you what topics are covered by the pages that rank at the top of the search results. It also shows you topic gaps. By addressing the topic gaps, you can make your content stand out from the other pages.
Article Insights
Article Insights is another topic modeling tool.
It helps you to identify the keywords that appear in the top 10 search results for a particular topic. It helps with competitor analysis by comparing your content to that of your competitors so you can see which keywords they are using that you are not. And it helps with entity detection by tagging keywords as either a person, product, company, or place.
The first thing you need to do in Article Insights is to create a project. Give your project a name and then add the keyword you want to target:
The keyword then goes into a processing queue – it may take a few minutes to complete the analysis.
Once the keyword has been processed, you need to click on the View button.
You’ll then see a screen that consists of two parts: the writing interface on the left and the analytics on the right:
In the article editor, you have two tabs: ‘Article’ and ‘Brief’:
Brief is where you can leave notes about the article. There’s a share button where you can get a link to share the article with your writers.
On the right-hand side is a panel with all the analytics for your content:
These include:
- number of words
- keywords you have used in your article
- keywords your competitors have used (gap analysis)
- headings you have used and the number of headings your competitors have used.
- uniqueness of your content
- readability score
You can start writing your article from scratch, or you can import an article-in-progress from a URL:
Once you have content loaded in the article editor, the tool analyzes your content against the top 10 search results for that keyword:
- Panels 1 and 2 show you how complete your article is and the number of words you should be aiming for.
- Panel 3 shows you the top 15 keywords used in your content.
- Panel 4 shows you the keywords your competitors have used and how many of them you have used.
- Panel 5 shows you the headings you have used and compares them against the headings used by your competitors.
Beneath the Headings panel is a panel that shows a ‘Uniqueness’ score and a tool that gives you a Flesch reading score:
The ‘uniqueness’ tool contains a button called ‘Article Re-writer’.
Click on that and it opens the article editor, with useful suggestions for synonyms you can use to re-write the snippets that you added from the ‘research’ tab. Hover your cursor over any highlighted word, and the tool gives you alternative synonyms for that word:
This is very useful that helps you quickly re-write your content.
Along the top of the righthand panel are seven tabs. So far, we’ve been working in the Score tab.
If you click on the Competitors tab, you’ll see a list of the top 10 competitors for that keyword, together with a keyword grouping for each competitor. These keyword groupings show you the top keywords used by each competitor:
You can select and unselect competitors, which is useful if there are results that you believe are not relevant to your content.
The next tab is ‘Research’. This tab pulls in snippets from top-ranking content:
Click on a research snippet and it will be added to the article editor. You then need to re-write it to make it part of your own content.
The next tab is ‘Headings’. This tab shows the headings used for each competitor that you selected. You can see exactly how many headings they have on their page, and what level the heading is.
Next is the ‘Questions’ tab.
This tab pulls questions from Google that are related to your main keyword. These are subtopics you can add to your article to gain topical authority:
The next tab is ‘Topics’. This tool shows you related keywords, grouped into topics. Paragraphs matching those topics are placed into that topic panel for you:
The topic outline helps you to discover related keywords you can easily add to your paragraphs. Adding these related words to your paragraph will increase the topical authority of your content and drastically improve the quality of your article.
The last tab is ‘Duplicates’. This tool detects fragments within your content that are duplicates. You need to re-write anything marked in red by this tool.
Let’s go back now to the keyword panel in the ‘Score’ tab because it has a useful feature. Click on a keyword in that panel:
That keyword will then be highlighted in the Competitor tab. You can then see how many times your competitors have used that keyword:
That same keyword will also be highlighted in the ‘Research’ tab:
This is a useful feature when you are trying to optimize your content for a particular keyword.
Conclusion
As algorithms move away from a focus on keywords and try to understand topics, it’s becoming increasingly important that your content covers a topic comprehensively.
That’s becoming the key to ranking at the top of the search results.
In this article, we have looked at various topic modeling techniques that search engines are now using to better understand the co-occurrence of words within a document and within a set of documents.
We’ve seen how the presence, frequency, and proximity of similar keywords within a document are being used by search engines to understand topics.
It stands to reason that if search engines are using these tools to understand topical authority, content creators need to use the same techniques to ensure their content covers a topic properly.
And that’s where tools like MarketMuse and Article Insights come in. They use AI to analyze the topic you are writing about and show you what the subtopics are within that topic and which keywords you should be using to rank well for that topic.