Brands can better understand users on third-party sites by using a keyword overlap analysis - Digital Marketing Interview Questions And Answers For Basic and Advance Level

Wednesday, 1 May 2019

Brands can better understand users on third-party sites by using a keyword overlap analysis

These scripts can help analyze cross-site branded traffic with overlapping keywords to capture untapped audiences.
If you are a manufacturer selling on your own site as well as on retail partners, it is likely you don’t have visibility into who is buying your products or why they buy beyond your own site. More importantly, you probably don’t have enough insights to improve your marketing messaging.
One technique you can use to identify and understand your users buying on third-party websites is to track your brand through organic search. You can then compare the brand searches on your site and the retail partner, see how big the overlap is, how much of the overlapping keywords you rank above the retailer and vice versa. More importantly, you can see if you are appealing to different audiences or competing for the same ones. Armed with these new insights, you could restructure your marketing messaging to unlock new audiences you didn’t tap into before.
In previous articles, I’ve covered several useful data blending examples, but in this one, we will do something different. We will do a deeper dive into just one data blending example and perform what I call a cross-site branded keyword overlap analysis. As you will learn below, this type of analysis will help us understand your users buying on third-party retailer partners.

In the Venn diagram above, you can see an example of visualization we will put together in this article. It represents the number of overlapping keywords in organic search for the brand “Tommy Hilfiger” between their main brand site and Macy’s, a retail partner.
We recently had to perform this analysis for one of our clients and our findings surprised us. We discovered that with 60% of our client’s organic SEO traffic coming from branded searches, as much as 30% of those searches were captured by four retailer partners that also sell their products.
Armed with this evidence and with the knowledge that selling through their retail partners still made business sense, we provided guidance on how to improve their brand searches so they can compete more effectively, and change their messaging to appeal to a different customer than the one that buys from the retailers.
After my team conducted this analysis manually and I saw how valuable it is, I set out to automate the whole process in Python so we could easily reproduce it for all our manufacturing clients. Let me share the code snippets I wrote here and walk you over their use.

Pulling branded organic search keywords

I am using the Semrush API to collect the branded keywords from their service. I created a function to take their response and return a pandas data frame. This function simplifies the process of collecting data for multiple domains.
import requests
from urllib.parse import urlencode, urlparse, urlunparse, quote
import pandas as pd
def get_seo_branded_data(brand, domain, database="us", export_columns="Ph,Po,Nq,Ur,Tg,Td,Ts", display_limit=10000, display_filter="+|Ph|Co|{brand}"):
  global key
  url_params={"type": "domain_organic",
             "key": key,
              "display_filter": display_filter.format(brand=brand),
             "display_limit": display_limit,
             "export_columns": export_columns,
             "domain": domain,
             "database": database
             }
  api_url="https://api.semrush.com/"
  qs = urlencode(url_params)
  u = urlparse(api_url)
  api_request = urlunparse((u.scheme, u.netloc, u.path, u.params, qs, u.fragment))
  #print(api_request)
  r = requests.get(api_request)
  if r.status_code == 200:
    results = r.text.split("\r\n") #
    headers = results[0].split(";") # save result headers to list
    table = [x.split(";") for x in results[1:]] #save columns to list of lists
    df = pd.DataFrame(table, columns=headers).dropna() #remove null types
    return df
  else:
    print("API call failed with code {code}".format(r.status_code))
    return None
Here is the code to get organic searches for “Tommy Hilfiger” going to Macy’s.
database="us"
macys="macys.com"
brand="Tommy Hilfiger"
macys_df = get_seo_branded_data(brand, macys, export_columns="Ph,Po,Tg") # only keyword, position and traffic
#we explicitly convert numbers to integers to be able to perform arithmetic operations later
convert_dict = {'Keyword': str, 'Position': int, 'Traffic': int} 
macys_df = macys_df.astype(convert_dict)
Here is the code to get organic searches for “Tommy Hilfiger” going to Tommy Hilfiger directly.
database="us"
tommy="usa.tommy.com"
brand="Tommy Hilfiger"
tommy_df = get_seo_branded_data(brand, tommy, export_columns="Ph,Po,Tg") # only keyword, position and traffic
#we explicitly convert numbers to integers to be able to perform arithmetic operations later
convert_dict = {'Keyword': str, 'Position': int, 'Traffic': int} 
tommy_df = tommy_df.astype(convert_dict)

Visualizing the branded keyword overlap

After we pull the searches for “Tommy Hilfiger” from both sites, we want to understand the size of the overlap. We accomplish this in the following lines of code:
macys_set = set(macys_df["Keyword"]) #this eliminates duplicates
print(len(macys_set)) # prints -> 4210
tommy_set = set(tommy_df["Keyword"])
print(len(tommy_set)) # prints -> 4601
in_macys_only = macys_set - tommy_set #in macys but not in tommy
print(in_macys_only) # prints -> 124
in_tommy_only = tommy_set - macys_set #in tommy but not in macys
print(in_tommy_only) # prints -> 515
We can quickly see that the overlap is significant, with 4601 keywords in common, 515 unique to Tommy Hilfiger, and 125 unique to Macy’s.
Here is the code to visualize this overlap as the Venn diagram illustrated above.
#See https://jingwen-z.github.io/data-viz-with-matplotlib-series6-venn-diagram/
import matplotlib.pyplot as plt
from matplotlib_venn import venn2
#calculating percentages
total = grp1.union(grp2)
print(len(total)) # prints -> 4725
both = grp1 & grp2 # set intersection
print(len(both)) #prints -> 4086
#difference from total
grp1_diff = len(total) - len(grp1)
both_diff = len(both)
grp2_diff = len(total) - len(grp2)
#percentages
print(grp1_diff/len(total)*100) # prints -> 10.9%
print(both_diff/len(total)*100) #prints -> 86.5%
print(grp2_diff/len(total)*100) #prints -> 2.6%
# Plotting
grp1 = set(macys_df["Keyword"])
grp2 = set(tommy_df["Keyword"])
fig = plt.figure()
fig.suptitle('Branded keywords overlap between Macys and Tommy Hilfiger')
fig.set_size_inches(18.5, 10.5)
v2 = venn2([grp1, grp2], set_labels = ('', ''))
v2.get_patch_by_id('10').set_color('yellow')
v2.get_patch_by_id('01').set_color('red')
v2.get_patch_by_id('11').set_color('orange')
v2.get_patch_by_id('10').set_edgecolor('none')
v2.get_patch_by_id('01').set_edgecolor('none')
v2.get_patch_by_id('11').set_edgecolor('none')
v2.get_label_by_id('10').set_text('Only Macys\n(2.6%)')
v2.get_label_by_id('01').set_text('Only Tommy\n(10.9%)')
v2.get_label_by_id('11').set_text('Both\n(86.5%)')
plt.show()
fig.savefig('overlap.jpg') #save image locally

Who ranks better for the overlapping keywords?

The most logical next question you would want to ask is that given how significant the overlap is, who commands more higher rankings for those. How can we figure this out? With data blending of course!
First, as we learned in my first data blending article, we will merge the two data frames, and we will use an inner join to keep only the keywords common in the two sets.
# combine keyword sets by keywords in common
merged_df = pd.merge(macys_df, tommy_df, how="inner", on="Keyword")
merged_df.groupby("Keyword").count().info() # we have 4086 unique entries
When we merge data frames and they have the same columns, they are repeated and the first columns include _x at the end and the second one includes _y. So, Macy’s columns end with _x.

Here is how we create a new data frame with the overlapping branded keywords where Macy’s ranks higher.
#if Macys' keyword position is lower, say 1 vs 6. It is ranking higher/better
macys_ranks_better = merged_df.query("Position_x < Position_y")
len(set(macys_ranks_better["Keyword"])) # we have 1075 better rankings
Here is the corresponding data frame where Tommy Hilfiger ranks higher.
#if Tommy's keyword position is lower, say 1 vs 6. It is ranking higher/better
tommy_ranks_better = merged_df.query("Position_x > Position_y")#.groupby("Keyword")
len(set(tommy_ranks_better["Keyword"])) # we have 3173 better rankings
Here we can see that while the overlap is big, Tommy ranks higher for many more branded keywords than Macy’s (3,173 vs. 1,075). So, is Tommy doing better? Not quite!
As you remember, we also pulled traffic numbers from the API. In the next snippet of code, we will check which keywords are pulling more traffic.
macys_ranks_better.groupby("Keyword").agg({"Traffic_x": np.sum})["Traffic_x"].sum() # Output -> 75026
tommy_ranks_better.groupby("Keyword").agg({"Traffic_x": np.sum})["Traffic_x"].sum() # Output -> 66415
Surprisingly, we see that, while Macy’s performs better for fewer keywords than Tommy Hilfiger,  when we add up the traffic, Macy’s attracts more visitors (75,026 vs. 66,415).
As you can see, sweating the details matters a lot in this type of analysis!

How different are the audiences

Finally, let’s use the branded keywords unique to each site to learn any differences in the audiences that visit each site. We will simply strip the branded phrase from the keywords and create word clouds to understand them better. When we remove the branded phrase “Tommy Hilfiger,” we are left with the additional qualifiers that users use to indicate their intention.
I created a function to create and display the word clouds. Here is the code:
from collections import Counter
import re
import nltk
from nltk.corpus import stopwords
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.pyplot as plt
nltk.download('stopwords')
def create_word_cloud(phrase_list):
  cnt=Counter()
  english_stopwords = set(stopwords.words('english'))
  for phrase in [x.replace("tommy hilfiger", "") for x in phrase_list]: # remove brand to learn what people want
    words = re.split(" ", phrase)
    for word in words:
      if len(word) > 0 and word not in english_stopwords and not word.isdigit():
        cnt[word] += 1
  word_cloud = [x[0] for x in cnt.most_common(25)]
  word_cloud_obj = WordCloud(max_words=25, background_color="white").generate(" ".join(word_cloud))
  #word_cloud_obj = WordCloud().generate(" ".join(word_cloud)) #default with ugly black background
  plt.imshow(word_cloud_obj, interpolation='bilinear')
  plt.axis("off")
  plt.show()  
Here is the word cloud with the most popular words left after you remove the phrase “Tommy Hilfiger” from Macy’s keywords.

Here is the corresponding word cloud when you do the same for the Tommy Hilfiger ones.

The main difference I see is people looking for Tommy Hilfiger products in Macy’s have specific products in mind, like boots and curtains, while when it comes to the brand site, people primarily have the outlets in mind. This might be an indicator that they intend to visit the store vs. trying to purchase online. This may also indicate that people going to brand site are bargain hunters while the ones going to Macy’s might not be. These are very interesting and powerful insights!
Given these insights, Tommy Hilfiger could review the SERPS and compare the difference in the messaging between Macy’s and their brand site and adjust it to appeal to their unique audience’s interests.

No comments:

Post a Comment