Detecting Brands in User Search Queries
Capturing user intent with brands can be valuable, especially in online advertising. In the online advertising domain, brand detection can help capture user interests and improve user modeling, which, in turn, can lead to an increase in precision of user targeting with ads relevant to their interests and needs.
Magnetic focuses on understanding user intent. An important intent source is user search queries, which we use for search retargeting. In a previous article we discussed how we categorize queries, and in our hackathon project we showed that we also detect brands from search queries, but until now we’ve never explained how we do this.
Our brand detection technique can be used in conquesting campaigns or in campaigns where marketers do not want to show ads to users familiar with certain brands. Additionally, we can use brands detected from searches as a feature for predictive modeling tasks (e.g. click-through rate and conversion rate prediction), reporting, and search analytics.
Brand detection has been in production at Magnetic for more than a year now, but recently we did some improvements and published a paper describing how it works.
What We Understand by Brand Detection
With brand detection, we detect queries related to a brand name or product under a specific brand. E.g., the brand “Apple” should be detected not only in a query such as “apple store”, where the brand name appears explicitly, but also in queries like “iPad” or “iPhone”.
Similarly, if the brand “Coca-Cola” is defined, the query “Fanta” will be detected as related to the “Coca-Cola” brand. One may argue that “Fanta” also names a brand, but in our approach we focused on detecting brands from a predefined list. So, if “Fanta” is not in the predefined list of brands, and it is related to the “Coca-Cola” brand strongly, we detect it as part of the brand.
The list of brands was defined manually, and comprises about 1,000 well-known brand and company names. Each brand is associated with one or more Wikipedia pages. For example, “Coca-Cola” is associated with Coca-Cola and The Coca-Cola Company wiki pages.
Data Set Preparation
The main idea is to map user search query to related concepts, represented by Wikipedia pages, similar to how we perform query categorization.
We download the Wikipedia archive, parse it, extract fields such as title, alternative names, links and anchor texts of links, categories, abstract, article text, and section headers. We create a Lucene searchable index. More details on this entity search step can be found in related paper.
Some of the fields are used as search fields; others are used as alternative names of wiki pages; and some of them for the subsequent task of brand detection.
Doing Brand Detection
Entity search includes two main steps:
- Search in the index and retrieve candidate Wikipages
- Entity “back-mapping” — mapping entity alternative name to the query
This first step, representing queries by Wikipedia pages, is the same as we do for query categorization, and is described in detail in our paper and a previous blog post.
To briefly illustrate the process, we discuss the following example. Consider the query “Galaxy S4 vs Apple 5S smartphone”, and suppose a search in our index for it got the following results (names correspond to Wikipedia page titles):
- iPhone 5S
- Samsung Galaxy S4
- Smartphone patent wars
- Samsung Galaxy S5
- Samsung Galaxy S4 Mini
- iPhone 5C
- Apple Inc.
- Samsung Galaxy
The results in bold are the search results which are mentioned directly in the query. We want to keep only those results and remove the rest. This is done by the entity back-mapping step.
Note that the first two search results are the most relevant to the query, but this is not always the case, which is why we need the second post-filtering step. The third result is also very relevant, but does not represent the entity which is directly mentioned in the query. If we would make this third result eligible for brand detection, we would detect many more brands, as the first sentence of the article mentions all important smartphone manufacturers. The search result list also contains other products which are relevant, but do not match the query intent precisely. So when we select the list of candidate entities/concepts, we then post-filter in second step. In the second step, entity back-mapping, we are mapping entity alternative names of all search results back to the query.
Here we also use the results of Wikipedia parsing for having all relevant alternative names for Wikipages (entities) such as Galaxy S4 for “Samsung Galaxy S4” wikipage or 5S for “iPhone 5S”. The items in bold were successfully mapped back to the query by their alternative names.
Now, with a list of Wikipedia pages relevant to user query,
we need to find out if those Wikipages represent some Brand.
So, for example, is
Fanta related to Coca-Cola strongly?
We tried many approaches, but finally the best was simple approach, which works as follows:
- First, we prepared all alternative names of each brand by simply taking titles and redirects of Wikipedia pages corresponding to our brands. So for example, for “Coca-Cola” we would have following list: “Coca-Cola”, “CocaCola”, “Coca Cola”, “The Coca-Cola”, and so on
- For each search result (Wikipage mapping to query), we take the first sentence of the article’s abstract
- We search for alternative names from the first step in the first sentence of the Wikipage and return matching brands.
Using this approach we can detect brands not specifically mentioned in our example query — “Samsung”, which is a brand in our list, and which is mentioned in abstract of the “Samsung Galaxy S4” page — as well as those that are explicitly mentioned, like “Apple”. Then all eligible results were evaluated for brands as follows:
Based on detecting the “Apple” brand alternative names in “iPhone 5S” and “Apple Inc.” wiki pages, we were able to detect “Apple” brand and also by searching “Samsung” alternative names in title and first sentence of “Samsung Galaxy S4” wikipage we detected Samsung brand.
Confidence scores of detected brands are based on search scores which shows similarity of wikipage to query. We return combination of detected brands.
Our brand detection works quite well. On tested queries it achieved over 80% precision and recall. During our research, we discovered that only about 10% of search queries mention brands or products relevant to brands. We are also sharing a data set of annotated queries along with our results.
If you are more interested in details of our approach please check our paper, or the talk we gave about it at the Query Understanding Workshop at WSDM 2016 in San Francisco.