Articles tagged data science

  1. Finding a Confidence Interval for Lift

    The motivation for this blog post is simple: I was having trouble searching Google for a simple formula for the confidence interval of lift. Lift is a very important metric in our industry, and after all the work I put into researching it I want to make sure the next person to google ‘confidence interval of lift’ has an easier time.

  2. Distributed Metrics for Conversion Model Evaluation

    At Magnetic we use logistic regression and Vowpal Wabbit in order to determine the probability of a given impression resulting in either a click or a conversion. In order to decide which variables to include in our models, we need objective metrics to determine if we are doing a good job. Out of these metrics, only the computation of lift quality (in it’s exact form) is not easily parallelizable. In this post, I will show how the computation of lift quality can be re-ordered to make it distributable.

  3. Mag-a-thon

    Our engineers from all offices participated in the latest hackathon, cleverly titled “Straight Outta Mag-a-Thon”.

  4. Demystifying Logistic Regression

    For our hackathon this week, I, along with several co-workers, decided to re-implement Vowpal Wabbit (aka “VW”) in Go as a chance to learn more about how logistic regression, a common machine learning approach, works, and to gain some practical programming experience with Go.

    Though our hackathon project focused on learning Go, in this post I want to spotlight logistic regression, which is far simpler in practice than I had previously thought. I’ll use a very simple (perhaps simplistic?) implementation in pure Python to explain how to train and use a logistic regression model.

  5. VIRBs and Sampling Events from Streams

    VIRB (Variable Incoming Rate Biased) reservoir sampling is a streaming sampling algorithm that stores a representative fixed-sized sample of events from the recent past (the user specifies the desired mean age of samples), even when the incoming rate varies. It is heavily inspired by reservoir sampling.

  6. Detecting Brands in User Search Queries

    Capturing user intent with brands can be valuable, especially in online advertising. In the online advertising domain, brand detection can help capture user interests and improve user modeling, which, in turn, can lead to an increase in precision of user targeting with ads relevant to their interests and needs.

  7. Measuring Statistical Lift on Search Categories

    One of the most popular features of the Magnetic Insight platform is our category rankings for an advertiser’s audience of page visitors. The rankings give a completely unbiased look into which search categories are the most popular amongst the users that visit a customer’s different web pages.

    A category lift report from Magnetic Insight

  8. Information Theoretic Metrics for Multi-Class Predictor Evaluation

    How do you decide if a predictive model you have built is any good? How do you compare the performance of two models? As time goes on, data changes and you have to rebuild your models — how do you compare the new model’s behavior on the new data with the old model’s behavior on the old data?

  9. One-Pass Distributed Random Sampling

    One of the important factors that affects efficiency of our predictive models is the recency of the model. The earlier our bidders get new version of prediction model, the better decisions they can make. Delays in producing the model result in lost money due to incorrect predictions.

    The slowest steps in our modeling pipeline are those that require manipulating the full data set — multiple weeks worth of data. Our sampling process has historically required two full passes over the data set, and so was an obvious target for optimization.

  10. Click Prediction with Vowpal Wabbit

    At the core of our automated campaign optimization algorithms lies a difficult problem: predicting the outcome of an event before it happens. With a good predictor, we can craft algorithms to maximize campaign performance, minimize campaign cost, or balance the two in some way. Without a good predictor, all we can do is hope for the best.

  11. SKIP, The Search Keyword Intent Predictor

    Magnetic specializes in search retargeting, thus we really need to understand our users’ searches — it is our bread and butter. We need to recognize what a user’s search means in an understandable way for both humans and computers. This is why we map each search to a category (e.g. “Automotive”), brand (e.g. “BMW”), or other intent data. Our keyword categorization service and Search Keyword Intent Predictor (SKIP) is our core technology which addresses this need.

  12. Search Query Categorization at Scale

    Classification of short text into a predefined hierarchy of categories is a challenge. The need to categorize short texts arises in multiple domains: page keywords and search queries in online advertising, improvement of search engine results, analysis of tweets or messages in social networks, etc.

    The meetup garnered a large
audienceThe meetup garnered a large audience.