We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham e-Theses
You are in:

Statistical modelling of clickstream behaviour to inform real-time advertising decisions

JESSOP, RYAN,DANIEL (2020) Statistical modelling of clickstream behaviour to inform real-time advertising decisions. Masters thesis, Durham University.

PDF - Accepted Version


Online user browsing generates vast quantities of typically unexploited data. Investigating this data and uncovering the valuable information it contains can be of substantial value to online businesses, and statistics plays a key role in this process.

The data takes the form of an anonymous digital footprint associated with each unique visitor, resulting in $10^{6}$ unique profiles across $10^{7}$ individual page visits on a daily basis. Exploring, cleaning and transforming data of this scale and high dimensionality (2TB+ of memory) is particularly challenging, and requires cluster computing.

We outline a variable selection method to summarise clickstream behaviour with a single value, and make comparisons to other dimension reduction techniques. We illustrate how to apply generalised linear models and zero-inflated models to predict sponsored search advert clicks based on keywords.

We consider the problem of predicting customer purchases (known as conversions), from the customer’s journey or clickstream, which is the sequence of pages seen during a single visit to a website. We consider each page as a discrete state with probabilities of transitions between the pages, providing the basis for a simple Markov model.

Further, Hidden Markov models (HMMs) are applied to relate the observed clickstream to a sequence of hidden states, uncovering meta-states of user activity. We can also apply conventional logistic regression to model conversions in terms of summaries of the profile’s browsing behaviour and incorporate both into a set of tools to solve a wide range of conversion types where we can directly compare the predictive capability of each model.

In real-time, predicting profiles that are likely to follow similar behaviour patterns to known conversions, will have a critical impact on targeted advertising. We illustrate these analyses with results from real data collected by an Audience Management Platform (AMP) - Carbon.

Item Type:Thesis (Masters)
Award:Master of Science
Keywords:variable selection; markov chain; hidden markov model; generalised linear model; clickstream; zero-inflated model
Faculty and Department:Faculty of Science > Mathematical Sciences, Department of
Thesis Date:2020
Copyright:Copyright of this thesis is held by the author
Deposited On:09 Sep 2020 10:03

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter