Test Design Advice for SEOmoz on their Landing Page Competition
I’m incredibly excited about the SEOmoz landing page competition. Not only did they chose to use the Offermatica platform but because of the awareness, education and information that this test will provide, countless marketers will experience the fun and excitement of testing. What is testing if not a competition?
Testing is also an ongoing learning experience. There is something to learn on every test, even tests that don’t perform well. Often results are highly counterintuitive. Be forewarned participants, testing can also be very humbling.
Those learning curves happen quite a bit. The very formation of OTTO Digital was due to the fact that getting great results from testing can be difficult, even for seasoned marketers. The technology is new and powerful, test designs and result analysis is highly quantitative and testing requires new ways of thinking about creative and messaging.
So, it’s no surprise that Rand’s testing methodology for the SEOmoz Landing Page Contest can use a bit of an improvement. I offer Rand some suggestions at the end of this post foremost because I admire Rand and want this test to be successful for his business. At the same time, his mistakes are common to many marketers who begin testing. Hopefully this post can help further the understanding of what kind of thinking is needed to ensure a successful test and add even more awareness, information and education around the event.
Testing Asks Questions
The questions start with what to test and then go to how to test. Here is where it gets tricky. A/B, A/B/C, 7x2 Multivariate (MVT), 4x3 MVT? These decisions on test design have a huge impact on the results.
Your test design should be predicated on the answers to five questions:
1. How much traffic will enter the test?
2. What do you want to learn?
3. What are the success metrics?
4. How long can you run the test?
5. Is your creative optimized?
Let’s go into these in a bit more detail. Remember, test success is determined prior to testing!
Traffic levels and the associated conversion rates (whatever you want to define conversion as) are the overriding factors for sound test design. This is because tests have to be designed so that they have the best chance to achieve confidence in the results. The more elements that are tested the more traffic (data) is needed to reach confidence and consequently the longer the tests need to run. This is not a bad thing (actually more data is often better), only a consideration that needs to be determined as a first course of business in determining how many pages or elements can be tested.
A/B or MVT
Traffic and associated conversions will help determine the first part of test design, what kind of test to create. A good rule of thumb is 100 conversions per page (or what refer to as recipe) for any test. So an A/B/C test should need about 300 conversions to reach confidence and stability and a 4x3 Multivariate test being an L9 array with 9 recipes would need 900 conversions. WARNING: This is just a rule of thumb. Your results may vary!
A/B testing produces the lowest signal-to-noise ratio. It is a great tool for getting clarity on single elements, even if those elements are entire pages. MVT allows us to gain greater understanding of the factor of influence the many elements on a page can have on conversion. This ultimately creates an optimized design based on results of a number of different variations simultaneously. This equation is done through the Taguchi methodology that produces orthogonal arrays as part of the test design. Knowing what matters and how much it matters allows us to design follow-up tests that continue to improve performance.
Confidence levels in your results are based on margin for error. This metric is driven by the actual performance of all your tested elements in relation to one another. The greater the discrepancy between results the more confident you can be in them. For example, in my A/B test if I have 18 conversions on page A and 2 conversions on page B, I will have high statistical significance that A is a better performing page (even with a small data set). However, if the results are 12-8 I will have low statistical significance because the margin for error is very high e.g. a conversion one way or another will greatly swing the results.
One of the biggest challenges we face at OTTO Digital is running tests for clients around set timeframes like seasonality and promotions. Anytime your results need to be achieved by a certain date you can compromise confidence because results are unpredictable. Ideally the data and most importantly the stability of the data will determine when your test is over, not your marketing calendar. Test stability is best determined by analyzing results over a period of time and looking for performance fluctuations. Stability is achieved when the tested elements exhibit similar performance behavior across enough data.
• The blue and red page reacted similarly to temporal factors (day of week)
• The blue page consistently outperformed the red page on a daily basis
• The green page sucked the entire time
From this daily performance data we can conclude with a high level of confidence that the cumulative results are stable.
Creative differentiation is another core foundation of good test design. When the creative elements have large variation between them, users are more likely to respond to one over another. Differentiation is also a key to driving understanding of what matters on the page. Many times we’ll go as far as to test page elements being present or not present to see what effect if any this has on conversion.
Creative methodology for testing also revolves around the idea that getting the best performing creative is not a single project but that testing allows for the creative process to be ongoing and iterative or agile. For many designers the concepts of differentiation and iteration are very hard to execute on (which after some frustration with outside designers led us to forming our own world-class design team at OTTO Digital) but this is the wave of the future for digital design.
Determining a Winner
Once we have confidence and stability we have a winner. Could there be more than one winner? Again, this depends on the test design. If source traffic to the page is segmented it’s quite possible we will. Let’s say frequent visitors to SEOmoz prefer one page while PPC traffic prefers another. If this were the case then we have two winners. Rand could then use the Offermatica tool to serve the best performing page based on rules of source or behavior. Many times though we see more differentiation in segment behavior in MVT when messages, images and offers are all tested. So maybe that’s part of the next test?
Advice for SEOmoz
My primary concern is that Rand and SEOmoz would like to run a test on every single entry. Based on their conversion rates and what I think will be a fairly large number of submissions this is not a good test design. With what Rand has disclosed about his traffic and conversion rates I would suggest running 5 page variations against each other….an A/B/C/D/E test. These pages should be selected based on not only how they look, but how they look compared to one another. This test design should produce a nice data set with high confidence in the 3-4 weeks timeline Rand has suggested.
Another option would be starting with a larger test, maybe 10 pages, and then look to drop pages from the test if they show consistent poor performance, thus increasing the distribution of traffic across the remaining pages and still working within a feasible timeframe for results. The downside to this scenario is that it only works if there is clear confidence among a handful of better performing pages. Otherwise you run the risk of needing the test to run longer to get better results, or worse, that you will never get confidence in results because the traffic is spread too thin across the tested pages.
Also to clarify test type, Rand is (correctly in my opinion) running an A/B or split test as his first course of action. This is the best design with smaller conversion volumes to get confidence. MVT or multivariate testing requires much more data. Rand made a reference that he was conduction multivariate test but in actuality what he described the contest as is an A/B or split test. A follow-up multivariate test of the winner with alternate elements would be a great idea and hopefully one that warrants a future contest.
Good luck to all the participants. This idea is an incredibly exciting mix of UGC, testing, social media and optimization. Hats off to Rand and everyone over at SEOmoz for doing this. Rand, may your conversions be many!