How to prevent bots and farms from taking over and ruining your online experiment
Many contributors to our blog have raised awareness for the costs of recruiting participants through online labor markets and platforms, like MTurk, Prolific, and, recently, also CloudResearch Connect. For instance, in this recent post, I discuss the criticism that online experiments increasingly face due to participants using server farms or VPS/VPNs to hide their location, bots, and automated scripts to quickly complete many studies (Ahler et al. 2019, Dennis et al. 2020, Kennedy et al. 2020). However, in the same post, I also indicate there could still be good reasons to recruit participants through these online labor markets and platforms (see also Jeremy Bentley's post). Furthermore, many of the these threats can be overcome using pre- and post-screening techniques (Bentley 2021). The idea behind these screening techniques is that inattentive participants, bots, and automated scripts are more likely to fail them, leading to a cleaner sample.
In this post, I share some simple, complementary techniques to filter participants before they take part in your experiment. These techniques rely on some programming, and I will present them using oTree, which is an easy-to-use, free, and open source platform for experiments and surveys (Chen et al. 2016). However, you can apply these techniques on any platform that allows javascript. The week before I published this post, these techniques screened around 15-25% of the participants on MTurk after applying basic MTurk criteria (i.e., located in the U.S., at least 500 hits approved, and at least 95% approval rate). However, I suspect the majority of the participants who drop would also have dropped if I used traditional pre- and post-screening questions. The advantage of the techniques that I will showcase in this post is that they are more objective and fall in the pre-screening category. They also do not require any storage of additional data on your server or a third party's server. This last advantage can be a requirement for researchers that must adhere to some data privacy and protection requirements (e.g., GDPR).
There are three techniques I nearly always use one way or another. First, I include an initial page that presents participants with a reCAPTCHA. Second, on the same page, I include an adblock detection script, which ensures that the participant accepts scripts to be run while participating in the experiment. This is not only essential for most experiments I run, but it also makes these techniques work. Lastly, the same initial page typically also includes a proxy/vpn detection script.
The three techniques, when presented on an initial page, work as follows. First, participants must complete the reCAPTCHA succesfully to start the experiment. Also, if either the Adblock and proxy/vpn detection script is triggered, the initial page redirects participants to a "dead end" page. On this "dead end" page, participants are instructed to either stop blocking scripts or drop their proxy/vpn connection before reclicking the invitation link. I have designed a screening app in oTree that features such an initial page. Its code can be downloaded here.
reCAPTCHA
To feature a reCAPTCHA on an initial page, you must first create a v2 reCAPTCHA on the Google Developer page. We implement the reCAPTCHA using the following html code as part of the form on the page:
<div class="g-recaptcha" data-sitekey="ENTER_CAPTCHA_SITE_KEY_HERE"></div>
The part of the code that says "ENTER_CAPTCHA_SITE_KEY_HERE" should contain the reCAPTCHA site key that google shares with you after you created the v2 reCAPTCHA on the Google Developer page. Next, you must include two scripts on your page. The first script is necessary to run the reCAPTCHA:
<script src='https://www.google.com/recaptcha/api.js'></script>
The second script, which is also included on the initial page, triggers an alert if the participant did not complete the reCAPTCHA:
<script>
window.onload = function() {
var recaptcha = document.querySelector('#g-recaptcha-response');
recaptcha.required = true;
recaptcha.oninvalid = function(e) {
alert("Please complete the Captcha!");
}
}
</script>
Together, the html code and scripts should make the reCAPTCHA function effortlessly. On the Google developer page, you can also increase the security that your reCAPTCHA provides on the settings page. I always put it on "Most secure."
Adblock and proxy/vpn detection script
To detect whether participants are using a VPN/VPS, proxy, or server farms, you can use a IP Geolocation API service. While there are free options out there, a high-quality, accurate, and current API is IP API. You can get API access to their IP Geolocation service by signing up for a pro plan here. After signing up for their pro plan, you will receive a Pro IP key, which you can use to ask the participant to verify that they are not using a VPN/VPS, proxy, or server farm ip address.
On your initial page, add the following script:
<script>
var xhr = new XMLHttpRequest();
xhr.open("GET", "https://pro.ip-api.com/json/?fields=status,message,proxy&key=ENTER_PRO IP_API_KEY_HERE", true);
xhr.onreadystatechange = function() {
if (this.readyState === 4 && this.status === 200) {
var response = JSON.parse(this.responseText);
if (response.status !== 'success') {
console.log('query failed: ' + response.message);
return
}
// Redirect
if (response.proxy === true) {
window.location.replace("../Proxy.html" %}");
}
} else if (this.readyState === 4 && this.status === 0) {
window.location.replace("../Adblock.html" %}");
}
};
xhr.send();
</script>
The part of the code indicating “ENTER_PRO IP_API_KEY_HERE” should contain the Pro IP API key you received after signing up for IP API’s pro plan. There are two more edits that the last script requires. First, this script checks whether the participant’s IP address is listed as a proxy server, VPN/VPS, or server farm address. If the response is true, it will redirect the participant to a “dead end” page called “Proxy.html.” Similarly, if the script detects that the script is not working, which happens if the participants disable scripts using some adblock addin or their browser, then it will redirect the user to a “dead end” page called “Adblock.html.” If you also want to use two “dead end” pages, you will need to update their locations on the “../”. You can also use the Adblock and proxy/vpn detection script on the other pages of your experiment to ensure that participants do not temporarily bypass your screening attempt.
Summary
You can use simple programming techniques to filter a large chunck of the undesirable participants before they take part in your online experiment. Since they rely on a bit of programming, I am happy to receive any suggestions to improve the code I shared in this post. Please leave suggestions in the comments or send them by email, so I can include them as an edit in this post (with a mention of course)!