The Fall of Constantinople for Amazon's Mechanical Turks?

On May 1453, The Ottoman Empire captured Constantinople, the then-capital of the Byzantine Empire. While the city’s walls where widely recognized as the strongest in Europe, the city fell under the Ottoman leadership of the 21-year-old Sultan Mehmed II, who led the Ottoman Turks against the forces of Byzantine Emperor Constantine XI Palaiologos. Following its capture, Constantinople was established as the new capital of the Ottoman Turks, and is presently known as Istanbul in Türkiye.

The Fall of Constantinople marked the end of the Roman Empire, which had lasted for nearly 1,500 years. Next to marking the end of the Roman Empire, this event also opened the door for Ottoman conquests into the Balkans, which ultimately resulted in the Battle of Vienna in 1683. While the Ottoman Turks’ conquests into the Balkans led to devastating sieges, it also had positive effects as many refugees escaped to Western Europe and helped to shape humanism and the Renaissance, and thereby helped to surge an interest in Classical scholarship.

While the Ottoman Turks caused turmoil in the late Middle Ages and the Age of Reformation, Amazon’s Mechanical Turks are causing a turmoil in experimental research at this moment. While early studies documented that Amazon’s Mechanical Turk participants were valid proxies for experimental accounting research (e.g., Farrell, Grenier, and Leiby, 2017), there are increasing concerns about the quality of Amazon’s Mechanical Turk (MTurk) data. For instance, AccountingExperiments.com contributor Jeremy Bentley (2021) writes that “MTurk presents challenges related to statistical power and reliability” and while these challenges are not unique to MTurk, they are “more prevalent than in research conducted with other participant pools.” Another study by Dennis, Goodson, and Pearson (2020) in Behavioral Research in Accounting documents that there is online worker fraud that threatens the integrity of MTurk data as participants may use Virtual Private Servers.

Although these developments have sieged the walls of Amazon’s Mechanical Turk’s fortress, the last defenders of MTurk’s fortress have not yet surrendered. While in the battle of Constantinople, the two first attacks on the Gate of Saint Romanus were met with fierce resistance by its last defenders, the third attack on the gate by the Janissaries was successful. Our modern-day Janissary seems to be Cameron S. Kay from Union College. Using a pre-registered unpublished study (Link), he shows that there is a positive correlation in MTurk for items that have contradictory content. For instance, the items “I talk a lot” and “I rarely talk” have a significant positive correlation in his sample of MTurkers. Figure 1 reports the correlations of contradictory statements for MTurk participants without screening procedures, MTurk participants with screening procedures, and CloudResearch Connect participants.

Figure 1: Correlations of contradictory items among MTurk and CloudResearch Connect participants. Adapted from Kay (2024)

This observation aligns well with my own recent experience with MTurk participants. In one experiment I recently ran at Amazon’s Mechanical Turk, I included a customary feedback box and found that subsequent observations had the same feedback. Figure 2 reports this data. Such a pattern of data suggests that there is work of bots or farms and this pattern persisted over approximately 1,000 observations. Such a waste of money.

Figure 2

In another study, where I conducted several experiments over a period of time and measured participants’ financial literacy, I found that the average financial literacy of participants decreased significantly over time, potentially indicating a decline in the quality of MTurk participants.

While the fortress of the Amazon Mechanical Turks seems to be captured, there are some important observations. First of all, the study by Kay (2024) shows that there is still a positive correlation of contradictory items on CloudResearch Connect. Also, this correlation is significantly different (with a huge margin) to the MTurk data. This indicates that there are places that have not been subjugated to the forces of bots, farms, and maybe generative AI. Anecdotally, I hear that experiences with participant panels such as Qualtrics, CloudResearch Connect, and Prolific are better, which gives hope for current experimental research.

However, similarly how the Fall of Constantinople opened the gate for the Ottoman Turks to conquer parts of Europe, the fall of Amazon’s Mechanical Turk may open the gates for bots, farms, and the influence of Gen AI to conquer other online participant pools. Therefore, researchers should keep a finger on the pulse of these pools and conduct sanity checks. Also, it is important for the experimental community to think of ways in which the influence of these factors can be addressed or mitigated.

Finally, it is important for researchers to consider the time-dependency of a sample. To draw a parallel to the fifteenth century, the Janissaries that conquered Constantinople may at the moment that they were celebrating the Ottoman victories not have realized that the Roman Empire was once the most powerful empire on earth. Similarly, researchers nowadays should pay attention to the time stamp as to which MTurk data was collected, as the decline of the MTurker quality seems to be a more recent phenomenon.

To defeat the Ottoman Turks in the Battle of Vienna (1683), the Holy Roman Empire of the German Nation (consisting of many territories, states, duchies, and autonomous regions and cities) and the Polish-Lithuanian Commonwealth had to bundle the forces of many to defeat the Ottomans in the Battle of Vienna. Similarly, to defeat the bots, farms, and usage of generative AI in online participant pools, researchers should concert their efforts to come up with screening mechanisms to defend what is important to us. I am optimistic that although these bots and farms may have won the battle, experimental researchers will win the war.

References

Bentley, J. W. (2021). Improving the statistical power and reliability of research using Amazon Mechanical Turk. Accounting Horizons, 35(4), 45-62.

Dennis, S. A., Goodson, B. M., & Pearson, C. A. (2020). Online worker fraud and evolving threats to the integrity of MTurk data: A discussion of virtual private servers and the limitations of IP-based screening procedures. Behavioral Research in Accounting, 32(1), 119-134.

Farrell, A. M., Grenier, J. H., & Leiby, J. (2017). Scoundrels or stars? Theory and evidence on the quality of workers in online labor markets. The Accounting Review, 92(1), 93-114.

Kay, C. S. Extraverted introverts, cautious risk-takers, and selfless narcissists: A demonstration of why you can’t trust data collected on MTurk. Accessible via: (last accessed May 7, 2024).

Authors

Christian Peters

Assistant Professor in Accounting