No one ever tried to calculate the cumulative time human beings have spent over the years on identifying traffic lights, storefronts, and bridges in an attempt to convince reCAPTCHA tools of our human nature.
I bet that the human-hours spent on this activity would add up to a significant amount of time. This is a testament of just how popular this method of bot detection is. Google’s reCAPTCHA is a leading tool for detecting abusive traffic on websites, and most users encounter this mechanism multiple times a day, often without even realizing it, as the new version is completely “invisible” to the end user.
In this article we will go over what is new in the reCAPCHA v3 for enterprise, and why the latest update still falls short when it comes to answering the needs of enterprise customers.
The latest addition to the reCAPTCHA family
V2 is still the most common reCAPTCHA method that consists of "i am not a robot" checkbox or an invisible reCAPTCHA badge. But this method is no longer foolproof against bots, as services are available to make it easy for bots to bypass these challenges. At the same time, the method adds friction to legitimate users, especially on mobile devices.
While bots manage to bypass the captchas, legit users suffer from created friction, causing conversion rates to drop. So we get a ‘lose-lose’ situation: conversion drop on one hand and poor bot mitigation on the other. Mobile devices are a particularly weak spot for reCAPTCHA V2 - causing extremely high friction for mobile users.
To counteract usability issues, in late 2018 Google launched a new version called reCAPTCHA v3 that doesn't require user interaction, while offering superior bot protection. V3 dropped the captchas altogether and introduced the completely frictionless approach.
A ~year later, reCAPTCHA v3 for enterprise was released. It adds enhancements specifically designed to protect enterprise businesses: detection with more granular scores, reason codes for risky events, and the ability for advanced customization.
reCAPTCHA enterprise: the good, the bad and the ugly
Like it’s predecessor, v3 for enterprise verifies if an interaction is legitimate without the need for any user interaction. And just like v3, it interprets a variety of signals from the user to return a simple API call returning a score instead of a boolean value as in V2.
Based on the score, the webmaster has the ability to set thresholds for various actions. For instance, requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content (or other nefarious actions that we are all familiar with.)
Below, we cover the v3 for enterprise value proposition for enterprises: the improvements it shows and what is still lacking in order to properly solve enterprise challenges.
The good: improved accuracy, auto-tuning by labels and going beyond bots
V3 for enterprise starts with a baseline pre-trained model. The tool provides labels for fraud cases that it then uses as feedback to tune the model and adapt to the unique characteristics of the legit traffic on the one hand, and the fraud patterns on the other. This approach ensures better and continuously improving accuracy and coverage.
There are indications in the API documentation that it also opens up to cover not only bots but also other attack vectors, which is a welcome development indeed (note: the type ‘Automation’ is the bots solution).
The bad: lack of scoring threshold
Enterprise analyzes user interactions and returns a score (1.0 is very likely a good interaction, 0.0 is very probably a bot). Based on the score, the user can take variable action in the context of your site. However, it suffers from the same shortcomings of v3 in this regard: there are significant challenges of actually taking action based on its scores.
This new version learns by seeing real traffic on your site and it is for this reason that scores, set in a staging environment or soon after implementing may, differ from production. Additionally, scores without thresholds are useless. Google's development documentation states that webmasters/users can:
“...decide on thresholds by looking at your traffic in the admin console. By default, you can use a threshold of 0.5.”
Stating that ‘a threshold should be implemented’ is easier said than done. Actually calculating and adjusting thresholds to get optimal accuracy is a very complex, ongoing task that needs to be managed and maintained by data scientists, it is beyond the scope of expertise of a webmaster. What's more, using 0.5 as a default threshold, doesn't provide proper accuracy.
The bottom line, generalizing the solution with a simple threshold doesn't provide the required accuracy for efficient fraud detection. At the same time, there's no simple way to tune the system. The only way to improve the efficiency of reCAPTCHA enterprise bot detection is to train your own Machine Learning algorithm using the scores with some other data points. This approach adds complexity that is prohibitive for most cases.
Regardless of the issues mentioned above regarding thresholds, detection and scoring of the tool is just not accurate enough. A quick search of score related issues will reveal many complaints on detection accuracy. Our own research team wrote a bot that, again and again, easily beat the 0.5 default threshold.
The ugly: limitations that have not been addressed
Despite significant moves in the right direction, there are several big gaps that still need to be addressed, sooner rather than later:
- Support on mobile is severely lacking: on mobile web the solution is supported but according to many complaints it delivers poor accuracy
- The lack of coverage for native apps. reCAPTCHA remains unsupported on iOS, and only partially on Android. This shortcoming will push fraudsters to the native, which is the weakest link for fraud detection already.
- The lack of support for different accuracy and precision levels. Allowing more flexible mitigation for the services is a requirement for fraud detection tools. Webmasters must be able to adjust the settings for example to block on high severity, challenge on lower severity, adapt per different use cases / financial risk. The one-fits-all approach simply doesn't work as each organization's needs, and fraud patterns are different.
- Inability to provide positive indicators to go beyond negative indicators for risk and fraud. At SecuredTouch, we call them "safe sessions." Whitelisting specific users allows our customers to provide better UX with less friction.
The future of fraud detection: flexibility and risk-based scoring
We've explored how the latest version of reCAPTCHA still lacks critical features such as mobile and native support and has severe issues with accuracy due to the limitations of the scoring threshold. Flexibility and accuracy levels are severely lacking to the point of being useless for most enterprise customers.
Yet, it is undoubtedly a positive step that will help improve the overall security of eCommerce merchants’ digital channels. reCAPTCHA v3 for enterprise is seamless and invisible to the user, supports auto-tuning by labels, and is moving towards covering more use cases than just bots.
This aligns closer to the approach we take at SecuredTouch where user experience is put front and center. We take this new update from Google as a vote of confidence that we are leading the way for fraud detection technology in the right direction. Our strategy focuses on ensuring a friction-free customer journey for trusted users. By deploying Behavioral Biometrics, the flexibility of machine learning algorithms means that we don't need to compromise on accuracy.