Constant Tracking: How All the Major Websites You Visit Record Your Every Keystroke and Mouse Movement
While internet users have come to expect at least some sort of tracking by the websites and operating systems we visit and use, we are yet to fully grasp the concept of how invasive this tracking has really become. We also tend to focus on some particular products more than the others, however, it is apparent that this tracking/keylogging routine is way more intrusive than we may like to believe.
Researchers from Princeton’s Center for Information Technology Policy (CITP) have recently started a new series titled “No Boundaries,” which talks about how third-party scripts run and track a user’s every keystroke. The first post in the series focuses on exfiltration of personal data by session-replay scripts (via Motherboard.) “You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make,” the researchers write (emphasis is ours).
But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.
All of us have started to feel exactly this way – as if someone is really watching over your shoulder. While few years back it was all about Google that used to do this as users complained seeing advertisements based on their Gmail conversations, it is far more prevalent now. Some of the world’s most-visited sites run this software aka session replay scripts that track every move you make on their websites without a user’s explicit permission.
It is considered the “expected” behavior. “Take yourself offline if you really don’t want anyone to follow you” has become the advice of the decade whenever you start talking about user security or privacy.
How session replay scripts fail to anonymize data
Since this collected data is often shared with publishers and is tied with a user’s real identity, those third party companies can profile users based on data sent from multiple products or websites. Anonymity cannot “reasonably be expected,” researchers warn despite many websites’ continued promises that user data is anonymized.
Researchers followed top session replay companies on 482 of Alexa’s top 50,000 sites and discovered several security vulnerabilities in how this user data is stored and redacted. “Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details and other personal information displayed on a page to leak to the third-party as part of the recording,” they write.
This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes.
Some of these session replay scripts also track users when they start filling up a registration form but never submit it. One of them from the software company FullStory only redacts credit card fields if the “autocomplete” attribute is set to “cc-number.” If not, it will collect complete credit card info.
These scripts also make even the HTTPS secured websites insecure as they run non-encrypted HTTP pages.
We provide a specific example of how recording services can fail to do so. Once a session recording is complete, publishers can review it using a dashboard provided by the recording service. The publisher dashboards for Yandex, Hotjar, and Smartlook all deliver playbacks within an HTTP page, even for recordings which take place on HTTPS pages. This allows an active man-in-the-middle to injecting a script into the playback page and extract all of the recording data. Worse yet, Yandex and Hotjar deliver the publisher page content over HTTP – data that was previously protected by HTTPS is now vulnerable to passive network surveillance.
Walgreens, one of FullStory’s clients, also had its user data leaked to FullStory despite heavy redaction as sensitive information, including medical conditions and prescriptions, were leaked to FullStory. The company has now said it would stop using session replay scripts. “We take the protection of our customers’ data very seriously and are investigating the claims made in the study that was published yesterday,” Walgreens said. “As we look into the concerns that were raised, and out of an abundance of caution, we have stopped sharing data with FullStory.”
While some companies have suggested they would stop using these invasive analytics tools, it is unlikely that websites and advertisers will stop pushing for these tools as they continue to record every keystroke, mouse movement, and scrolling behavior.
“I don’t think most users realize that when they interact with a website that their information about that visit is being shared with 40 to 100 third parties,” Ashkan Soltani, a security and privacy researcher said. While many expect these companies to record only what a user visits, they are now capturing “not only that I visited that page, but also what content I submitted.”
– Some ad blockers have started to block popular session replay scripts, as well. Here’s the complete list of websites profiled in the Princeton research along with the session replay companies they use (includes some big names like Lenovo, Grammarly, Udemy, Norton, Souq, Kaspersky, Godaddy, Michael Kors, Microsoft, Samsung, and others).