Metadata privacy matters – and this VPN promises to help
Nym Technologies says it may have a solution for traffic analysis
Encrypt your communications and online activities: you've probably heard these privacy-boosting tips before. Metadata, however, tends to be a rarer topic of conversation, but it can reveal a lot about who you are and what you do on the internet – and not even the best VPN apps can do anything about it.
As Whitfield Diffie and Susan Landau noted in the 1998 book Privacy on the Line, "Traffic analysis, not crypto analysis, is the backbone of communications intelligence." A few years later, former NSA General Counsel Stewart Baker would confirm this statement (Wired reported), by saying: "Metadata absolutely tells you everything about somebody’s life. If you have enough metadata you don’t really need content."
Claudia Diaz has known about the extent of metadata surveillance for a long time. A data scientist, she earned a PhD in anonymity and privacy at the University of Leuven between 2000 and 2005. "More specifically, I worked on mix networks," she told me. A theory proposed by the cryptographer David Chaum in the 80s as a way to cover up these little – yet important – traces we all leave behind, scattered all over our digital lives.
Now, nearly 20 years later, Diaz is the Chief Scientist at Nym Technologies, busy turning this academic concept into a consumer privacy tool.
NymVPN is the first piece of software to use the mix network infrastructure. It promises to be the "world's most private VPN" as it claims to do something competitors cannot – hide your metadata.
What is metadata?
Diaz describes metadata as a letter in an envelope. A physical letter is sealed, protecting its contents throughout the journey, and ensuring privacy between the sender and receiver.
This is what encryption guarantees for all of your online communications and activities – and it's also known as end-to-end encryption.
The message is only one part of the process, though. A letter is marked with the sender's name and needs another name and address – indicating its target destination. Then, the Post Office will record its weight, and a timestamp will be created when the letter departs and when it arrives at its destination. Put simply, "Metadata includes everything about the data that is not the content."
The most obvious ones are IP addresses, location, phone numbers, who you have spoken with, and when. Yet, the data packets executing your online activities leak metadata, too. How many bytes, the patterns they move to, timestamps, and so on.
Claudia Diaz is an Associate Professor at the COSIC research group of the Department of Electrical Engineering (ESAT) at the KU Leuven, where she leads the Privacy Technologies Team. Her research is focused on the design, analysis, and applications of technologies to protect online privacy and, in particular, technologies that keep private metadata to prevent traffic analysis, tracking, localization, profiling, and surveillance.
Following online traces
Internet service providers (ISPs), telecom firms, Wi-Fi routers: anything with access to the wire, or operating an intermediate step before you connect to the internet, can gather metadata.
The main problem with metadata is that, contrary to the content you share online, you don't have control over it. As Diaz says, it's like the difference between what you say and how you say it. You generally have control over the message you wish to convey, you choose your words and tone of voice. Yet, controlling your body language is a lot harder. These details can, however, be recorded and analyzed to assume what you don't want to say.
Metadata speaks the language of computers: strings of code, letters, and numbers. This makes it even easier to look for patterns, aggregate, analyze, and extract intelligence. These details are so powerful, Diaz explained, that they enable snoopers to understand even encrypted content without breaking the encryption.
"Not only are these data exposed by default in the internet protocols," she told me. "If you don't want to expose it by default, you need to start doing very sophisticated things."
A VPN, short for virtual private network, is a security software designed to boost your privacy when browsing the web – and it can hide some metadata. VPNs encrypt your internet connection and spoof your real IP address by rerouting your data in transit through one of its VPN servers via an encrypted tunnel.
When you connect to a VPN, your IP and location (both pieces of metadata) are masked and the content of your communication encrypted. Yet, according to Diaz, even a normal VPN would not protect you against traffic analysis. "If it does not disrupt the pattern of packets back and forth, then this pattern is preserved," she said.
In some instances, even using the Tor browser – generally considered more secure than VPNs as it's fully decentralized and shields your internet connections in at least three layers of encryption – isn't enough.
"That's better," said Diaz. "But because Tor is still circuit-based, you might still be able to figure out the input and output of each of these intermediaries. By making delays and covering traffic, Mixnet raises the bar even more."
Building the Mixnet infrastructure
As Nym Technologies stated in a blog post to announce the launch of NymVPN free beta testing: "With advancements in AI-powered data analytics, data surveillance is becoming increasingly powerful. What is needed are sophisticated decentralized networks capable of confusing all the attempts to track us, not only today but in the future."
Confusing. This is the key word to understand the concept of Mixnet. Based upon Chaum's idea of mix networks, whistleblower and security consultant at Nym Technologies, Chelsea Manning, came up with the concept while in prison for disclosing classified documents to the non-profit media organization WikiLeaks.
The real revolution of Mixnet is in the way it processes different data packets entering the server. Not only do they go through five different servers, but they also get "shuffled like a deck of cards" and covered with "network noise" along the way.
As the image above illustrates, NymVPN's Mixnet approach employs several network strategies to confuse data surveillance efforts. These include data fragmentation, dummy data packets, timing delays, and data packet shuffling.
"If you have enough noise, it starts becoming very hard to find the [digital] fingerprint because it gets buried under a lot of crap," said Diaz. Browser fingerprinting is indeed the practice of gathering and analyzing all the metadata information related to a user's operating system to create a unique identifier for each user.
NymVPN: not your usual VPN
NymVPN is supposed to be the first scalable real-world solution using the mix networks concept, counting over 700 Mixnet servers around the world at the time of writing.
The first time I talked with Nym Technologies CEO, Harry Halpin, when the company just launched NymVPN in Alpha in November last year, he told me that, while it may look like a VPN, the software the team is building is more of a sort of anti-artificial intelligence machine.
He said: "AI models collect a lot of data by finding some patterns in the data. Our VPN does the reverse. We add fake traffic, we mix traffic up, and we scramble the pattern."
- Fast mode is best suited for everyday online activities such as messaging, casual browsing, and streaming. As the name suggests, it offers better connection speeds. The VPN service spoofs your IP address by rerouting traffic through a fully decentralized network running on two-hop servers and the speedy WireGuard protocol.
- Anonymous mode is the go-to for protecting highly sensitive activities – and what really promises to set NymVPN apart from the competition. The traffic not only gets rerouted via five different servers, but it also uses the Mixnet infrastructure to make it difficult for snoopers to intercept your metadata.
Another important point when thinking about Nym's infrastructure is that more people using the software will translate into stronger security for all. Users will hide in the crowd – the larger, the better.
What's next?
As we have seen, Nym's Mixnet infrastructure comes as a revolutionary way to give you more control over your metadata exposure. Injecting covered traffic and reshuffling data packets comes at the cost of performance, however. The more noise the network is, the slower the internet connection will be. This is why NymVPN operates on a fully decentralized infrastructure.
Halpin believes, in fact, that no-log VPN services – which supposedly never retain users' data – are just half of the solution. That's because some providers like Private Internet Access (PIA) and Mullvad may have proven their no-log policies against authorities' requests, but there have also been instances like the infamous HideMyAss! case in which companies have handed over users' data under court orders despite their no-log claims.
The idea behind Nym was first born around 2015 from a project focused on mix networks carried out on a European level. Alongside Claudia Diaz, also Harry Halpin was involved in the project. He decided to take up the challenge of finding a way to deploy this technology on a commercial deployment – NymVPN was then born.
Nym has now more than 200 active decentralized nodes. The goal is to grow a community of professional people dedicated to running these independent nodes.
According to Diaz, the challenge now is to raise awareness among users about the risks of metadata surveillance and the need to use software like NymVPN.
She said: "Metadata is this abstract and unconscious thing. It's then so much harder to raise awareness about it with users. And even, if they have awareness, it's also challenging to tell them what to do about it. The key to improving the situation is to have highly usable technologies that do all the complicated jobs in the background."
That technology allegedly now exists – and is moving its first steps into the consumer world. While it's too early to say if it'll be able to live up to its promises, NymVPN certainly sounds like a big step forward to a more private digital world.
Chiara is a multimedia journalist committed to covering stories to help promote the rights and denounce the abuses of the digital side of life—wherever cybersecurity, markets and politics tangle up. She mainly writes news, interviews and analysis on data privacy, online censorship, digital rights, cybercrime, and security software, with a special focus on VPNs, for TechRadar Pro, TechRadar and Tom’s Guide. Got a story, tip-off or something tech-interesting to say? Reach out to [email protected]