IP address truncation is not anonymization

URL: 00f.net
3 comments

TFA correctly points to (subnet-structure-preserving) encryption as the right way to anonymize IP addresses, although for some reason it calls it "IPCrypt" instead of "Crypto-PAn."

https://en.wikipedia.org/wiki/Crypto-PAn

Anonymization is supposed to be irreversible. This scheme is reversible by whoever has the key. I don't really get the point of it.

Any stable hash can't truly anonymize IP addresses because there is a finite amount of outputs easily computable via ordinary machines.

Which is why we pepper and salt our hashes.

If you store the blood type of a patient hashed, the problem is that there are only so many blood types. So the same blood type will have the same hash value and attackers could (1) just infer statistically which are which, (2) crack one and get the rest and (3) group users even without cracking the hash.

That means we need to ensure the input values are getting more complex by prefixing them with secrets from elsewhere.

If you have one secret (e.g. stored in an environment variable) that would be the pepper. Adding pepper just makes cracking harder, but since it is the same for each value, it is not enough. But since it is not stored next to the input value it makes attacks harder.

A salt would be a per value secret that is stored for each blood type and prepended on hash.

The two in combination make it much harder to get from the hashed value to the input value without having both salt and pepper.

This can be anonymization, if you throw away the key. If you keep it, it worse than encryption since now attackers can also differenciate subnets.

Can we get a tag for AI slop generated articles like this one?

If the author couldn't be bothered to write it, why would anyone think we should bother to read it?

Why do you feel this was generated by AI?

We would also truncate lat/lot coordinates.