Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The question I have isn't why you need analytics but why you'd ever need any PII in the data. I don't care whether Bob clicked the button I only care whether 1% or 50% of users click the button. Or if those who clicked button A are likely to click button B so they should be closer together. Analytics should be anonymous usage statistics not tracking individuals. We are clumping two things together where one is bad and the other is useful and mostly harmless to integrity.


That’s the idea but to know that an anonymous user who has clicked button A goes on to click button B requires you to track that user via some kind of random ID that uniquely identifies their browser/device. This new Swedish ruling says that ID is itself PI.


How does the ruling come to that conclusion? How does an ID that uniquely identifies a user but can't be used to trace back to a physical person be PII?

Reading the linked rulings it seems like it's not the IP (even though it's only blanked at the last octet) but rather other cookie values, which may in turn be traceable to the user?

Of course if a cookie value is sent and in some other system that same cookie value is stored next to a user's name, then that cookie value is definitely PII and can't be sent via GA, that much I understand.

The key passage from the longest ruling (DI) seems to be

Dessa identifierare har skapats med syftet att kunna särskilja individuella besökare, såsom klaganden. De unika identifierarna gör därmed besökarna på Webbplatsen identifierbara. Även om sådana unika identifierare (enligt punkt 1 ovan) i sig inte skulle anses göra enskilda identifierbara, måste det dock beaktas att dessa unika identifierare i det aktuella fallet kan kombineras med ytterligare element (enligt punkterna 2–4 ovan) samt att det är möjligt att dra slutsatser i förhållande till information (enligt punkterna 2–4 ovan) som medför att uppgifter utgör personuppgifter, oaktat om IP-adressen inte överförts i sin helhet

Basically: the random ids aren't enough by themselves, nor is the IP, but the IDS together with partial IPs and something else is.

I don't know what the bottom line is though. And that worries me a bit. Any analytics will be at risk of doing this. In my desktop app analytics we blank IPs etc, but just storing some hardware data (ram amount, cpu freq, windows version, screen resolution...) means that we eventually have enough entropy to say with certainty that each user we have has a unique set of parameters in the data we log. It's almost impossible to NOT fingerprint perfectly if you gather even just basic hardware and OS info, for example. But there is of course zero possibility that we could use the data backwards and say "ok which single physical person is it that has a 16 core machine and 16Gb ram" making it "not PII"?

I think the key issue in these cases with GA is that it's more a chain leading to actual PII. E.g. the cookie value that GA has access to, can realistically be stored somewhere where there is also PII such as an email address. And that's enough to violate the GDPR.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: