Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Oracle’s BlueKai tracks people across the web – that data spilled online (techcrunch.com)
165 points by roldie on June 19, 2020 | hide | past | favorite | 89 comments


An important feature of Firefox that isn't nearly as well known as it should be is first party isolation.

It can be a bit annoying due to eg Recaptcha, but it works fine 99% of the time. Some sites will break, but these are far and few between. I just use another browser for those.

It was upstreamed from TOR and will scope all data (cookies, caches, etc) per domain. It's not a panacea, but definitely a good step.

(about:config , privacy.firstparty.isolate)

I feel like something like this should be the (standardized) behaviour of browsers. Sharing any kind of observable state between domains should require an explicit opt-in with a prompt.

Like: "Privacy Warning: techcrunch.com wants to share your data with facebook.com. Do you want to allow this?"

(such fine grained permissions are a complicated topic of course, but that's a longer discussion...)


Firefox's First Party Isolation can break some websites that might rely on third-party authentication. This may be acceptable for Tor users, but not regular users, which is why First Party Isolation is still off by default in Firefox. Mozilla is actively testing a variation called Dynamic First Party Isolation that will support those workflows.

For more details, here is the Firefox bug report:

https://bugzilla.mozilla.org/show_bug.cgi?id=1549587

To enable Dynamic First Party Isolation: in Firefox Nightly's about:preferences#privacy, select "Custom" Enhanced Tracking Protection and "Cookies" = "Cross-site and social media trackers, and isolate remaining cookies". Or just set `network.cookie.cookieBehavior` pref = 5 in about:config.


I just disabled third party cookies and everything works fine.


Disabling third-party cookies hasn't really been that useful in a long time though.

"When third-party cookies are allowed, 101 domains can reconstruct over 50% of a user's history and 161 could recover over 40%. Even when these cookies are blocked, 44 domains could recover over 40% of a user's history."[1]

[1] Study back in 2014: https://securehomes.esat.kuleuven.be/~gacar/persistent/the_w...

Add to that web beacons, ads, like buttons, browser fingerprinting via HTTP header, javascript, canvas, plug-ins, add-ons or hardware, keystroke biometrics and gestures on smartphones. Of course, there are also several techniques for e-mail tracking as well as tracking of documents (word, .pdfs etc).


Thankfully this became the default in Safari: https://webkit.org/blog/10218/full-third-party-cookie-blocki...

Previously it only enabled third-party cookies for domains you had explicitly visited. https://www.theverge.com/2017/9/14/16308138/apple-safari-11-...


Unfortunately, people are working out ways to get around this now that it's become a default.


Use a generic JS blocker. It protects you from workarounds and zero days.


It also blocks JS.


You say that like it's a bad thing?


Doesn't stop tracking pixels.


If you use umatrix and only allow 1st party cookies it should block those right? The pixel would load but without any cookie.


Tracking pixels don't need a cookie if they're using dynamically generated URLs.


umatrix can block third-party images as easily as it blocks cookies.


Sharing would just be routed through cdn.firstparty.com


Thanks very much. Useful


I worked at oracle on the bluekai product (search 'oracle' in my comment history). I helped deliver data using bluekai from oracle to platforms like facebook and google. AMA.

If you are curious what data is housed in bluekai, here is a 170 page pdf with lists and descriptions of different data providers/vendors: http://www.oracle.com/us/solutions/cloud/data-directory-2810...


What is the best way to opt-out / avoid tracking like this? The Oracle opt-out page requires providing your physical address / email / name and then only sets a cookie in your single browser.


A good adblocker and using this tool on all of your devices: https://optout.networkadvertising.org/?c=1

Its difficult to actually opt out in full, but we will definitely get to that point in the future. Also if you use niche hardware/software (like netscape on an old linux distro) you're data will just get cleansed out. Nobody does fingerprinting unless you're a spy or in charge of purchasing for a hospital or something, so I wouldn't worry about that.


> Nobody does fingerprinting ... so I wouldn't worry about that.

I'm not sure how long you've been out, but this has changed a lot recently. Many adtech companies (not us) have responded to ITP by building out fingerprinting to continue personalization.

(Disclosure: I work on ads at Google)


Yeah that makes sense. As far as I was aware only the low-tier dmp/cross-device/measurement vendors did, but all the top tier dsps vetted their vendors and don't allow that.


I recommend ublock origin.

When I checked with the disclosure page (give me everything you have on me), after disabling ublock origin for that page, it said they had nothing on me. The truth of that statement is debatable, but it seems to be effective.


pfsense + pfBlockerNG


Is data of European citizens processed? The site says it's not, but how do they know? GeoIP?


From the article:

"One record detailed how a German man, whose name we’re withholding, used a prepaid debit card to place a €10 bet on an esports betting site on April 19. The record also contained the man’s address, phone number and email address."


A VPN based in Europe seems a good option is this is the way.


I don't know the current answer, but the vast majority of companies just turned off all operations to avoid an GDPR mess.

I don't know how it was turned off though. Likely ip.


> the vast majority of companies just turned off all operations to avoid an GDPR mess

Are you saying that regulation is effective? :)


Yeah, from the perspective that tons of companies stopped doing business in Europe


Can't think of a single company which did that and had any impact on me. I'm not able to read some news on a local foreign website once every few months and that's kind of everything.


Sure, as long as we're not insinuating that's an inherently undesirable thing.

We're quite literally on a thread involving a product that collected tonnes of sensitive user data without a clear opt-out mechanism while failing to be an effective custodian, after all.


Sounds like the next wave of startup opportunities to me. "Obeying The Law And Not Being An Asshole As A Service."


Do they store data forever, or does it eventually cycle out for cost reasons?


There is a metric called time to live (ttl) that cycles users out of an audience after x days. Technically they can live in bluekai forever, but pretty much all platforms default ttl to like 120 days and then advertisers pare it down even more.

Oracle's data transfer/storage costs are meaningless since this business unit in oracle just exists to establish market share - they're not trying to make money yet. Other company's data costs are what cause oracle to store/deliver less.


When I tell people I don't like tracking the defense of all this nonsense is "People prefer getting relevant adverts". The campaign to normalize this is so good I often here consumers repeating it.

But the inherent assumption is that the data is only used for benign ad targeting and ignores the possibility that the data will end up in the wrong hands like it has here. The response seems to be "{{ad tech company}} has some of the best developers in the world and they want to protect that data!". Except leaks like this are inevitable. So long as tracking is a thing, data will leak, and hostile actors will abuse it.


> The campaign to normalize this is so good I often here consumers repeating it.

The campaign to normalize thinking of onesself as a consumer is so good, I often hear my fellow citizens replace the noun "people" with the noun "consumers".


Touché. But in my case only when I'm thinking about this topic.


When I tell people I don't like tracking the defense of all this nonsense is "People prefer getting relevant adverts"

I then ask them, "Is it working? Do you really see advertisements online that mostly show you things that are meaningful to you? Are online ads more relevant to you than what you see on TV or hear on the radio?"

The honest always answer "No."


How many ads for hearing ads you have seen on TV? And how many ads for hearing ads have you seen online?

The honest answer is a lot, and none.


> How many ads for hearing ads you have seen on TV?

When I watch Redbull TV I see zero hearing aid adverts. That's about the only commercial TV I watch. Likewise when I more frequently watched commercial TV 15 years ago, I didn't see a ton. In the years since, the demographics of network TV viewership have shifted massively and the over 65 crowd is massively over-represented. As demographics switched advertising switched.

I suppose when I occasionally go to CNN or CNBC for news, I see hearing aid advertising, but the sites I generally frequent (should) know their readership better than blast us with that sort of advertising.

Which is ultimately the point context based advertising delivers 90% of the benefits of target/ tracking based advertising without the performance degradation, risks, and other nonsense.


That all makes sense, but your missing the monetary aspect. If I know NYT readers are my target market, I want to advertise on the NYT. Except thats absurdly expensive so I can just reach users I know read the NYT, but are currently on other websites. Same audience, much cheaper.

There are a lot of caveats like this where price and business reasons play a role. If it weren't a $20B industry then your logic would probably ring more true.


Making something less expensive to advertisers is not my problem. People sucking up piles of personal information about me is my concern.

What you are missing here is by giving people the ability to advertise to New York Times users offsite takes income from the Times. If you read the NYT or you are the New York Time, this should be concerning. As a reader, you want the the value of their brand to flow back to them so they can continue doing better work.


TL:DR If you want to do any good to a newspaper, buy a subscription.

Advertising hasn’t been enough since the late 2000s. More monetization options arised and were rapidly adopted because news venues were hungry for them, when they figured that reservation (buying ads on a specific website with no control on delivery and performance) was declining. The creepiness of all this, and the sense of entitlement of a large part of the audience, fueled the rise of Adblock, which required even more creepiness and gave birth to the present cat&mouse game.


> If you want to do any good to a newspaper, buy a subscription.

My point wasn't that you shouldn't subscribe to the content you want—you should. It was that I have zero interest in sacrificing anything to lower advertising costs for some random company. Nor do I care to increase Google/ Facebook's earnings potential because they spy on me.


I see more advertisements that are relevant to me I would otherwise. Often they are just off the mark though, often I see advertisements for products similar to one I just bought. But most people only need one lawn mower, one circular saw, etc etc.

Overall, I don't see it as being remotely worth the trade-off.


> data will leak

It doesn't even have to leak. HR departments and shady government agencies could just buy it (or legally seize it). PR firms could dig through protester history and smear them so very easily.


> "People prefer getting relevant adverts"

that's a script.


> But BlueKai also uses more covert tactics like allowing websites to embed invisible pixel-sized images to collect information about you as soon as you open the page — hardware, operating system, browser and any information about the network connection.

View-source

Ctrl-f "pixel"

:-)

This seems really odd to include, given Techcrunch's use of pixels, and whatever their own covert pixel tactics may be...invisible pixel images are part and parcel when you start talking about pixels in general. (Edit: The author seems to have addressed this)

Also, is that _Kai_ as in the Japanese word for ocean? Or is it _Kai_ as in meeting?

Perhaps the entire word is a Japanese derivative, with buruukai meant to connote a gathering of nervous, fearful Oracle executives? Not the most auspicious reading...


To be fair, the author does touch on this, giving the fact that this very article has a Bluekai tracker as an example of its prevalence.


Thank you, I missed that on my first pass.


I know the founder of BlueKai. The name refers to "blue ocean" as in the blue ocean strategy of business where you enter or create an entirely new market rather than competing with everyone else in a saturated one (which is called red ocean).


This seems really odd to include, given Techcrunch's use of pixels

So because TechCrunch does something bad, that makes Oracle doing the same thing not bad?

Yes, it's hypocrisy at the macro level. But it's not an excuse for either party.


This "commentary" is really common on HN. Perhaps, to add "integrity" for cynical HN readers, articles critical of tracking and advertising should be published on pages that do not include the means to access trackers or ad servers.

Not sure how that changes the effect of the article. One would think the content of the article is what matters.

Perhaps this is just another example of shooting the messenger.


My guess is that there was an Oracle MBA type who thought they were being very clever by not saying "Blue Ocean," but "Blue Kai."

https://www.blueoceanstrategy.com/what-is-blue-ocean-strateg...


Oracle bought BlueKai in 2014.


That's useless. If you have javascript, just make the request directly, why pixel? The facebook pixel is slightly above that and is done right.


pixels work with javascript disabled...


Oh, BlueKai is only the tip of the iceberg. I used to work for a spammer (that called itself an "incubator") that used blukai among other better sources.

3rd party cookie impressions can be sent to a company like LiveRamp to identify what websites users have visited (in theory only in buckets, but that's trivial to bypass). With the exception of the very largest websites, you could see traffic to many, many websites and tie it back to an individual user (even if their IP changed, etc.).

Companies like FullContact and People Data Labs offer the ability to take the few pieces of information you have a user and get their other info (eg; verify email using name and address, or get social media profiles, etc.).

The problem is no matter how good you are about using VPNs, clearing cookies, etc. the weak link are the websites you're forced to give some info to. Many of them will well you out even if their TOS say otherwise.


> I used to work for a spammer (that called itself an "incubator")

CogoLabs?


small world ;)


I was there about 10 years ago, and wrote a bunch of the initial email stuff. At the time, we sent email only for things that people explicitly signed up for (Groupon competitor), but...


Yeah, that changed completely haha.


"Many of them will well you out even if their TOS say otherwise."

Can you prove that?


If you want to check out what bluekai knows about your browser: https://datacloudoptout.oracle.com/registry/


It should be known that this tool is purposefully never updated and knowingly inaccurate.


It's probably more accurate to check the leaked data than this tool >_<


If you click through to request your offline data it appears that their regex to validate DOB is faulty and it fails the identity check requiring you to have to start a letter campaign to maybe get access to the data they hold about you.

  (0[1-9]|1[0-2])/(0[1-9]|1d|2d|3[01])/(19|20)\d{2}
If the day you were born is between 10-29 it will not validate as a proper day.


Ha ha! I disabled ublock origin, reloaded and it told me I have nothing.


Multiple Families / Single Family / Married / Single, and I apparently live in 3 different states. It did get the "Chrome" part right, though.


What kind of information do others see? I am currently getting: "No data available for this browser."


For me it just keeps spinning at "loading your data" with a bunch of CORS errors in the developer console.


Now that you have provided coordinates, you have to wait for the satellite to pass over for the data to populate.


I have a paid Chrome extension and recently implemented an uninstall survey to see why people are leaving. For people who indicate that cost is an issue, we explain that many "free" products are either supported by advertisements or by selling user data. We then ask people if we offered such a "free" option if they would choose that over our paid option (which costs less than $2/mo).

Almost all of the respondents indicate that they would choose the "free" version. I was surprised that the feedback was this one-sided, especially after making the tradeoff salient (and especially for a browser plugin that sees every page you visit). Apparently most people just don't care about their privacy.


There are some very important distinctions there.

Maybe you said your free version would use targeted ads - which I'm fine with personally.

It's selling data or being dishonest about how the data is used that I have a problem with, so maybe the problem is that your users trust you so they are comfortable with you having and using their data.


How, exactly, are the ads supposed to be targeted if data isn't being sold? How is Fiat supposed to learn that you are interested in buying a car, if there's no tracking and sale of your browsing history? Not saying that being tracked is good. Just that saying "I'm ok with targeted ads, but not ok with tracking" doesn't make a lot of sense.


Ad personalization can be handled by the ad network. The network contract they are interested in buying a car, and Fiat can request their ads be shown to people interested in buying a car. Fiat only learns that you are in the "people interested in buying a car" category if they win the bidding to show you an ad. Not great, since showing an ad, especially well out of the viewport, is pretty cheap.

We can take this one step further with the proposed TURTLEDOV browser API (https://github.com/michaelkleber/turtledove), at which point the ad network doesn't learn your interests and Fiat only learns you were interested in buying a car if you click on an ad.

Or with the proposed FLoC browser API (https://github.com/jkarlin/floc) where your browser uses on-device clustering to present an interest category instead of the ad network learning interests server-side.

(Disclosure: I work on ads at Google, and am friends with the folks behind these proposals)


In my survey, I didn't say anything about the type of ads or how we would use the data. We are very trustworthy, and part of that is that we don't monetize user data in any way. But of course if we created a free tier that was funded by monetizing user data, this would no longer be the case!


Privacy has no "cost" whereas paying $2 right now is a very understandable cost.

Also, it's not $2 but $50, for users wary of automatically extending monthly subscriptions.


This is a small eco chamber after all. The billions willingly share all their “private” stuff on FB and Google, which are concentrating orders of magnitute more data than any legit or fraudulent data company on the planet, and putting them out of business in the process.


also, your respondents might be a self-selecting population.


I think a lot of this comes down to the fact that so many people are tracking where you go online that there's not much point in trying to stop it.


The wide range of data the article hints at and the guesses at connecting data to users suggests that this data is probably full of inaccuracies. That is one of the really scary things - companies are making decision about me based on outright wrong data.

This can be brutal for applications like credit worthiness, but I'm still worried about mistakes in data used for more mundane decisions like who to offer a discount to or which passenger to bump to business class.


This type of data isn't really used for business decisions, since an audience like "good credit score" (highly highly regulated and almost never used, fyi) contains like 150MM unique identifiers (cookies, device ids, etc).

Everyone using this data knows it contains inaccuracies, but it is much much better than not having any data at all.


Through no fault of the author, he is confusing how bluekai operates. He talks a lot about the sanitized huge anonymous audience (like in-market for cars) but then skips to user-specific data (Steve who spent $10 gambling on 4/19.)

These two things, 3rd party and 1st party data, are very different. Stored differently, different user access, different rules, etc. If Oracle accidentally made all their client's 1st party data public they'd get sued out of existence.


Yes, although at this point I expect any adtech articles to be 90% inaccurate.

The real issue is the poor security practices of this company, especially under Oracle's watch, and how much data is shared by partners. BlueKai can't get personal details unless someone shares it with them.


Agreed. Oracle sends their dataset to like 200 different partners and it’s in a constant state of being updated, so it’s not surprising one of those was misconfigured.


> If Oracle accidentally made all their client's 1st party data public they'd get sued out of existence.

Historically speaking, I don't know any entity that has been sued out of existence for these kinds of things. Usually after the media cycle everything is back to normal.


> There’s no big conspiracy. Ad tech can be creepily accurate.

> Tech giant Oracle is one of a few companies in Silicon Valley that has near-perfected the art of tracking people across the internet. The company has spent a decade and billions of dollars buying startups to build its very own panopticon of users’ web browsing data.

Doesn't that exactly describe a big conspiracy?


as if we need more reasons to hate Oracle


Anyone know where to find the dataset?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: