Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Collecting the rhetorical BS:

"scraping attacks"

Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.

"self-compromised"

Monopolists want to sell you thus it's imperative they maintain the fiction of "one person, one account". By admitting you own your account, they'd have to allow sharing and they wouldn't be able to provide their customers (advertisers) with reliable data about individuals.

"protect people from scraping"

Monopolists will protect themselves and call it protecting you. They will attempt to make you afraid of some other actor using your data in harmful ways so as to detract from how they monetize you and use your data in harmful ways.

"deter the abuse"

Monopolists don't want to argue about what constitutes abuse. Anything they write in their TOS is entirely for their benefit and only constrained by local law (if that). They will abuse you to the fullest extent they can get away with while arguing that any action to use your rights is "abuse."

"safeguard people against clone sites"

Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.

--

More subtle but even more ironic rhetorical points

"for hire" / "paying for access"

Emphasizing that people making money (gasp) for providing this service, is bad.

"industry leader in taking legal action" + "across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society"

Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.



While I agree with your assessment of the BS in the article wrt scraping, and also agree with your assessment that the behaviour is completely about FB protecting itself and its monopoly control (the word control being important), I think its important to emphasize its not about FB caring whether other entities having access to the data, its about FB caring about it's public perception with regard to its having that data at all.

Over the last few years or so it feels like, to reference a @dril tweet[1], Facebook has just been 'turning a big dial taht says "data access" on it and constantly looking back at the audience for approval like a contestant on the price is right' with how much it allows 3rd parties to get at its data.

Keep in mind ~5 years ago the big thing at FB was "Open Graph" and "Graph Search" which gave everyone really in-depth access to their data with the idea that Facebook would be the "data platform" on top of which all of these 3rd parties would build apps and interfaces. This of course eventually resulted in the whole Cambridge Analytica thing and now this gigantic swing in the other direction of being overly protective of the data as a kneejerk PR reaction to all the bad press.

FB loved sharing data and provided a direct API for accessing it when the public narrative was about data freedom and 3rd party developer friendliness and it hates giving any access at all and goes around sues web scrapers now that the public narrative is all about privacy.

Facebook will happily align itself in whatever way results in the least public outcry arguing they shouldn't be allowed to have the data in the first place regardless of if that means giving access or restricting it.

1: https://twitter.com/dril/status/841892608788041732


The example you stated is a truly fantastic one. Graph Search was pretty much like a direct API into their front facing network.


Great post that summarizes exactly what I feel about globocorps. The euphemisms and propaganda are disgusting.


The users agreed to share their data with Facebook, not some other company. If they didn't prevent this, they'd be asking for another Cambridge Analytica


The users agreed to share their data with everyone that uses Instagram. Because that's how the site works.


There’s an important difference between technically consenting and informed consent.

Given what I know about the bot problem on Instagram, I would imagine many people have been tricked into sharing their private profiles with scraping bots. Many bots are copying real people’s profiles and then spamming their friends with follow requests. It’s highly effective and gives these bots access to private profiles.

Fooling people is fraudulent, period.


The user agreed in facebook to have is data "public", so it can't complain that a robot scrap it.

Nothing prevents him to restrict access to his pages an data to "trusted" friends.


The description in the article sounds like it scrapes private profile data.

> Octopus designed the software to scrape data accessible to the user when logged into their accounts


Were they showing the private data to everyone, or just to the person whose account was used for the scraping? If it’s the latter, then this is also not a crime, it is just someone accessing data they have been authorized to access, but in an automated way.


I don't think so, it is more like you scrape what is accessible to this user. So in the end you will scrape your friends data. This is why I said that you are free to only share with friends that 'you trust'.


That is a very good point, but surely it was taken into consideration when scraping was declared legal?


All that case says is "scraping is not a violation of the CFAA". But of course the scraped data still exists in legal limbo; maybe you can compute derived information from it, but the moment a scraper reproduces it there is all of copyright law waiting for them.


In that case, the user owns the copyright, not the company, as the user is the author. So it would be up to them to take legal action if deemed necessary.



The only argument I have here (sadly in favor of FB) is with "safeguard people against clone sites". While I did give my data to FB, I didn't approve that transfer to another site/system. That is the only place I could possibly see some legal foot hold.


What happens when FB builds a shadow instagram profile of you based on your FB account? That already happens. FB clones their own data for other projects no different than what you might fear happening if this data were cloned to a third party. The cat is out of the bag already but FB wants to pretend they are the only ones with the right to abuse.


It's impossible to control information once been created. The longer it's existed and the more locations you can see it make that spread exponentially more likely.

Wehether we make that spread of informationlegal or not does little to affect whether it happens.

There are two things that might help. First, don't share as much information. Once it's no longer limited to you or your close group of friends which hopefully won't share it along with your name, it's mostly out of your control. Second, put limits (laws) on what information companies are able to synthesize about you, and how long they can retain it. If there's less information created about you (or it's ephemeral, created and destroyed as needed), and if they need to clean out older data, there's less to be shared or stolen.


“It’s hard to enforce the rule of law” is not a good reason to abandon it entirely. Data privacy laws make data privacy better even without being 100% infallible.

We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.


> “It’s hard to enforce the rule of law” is not a good reason to abandon it entirely.

I didn't?

> We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.

That's what I said. The first thing is data hygiene, the second is legal requirements. The difference I think is that the legal requirements should be on the actual creation and retention of the data, not just who owns it, who it can be shared with, etc.

As soon as PII information over a certain age is radioactive and linked to a fine per person, all of a sudden there'll be a lot less giant repositories of PII to worry about.


they also toss in the chinese affiliation in hopes to bring even more ill will from the reader towards the company. china is probably doing some bad things, but scraping facebook ain’t one of them.


Scraping social media is something that China is very notorious for doing. They are 100% positively scraping all major social networks around the world.

They do this to collect information of foreign policy interest to them, to silence political dissidents abroad, etc.

For example: https://www.washingtonpost.com/national-security/china-harve...

And: https://www.propublica.org/article/even-on-us-campuses-china...


Good point, I missed that one.


I don't get the thing about "monopoly".

Let's start with one thing: copyright on databases. Take IMDb: they collect and combine totally open data on movies cast, crew, soundtracks used and so on. Everyone can go to the cinema, wait until movie ends, write down data from credits roll and put it on the database. There's no prohibition on this activity. Cinema may prohibit filming inside, but not using pencil on paper. Or you may buy a DVD released later, and do just the same. Or you may even write a movie company email asking for those data in electronic form and chances are they will send it to you or point to some promo materials website where it is published already.

But the entire database is a product of work, and that makes it valuable. So the company or organization spent time and money collecting, indexing and cross-linking those data, and has a right to bank on that work. Easily copying that database for commercial purpose _is_ stealing. This is why we have a database copyright laws.

Now back to Meta. They created this product and made it attractive enough so people are adding their data voluntary. Every single piece of data is quite open (maybe not really so for personal bits like face photos, emails and phone numbers). Meta spent a lot of cash making and keeping product that attractive, and now banks on those collected data by targeting ads.

Nothing in the world prohibits everyone else to create a service, make it valuable, attract people, collect data (according to data collection laws) and bank on that. But just copying data collected my Meta is stealing, and Meta is in its own right to protect it. The fact that Meta did it before doesn't makes it monopolist. In fact, there are lots of companies doing the same, like Google, Amazon, Apple, eBay etc. So in my opinion it is not a monopoly defending its' position, but rather business defending its' assets from stealing.


Missed this one:

> a US subsidiary of a "Chinese national" "high-tech" enterprise

Replacing it with "a business" would do just fine.


Indeed. It's the height of hypocrisy for a company to define the borders of its own system and then prosecute those who they consider in violation of them. There is no consideration given to whether the data should have been collected and retained by Facebook in the first place, regardless of whatever arbitrary access policies they defined to fit their own business and data model.

It's not clear what Facebook's position on scraping truly is. Sometimes they downplay it as "normalized and widespread," and other times they castigate it as inexplicably legal and clearly immoral, or even outright "in violation of state and federal law." For example:

- April 2021. Researchers find an exposed database containing the scraped data of 533 million facebook users. Some news reports refer to it as a "breach." Facebook attempts to downplay the issue as the result of third party scraping. Headline in ZDNet: "Internal Facebook email reveals intent to frame data scraping as ‘normalized, broad industry issue’" [0]

- October 2020. Facebook announces lawsuits against companies it claimed created a "malicious extension on Google’s Chrome Web Store designed to scrape Facebook, in violation of Facebook’s Terms and Policies and state and federal law." [1]

So... which is it? Does Facebook believe that scraping is a "broad, normalized industry issue?" Or is it a violation of "state and federal law?" It seems like they measure severity of its impact primarily based on the reactions of political commentators.

And what's the difference between automating a browser and automating an API client? Why did Facebook design an API for accessing the data they collected, if it's illegal to collect? They've even claimed to be the victim of Cambridge Analytica, who purchased a "quiz" application created by a developer who pieced it together using code straight from the "examples" section of Facebook's API documentation.

There is one obvious resolution to this apparent contradiction. If we remove Facebook from the question, then the contradiction resolves itself. All we need to do is stop presuming that Facebook has the right to collect and retain this data in the first place. And as a user, if you publish your data to a website designed for sharing it with other people, then by definition it is no longer private data. Therein lies the central question: what is "semi-private" data, and who controls its boundaries?

[0] https://www.zdnet.com/article/facebook-internal-email-reveal...

[1] https://about.fb.com/news/2020/10/taking-legal-action-agains...

p.s. another thing they never mention is why companies want to scrape lists of facebook users. perhaps it might have something to do with the "lookalike audience" feature, and its more precisely targetable predecessors, which allow advertisers to upload a list of usernames and email addresses for targeted advertising?


[flagged]


I've reread the previous comment and I really don't see where there is any justification stated for acting in an unethical manner. While Facebook may be making an argument against unethical behavior by a few, using the language they do is detrimental to legitimate uses of crawling content available on the Web.

Corporations, by nature, work in a way that individuals at those companies don't. They are literally "non-corporal" entities and work toward increasing profit and stakeholder value, not improving the lives or situations of their users, unless that happens to correspond to making them more money.

We should all be wary of corporate control and claim to rights built from their user base, especially if those services are offered for "free".


We should all be wary of corporate control and claim to rights built from their user base, especially if those services are offered for "free".

That's fine then. And I agree with you. But leave you with this.

Do. Not. Give. The. Company. Your. Data.

They are literally "non-corporal" entities and work toward increasing profit and stakeholder value

Again, I agree. But if you think this is a bad thing, then you don't believe in capitalism, and I'm not quite sure what the intention is to argue this point on a platform (HN) that encourages the most basic forms of capitalism - starting up companies with innovative technology and solutions.


What a pretty picture capitalism is. Break out the popcorn for the latest regular installment of “ok for me but not for thee”:

People You May Know employs tons of shady stuff Facebook doesn’t reveal and has saved their bacon early on from stagnating at around 100M users.

https://mashable.com/article/people-you-may-know-facebook-cr...

Facebook Beacon and others had a big outcry. They got hauled into Congress multiple times. And of course whenever they get caught, they always throw a “mea culpa” and do it all over again in a year under a different name. Here they are recording faces of their users secretly using camera permisions!!

https://www.independent.co.uk/tech/facebook-app-recording-ca...

Their entire business model is “Give us all your data for free.” Mark Z early on was flabbergasted himself when he realized he no longer needed to scrape sites on Harvard’s house websites and could just ask people to submit the data for each other: “They ‘trust me’, dumb fucks.”

https://www.esquire.com/uk/latest-news/a19490586/mark-zucker...

Proceeds to build entire business on this data…

BUT THEN. Someone else does it to them and they get mad. “You can’t scrape us!” LinkedIn tried this:

https://www.zdnet.com/google-amp/article/court-rules-that-da...

And it’s not like capitalist enterprises even try to be consistent in their legal complaints:

https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-hypo...


The problem isn’t “capitalism”, it’s crony-capitalism enabled by certain elements of state complicity.


Okay, is there a single problem with capitalism, or is it perfect? The problem is never w capitalism?


Yeah, nothing says "commie" like trust busting and keeping markets competitive.


Cough, cough, Google, cough, cough…

I’m not ashamed to admit that I’ve done some jquery shenanigans on my Facebook friends page to “export” my friend list so I can retake control of my friend relationships (disintermediation for the in-crowd).

So easy to push data in to Facebook, so hard to get even basic data out of it.


In my opinion, breaking a click-through license agreement or violating the small print on some dense and difficult to read web page is hardly an issue of morality or ethics.

Let's also remember that a big reason Meta is hating on scraping is because of their own problematic behavior. It wasn't so long ago that they were suing NYU over research on political ads and how Facebook targets their readers.[0] In fact, it wouldn't surprise me if Meta's larger goal is to prevent this sort of research.

[0]: https://news.bloomberglaw.com/privacy-and-data-security/face...


Google search's business model is scraping the web, indexing it, and then pasting ads all over their search results made up of other people's content. If Google can build a business on third-party data then these meta scrapers can do the same thing.

It is like saying a photographer can't photograph a building from the street because she doesn't own it. The building is there, taking a picture takes nothing from the building. That is all that is going on here, repeating publicly available information.


No it's more like you subletting an apartment to a dodgy photographer who wants to take pictures of the children's playground your back window looks out on even though your contract explicitly forbids it subletting. The suit is against companies that use login credentials that are not theirs. It is not public information that is being scraped. It is information behind a login with a terms of service for what you are allowed to do with that login.


> the vast majority of Web scraping efforts are to build businesses on top of other organizations hard work and innovation.

Not really. Scraping just gets data, not code, so it's hard to support this argument. The anti-scraping view is that the right to use the data rests with the company that collected it, but I don't think that view is held by most people.


If you are arguing that an organization's data is worthless but only their code has worth, then I'm not quite sure where to go from this point in this discussion, other than to say that is crazy.


The data is obviously valuable, but they don't necessarily deserve a monopoly on that data, since that data primarily belongs to the users who created the data; so while it's understandable that organizations want to restrict that data, we have no obligation (moral or otherwise) to respect that desire.


Exactly. Your list of friends does not belong to Facebook, it belongs to you.

I am sure Facebook believes they deserve a monopoly for having obtained it first. They do not. The market forces you to compete for every dollar you earn, so you have every right to expect Facebook to compete for every dollar they earn, and "I touched it first therefore it's mine!" is not competition.


But, but, but..... you agreed that Facebook does own your friends list when you signed up for an account and started giving them all your data.

If I run a restaurant, and I stipulate that when you walk through the doors and place an order I reserve the right to take your picture and post it on the bulletin board, why would you place the order and then get pissed off when I post a picture of you on the bulletin board? And why would you be mad at me if I stipulated that no one else can use a camera in my restaurant? Terms of service, my friend. Unless prohibited by legislation, I can stipulate how things run in my restaurant.


If your bulletin board somehow let you monopolize the restaurant industry (? lol) then we should absolutely vote for some politicians to boot your entitled ass back into competition.

Obviously, the idea of a bulletin board granting a restaurant an effective monopoly is ridiculous so your analogy is trash, but even if your analogy wasn't trash, your conclusion would still be wrong.


I'm not saying that the data isn't valuable, but that possession of the data, valuable though it may be, is not related to the organization's hard work or innovation. For the most part, any control rights to the data likely rest or should rest with the people who provided it to the company.

Meta claiming that all of the photos on Instagram are Meta's property does not comport with current IP law or the views/opinions of most of the users on Instagram who do own the copyrights to those photos.

You really shouldn't be able to sue anyone for use or copying of data to which you do not hold copyright. The stuff on FB is licensed to FB by the people who own it (their users).


I don't sympathize with a monopoly that people are trying to weaken.

I loathe Meta and want to boycott it. Unfortunately this means I'm now locked out of the only repository of most local events and gatherings in my city.

In some countries, life is literally not possible without WhatsApp.

If Meta wants to cry about the mean bullies trying to exfiltrate data, they need to stop wiping out competitors.


> the vast majority (…) Period. End of story.

If you’re going to assert something as definitely true to the point of closing off discussion, I’d expect a modicum of evidence. At a minimum that you’d explain the reasoning behind your conclusion. What’s the source of the “vast majority” claim? There’s little point to advertising when you’re scraping a website for personal consumption, so it seems dubious anyone would have reliable numbers on which kind is more prevalent.


Regardless, it’s very rich that a company like meta is mad that they’re being beat at their own game (making money off of data that they obtained through shady means).


Sorry but there are many legitimate reasons to scrape a website. Price manipulation is one example. Because of scraping we know Amazon does things like price gouging and raising prices right before they go on “sale”. Scraping can be very useful for researchers to monitor trends and find correlations. It’s not just about bad guys stealing personal information. There are far to many legitimate uses that banning scraping would be a bad thing.


Pretty ironic that Mark Z himself started out exactly like this: scraping Harvard servers and photos to power facemash.

He subsequently realized that he doesn’t need to scrape if he can just make a viral site that lets people share this info with each other while he can eavesdrop on ALL OF IT:

https://www.esquire.com/uk/latest-news/a19490586/mark-zucker...


Nah, you are straight-up wrong. In fact, it’s the opposite - the only companies who are scared of scraping are the ones whose business models rely on artificial lock-in, and we should all be working as hard as we can to demolish them.


It's wild that people are arguing that their friend list should belong exclusively to facebook and not, you know, to them and their friends.


>the only companies who are scared of scraping are the ones whose business models...<snip whatever other nonsense followed>

This is just patently false. There is an expense incured by scraping. There is no benefit to a host providing the data from those scrapers. My logs are full of various bots that pull data from my webhost that costs me money to serve. I run various sites that do not serve ads. I do not include any 3rd party tracking. They're just simple sites that I pay for out of my own pocket because that what I've chosen to do. Nothing shady about any of it.

It's just sad that your own personal feelings towards scraping prevents you from being able to accept that there are people with views other than your own.


Hey, I totally accept people have views other than my own. I just disagree with them.

It seems extremely weird that you’d want to publish content, but then get mad that people are using the thing that you published. But you do you.


How is that weird? I publish on my site to have people visit my site. I don't publsh for people to take my data and do what they will without attribution for where they got the data. How that makes no sense to others has me saying please don't do you because you are being not considerate to others


> the vast majority of Web scraping efforts are to build businesses on top of other organizations hard work and innovation. Period. End of story.

Yeah and the vast majority of the internet and all these mega corps run on open source while paying pittance back to the ecosystem. Cry me a fuckin river.

Can't wait til someone sue's them for "scraping" their site for web previews and thumbnails everytime someone shares a link on Facebook.

The double standard of these muppets.


I disagree precisely for the simple reason that these businesses are using Meta's weapon against them. It will be an interesting battle to watch - and if my memory doesn't fail me, LinkedIn lost one already. The more the press writes about it, the better: (ordinary) people will sooner or later see through their doublespeak and realize what is at stake.


I feel the same way. My biggest pet peeve is that scrapers/bots traversing my site generates more data than the target audience of users. The scrapers get all of this data for "free" at my expense of the hosting costs to provide them that "free" data.


>the vast majority of Web scraping efforts are to build businesses on top of other organizations hard work and innovation

The vast majority of Facebook/Google's efforts are to build businesses on top of other organizations hard work and innovation.


And that’s the trick. You use the bad apples to delegitimise the good ones. Works every time.


[flagged]


If simp is supposed to be short for simpleton, you might want to consider how simple your thoughts are.



I can also link to a source that's going to be biased in my favor: https://www.etymonline.com/word/simp


> 1903

> 1640s

Lol, no. I'm using the definition from this century:

> Someone who does way too much for a person they like


What a pretty picture capitalism is.

“Give us all your data for free.” “They ‘trust me’, dumb fucks.”

https://www.esquire.com/uk/latest-news/a19490586/mark-zucker...

Proceeds to build entire business on this data…

“You can’t scrape us!”

LinkedIn tried this:

https://www.zdnet.com/google-amp/article/court-rules-that-da...

And it’s not like capitalist enterprises even try to be consistent in their legal complaints:

https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-hypo...


> I love to hate on Meta, but their actions here are spot on and make my morning very enjoyable as I sip my cup of coffee.

You might want to reassess your intelligence there friend. It seems to be suffering from a common form of cogntive dissonance combined with some form of confirmation bias.

How so?

Well you clearly don't like scraping, otherwise you wouldn't be agreeing with a criminal... So there's the confirmation bias...

Which is also the cognitive dissonance part. You clearly don't like Meta/Zuckerberg by your own admission; but you are agreeing with a empty rhetoric attack against people who are smart enough to make use of Zuckerbergs terrible security practices...

Do you not see the problem in this?


This is a total non-sequitur argument here. You've gone from accusing me of lack of intelligence to suffering from cognitive dissonance and confirmation bias, to Facebook's terrible security practices: simply because I'm pleased that an organization has taken action against Web scrapers for violation of Terms of Service.

Yes, I've gone on record indicating that I believe Web scraping to be generally unethical, and that I'm pleased that some action was taken against those that make it their business to do so. And that is all that I have stated in my OP. You've decided to take me on some circular mental gymnastics journey I'm still trying to wrap my head around.


Let me restate this how I view what you've stated: your position is that because Facebook has a Terms of Service that may define something that is not illegal - means that one must abide by it? Also... Facebook/Meta/Zuckerberg have lied over and over and over very publicly to get their way or to give themselves an advantage: by giving themselves unfettered and unwarranted access to data that they profit from by their own fast and loose rules.

If Facebook/Meta/Zuckerberg are OK with lying, stealing and cheating - then why should anyone leveraging their online properties need to abide? Until they're held accountable under broader rules I see no reason the consumption side can't bend them as well. And you may argue "this isn't how it works" but we all know this isn't how Facebook/Meta/Zuckerberg operate. They operate under the premise of: do whatever makes us money because breaking the rules is the cost of doing business. So, no - they don't get to spew propaganda to the advantage of their business under the guise of protecting users. That is complete and utter bullshit.


Thank you. This is pretty much what I am getting at as well, though in different words.

But of course, the general populous thinks it knows better than the people who actually know best. That being those of us who have been able to live our lives while learning from not just our mistakes; but others around them.

We are the rare and few; and considered the enemy to the mob. Good luck comrade.


Who is the criminal here? Scraping is not illegal. This is a civil suit, so even if Meta wins, it's still not remotely criminal for anyone involved.

Also please explain to me how someone giving a company their Facebook credentials is an example of "people who are smart enough to make use of Zuckerbergs terrible security practices."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: