Hacker Newsnew | past | comments | ask | show | jobs | submit | memexy's commentslogin

I think there is a solution to this problem. If moderator decisions are made and recorded publicly then the data can at least be analyzed objectively. If there is indeed a bias then someone should be able to sit down and do the statistical analysis and show that "Yes, X type of stories / comments are more consistently flagged / removed / downvoted / etc." or "No, there is actually no bias in this instance".

I think there is contention right now because moderator decisions are opaque so people come up with their own narratives. Without actual data there is no way to tell what type of bias exists and why so it's easy to make up a personal narrative that is not backed with any actual data.

User flagging is also currently opaque and a similar argument applies. If I have to provide a reason for why I flagged something and will know that my name will be publicly associated with which items I've flagged then I will be much more careful. Right now, flagging anything is consequence free because it is opaque.


I completely understand, believe me I get it—but based on everything I've seen, it's a hopelessly romantic view. If I've learned one thing, it's that people are going to "come up with their own narratives", as you aptly put it, no matter what we do. Adding energy into that would only create more pressure and demand on a system which is maxed out already.

Making this mistake would lead to more argument, not less—the opposite of what was intended. It would simply reproduce the same old arguments at a meta level, giving the fire a whole new dimension of fuel to burn. Worse, it would skew more of HN into flamewars and meta fixation on the site itself, which are the two biggest counterfactors to its intended use.

Such lists would be most attractive to the litigious and bureaucratic sort of user, the kind that produces two or more new objections to every answer you give [1]. That's a kind of DoS attack on moderation resources. Since there are always more of them than of us, it's a thundering herd problem too.

This would turn moderation into even more of a double bind [2] and might even make it impossible, since we function on the edge of the impossible already. Worst of all, it would starve HN of time and energy for making the site better—something that unfortunately is happening already. This is a well-known hard problem with systems like this: a minority of the community consumes a majority of the resources. Really we should be spending those making the site better for its intended use by the majority of its users.

So forgive me, but I think publishing a full moderation log would be a mistake. I'll probably be having nightmares about it tonight.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

[2] https://news.ycombinator.com/item?id=23656311


I merely outlined what I would want if I was a moderator. I would rather receive email with statistical analysis than be compared to Hitler and Stalin without any data to back it up. It would be way funnier if someone proved statistically that I was Hitler and Stalin at the same time. They'd have to go through a lot of trouble to actually do that and if they managed to do so then that would be some high art.

Any complaint without data to back it up would be thrown in the trash pile.

In any case. It's a worthwhile experiment to try because it can't make your life worse. I can't really imagine anything worse than being compared to Hitler and Stalin especially if all that person is doing is just venting their anger. I'd want to avoid being the target of that anger and I would require mathematical analysis from anyone that claimed to be justifiably angry to show the actual justification for their anger. Without data you will continue to get hate mail that's nothing more than people making up a story to justify their own anger. And you have already noticed the personal narrative angle so I'm not telling you anything new here. The data takes away the "personal" part of the narrative which I think is an improvement.


Alas, there's no way to avoid being the target of that anger. It's inevitable in the system. You're right that one has to develop strategies for managing one's reaction to it. I don't think requiring a mathematical analysis would work in my case. It might not work in any case; I expect most angry people would probably get angrier if told that their anger is invalid because not backed up mathematically, and the dynamics of escalating mass anger could end up destroying the whole system.

There's a deeper issue though. Such an analysis would depend on labeling the data accurately in the first place, and opposing sides would never agree on how to label it. Indeed, they would adjust the labels until the analysis produced what they already 'know' to be the right answer—not because of conscious fraud but simply because the situation seems so obvious to them to begin with. As I said above, the only people motivated enough to work on this would be ones who would never accept any result that didn't reproduce what they already know, or feel they know.


Hey dang, first, I appreciate all of your comments in this post and your deep commitment to both HN and the state of online discussion. I'm learning a lot from reading your comments.

I'm curious if given all that you've shared, you think it's even _possible_ to scale a "healthy" discussion site any larger than HN currently is? It's clear that HN's success is in no small part due to the commitment, passion, and active participation of the few moderators. Contrast that with some of the top comments, which describe how toxic Twitter is, and I wonder if there's some sort of limit to effective moderation, or if we just haven't found more scalable solutions to manage millions of humans talking openly online sans toxicity? cheers


I'm not dang, obviously, but I think it's a testament to his hard work that HN functions even as well as it does.

Most sites its size are far, far worse, I think.

I personally believe that is due to human nature.

I think that is what dang has observed and is trying to articulate - no matter how smart or rigorous or mathematical you are, you still are human and thus subject to the human condition.

One way that manifests is this persuasion that the Other is winning the war (and that there is a war, for that matter).

I take it as almost axiomatic that a site with Twitter's volume cannot be anything but the cesspool it is.

It's too big for a single person to even begin to read a statistically-significant fraction of the content.

That means moderation is a hilariously-stupid concept at that scale. Any team of moderators large enough to do the job will itself suffer the fragmentation and conflicts that online forums do, and find itself unable to agree on what the policies should be, let alone how they should be adapted in contentious cases (and by definition, you only need moderation in contentious cases).


I agree with a lot of this, but the things you're talking about already apply to HN, so the argument can't really be used to show that a larger site would necessarily fail to work as well as HN (however 'well' that really is—I'm not making any grand claims here).

For example, the human nature you're talking about is by far the strongest force on HN, and the scale (though tiny compared to Twitter or Facebook or Reddit) is already beyond what one would suppose possible for a forum like this.


Very good point.

I would agree that HN is far too big for moderation alone to save it, though I hadn't quite put that together when I wrote my first post.

I think pg's original guidelines managed to capture enough of a cultural ideal that much of the original culture has been preserved organically by the users themselves (though I'm not qualified to speak to the culture of the early years, or how much it has changed since then).

You and the other mod(s?) have done a great job of being a guiding hand, and of understanding that it's too big for anything other than a loose guiding hand to be relevant, from a moderation perspective. You can remove things that shouldn't be discussed, show egregious repeat offenders the door, and encourage people to behave well and be restrained (in large part by example).

Twitter is so much vaster, and grew so fast, that even a guiding hand and good founding culture could not hope to save it. I suspect the way its design encourages rapid-fire back-and-forth also really hurts the nature of interaction on the site.


Hey, when you say “already beyond what one would suppose possible.” Could you describe it. And to promise that this in good faith - a personal example of when you stood on the precipice and saw the scale of the yawning depths below.

I’ve found that clear vivid examples from people are crucial torch lights which can be shared around to give people a snap shot into what mods feel or witness. This then allows the conversation with non mods to progress faster, since this type of story telling is what people are best optimized to consume.


I don't mean anything fancy, just that HN is a large-ish (millions of users, but not tens of millions) completely open, optionally anonymous internet forum, and it's not obvious that one of those could function as well as HN does. When I say "as well", I don't mean "well". This place has tons of problems. But it could be a lot, lot worse, and the null hypothesis would be to expect worse.

I wrote about this a bit here: https://news.ycombinator.com/item?id=23727261. Shirky's famous 2003 essay about internet communities was talking in terms of a few hundred people, and argued that groups can't scale beyond that. HN has scaled far beyond that, and though it is not a group in every sense of that essay, it has retained, let's say, some groupiness. It's not a war of all against all—or at least, not only that.

As we learn more about how to operate it, I'm hoping that we can do more things to encourage positive group dynamics. We shall see. The public tides are very much against it right now, but those can change.


Oops - s/adapted/adopted/.


Not Dang.

No.

We haven’t really found it before the internet (these problems are endemic to human/sentient nature.)

The internet only makes things industrialized.

There are things you can do, that reduce the number of friction points, thus making it possible to self govern

1) narrow topics/purpose - the closer to an objective science the better.

2) no politics, no religion - as far as possible.

3) topic should not be a static/ largely opinion oriented. More goal driven, with progress milestones easily discussed and queried (lose weight, get healthy, ask artists, learn photoshop.)

4) clear and shareable tests to weed out posers - r/badeconomics, askhistory

5) strong moderation.

6) no to little meta Discussions

7) directed paths for self promotion

8) get lucky and have a topic that attracts polite good faith debaters who can identify and eject bad faith actors (the holy grail.)

Each of these options removes or modulates a source of drama. With enough of them removed, you can still get flame wars, but it will be better than necker having done these before.


I don't know. I think it might be on the unlikely side of possible, but I don't have any compelling reasons for saying so. It would require a lot of learning, but it's possible to learn. It's hard and wouldn't happen by default though. You'd be attempting to induce a group into functioning in a way that that no group that large has ever done before.

NateEag makes some good points in the sibling comment. You'd have to create the culture at the level of the moderation team, and that's not easy. The way we approach this work on HN has aspects that reach deep into personal life, in a way that I would not feel comfortable requiring of anybody—nor would it work anyhow. If you tried to build such an organization using any standard corporate approach it would likely be a disaster. But maybe it could be done in a different way, or maybe there is an approach that doesn't resemble how we do it on HN.

Would it be possible with the economics of a startup, where the priority has to be growth and/or monetization? Probably less.


Absolutely not.

There are 2 mods running HN. Responding to people is TAXING - as in its hugely costly. And it has some terrible edge cases which destroy the process:

The costly occasions are when you meet people who are either

a) Angry

b) Rule lawyers

c) malignantly motivated

AT this point their goal is to get attention or apply coercive force on the moderation process.

These guys are an existential threat to the conversational process and one of the win conditions is to get people to turn against the moderators.

Social media is a topic that HN gets wrong so regularly, and without recourse to research or analysis so frequently that I would avoid discussing moderation in general here.

The fact is that if people are arguing in good faith, we can have some amount of peace, and even deal with inadvertent faux pas and ignorance, provided you never reach an eternal september scenario.

But bad faith actors make even this scenario impossible.


If you know of research or analysis that is essential on this topic, please tell us what it is. I'd like to be sure I'm aware of it, and other readers would surely be interested also.


Hmm. Given the broad range of topics "social Media" covers, there are vast numbers of papers on it.

For people who have NEVER thought of social networks and conversations online I find this site to discuss some of the blander but more game theoretic elements of networks/trust and therefore online conversations:

https://ncase.me/crowds/

https://ncase.me/trust/

-----------------

For you guys (HN Mods) I'd bet that you in particular are abreast of stuff.

- I'd ask if you have heard/seen Civil Servant, by Nathan Matias - its a system to do experiments on forums and test the results (see if there is a measurable change on user behavior)

https://natematias.com/ - Civil Servant, Professor Cornell. He probably has an account here

https://civilservant.io/moderation_experiment_r_science_rule...

- Books: Custodians of the internet.

------

Going through some of the papers I have stocked away, sadly in no sane order. I can't say if they are classic papers, you may have better.

- Policy/law Paper: Georgetown law, Regulating Online Content Moderation. https://www.law.georgetown.edu/georgetown-law-journal/wp-con...

- NBER paper on polarization - https://www.nber.org/papers/w23258, I disagreed/was surprised by the conclusion. America centric.

- Homophily and minority-group size explain perception biases in social networks, https://www.nature.com/articles/s41562-019-0677-4

- The spreading of misinformation online: https://www.pnas.org/content/113/3/554.full

- The Uni of Alabama has a reddit research group, - https://arrg.ua.edu/research.html, they have 2 papers. One of which explores the effect of a sudden influx of new users on r/2xchromosomes. https://firstmonday.org/ojs/index.php/fm/article/view/10143/...

-policy: OFCOM (UK) has a policy paper on using AI for moderation https://www.ofcom.org.uk/__data/assets/pdf_file/0028/157249/...

- Algorithmic content moderation: Technical and political challenges in the automation of platform governance - https://journals.sagepub.com/doi/10.1177/2053951719897945

- The Web Centipede: Understanding How Web Communities Influence Each Other Through the Lens of Mainstream and Alternative News Sources

- Community Interaction and Conflict on the Web,

- You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech

Papers I have to read myself,

- Does Transparency in Moderation Really Matter?: User Behavior After Content Removal Explanations on Reddit. https://shagunjhaver.com/files/research/jhaver-2019-transpar...

- Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms: https://journals.sagepub.com/doi/abs/10.1177/146144481877305... (I need to read that paper, but I expect it to be a good foundation of knowledge and examples)

Other stuff:

- The turing institute talked about Moderators being key workers during COVID - https://www.turing.ac.uk/blog/why-content-moderators-should-...


Hey really big thank you for posting this! I'm working on a new discussion site and so much of this is pertinent. I may return with some follow-up commentary once I've read through some of it, thank you.


NP. If you find any interesting papers, do share.


What papers/articles/sources do you guys read?


Not just made and recorded publicly, but easily searchable and aggregated.


Yes. That's what I mean. If there is an API then we can use mathematical models to answer questions about bias or lack thereof.

I also don't think that it's possible to have any forum without bias so the data I'm certain will indicate bias but at least it will be transparent and obvious so people can point to actual data to make their case one way or the other. It's hard to improve a situation if there is no data to point to and argue about. Without data people just tell stories about whatever makes the most sense from whatever sparse data they have managed to reverse engineer from personal observations.


> What can be known in a system without rigor? That’s the question to make rigorous, I think.

Who is working on making that rigorous?


I think adding "Most Favorited" would create a popularity contest and people would start looking for ways to game the system. I don't think favorites should have metrics associated with them because as soon as metrics are introduced people will try to optimize them.

Now that I know comments can be favorited I plan to bookmark comments that include useful reference information on topics I find interesting. Adding counters for how many times the comment was favorited wouldn't really help me with that use case because I doubt anyone else cares about collecting useful references so my favorites would never make it to the "most favorited" list. I personally don't care if I make it to the list or not but I'm certain some people would care and they would go around and start playing a popularity contest instead of looking for ways to favorite information that would be useful to them.


The [op] tags are really helpful.


The article outlines two approaches to causal AI

> There are two approaches to causal AI that are based on long-known principles: the potential outcomes framework and causal graph models. Both approaches make it possible to test the effects of a potential intervention using real-world data. What makes them AI are the powerful underlying algorithms used to reveal the causal patterns in large data sets. But they differ in the number of potential causes that they can test for.

Does anyone have references and tutorials for either approach?


Imbens and Rubin’s book “Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction” is an excellent reference for the Potential Outcomes approach. There are also several summaries online done by Rubin which do a great job of explaining the core concepts and how they apply in concrete examples. Rubin’s class in grad school was a large factor in my decision to focus my PhD on Causal Inference, and that book is one I return to frequently.


Thanks.


Someone posted this flowchart in a previous HN thread on Causal Inference frameworks: https://www.bradyneal.com/which-causal-inference-book (I picked up Counterfactuals and Causal Inference and Elements of Causal Inference, and would recommend both).


Thanks.


I suggest a different approach: we use Plan Analysis (Caspar/Grawe). This is a neuropsychotherapy instrument that can be used to actually map behavioural causality. These models here assume that you know causes for behaviour, and you only have to map them. BS. The biggest problem with an AI designed as such is that it doesn't have CONTEXT. Behaviour is never 'if A then B'.


How does having a program / algorithm and checking it on various input values help with understanding causality?


Neural networks are largely black boxes whereas a computer program or other symbolic configuration is a compositional interpretable description.

The program is the explanation of the output, i.e. the program causes the output.


But why is that a causal explanation? If I can write down a simulation of planetary motion then that doesn't necessarily explain the causal mechanism behind why the planets actually move. In fact, there are simulations for planetary motion and none of them are causal explanations because they don't actually move the planets.


I guess it's tricky because the real world is full of feedback loops. If you want a causal model for fake news then your model needs to include some representation of incentives for ad revenue and clickbait. How does the causal inference framework handle feedback loops?


I believe DAG-based causal inference isn’t able to handle feedback loops (acyclic) or nonlinearity (linear). Nonlinearities include stuff like deadbands and delays.

Control theory models handle these things just fine. But control models are hard to apply to sociological/epidemiological domains, where causal inference dominates.

From what I gather, causal inference is useful for designing studies. I’m not sure if they’re used for prediction — would appreciate if someone in the know could chime in.


> DAG-based causal inference isn’t able to handle feedback loops (acyclic) or nonlinearity (linear)

I don't think this should be true, and if it is, then "causal inference" should be qualified to refer only to a specific modeling framework. As a counter example, it's possible to formulate a nonlinear differential equation model of Covid spread and infer the parameters to construct a plausible, causal, generative model.


Interesting. I wonder if someone has tried to combine the two. I guess modern deep reinforcement learning is one such combination because it combines feedback (reinforcement) and probabilistic descriptions but maybe there are other interesting combinations of probability, causality, and feedback.


I just visited the web page and saw the following message

> Internet Explorer not supported

I'm using firefox. It's better to perform browser detection and show the message if you detect that I'm actually using internet explorer. Otherwise it seems like something is wrong with my browser.

Here's a stackoverflow answer for how to perform browser detection with JavaScript: https://stackoverflow.com/questions/2400935/browser-detectio....


Great feedback, thank you.


No problem.


I think this is useful but do you have examples and tutorials for how to use it? If you write some blog posts and tutorials for how and why this tool is useful then you will increase the chances that people will use it and get value out of it.


What's funny about his use of the word?


hey, are you on reddit memexy? i'm /u/foobanana, ping me if you are. admire your attempts at reasoning with the unreasonable here.


Thanks for the kind words. I stopped using reddit some time ago but good to know there are still good people there like you working to uphold standards. I am on keybase and I'm always happy to chat with fellow reasoners: keybase.io/memexy.

I dislike getting swept up in groups and that's what happened to me on reddit and other social media sites so now I make sure to deliberately talk with individuals without worrying about performing in front of a group.


I agree with you. His actions paint a very dark picture and I also don't understand why people are defending him.

It's like you said, he's either smart enough to know what he's doing, in which case his actions should be scrutinized as such, or he has no clue what he's doing and he should be called a "moron" and left to his own devices. It can't be both at the same time. He can't be a moron and a genius that was forced to do something he didn't want to do.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: