Hacker Newsnew | past | comments | ask | show | jobs | submit | 8chanAnon's commentslogin

Third article in a series. Explores eval and self-invoking functions.


My next article will be on the topic of bypassing the Cloudflare bot protection. You can then compare with how Selenium handles this problem (if at all).


>Some walled garden sites seem completely unscrapable

Any examples besides Linkedin? Tell me what sites you're trying to target and I'll have a look to see what can be done with them. It takes some pretty evil Javascript obfuscation to block me and only one site has been able to do that. I doubt that the sites you're hitting are anywhere near that evil, lol. I would appreciate it if you have a good example that I could use in a future article.


It's been ~18 months so I'm fuzzy on details. I remember gmail being tricky also.

IIRC I ended up building an iframe based scraper for sites that didn't yield any content with just a fetch - and I think built a fallback mechanism so that if fetch didn't work, I'd queue it up in the iframe scraper. The problem with that is that there are various heavily used security headers that prohibit loading in an iframe. (And the reason for iframe vs just loading in a tab and injecting my extension's script is that I wanted it to be able to run "in the background" without being super distracting for the user - the tab changing favicon every second or two was pretty annoying.)


WASM runs in a sandbox. It can only talk to the outside world via Javascript so you can forget the idea that it might be a way to crack through browser security.

Maybe somebody will make a web browser with all of the security locks disabled. Sort of like the Russian commander in "Hunting for Red October" who disabled his missiles' security features in order to more effectively target the American sub but then got blown up by his own missile.


>bad formatting

If you can elaborate, I would very much appreciate it. I'm always interested in doing better.

Why use Puppeteer etc. when you don't have to? What is the argument for using these additional tools versus not using them?


You don't have max-width set on the text, so unless you have your browser window resized to very small size the paragraphs will span your whole screen.


Which places you, the reader, fully in control of the width of lines you prefer to see. Adjust your browser window width, or apply a user style sheet to tell your browser to format the text the way you want to see it formatted.


I knew this reply would come... How many people do you think use custom browser stylesheets? It's probably smaller than 0.1% of the internet population and everyone moved on to formatting text so everyone can enjoy good readability. Also not all devices have the luxury of supporting custom stylesheets.

Of course it's always up to the site owner, but most people want people to read what they share.


> Adjust your browser window width, or apply a user style sheet

very funny, both jokes


I see. I don't have a wide-screen monitor (still using an old tube type until it finally expires but it's taking a few decades, lol). I've wondered whether people actually like reading websites on wide-screen. Some do and some don't. What would you suggest for a max width?

You could also try zooming in. My apps don't expand to full width because of the video box but you can zoom.


There's a lot of info about this but usually 500-700px or ~80 characters will be much easier to read: https://ux.stackexchange.com/questions/108801/what-is-the-be...


I have the opposite problem. I typically have many browser windows open at the same time but only two screens, and many sites that I use are designed to assume that everyone has full-screen browsers.


Similar problem here. My resolution is 1440x900 (27in Monitor) paired w/ 1280x720 (32in TV) and I keep 3 Browsers Open (Edge, Opera, FireFox - each has intent). Each are at 3/4 width and 1/2 height and offset so I can see each partially.

With this setup, many sites work, but a few... a few have a top ad banner, a side banner and a footer of 'cookie acceptance'... then add in a 'subscribe to our email' and a google login prompt.... (Game Wiki's.. I game in smaller windows too -- what good is a multi tasking computer if you don't use it?)


The red text on a yellow background is not great. Neither is a serif font. Also it should be JavaScript with a capital S.


Red text on yellow - you mean the website? Would you like the text to be darker?

And Beautiful Soup should be BeautifulSoup. Who makes the rules?


Yes, the website. Better contrast is good. Black on white works great.

Margins would also be nice on the left and right.

Beautiful Soup is two words. Just look at their website.


Too much contrast makes it hard to read too. (With bonus eye strain.)


Is there a happy in-between? Maybe not. What looks perfect to one user might appear atrocious to another. What is a poor website operator to do???

I dislike black-on-white and don't understand gray-on-black which seems to be popular now due to gamma settings being cranked up to 11 or something. I try to use some color as an in-between but that may take some time to "perfect".


Even Google Chrome's Lighthouse said that your background and foreground colors do not have a sufficient contrast ratio.


Every dark mode site in existence should fail that test.


No, because the contrast can still be good as the colors are just reversed?


> Is there a happy in-between?

Since browsers allow users to configure a default font and background color then one possible "happy in-between" would be to set no background color, and set no font color, thereby allowing each user agent (i.e., browser) to display the site with that user's default background and font colors.

In that case, each viewer should get their preferred colors, all without you doing anything.


Cloudflare et al do a lot of fingerprinting of the user agent. Any website that has 'high' anti-bot settings will return a 403 with anything but a browser. source: I've scraped lots of things.


Rather long so I'll read it later. Thanks for the tip. Got more or is that it?


Not that I know of, I'm just quipping a bit about my own work lol


I voted "light" because there is no "medium" option. Why is it a choice between gray-on-black or black-on-white? I would choose a "color" mode if there was such an option on the table. Maybe we can see a return to the yellow-on-blue mode which was popular with text editors a few decades ago?


I've been asking the same question myself though I'm not trying to get paying customers. I'm offering free stuff but I can't get many takers. One customer per 1000 views is pretty much the best you can expect. You need a million views to snag 1000 customers. I would suggest that you stick to your plan and go through the free routes first. This will help you to gauge the level of interest before you bankrupt yourself paying for marketing (which never lives up to expectations).

I do have an anecdote. I have a repository on the Internet Archive. One of my items had just a few hundred views but, all of a sudden, it accumulated over 3000 views. I posted a comment on the item asking what sparked the sudden interest. Someone replied that it was a user comment on Youtube. Makes me wonder if I'm not making a huge mistake by not registering on Youtube so I can post self-promotional comments everywhere.

Obligatory shameless plug is below.

https://8chananon.github.io


Interesting, never considered YouTube! Also helps to hear that the conversion rate is pretty normal haha. Thanks!


I've updated the app to remove the user agent string so that the link will not redirect to "m.youtube.com". I hope that's all that's needed to fix this.


Youtube just changed their code. The app is not currently working. Coincidence?

I might have to wait a bit to see what else gets changed.

Sorry about that.

Update: Fixed! Will keep an eye on it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: