I'm glad they brought up Marginalia; it's one of my favorite engines. Teclis[0] ...

derbOac · on March 11, 2022

So I tried these search engines. They're fun to try but kinda remind me of search before Google.

I'm frustrated with Google search results lately, they don't reach far enough back and are diluted with all sorts of crud.

But these alternative search engines kinda just remind me of why people started using Google. When I use them, I don't get spam, but I don't really get things I'm interested in either. It's like they don't understand what I'm really interested in and I either get no hits, or get hits on things that are just completely unrelated. It's like old-school chat software or something, complete with all the computer misunderstandings and whatnot.

Don't get me wrong, I'd love to see a lot of competition in search. I use DDG a lot. But I'm surprised at how positive people's comments are about these alt search engines, because to me they have problems on the opposite end of the spectrum. To me, what Google returns is a pretty good understanding of what I want, but corrupted by manipulative spam; the other ones generally return no spam, but also not at all what I want.

mattmanser · on March 11, 2022

I'm trying https://kagi.com/ at the moment, was mentioned here a few months back when it was still in beta and I finally remembered to switch one of my browsers to try it.

So far, pretty happy with it. Before that I'd been using ddg but found myself so frequently !g that it was almost pointless.

DDG feels absolutely rubbish at local searches, but I may have just lost patience with it.

To be honest, I don't think DDG has a future purely because of the name. No way I'm telling non-techy friends that as I'll just get "what? Ducky go? Duck what? Are you being serious or is this a joke?"

Gareth321 · on March 11, 2022

Being able to make rank adjustments is an absolute GAME CHANGER. I didn't realise just how terrible my search results were until I could have granular control over the poor quality sites which kept appearing in my Google results. It's so clear to me now that Google is prioritising advertising heavily over user experience. The recent discussion about how such a large number of Google searches include "reddit" should serve as a warning for Google to swing that pendulum back in the direction of the user experience, or they will lose people like me. I'll be paying for Kagi when it's out of beta.

jaclaz · on March 11, 2022

>To be honest, I don't think DDG has a future purely because of the name. No way I'm telling non-techy friends that as I'll just get "what? Ducky go? Duck what? Are you being serious or is this a joke?"

Only as a side note/JFYI, naming something, particularly if aimed to be international/multilingual, is particularly tricky, duckduckgo may well sound funny to your friends, but - as an example - kagi (in Italy) would be pronounced the same as "cagi" which is a reknown historical maker of men underwear, to the point that "a cagi" is sometimes used as a synonym to "a tank top", and surely it would make some people think it as a joke.

klez · on March 11, 2022

Or worse, it could be pronounced as "Caghi" ("you shit").

freediver · on March 11, 2022

To solve the mystery, Kagi is pronounced kah-gee (and is Japanese for "key")

Doctor_Fegg · on March 11, 2022

I’m delighted with Kagi. It finds the regulars I want (StackOverflow, Github, Wikipedia) just as well as Google, but is less infested with SEO spam and has a better selection of indie sites. Images and maps aren’t really up to par yet, but I mostly use OSM for maps anyway. I’ll be very happy to pay $10-$15 a month when Kagi start charging.

pverghese · on March 11, 2022

Anything that says sign up for beta to get me to provide an email and contact info, I and a lot of other people are not going to use.

akvadrako · on March 11, 2022

It's a beta now but it'll become a paid search. That you aren't willing to pay for search means it isn't very important to you.

krageon · on March 11, 2022

> That you aren't willing to pay for search means it isn't very important to you.

Or that they don't have money to throw away on things that aren't things necessary to live. Your statement is only true if the person you respond to is quite wealthy, which is a leap.

akvadrako · on March 11, 2022

The target audience is professionals, where search helps them make money, which is necessary to live.

krageon · on March 11, 2022

If the choice is between eating or good search, which do you think someone will pick? Is that because they don't think it's important?

Gollapalli · on March 11, 2022

I don’t know where you live or what your circumstances are, but $10-15/month is less than Netflix or Spotify. It’s less than YouTube Premium. All those things are essentially you paying for content in a way that results in you being less advertised to. It’s not a lot of money.

libraryatnight · on March 11, 2022

Wait, why is being 'quite wealthy' necessary? Their FAQ says 10$ on the low end, 20-30$ for an unlimited offering. People of varying economic positions spend that monthly on a myriad of non-essential services. I feel like you're also playing a semantic game with 'importance' since in another comment you immediately jump to a starving person trying to choose between a premium search engine and food. We're more or less in a typical HN thread about product barrier to entry here and it feels like you're on a class crusade to get equal search results for all. Not the same conversation.

Gareth321 · on March 11, 2022

Totally fair, but why not just provide a throwaway email if you're worried about spam or privacy? Is it the principle? I've never had a principle problem with having a login to a service. I would do so with DDG, for example, if I could customise my results better.

marssaxman · on March 12, 2022

I am thoroughly sick of registering and logging in for services and will avoid doing so whenever possible.

zozbot234 · on March 11, 2022

When it comes to these older sites, it's hard for an automated engine to discover what they're about because (1) they lack the kind of structured description that's far more common on modern sites - including, to some limited extent, spammy ones (although extensive structured descriptions would nonetheless tend to favor legitimate content) - and that powers "smart" suggestions in search results. Also, (2) the web directories that would've provided an accurate description back when those sites were current are now dead. What we'd need to improve the Web search ecosystem is for "non-commercial", hobbyist sites to work on addressing both of these problems.

marginalia_nu · on March 11, 2022

HTML has had metadata tags for ever, it's just that they quickly stopped being used by search engines were so inaccurate and prone to abuse. Even now, the heavy presence of these types of tags is arguably a marker that a website is really interested in its google ranking, and probably fairly spammy.

Any sort of description or tagging or keywords or genre description needs third party vetting to be of any use what so ever. It's simply too profitable to misrepresent your websites for it to be any other way.

zozbot234 · on March 11, 2022

Those metadata tags were just a simple textual description and a bunch of keywords with no reference to any controlled vocabulary. This is what made them so easy to abuse. Modern schema-based structured data is vastly different, and with a bit of human supervision (that's the "third party vetting") it's feasible to tell when the site is lying. (Of course, low-quality bulk content can also be given an accurate description. But this is good for users, who can then more easily filter out that sort of content.)

One could even let this vetting happen in decentralized fashion, by extending Web Annotation standards to allow for claims of the sort "this page/site includes accurate/inaccurate structured content."

marginalia_nu · on March 11, 2022

The thing is "a bit of human supervision" is difficult on a scale of ten thousand Wikipedias. It pretty much needs to be done completely automatically.

amelius · on March 11, 2022

I just hope we can get a trust-based network combined with search. So if I trust some friends and search "best toaster", and one of my trusted friends has given a very high review score to a toaster, then I get that one as the top search result.

Extend this to online communities, and you can ask "what laptop would HN recommend for Linux?", etc.

Of course, there's a privacy issue to solve, but the functionality could be very useful compared to the crappy Google search results for commercial products.

aabbcc1241 · on March 11, 2022

I'm having similar idea but with less user involve: a browser extension that extract keywords from the websites you actually visited, and form a p2p database with your followers. If you see spam / undesired ads in search result, you can rate down the content, and the system auto reduce the weighing from the peer providing that history.

To avoid sharing sensitive page / habit, maybe let the user review in batch and confirm before sharing out the list.

willis936 · on March 11, 2022

>But these alternative search engines kinda just remind me of why people started using Google. When I use them, I don't get spam

In the past few months at least half of my Google searches have had spam in the top 3 results. Literally malware domains that community-made filter lists are aware of but somehow Google chooses to share anyway.

andromeduck · on March 11, 2022

Yes! I was trying to find an article on polish objections to Nord Stream 2 a week or two ago and couldn't find anything. Tried a million search combinations, tried date restricting but it all favotrd recency. Ended up finding it as a footnote on Wikipedia

ufo · on March 11, 2022

With marginalia I think that is by design. They intentionally do not index any websites that are deemed "too modern", which includes most things one might be searching for. It does seem to be intended more for exploring unusual places.

zzo38computer · on March 13, 2022

It look like good, but perhaps should have some more options (that can be specified as a part of the search query text), and documenting these features. Some possibilities would include filters (by file format, domain name, scheme, etc), sort order, excluding, etc.

There are two menus for options ("Popular Sites", "Blogocentric Eigenvector", "Both Algorithms", "Experimental", "Allow JS", "Deny JS", "Require JS"), but does not explain them very well. (For example, I might want the search engine to not execute any scripts in web pages to determine their text, but if the text works when scripts are disabled that it can still be included in the search results (even if the web page has scripts, as long as those web pages work correctly even when scripts are disabled).)

Also, they have some documentation using Gemini format. I have a Gemini viewer in my computer, but it won't use it because of the "Content-disposition" response header.

cryptolake · on March 11, 2022

I don't know, why i wasn't aware of these search engine, i'm having a blast discovering personal website, this is amazing.

marginalia_nu · on March 11, 2022

Honestly kinda crazy the publicity my tiny project is getting for what it is.

Seirdy · on March 11, 2022

You deserve every bit of it, and then some. The scope of Marginalia may be small, but it is one of the best examples of making the non-commercial web accessible to a much larger audience. I hope it inspires others to tackle similar projects.

I'll be able to support the project starting a few weeks from now, and would love some non-Patreon options.

marginalia_nu · on March 12, 2022

What is a good alternative to Patreon? I haven't really looked into it too deeply.

They do take a fairly steep cut, and that's not even considering that PayPal also wants their pound of flesh on top of that :-/

Seirdy · on March 12, 2022

Ko-fi and Liberapay are platforms that don't take fees. Buy me a Coffee is a popular choice for one-off donations.

leobg · on March 12, 2022

Why not set up your own Stripe?

aghilmort · on March 11, 2022

list is great -- & knew you'd done your homework when saw Marginalia - also a fave

run search startup, Breeze, so have a similar list and I was like damn :)

Seirdy · on March 11, 2022

My feedback:

I think what you're doing with Breeze seems interesting, but the value-add of making it a commercial offering isn't clear; what does it offer that anyone else can't easily replicate with Google Custom Search? I'm not saying that there isn't a value add, I'm only saying that it isn't obvious as a user.

Something has to be a scarce resource; my guess is that the resource here is "labor" in building the CSE parameters and finding the sites to add to collections. Perhaps the effort that went into this should be emphasized.

Are there plans to move things server-side? Making client-side requests to Google has privacy implications.

aghilmort · on March 13, 2022

tackling in parts / next replies

aghilmort · on March 13, 2022

-- Google / privacy / alternative proxies --

tl;dr -- working on improving client-side privacy protection & there's couple of other options that may permit using Google with a slight bit of user config; premium version is all server-side since proxied via Bing, Gigablast &/or our index; there's an alt premium option that would basically proxy google via a cloud browser, say browserling or KASM or similar

1. The client-side will be set to no personalized ads -- just found out about a fairly hidden setting that permits that -- should be changed / live later today

2. Google's API precludes making server-side unless user configs an account which we can then drop in, since limited to 10K queries / day to do anything meaningful custom or full web; we're planning to do that as blog post as intermediate alt

3. We've started to instrument if client-side calls sidestep any privacy -- those results / protections are necessarily limited, however, we can perhaps uncover some things if anything in their client-side code that is privacy revealing & possibly mitigate some

4. premium is server-side with proxy to {ahrefs*, Bing, Gigablast, etc.} OR our Breeze index -- we scrape inventory-sensitive / time-sensitive sites, e.g., used car dealer pages

5. given enough ad or other revenue, we could alternatively give everyone a Bing proxy like DDG or other services -- bit too bootstrapped to do that out of the gate, plus Bing has more constraints on custom search, so that's a mixed bag of outcomes

6. another approach, also necessarily premium, would be to proxy through a service like browserling for Google searches, since the API doesn't permit it at scale

7. premium also includes alerts, especially for things like say car dealer pages that are more time sensitive and harder to config than what free google alerts do

aghilmort · on March 13, 2022

can DM some examples of CSE config on easy to hard

if added anything else, it's that we're positioning Breeze more as a search client + search engine, e.g., we use Google, Bing, etc. for webw-wide (search client) whereas we scrape car dealer pages for real-time alerts of inventory (search engine)

that same search client philosophy also why we're adding a low-code query builder so that anyone can build a really extended query, aka, custom search engine, and either keep private for their use or share with community, since we can't possibly build all the CSEs ourselves

in that sense, our long-term trajectory is about building what amounts to a deep reddit, where people can search sites / topics of interest in very direct ways that are way more substantive than wrestling Bing / Google

that's also why we're shifting away from /topics at the top of our site to integrating the branches / filters / custom queries directly into the search experience, e.g., blogs is first one we've done that with that's not easily available elsewhere, and podcasts is likely next.

aghilmort · on March 13, 2022

-- custom / topic search, or what we call branches --

tl;dr -- free version makes it easy for anyone to build, use & share custom searches; premium version is zero ads + better web alerts + easier access to alt web indexes

1. average user unlikely to go through config of custom search

2. even if they did, it's so poorly documented & finicky, they'd be unlikely to achieve comparable outcome

3. Bing CSE is even worse -- requires Azure account, limited to 400 up/down boosts, etc.

4. there's other technical reasons a user might not do more than a couple, along with the sheer scale issues that also mentioned

5. we're refactoring how the topic searches are experienced and making the topics, which we call branches, a more natural part of search experience

6. "blogs" is the first one to be done that way -- you can filter to just blogs after searching for X - any other branches / topics will be added in that way

7. e.g., podcasts & RSS feeds are coming out soon, along with more traditional filters such as shopping

8. that approach also makes it easy for us to expose what we're calling a low-code builder to let anyone build a custom search and either share / make public or keep private

9. that includes all possible filters - advanced keyword combos, site inclusion / exclusion, URL patterns, schema structure, etc.

10. premium for users includes zero ads, alerts, and some other features that are mix of TBD or too early to build

11. premium for teams includes similar things, along with the ability to config dashboards of searches, e.g., an HR dashboard of relevant custom searches, etc.

12. we're navigating that labor balance -- e.g., the blogs filter is fairly basic atm, whereas filtering college scholarships was a bit more nuanced to make it work

aghilmort · on March 13, 2022

anyway, hope that helps / iterating quickly

Seirdy · on March 19, 2022

Really good stuff. You might want to explain some of this on the website.

loudtieblahblah · on March 11, 2022

it'd be nice if these could be worked into SearX