Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building a legacy search engine for a legacy protocol (benjojo.co.uk)
137 points by benjojo12 on May 21, 2017 | hide | past | favorite | 23 comments


Do you still have the raw crawl data? If so, have you considered uploading it to the Internet Archive to help preserve it for the future?


Nice idea, I've rolled up all of my data and crawl database and put it on the internet archive:

https://archive.org/details/gopher-may-2017.tar


Could someone please help seed the torrent? It has been stuck at 99.9% for the past day.


The Internet Archive should be seeding all the torrents they host. Have you tried restarting your torrent client, forcing it to recheck your local data, or re-downloading and re-adding the torrent file?


Thanks! It's amusing how the entirety of Gopher fits into a 1.6 GB archive compared to the petabytes (exabytes?) it would take to archive the entirety of the WWW.


The internet is many, many, many exabytes large at this point. Hell, I'd bet YouTube alone is a few hundred exabytes at minimum.


Windows 98 is a flaky OS, IIRC it used to crash with BSoDs multiple times a day even on real hardware. That AltaVista standalone crawler software probably could run much better under NT 4.0 or Windows 2000. And NTFS allows for much larger datasets than FAT32. NTFS hasn't much changed since that time, by the way.


Old Windows versions crashed so much because hardware manufacturers at the time slapped together poor drivers, and Microsoft couldn't do much about that if they wanted people's computers to work at all. The #1 thing that as improved since then, was simply that more of the drivers are now written by Microsoft themselves. The #2 thing is Microsoft getting enough power over hardware makers to force driver quality-assurance and signing on them.

Which is all to say: if you run Windows 98 on hardware that has good drivers written for it—or especially on virtual hardware whose "drivers" are just paravirtualized calls into a modern OS kernel—your copy of Windows 98 won't be BSoDing any time soon.


There's also the thing where, uh, Windows NT has full memory protection.

On Windows 9x it was trivial for any app to hose the OS, because critical memory regions were unprotected.


The OP's issues seem to indicate that this is not the case, the system would just crash under hdd load in those virtualized systems.


Yup, 98 is trash for this use, This is why it makes is so exciting to try and run a "production service" on!


Win98 SE was a lot more stable than original Win98, but I agree about 2000 being a better platform all around.


Maybe i am lucky, but what most "BSODs" on 9x amounted to was a "the program is unresponsive".


Anyone remembers back in 93/94 hearing about HTTP and saying 'gopher, but with images? Neat! Oh, but I need a graphics terminal... damn'.


Very cool and a nice write-up, thank you.

I know absolutely nothing about the AltaVista software so this was interesting. I'm wondering, were there limitations in scale? Did the search engine have other types of technical limitations? If you had to choose from modern software, what would have been your choice?


Gopher was awesome back in its day, and I'm sad that the semantic web efforts (its nearest modern aspirant) have done so poorly.


Wow ... weird to think I'm running one of the 370 gopher servers left in the world: gopher://gopher.conman.org/


I actually thought about doing this a couple months ago but never got past the "cool idea in my head" phase. Great to see this!


AltaVista brings back memories!

And in the archive I was able to find some of our old text from '92! It's alive I tell you.

I didn't use gopher much if at all when I had the chance back in mumble mumble '80s/'90s so I'm pleased to finally use it. Surely it'll be back in fashion soon?


But still, I would rather use xapian over AltaVista. Same technology (inverted index), but stable and much better. Win98 as service is just asking for too much trouble. And no, not ElasticSearch. No java in the house, C++ is enough.


Fantastic article. Wonder if the altavista software would run under reactos?


Gopher strikes me as something of a file explorer for the net.


Wow, this really brings me back. Well done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: