Meredith L. Patterson ([identity profile] maradydd.livejournal.com) wrote in [personal profile] maradydd 2008-01-02 08:21 pm (UTC)

Making the algorithms open doesn't mean that any flaws will be fixed quickly; it just means that they'll be _found_ (more) quickly.

Sure, and I'm not pretending otherwise. (Note that I only said responses will be generated more quickly; it's anyone's guess as to how well those responses will work, particularly since the spammers certainly won't be opening up their source code!) I should have said above that I think the potential for an open-source search engine to implode in grand style due to sheer developer frustration is enormous. But I still think that if enough dedicated people were on board, cool things could happen; it's hard to say how many is "enough", though, or how dedicated they need to be. Startups tend to have fanatically dedicated people working for them because the people know that the volume and quality of their work has a direct influence on whether they're going to have a job in three months; this really can't be said for open-source projects. Even when the work sucks giant monkey balls, a sense of urgency can be a great source of inspiration.

random Wikia users won't have access to the data that informs the algorithm design

Do we know this is true? (I didn't see it indicated in the article, but it was a pretty short article.) I suppose the bandwidth costs would be kind of insane if just any random person could pull the data whenever they wanted ("hey, let's DDoS Wikia today!"), but perhaps developer keys and rate-limiting, or BitTorrent, or something.

Nor, I suspect, will Wikia be letting just anyone _edit_ their algorithms, unless they're complete idiots.

Sure. I took [livejournal.com profile] neoliminal's question to mean something like the setup that LiveJournal has, where the code is published and can be replicated elsewhere (e.g. DeadJournal), but users can't make changes to an instance of the system that they don't control.

From my experience, part of the problem with translating academic results to search engines in particular is that it's hard for an academic to demonstrate that their approach or improvement will work in actual practice.

Oh, absolutely, although the better conferences (e.g. SIGIR, KDD, &c) seem to at least pay lip service to scalability issues. But I totally agree that academics almost universally have blinders on when it comes to the notion of people using their systems in unintended or unexpected ways, and they don't write papers (and certainly don't implement code) with an eye toward this very real problem.

Still, I like the notion of J. Random Bored Hacker being able to read a paper, bang some code together, and see whether it works. J. Random Bored Hacker isn't going to have the hardware resources to put together his own private Google, but I know probably ten or twelve different people who have clusters running in their homes/warehouses/whatever just for shits and grins. There's got to be some guy out there with fifty Beowulfed dual-Xeons and a healthy curiosity about search...

I gather that you're in some completely other time zone these days

Yep, I'm in Belgium until late February, alas. If you'll be business-tripping later in the year, though, drop me a line in advance and we can grab dinner! (I am currently without car, and my day-to-day transportation needs are met well enough by SF public transit that I'm not especially motivated to fix my engine, shell out to get it fixed or buy a new car, but Mountain View is fairly reachable by train.)

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org