maradydd | Small world

You're viewing

maradydd's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

There's a post up on BoingBoing today (ok, yesterday for me) about open vs. closed search algorithms, suggesting that the search algorithms used by Google, Yahoo et al are bad because of their lack of transparency. It invokes a comparison to an important concept in computer security: "security through obscurity" is dangerous because an effective encryption scheme should be equally hard to break whether you know the internals of the algorithm that generated the ciphertext or whether you don't.

I think comparing this to search is a bad (or at best misleading) idea, and expounded on this in the comments. But I'm far more entertained by the fact that the two best comments on the post so far come from two sources with whom I am tangentially familiar, albeit from totally different directions:

jrtom and

radtea. Small damn world!

Current Mood: amused

Flat | Top-Level Comments Only

From:

jrtom.livejournal.com

Sure, I wouldn't expect anyone to try and mirror all the data. But I do think that independent developers could get useful data to work with through statistical sampling.

Ah, got it. Hmm. Might be feasible to do sampling.

As to your proposal: my completely uninformed guess is that these services overlap in something like 90% of the _types_ of data that they collect/maintain.

...

Actually, I'm going to revise that figure downwards. I'd guess that Google uses data from GMail and their ad programs, for instance, to inform their search ranking algorithms.

The suggestion is an interesting one, but I suspect that it wouldn't fly:

(1) It would have to be a (partial) mirror; there's no way that the services would want to base off something that wasn't dedicated. So cost-wise it would be pure overhead for the sponsors.

(2) A lot of the data has privacy implications that are hard to deal with. Remember that flap over the release of a bunch of search queries?

(3) I suspect that the various services don't really want attention drawn to the breadth and depth of the information that they use for these purposes.

Definitely an interesting idea, though, and it might be possible to do something like this even if the data sources that the major players use are widely disparate. It wouldn't help you to answer questions like "how would Google work if I tweaked this constant?" but for general search research, it could be useful.

YaCy might be an interesting resource in this context.

Flat | Top-Level Comments Only

Profile

maradydd

September 2010

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Page Summary

jrtom.livejournal.com - (no subject)

Style Credit

Style: Blue for Skittlish Dreams by Kaigou
Resources: Circular Icons

Expand Cut Tags

No cut tags

Radio Free Meredith

science keeps me warm at night

Small world

(no subject)

Profile

September 2010

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags