maradydd: (Default)
[ profile] alexey_rom tweeted Edward Z. Yang's Databases are categories (based on a talk by David Spivak) the other day. I only just got round to reading it, and having done so, I recommend you do too. The notion of arrows and their properties (identity and associative composition) can be a bit abstract for the amateur/novice category theorist (like me -- hell, I wouldn't call myself more than a category theory fangirl), and mapping this onto identity and joins in databases is a really clever concretization.

There is some nerking in the comments about the relational model really being about Cartesian relations rather than object relations. This is true, but AFAICT irrelevant if viewed from the perspective of object-relational mapping (which you get for free in Postgres and Oracle anyway).

Where I think this is really useful is the world of higher-order query languages. Category-friendly languages such as Haskell have already made a good deal of headway into database APIs; I do not yet know of any projects that (for example) can create a schema from a set of objects and morphisms, but (continuing the example) I could see using that approach to generate all necessary foreign key constraints from an ORM.
maradydd: (Default)
I owe the Berlin trip a proper writeup, but some highlights: talk went extremely well, saw many old friends and acquaintances, came up with yet another paper we need to write with Dan Kaminsky, had some interesting discussions about a computer science curriculum that emphasizes security from the get-go, narrowed down the scope of some tools I need to write in the very near future in such a way that I can put together a proper spec now, got invited to give our talk or something very much like it again at Dartmouth. [ profile] enochsmiles and I co-present extremely well, which bodes well for future joint presentations (which I enjoy better than solo presentations, when they go well at least).

We also sort of got stuck in Berlin after seeing [ profile] foxgrrl off at TXL, as it turns out that trains from Berlin to Leuven are not to be had after about 2 pm; the farthest west we could have gotten was Liège. A glance at a rail map suggested a wild possibility: Saarbrücken, so on a wild shot I called [ profile] oralelk's office and got him on the first ring. Despite not having had much contact at all over the last, um, five years (bad Meredith, no cookie!) he was still quite happy to have us crash on his couch for the night, even coming out to meet us at 11:30 at night, staying up to chat, and putting off going in to work until well past 11 am despite having quite a lot of work to do. It was rapidly discovered that Saarbrücken is one of the least convenient places in Germany to get to Belgium from; our options were basically the ICE high-speed train to Paris and the Thalys to Brussels, or a bus to Luxembourg and two trains for roughly a quarter the price. Thus I have now been to Luxembourg, making that eight countries so far this year.

I have also just received notification that our Black Hat talk has been accepted. Thus, I will be both there and at the Open Science Summit in Berkeley immediately thereafter, July 29-31. (Current plan is to arrive in CA on the 30th.) Unfortunately, this will mean missing DEFCON, for me at least; I'm not sure about [ profile] enochsmiles.

It is going to be a wild summer, with tools to write and a journal article to finish and a couple of big chewy proofs to prove on top of all my normal work. But I'm excited!
maradydd: (Default)
As most of you have probably picked up on, I'm among that minority of computer scientists who actually writes code, and often prefers it to writing papers (much to the chagrin of my advisors and colleagues). I enjoy my theoretical work, but if I spend too much time on theory alone, the joy turns hollow; I want to build things that people can use. Things that are better than what we have now. Things that are founded on sound principles and elegant proofs, that run fast, scale efficiently, are easy to use and extend, work well, and fail gracefully.

I'm often torn, rather badly, between two motivations, which I'll call the Mathematician and the Engineer. The Mathematician loves elegance and correctness, and is willing to spend exponentially increasing amounts of time to get them. The Engineer respects elegance and correctness, but is far more aware than the Mathematician of the fact that there's a lot of work that needs to get done yesterday, and just wants to get the fucking code out the door. With unit tests and regression tests and a decent build system, sure -- the Mathematician likes those too, they're a good way to demonstrate correctness -- but the Engineer recognises that nobody can use the code we don't ship, and sometimes there comes a point when you just have to take a deep breath, commit the changes and call it a release.

I submit that political decisionmaking is subject to its own Mathematician/Engineer conflicts. And it's often hard, damn hard, to switch between one perspective and the other or to find a point of harmony between them, especially when the brokenness of a system is obvious to both Mathematicians and Engineers. The argument isn't so much over that something needs to change, but what needs to change, and how much work under the hood it's going to take to ship something that is at least less broken and scales better, and what the balance is between getting it more correct and getting something that will fix at least some of the short-term problems out the door at all.

(This post brought to you by the fact that the Haskell parser generator doesn't have precedence settings for attribute rules, the realisation that my life would be so much easier if it did, the further realisation that if I want it, I'm probably going to have to put it there myself, and the nagging knowledge that there's a less elegant way to do it.)
maradydd: (Default)
Consider an arithmetic over {0,1}+ with three operators: addition, left-shift, and equality.

Is it decidable?
maradydd: (Default)
Shortly after my recent post ragging on Dembski, new reader and mathematician [ profile] coheleth wrote me asking for some pointers on learning information theory. I also got an anonymous comment asking for the same thing (hello, anonymous reader!), and who am I to say no when someone wants to talk about math?

However, I know for a fact that I have a lot of really sharp readers, many of whom are way better at this kind of thing than I am. So, you are all cordially invited to participate in a paper-reading and discussion group, right here on this very LJ.

Why information theory? Well, because it's the unsung hero of the modern age. Every form of communication we take for granted today -- telephones, cellphones, radio, the Internet, wifi, Bluetooth, you name it -- has its feet firmly planted in information theory. Information theory also helps us to make sense of the world around us, from DNA to black holes. Understanding information theory will help you to be a better scientist, even if you're not one already. On a very fundamental level, information theory has a lot to say about what we can know, what we can do. It's theoretical math for realists.

I figure we'll start with one of the classics -- Claude Shannon's The Mathematical Theory of Communication, available in several formats from the nice people at Bell Labs -- and work our way forward from there, letting the pace set itself. When I was doing this in grad school, we did a paper a week, reading on our own and meeting to discuss for a couple of hours once weekly. Since this is the internet, I figure discussions might go on for a couple of days, so my initial thinking is a paper every two weeks -- that's a week to read, a couple of days for discussion, then a breather before picking up the next one. But that's all assumption; I don't want to drag anyone away from a good discussion, nor do I want to rush anyone.

If you're curious about the subject but think that you're bad at math, then rejoice -- information theory is, to my mind at least, one of the easiest mathematical disciplines for laypeople to understand. It will help if you understand binary numbers (or, better yet, how bases work generally); a grasp of basic probability (i.e., how to compute the likelihood of a certain number coming up on a dice roll) will also be useful, as will ninth-grade algebra. You will also need to understand that the integral of a function is the area under the curve in the graph of that function (or, in three-space, the volume described by rotating that curve around an axis), and the general idea of summation (including the notion of a convergent series, which is an infinite series whose sum has a limit, i.e., it does not grow without bound). But that's it. Seriously. (You don't have to actually understand how to compute an integral. Hell, calculus was fifteen years ago; I barely remember how to do one myself. I am way overspecialised in discrete math, and underequipped for continuous math.)

If anyone's interested but doesn't feel like they have the prerequisites down, I can post something in the next couple of days to get you up to speed; don't be shy.

Where we'll go from Shannon is anyone's guess, and depends mostly on where the discussion goes. We'll likely end up talking about coding theory and compression (as in, how ZIP files work, and how your cellphone is able to hold a reliable connexion without being clobbered by the thousands of other conversations going over the cell network). But we might also get into cryptography and cryptanalysis, information-theoretic security (as in, cryptosystems that can't be broken even if the attacker has all the computational power in the universe -- a favourite subject of [ profile] enochsmiles' and mine), astronomy (radio telescopes are completely dependent on information theory), and computability theory, that latter by way of Gregory Chaitin and algorithmic information theory.

My goal here is to deepen and broaden understanding on all levels. In a meatspace paper-reading group that's difficult, but since the net is distributed, I'm hoping that we can address the curiosity of newbies, experts and self-proclaimed "non-math people" alike. Feel free to invite your non-LJ friends, too. (They might want to get OpenID accounts, to make discussion threads easier, but that's certainly not a requirement.)

So! You've got the link, up there in the fourth paragraph; go forth and read. It's 55 pages, so I'm thinking we might want to start with just the first part (pages 1-19 inclusive). We should definitely follow up with part two; I'm less sanguine about part three, but if there's interest, we'll do it.

I'll kick off discussion next Wednesday with some questions and maybe an observation or three. I'm looking forward to having you join us!

i can has?

Jul. 19th, 2009 01:23 pm
maradydd: (Default)
Would someone please invent an FPGA where the logic units are made up of a Hadamard gate, a phase rotation gate, and a controlled NOT gate apiece?

Alternately if I could get those gates in CMOS packages that would be cool too.
maradydd: (Default)
A mathematical argument for affirmative action, presented by [ profile] michiexile.

This tickles data-mining brain, because there's a notion of fuzziness that [ profile] michiexile gets into -- your fitness function will only ever be so good. Thus there's a probability problem (that I'm not quite sure how to set up) that basically answers the question of how many people you need in your pool of "perfect" (or "adequate", which is to say, "will achieve all performance goals set before him/her") candidates, if your fitness function is known to have some errors, in order to be reasonably well assured of having a candidate who really is equal among firsts. There's probably a birthday problem in there somewhere *flings up hands* -- this is a bit out of my depth, I could probably sort it out in a few hours if I had my library with me, but one of you probably knows how to fit it together. [ profile] martian_bob, you do this for a living ... anyone else? [ profile] michiexile? [ profile] coheleth?

I guess you want a fitness function that prefers false negatives (rejecting a candidate who would be qualified) to false positives (adding to the pool a candidate who actually isn't qualified). But naturally you want to minimize both, to the extent possible. *pounds head into wall* Data mining 101, where did you gooooo ...

Anyway, once you can bound your desired candidate pool size based on what you know about the accuracy of your fitness function, then you can talk about whether your candidate pool is large enough to be assured success no matter who you choose. Thus we eliminate "oh, he was hired only for his race / she was hired only for her gender" trash-talk, because we have some standard of proof of fitness for the decision criteria. (Or if the steel mills really are refusing to hire Irishmen, we can tell.) We probably also create a useful tool in battling salary disparities.


Oh, and I'm also implicitly suggesting an open standard for representing/documenting hiring criteria.
maradydd: (Default)
By way of Pharyngula, apparently the creationists are starting to abuse information theory, not just physics, in their tortured attempts to justify their doctrine.

Of course, you understand, this means war.

ETA: /me reads the comments. Oh. Apparently creationists reject Claude Shannon's work on information theory. Infidels. They shall be first against the wall when the revolution comes.

One thing that I will never understand is why creationists believe that an omniscient God is bad at math.


Jan. 11th, 2008 12:28 pm
maradydd: (Default)
Longtime readers of this journal may recall that a few years ago, I posted about a (thus far unpublished) short story I wrote for an anthology themed around Abraham Van Helsing, the vampire-hunter character from Bram Stoker's Dracula. Its main character/narrator was the mathematician Charles Babbage -- though not quite what you might expect; since the real Babbage died some twenty years before the events of Dracula, Our Hero was actually Vampire!Charles Babbage. Most of his backstory didn't make it into the short (as it should be), but [ profile] oralelk helped me flesh out some really fun details. (Brief version: Vampire!Charles Babbage sired by Vampire!Ada Byron Lovelace (world's first computer programmer, bled to death by her physicians in 1852), sired by Vampire!Evariste Galois (founder of group theory, shot 1832 in a duel).) It touched on NP-completeness and the Halting Problem, and was quite a lot of fun to write.

In later chitchat, someone -- I forget who, possibly [ profile] oralelk or [ profile] wintersweet -- suggested I could follow up with more stories of Vampire!Charles Babbage and other famous historical mathematicians, e.g. Richard Montague or Alan Turing. As yet, this idea has lain fallow. But I've been brushing up on my crypto theory these last few weeks, and as I was walking home from the print shop just now, an idea started to gel: Turing => Bletchley Park => cryptography => modern cryptography => abstract algebra => Galois!

Not sure if anything will come of it; one happy accident of a connexion isn't enough to hang a plot on. But I'll keep it in mind, and we'll see if it goes anywhere.
maradydd: (Default)
I generally prefer to absent myself from the morass of meta-journalism that is discussion about the blogosphere itself, but that said, if you want to read an incredibly patronising article, you don't have to look a lot farther than this Eric Engberg op-ed.

The editorial focuses on the shitstorm of commentary that swept the web on 2 November, natch. Certainly, a lot of that was wishful thinking, a lot of it was misinformation, and a lot of it was just flat-out wrong. That's fine, because it's all true. What gets under my skin, though, is stuff like the following:
While out on the campaign trail covering candidates, my own network’s political unit would not even give me exit poll information on election days because it was thought to be too tricky for a common reporter to comprehend. If you are standing in the main election night studio when your network’s polling experts start discussing the significance of a particular state poll, you the reporter will hear about three words out of one hundred that you will understand. These polls occur in the realm of statistics and probability. They require PhD-style expertise to understand. The people who analyze them for news organizations, like the legendary Warren Mitofsky and Martin Plissner at CBS News -- have trade associations like doctors do to certify their work.
First of all, never you mind that a binomial distribution absolutely does not take a PhD to understand; it's standard fare for the latter half of your average undergrad Stats 101, and I can explain it to a high school student of above-average intelligence such that he'll remember it when he gets into Stats 101. That isn't the point at all. The point to which I object is Engberg's attitude that because We the People aren't certified to deal with these Scary Data, we shouldn't be allowed to put our grubby little hands on them at all.

Well, you know, the vast majority of We the People aren't going to grok most of what goes into the EnsEMBL genomics database, or the reports and data on the Center for Army Lessons Learned, but it's all up there for anyone to take a look at. You want to see some data that could be outright dangerous if used irresponsibly, paw through some of the stuff on CALL; there are POIs in there that can get you killed if you're not observing proper safety precautions. Them's the breaks; you pick the information you want and how you want to use it.

Engberg continues:
When you the humble reporter are writing a story based on the polls you need one of these gurus standing over your shoulder interpreting what they mean or you almost certainly will screw it up. There is a word for this kind of teamwork and expertise. It’s called "journalism."
Now, I'll absolutely concede that it is the responsibility of people who provide information to others to double-check that what they're putting out is correct, and part of that responsibility includes consulting expert resources before running one's mouth. (It's also especially amusing that this sort of high-horsery is coming from CBS, given the colossal fuckup that was the Bush National Guard Documents scandal.) But there's also a flip side of the coin: when people screw things up, they are expected to print retractions. This happens in blogs all the time; this happens in print media as well, but I don't even have to invoke my expertise as someone whose job it was for several years to proofread the laid-out pages of a major metropolitan newspaper to remind you that the print media usually do their damnedest to bury retractions in the tiniest print they can get away with.

I'll even argue that in blogs, particularly in political blogs talking about transitional situations like elections, the truth will out not only because people call each other out (as was the topic of much discussion after Rathergate), but because transitional situations come to an end and everyone finds out What Really Happened all at once. This leads directly into a facet of the blogosphere that Engberg is utterly glossing over: its time scale is radically compressed from that of print or even TV journalism.

With one exception: sports.

The blogosphere allows for a play-by-play of what's going on from moment to moment, just like the commentators in a football game describing every action on the field for the loyal listeners back home. So what if the commentators point out that the Cowboys are up by a field goal at one point in the first half, but they end up losing the game anyway? That doesn't change the fact that, say, from the field goal at 7:34 pm until the Texans scored a touchdown at 7:52, the Cowboys were ahead. Likewise, if Wonkette points out at one point that Kerry is up 52-47 in Ohio -- which he was at one point, because I was one of those no-life dorks hitting Reload on all fucking night of the election, until I got sick of it and went off to grade papers -- that isn't changed by his ultimately losing the election. Engberg seems to be of the opinion that blog readers are looking for Gospel Truth and receiving at best, half-truth, at worst, lies, damned lies and statistics. I submit that Engberg misunderstands what we're interested in. Journalism, of the type he describes, can indeed provide a slice of what's going on all over the country, but it must wait until long after the fact to do so. Those of us who are interested in a truly up-to-the-minute assault of information understand that we're going to have to take it with a shakerful of salt; that seasoning is the price we pay for a slice of life that we can get from blogs.


maradydd: (Default)

September 2010

12131415 161718
26 27282930  


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags