maradydd: (Default)
Whoever came up with apport -- the Ubuntu crash-reporting system -- is an unsung genius. Crash reporting isn't anything new, of course, but crash reporting that opens up a ticket in Launchpad and lets you, the user, customise the report and follow the problem-remediation on the web is elegant brilliance. Quality assurance that provides some level of accountability to the user reporting the problem? Who'd'a thunk?

Amazing what you can get your users to do with just a little presumption of good faith, innit.

Also, whoever came up with the idea of having apport report package installation failures -- those of us who have ever spent time in dependency hell salute you. If I ever meet you, I'm buying you a beer.

(This post brought to you by the karmic dev build.)
maradydd: (Default)
Has anyone reading this used Scrapy, the Python HTML-scraping framework, programmatically as part of a larger system? I'm interested in using it to replace BeautifulSoup in a project I'm working on which involves extracting specific, XPath-targetable tags from the contents of a whole bunch of different URLs. BeautifulSoup can do it, but the CPU and memory load is really heavy and I'd like to find a lighter-weight solution. (Scrapy supports XPath out of the box, which was a great design decision on their part.)

The specific problem I'm having with Scrapy is that despite the fact that it supports writing custom scrapers, it's designed as a command-line-driven tool to the exclusion of anything else. I want to instantiate a scraper from within a routine, run it, and hand the contents of the tags it collects off to another routine all within the same process, without having to invoke a separate process or touch the disk -- this system has to consume a lot of network data and I can't afford for it to become I/O bound. (I can queue the inbound network data -- in fact, since my current architecture is completely synchronous, I already am -- but not having to do so is preferable. Scrapy is asynchronous and that's a plus.)

Since it's written in Python, I can trace the control flow and figure out what specific pieces I need to import and/or customise to get it to do what I want, but it's a pretty densely layered system and it would be nice to have some examples to use for guidance. The documentation is unfortunately useless in this regard -- all the examples are for command-line invocation -- and neither Google Code Search nor koders.com turn up anything useful.

N.B.: I'm reluctant to just use libxml2, because most of the pages I'm scraping are not XHTML-compliant. In fact, a surprisingly large number of them have HTML so malformed that BeautifulSoup chokes on them and I have to use an exponential-backoff approach to parse only a piece of the document at a time. (And in practice, that means I sometimes lose data anyway; this is annoying, but frustratingly necessary. Dear web developers who cannot be bothered to make their content machine-readable without lots of massaging: die in a fire.) It is my understanding that Scrapy is quite tolerant of bad markup, but if I'm wrong about that, please correct me.
maradydd: (Default)
By way of [livejournal.com profile] sfllaw, a development paradigm I had not previously known about, and tool for developing in this fashion. Holy shmoley. I agree with Bill Tozier, I want this for Python yesterday.

Behaviour-driven development is basically test-driven development on steroids: it takes the principle we like to cite, "write your man pages first!", and hooks it right into the test-driven development cycle, except now you're developing one behaviour at a time, so you can write your tests piece by piece and have individual chunks of the system piece by piece. I like TDD, but sometimes I have to write code fast (and yes, TDD always ends up saving me time in the end, but we've all had those projects where OMG EVERYTHING IS ON FIRE AND THERE'S NOT TIME TO DO IT RIGHT. Behaviour-driven development eliminates your excuses to not do it right: you're producing code as discrete functional units, complete with tests to prove that they are correctly functioning functional units, and you're producing it fast enough to keep management/the client happy. (Clients are sometimes not happy when the first week of work goes into building the unit test suite. Yes, yes, I know, that week of work saves a month or more later on down the line. Some of my clients are no longer my clients for a reason.)

Behaviour-driven development is also a great tool for the "design the UI first" school of programming, and any project that doesn't follow that school of programming is doing it wrong. (Think of it this way: if you're writing a library, design the API first -- that is to say, write the man page first. If you're writing a web application, mock up the user interface, figure out what the damn thing's going to look like and do all your changing-your-mind about how the UI is going to behave before you start laying down AJAX requests.)

Also courtesy [livejournal.com profile] sfllaw, a talk by Ben Mabey explaining not only these ideas but the business decisions which motivate behaviour-driven development. This is a really great overview and I strongly encourage any programmer with a pragmatic spirit -- or, even better, an entrepreneurial one -- to block out half an hour of your time to watch it.

Alas and alack, Cucumber is not available for Python yet, and from what I've seen, I really like the way it works. It apparently can be used with PHP, but I really would prefer to avoid PHP if at all possible; my preferred style is just way too functional these days to blend well with PHP. (I've developed a thing for continuation-passing style in the last month or so.) This may end up being the thing that finally motivates me to learn Ruby. I have a little side project going on right now that has a web-application-framework-shaped hole in it, and I had been planning on using Django, but given that it's going to be a Javascript-heavy front end with likely a healthy dose of script.aculo.us, Rails could be a better tool for the job. I'll need to decide if I like how Rails talks to databases; I'm madly in love with the way Django does it and anything less will be a major disappointment, so this is definitely a factor to consider. (Current Rails devs, your input is welcome -- I know very little about your framework. I used to be cranky about the lack of integration with Apache, but there's mod_rails these days and I assume that removes a lot of the reasons I had for bitching.)

And I'd have real continuations. That's always a plus.

Decisions, decisions. But I do like the fact that tools like this exist at all; it's me who needs to get over my uncanny-valley problem with Ruby.

([livejournal.com profile] karnythia, [livejournal.com profile] thewayoftheid, [livejournal.com profile] tanyad, I'm not talking about the project I'm doing for y'all, this is a different project. So many irons in the fire!)
maradydd: (Default)
Prompted by a discussion with [livejournal.com profile] bunnykitteh, who's good at prompting these kinds of things:

Imagine a Facebook and/or MySpace application aimed at organising flash mobs for political action (e.g., the kind of thing Anonymous might use to quickly notify members of imminent $cientology activity in a particular location). What features should it have? (Twitter gateway?)

(Note that with Facebook, especially, there are all kinds of interesting concerns with respect to privacy...)
maradydd: (Default)
As some of you know, I have a rather lengthy post in the works about the history of challenges to initiative amendments in California -- that is, constitutional amendments which are proposed by a petition of the people and decided by popular vote. It's 1500 words and counting, and will probably hit 3000 by the time it's done, but I wanted to make sure that folks who want to understand the precedents coming into play with Strauss v. Horton, the ACLU's challenge to Prop 8, have a good resource for that. However, the following came up on [livejournal.com profile] theinated's journal, deep in a comment thread, and I think it's important enough to bring up here.

But first I'm going to talk about software engineering. I promise, it's relevant.

In the code-slinging trade, there's a concept called "shotgun debugging" which makes every seasoned engineer foam at the mouth. The Jargon File defines it as "the making of relatively undirected changes to software in the hope that a bug will be perturbed out of existence". "Relatively" is loosely applied here; typically the code you tweak has something to do with the problem -- if the problem is in your user interface, twiddling with interprocess communication usually isn't going to help -- but you're not sure where the exact problem is, so you poke at a bunch of different places and pray you got it right.

Don't do this. It's practically guaranteed that you will make things worse, most likely by creating new bugs that are subtler, more obscure, and will bite you in the ass for years to come. But keep the concept of shotgun debugging in mind, because we're going to talk about it again shortly.

Elsewhere, [livejournal.com profile] lather2002 wrote:
There are ways for same sex couples to have rights that allow them basically the same rights as "Married Couples".
In principle, [livejournal.com profile] lather2002 is correct. However, the institution of marriage is deeply embedded in the principles of English common law upon which our legal system is founded, and altering those principles to cover civil unions would involve a massive rewriting of the law which amounts to shotgun debugging of the very worst sort.

Looking only at statutes, we can easily find dozens of areas in which marriage plays a role: tax law, estate/inheritance law, family law, laws having to do with visitation rights (both for hospital patients and for prisoners), property law, insurance law, torts (e.g., wrongful death suits), and so on. Attempting to shotgun-debug the California code in an attempt to create parity between marriages and domestic partnerships is a fool's errand; there are just too many places where marriage is closely intertwined with statutory law to be able to do the job right. California tried to do it all in one go by providing that domestic partners are to have all the rights and responsibilities afforded to married partners, but the very bill that established this also carved out several exceptions. Establishing a domestic partnership requires different prerequisites -- among other things, the couple must live together before becoming domestic partners, which isn't required for marriage -- and it isn't possible to have a confidential domestic partnership (i.e., one that isn't a matter of public record), while it is possible to have a confidential marriage.

However, the matter gets fuzzier. In some situations, the principles of common law protect the institution of marriage in a way that isn't actually codified anywhere. A good example is the notion of privileged communication. There are certain types of communication, such as that between a lawyer and her client, a doctor and his patient, a priest and a penitent confessing to him, which are "privileged" in the sense that neither party can be compelled to disclose the contents of that communication. If a defendant admits to his lawyer that he committed the crime with which he is charged, the lawyer cannot be compelled to disclose this to a third party. Spouse-to-spouse communication is protected in exactly the same way: one spouse cannot be compelled to give evidence against the other (also known as "spousal immunity"), and in fact one spouse can prevent the other from disclosing information which was communicated privately between the two of them (also known as "marital privilege").

For what it's worth, the matter of privileged communication has a lot to do with why the right to marriage is viewed as derivative of the right to privacy -- which is expressly protected (in fact, it's inalienable) under CA constitutional law.

Some states have passed statutes which restrict privileged communication in some form; for instance, Washington state has made attorney-client privilege a one-way street from client to attorney (the client can be compelled to testify against the attorney on matters of communication that don't have to do with the client's communications). California has codified attorney-client privilege the opposite way, protecting all attorney-client communication regardless of subject, but that merely reinforces the common-law definition; it does not expand it. I can't find an example of a law which creates a new class of privileged communication. Expanding privileged communication to domestic partnerships is thus quite difficult, and privileged communication isn't the only area of common law where marriage comes into play.

Shotgun-debugging a body of statutory law is hard enough; how do you shotgun-debug hundreds of years of tradition? Under the principle of stare decisis (literally "to stand by and adhere to decisions"), which obligates judges to follow the precedents established in previous case law, you can't. Only marriage is marriage, and there is no precedent for "domestic partnership immunity"; in this respect, the court's hands are tied. Even if statutory law mandates equal treatment before the law for domestic partners, the court cannot magically create privilege where none exists. There can be no parity between marriages and domestic partnerships.

I'm going to turn back to the Jargon File, now, to address the topic of elegance: "Combining simplicity, power, and a certain ineffable grace of design." Software engineers love elegant code: it's easier to understand, easier to work with, and it's aesthetically pleasing. Linguists adhere to the principle of elegance, too: given two sets of rules which describe the exact same grammar equally well, the one with fewer rules is to be preferred, as complicated rules are difficult to apply and lead to errors.

I'm not going to pretend that law adheres to the principle of elegance -- the sheer size of the California constitution, much less the California code, is testament to that -- but in this instance, we would do well to observe it. If we wish to establish parity between same-sex and opposite-sex couples, the simplest, least confusion-causing, most elegant solution is to legalise marriage between both same-sex and opposite-sex partners.

(This is, incidentally, the fundamental flaw I see in the "then let's make everything a civil union" argument. Taking away spousal privilege is a horrible, horrible idea that would remove the protections of hundreds of years' worth of important, rights-preserving court decisions which hinge on spousal immunity or marital privilege. Please take a look at the bigger picture here; let's not cut off our noses to spite our faces.)

Small world

Jan. 2nd, 2008 07:52 pm
maradydd: (Default)
There's a post up on BoingBoing today (ok, yesterday for me) about open vs. closed search algorithms, suggesting that the search algorithms used by Google, Yahoo et al are bad because of their lack of transparency. It invokes a comparison to an important concept in computer security: "security through obscurity" is dangerous because an effective encryption scheme should be equally hard to break whether you know the internals of the algorithm that generated the ciphertext or whether you don't.

I think comparing this to search is a bad (or at best misleading) idea, and expounded on this in the comments. But I'm far more entertained by the fact that the two best comments on the post so far come from two sources with whom I am tangentially familiar, albeit from totally different directions: [livejournal.com profile] jrtom and [livejournal.com profile] radtea. Small damn world!
maradydd: (Default)
As usual I am probably the last one to notice, but just in case I'm not: Script.aculo.us fucking owns.

Sometime in the reasonably near future I want to arrange, um, everything I have been thinking about in the last week and a half into an essay, the thesis of which is: if you are a Web 2.0 coder, learning Lisp will make you a much better Web 2.0 coder. No, really. And not just because of what Paul Graham had to say about his experiences starting Viaweb (short version: back during the early days of Web 1.0, they built one of the first truly agile web applications -- hell, quite possibly the first web application full stop -- and one that could have justifiably been called a Web 2.0 app if we'd had AJAX back then). I do not have time to expound on this right now, but I leave you the following points to mull over while I get my house in order:
  1. Dynamic HTML lives and dies by the DOM. If your code spends a lot of time modifying innerHTML members, you are doing it wrong. Javascript makes it easy, blissfully easy, to manipulate your content by manipulating its structure -- adding, removing and altering elements and their attributes by type and value.

    The DOM is a tree, and here is the Big Secret Insight about trees: trees are lists. Trees are dead easy to represent as nested lists, and if you can think in Lisp then you think in trees all the time anyway. Use trees. Learn Lisp.
  2. Remember my enormous long rant about C++ functors from a few weeks ago? Remember the part where I talked about "functions as first-order data"? Javascript treats functions as first-order data. You can create, modify, assign and replace functions at runtime. Yes, you heard me right: self-modifying code. The hardest thing about self-modifying code is getting your head around the fact that yes, it exists, and yes, you can do it. Go get comfortable with it. Learn Lisp.
More later.
maradydd: (Default)
... woot.com had the Roomba Discovery 4220 SE on special yesterday for $150 plus $5 shipping, so I did a little budget-checking and decided it was time to start establishing my robot vacuum cleaner army. (Ever since the Bluetooth-enabled Roomba cockfight at ETech back in March, I've been thinking it would be cool to get a bunch of Roombas and write flocking and swarming algorithms for them, then bring them to a conference, have them lock onto some poor bastard's Bluetooth cellphone or PDA, and watch while cackling hysterically.)

I could only afford one, but one is enough to start playing around with the Serial Command Interface. I'm pleased that the SCI manual shows a Python code fragment for changing the baud rate, but all the commands are bit-level, power-this-pin-for-this-long/send-this-opcode-and-data-packet instructions. It doesn't appear that anyone's written a higher-level API (at least not in Python, though the Illinois Roomba Lab (!) at UIUC has a C++ one). (And why would I want a Python Roomba API? Because then any Nokia S60 phone becomes my Bluetooth-Roomba-army control platform. I love you, Python interpreter. Muahahaha.)

(Note to self: in that case, do we open up a need for encrypted channels between cellphones and robots? Should I draft an RFC for SRCP, the Secure Roomba Control Protocol? "Man-in-the-middle" takes on a whole new meaning when the attacker is somewhere in the room with you!)

Anyway, one robot vacuum cleaner does not an army make, but it'll be a neat sidekick. I need a naming convention for robot vacuum cleaners!

Profile

maradydd: (Default)
maradydd

September 2010

S M T W T F S
   1234
567891011
12131415 161718
19202122232425
26 27282930  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags