maradydd: (Default)
[ profile] alexey_rom tweeted Edward Z. Yang's Databases are categories (based on a talk by David Spivak) the other day. I only just got round to reading it, and having done so, I recommend you do too. The notion of arrows and their properties (identity and associative composition) can be a bit abstract for the amateur/novice category theorist (like me -- hell, I wouldn't call myself more than a category theory fangirl), and mapping this onto identity and joins in databases is a really clever concretization.

There is some nerking in the comments about the relational model really being about Cartesian relations rather than object relations. This is true, but AFAICT irrelevant if viewed from the perspective of object-relational mapping (which you get for free in Postgres and Oracle anyway).

Where I think this is really useful is the world of higher-order query languages. Category-friendly languages such as Haskell have already made a good deal of headway into database APIs; I do not yet know of any projects that (for example) can create a schema from a set of objects and morphisms, but (continuing the example) I could see using that approach to generate all necessary foreign key constraints from an ORM.
maradydd: (Default)
When you inspect call graphs, what tools do you like to use? Shark? callgrind/kcachegrind? Something else? What makes a call graph inspector useful?
maradydd: (Default)
Waking up with a solution in mind for a leftover bug from last night's hacking session, getting it implemented within ten minutes of being sufficiently caffeinated to work, and discovering that one of the subsequent items on my TODO is actually as simple as I thought it would be.
maradydd: (Default)
Suppose you're designing a protocol, and you're deliberating over whether to use XML, YAML, JSON, s-expressions (!) or some other data representation format for it.

The question you need to ask yourself is, "have I written an EBNF definition for my protocol yet?"

If the answer is "no," drop everything you are doing. Yes, everything. Step away from the keyboard. Get a pencil and paper, or go to the whiteboard, and work out your EBNF first.

Then, and only then, you may decide what to use as a data representation. Deciding what data format to use before you have determined the grammar of your protocol is like deciding what language to use before you have figured out what algorithms you're using.

Protocol structure is a design decision. Data representation format is an implementation decision. Do not intermingle the two; it will only end in tears, yours or someone else's. Probably yours.
maradydd: (Default)
libevent is the second best thing that has ever happened to me. If you are doing asynchronous programming of any kind whatsoever1, you should be using libevent or an appropriate interface to it.

(The first best thing that's ever happened to me? Diverting a long walk through Iowa City with [ profile] ylla into the gym so that she could use the bathroom there. While waiting for her to finish up, I randomly found a newspaper and flipped through the classifieds, which led to me finding an ad for an internship at Integrated DNA Technologies, which I applied for and got; that led to me presenting at CodeCon, which led to my meeting [ profile] enochsmiles and a bunch of other awesome people who have improved my life considerably, including, appropriately enough, one of the authors of libevent.)

1Ok, fine, maybe not AJAX.
maradydd: (Default)
Google Analytics does some pretty cool stuff, but has one major drawback for mobile web application developers: it's Javascript-based, meaning that hits from mobile devices that don't speak Javascript silently go untracked. Recently, the Analytics team released some code that does server-side tracking; the linked ZIP file contains source and examples in ASP, JSP, PHP and Perl. Why not Python, you might wonder? I wondered too, particularly since an AppEngine project I'm working on is at least somewhat intended for phones (hey, you never know when you might be away from your desk but really want to know if a certain BioBrick exists), so I did a little poking around to see if it was possible to instrument an AppEngine application using server-side Mobile Analytics.

The short answer is no. )
maradydd: (Default)
Oh, don't get me wrong, I laughed, but it's horrible:
I once saw a C++ filesystem driver that overrode the / operator to mean "append". So you could do something like:

directory = "/tmp/subdir1";
filename  = "myfile.txt";
full_path = directory/filename;

and end up with full_path being "/tmp/subdir1/myfile.txt"
And no, Stroustrup's not going to hell for designing a language that lets people do this. The sheer fact that people can do this means he's already there. And so are we.
maradydd: (Default)
By way of [ profile] sfllaw, a development paradigm I had not previously known about, and tool for developing in this fashion. Holy shmoley. I agree with Bill Tozier, I want this for Python yesterday.

Behaviour-driven development is basically test-driven development on steroids: it takes the principle we like to cite, "write your man pages first!", and hooks it right into the test-driven development cycle, except now you're developing one behaviour at a time, so you can write your tests piece by piece and have individual chunks of the system piece by piece. I like TDD, but sometimes I have to write code fast (and yes, TDD always ends up saving me time in the end, but we've all had those projects where OMG EVERYTHING IS ON FIRE AND THERE'S NOT TIME TO DO IT RIGHT. Behaviour-driven development eliminates your excuses to not do it right: you're producing code as discrete functional units, complete with tests to prove that they are correctly functioning functional units, and you're producing it fast enough to keep management/the client happy. (Clients are sometimes not happy when the first week of work goes into building the unit test suite. Yes, yes, I know, that week of work saves a month or more later on down the line. Some of my clients are no longer my clients for a reason.)

Behaviour-driven development is also a great tool for the "design the UI first" school of programming, and any project that doesn't follow that school of programming is doing it wrong. (Think of it this way: if you're writing a library, design the API first -- that is to say, write the man page first. If you're writing a web application, mock up the user interface, figure out what the damn thing's going to look like and do all your changing-your-mind about how the UI is going to behave before you start laying down AJAX requests.)

Also courtesy [ profile] sfllaw, a talk by Ben Mabey explaining not only these ideas but the business decisions which motivate behaviour-driven development. This is a really great overview and I strongly encourage any programmer with a pragmatic spirit -- or, even better, an entrepreneurial one -- to block out half an hour of your time to watch it.

Alas and alack, Cucumber is not available for Python yet, and from what I've seen, I really like the way it works. It apparently can be used with PHP, but I really would prefer to avoid PHP if at all possible; my preferred style is just way too functional these days to blend well with PHP. (I've developed a thing for continuation-passing style in the last month or so.) This may end up being the thing that finally motivates me to learn Ruby. I have a little side project going on right now that has a web-application-framework-shaped hole in it, and I had been planning on using Django, but given that it's going to be a Javascript-heavy front end with likely a healthy dose of, Rails could be a better tool for the job. I'll need to decide if I like how Rails talks to databases; I'm madly in love with the way Django does it and anything less will be a major disappointment, so this is definitely a factor to consider. (Current Rails devs, your input is welcome -- I know very little about your framework. I used to be cranky about the lack of integration with Apache, but there's mod_rails these days and I assume that removes a lot of the reasons I had for bitching.)

And I'd have real continuations. That's always a plus.

Decisions, decisions. But I do like the fact that tools like this exist at all; it's me who needs to get over my uncanny-valley problem with Ruby.

([ profile] karnythia, [ profile] thewayoftheid, [ profile] tanyad, I'm not talking about the project I'm doing for y'all, this is a different project. So many irons in the fire!)
maradydd: (Default)
I'm building a C++ project for an unusual platform, and am having some confusing problems with my libstdc++. For some reason which I cannot fathom, I am getting an absurd number of "undefined reference to..." linker errors for symbols which are indeed undefined in libstdc++.a, but which are definitely defined in libc.a and libgcc.a. Yes, I am linking to both of those. (I know I am, because earlier I was getting some undefined-reference errors to symbols in libgcc.a from the code I'm actually compiling, and when I added -lgcc they went away.)

Any idea what's going on here? Do I need to compile libstdc++ from scratch rather than using the provided binary? (Please, God, let the answer to that be "no".)

ETA: enigmatic ld ordering issues for the lose. Thanks, [ profile] tangaroa!
maradydd: (Default)
  1. config.log is your friend.
  2. If your build worked great, then you set up a cross-compile and ./configure failed in an AC_CHECK_LIBS, it's probably the linker.
maradydd: (Default)
I love const_cast and it loves me back. (It's hard to go wrong adding const to things as long as you're confident that it's okay for them to be immutable. These totally are.)
maradydd: (Default)
Len's Principles of Programming #15: "Do not engage in edge play with the stack. The stack has no safeword."
maradydd: (Default)
So I've been working with Django lately, and I continue to be pleased with the preeminent saneness with which it handles the interaction between HTML and Python. Here is the latest example.

Suppose you have an HTML form that your Django backend will be processing. Give each input or select element in your form a name attribute, and whatever function you POST the form back to will receive a request.POST dictionary keyed by the name. For instance, if you have a form like this:

<form id="shopping">
    <select name="fruit">
      <option value="apples">apples</option>
      <option value="bananas">bananas</option>
      <option value="cherries">cherries</option>
    <select name="meat">
      <option value="buffalo">buffalo</option>
      <option value="moose">moose</option>
      <option value="quail">quail</option>
  <input type="submit" value="Submit" />

Then your receiving function will get a request.POST consisting of a dictionary with the keys fruit and meat, and the value for each will be a list containing the values that were selected. And, yes, if you give those selects the multiple attribute, turning them into multi-valued choice sets, your list will contain all values that were selected. Very handy.

But wait, there's more!

Suppose that you want to give your hypothetical shopper the ability to select more than one type of fruit or meat at a time without using multiple, so you write some DOM-manipulating javascript to dynamically add more copies of the appropriate select element as needed, giving each element a unique name. (How to do this is left as an exercise for the reader. I did it, you can too.) Suppose further that you also want to give your users the ability to specify how many units of each item they want, so you add text inputs (with appropriate input validation, of course, also left as an exercise for the reader). Give each <input type="text"> the same name as its corresponding select, and you'll get a request.POST that looks like:

{ 'fruit_0': ['4', 'cherries'], 'fruit_1': ['3', 'apples'], 'meat_0': ['1', 'buffalo'] }

(In this case I'm using subscripts in my javascript to generate distinct names. There may actually be a simpler way to do this, though I haven't hit on it yet.)

This is especially useful in the case where you have some function that you want to pass each of your (amount, item) pairs to, because then you can use the handy *args syntax, e.g. [doStuffTo(*thing) for thing in request.POST.values()]. You could also use a dictionary comprehension if you're using Python 3, you bleeding-edge hacker, you. Though I don't know if Django is compatible with Python 3 (and I doubt it, given all the backward-compatibility stuff that Python 3 breaks). That, too, is left as an exercise for the reader.
maradydd: (Default)
Suppose you're a bioinformaticist and you have a gapped multiple sequence alignment (that is, a bunch of similar protein sequences, with dots and dashes inserted to line them all up), and you want the unaligned sequences, i.e., without all those dots and dashes. (Suppose further that you're using Bio.AlignIO from BioPython, and thus your multiple sequence alignment is an object containing other objects.)

You could write some loops and use .replace() a lot, or you could do something like this:

[''.join([x.upper() for x in record.seq if x.isalpha()]) for record in inputAlignment.get_all_seqs()]

(I'm really not sure how to describe the feeling of satisfaction I get from replacing ~20 lines of code with a list comprehension like this, but it's definitely rewarding.)
maradydd: (Default)
As most of you know, I have a deep and abiding love for the C++ Standard Template Library. Perhaps paradoxically, the first language I ever got any good with was Python, and I still adore it deeply. It's not the best language for much of the heavy numerical lifting that I have to do, so I mostly use Python for prototyping, and part of what I love about the STL is that it lets me do the same kinds of cool higher-order things I can do in Python.

For instance, suppose I'm working in Python and I have a list of objects, foo. (These could be numbers, strings, files, network sockets, objects I defined, whatever.) I can filter that list based on some arbitrary criterion that I define on the fly: filter(lambda x: x > 13, foo). Instead of the lambda, I could instead use a reference to some other function -- even a method of an object. Consider the following:
>>> foo = ['123', 'abc', '123abc']
>>> filter(str.isalpha, foo)
Better yet, those function references can be handed around just like ordinary variables. I can write a function which takes another function (let's call this one bar) as one of its arguments, and use bar anywhere it can be legally applied. I don't have to know bar's real name -- in fact, bar might not even have a real name, because it could be a lambda function and thus anonymous.

The technical term for this is functions as first-order data. It's a hallmark of the functional programming paradigm, so if you learn Lisp or Haskell, you'll be doing this all the time. Treating functions as first-order data lets you do all kinds of crazy shit: you can compose them, you can bind arguments to them (and perhaps those bound arguments are the results of other functions!), you can return them as results (seriously!), and most importantly, you can construct them at runtime. It provides a level of power, expressiveness, elegance and readability which just isn't found in languages which don't support functions as first-order data.

Which languages don't support functions as first-order data? That would be most of them.

"But Meredith!" I hear you say. "I can pass a function pointer as an argument in C; doesn't that mean C supports functions as first-order data?" Well, no, it really doesn't. A function pointer is a pointer, not a function. You can construct a function pointer at runtime, but you can't construct the underlying function it points to at runtime, so you're stuck. Function pointers are also fugly as hell, and although readability is not a requirement -- see Unlambda as an example of a purely functional language which is also completely unreadable -- it sure is handy.

But let's get back to the STL. C++ also doesn't truly support functions as first-order data, but it comes closer by providing function objects, which it calls "functors". In C++, any class can have an operator() method, which allows objects of that type to behave as functions: pass a suitably-typed argument to the object, and voila, you'll get back a result. Template polymorphism makes this even more powerful: it's quite easy to define, for instance, a functor which takes two arguments of the same type, T a and T b, and returns their sum, a + b. (This works as long as operator+ is defined for type T; if not, the compiler will yell at you.) The STL provides a wide variety of predefined functors, along with ways to combine them to create more sophisticated ones. Best of all, any ordinary function can be instantly promoted to a functor with ptr_fun, and any class member function can become a functor via mem_fun (or its slightly less useful brother mem_fun_ref, but that's another story).

Okay, but why is this useful? Indeed, the <functional> library is rather masturbatory on its own, but it really comes into its own when paired with the <algorithm> library. This bad boy provides several dozen handy algorithms for operating on sequences -- searching, sorting, replacing, copying, swapping, removing, shuffling, reversing, applying arbitrary functors to members, and so on, plus conditional versions of many of these (e.g. find_if and remove_if). These algorithms are all generic: you can use them on any STL container1, and an STL container can contain anything you want2.

Also -- and I am putting this on its own line because it's important -- they are lightning goddamn fast. They are also somebody else's problem3. Many people who are much smarter than I will ever be have put hundreds of hours of work into making sure these libraries work and perform well. Sure, if you really know what you're doing, you can hand-hack yourself a special case in C or assembler and beat the STL's performance, but that requires (1) really knowing what you're doing, (2) the Copious Free Time to do it yourself, and (3) getting down and dirty with a profiler. (If you're trying to tell me that your C will always beat the C++ equivalent but you haven't profiled the two against each other and your name isn't Dennis Ritchie, get the hell off my lawn.)

Anyway. The <algorithm> library means that conditional operations over containers, like "how many even numbers are in this list?", don't require a for-loop -- you can express them in one line (see the example at the end). The catch, however, is that any function you apply in an STL algorithm must be expressed as a functor. So, if you want to start using C++ to its full power, you're going to need to take an hour or two to get comfortable with functors and how they work.

Now, one of the coolest inventions in the C++ world in the last fifteen years was the concept of smartpointers. (I have waxed rhapsodic on this before.) Smartpointers are objects that help programmers avoid the problem of memory leaks. Dynamic allocation functions (malloc) and operators (C++ new) grab memory off the heap and return a pointer to that storage. If the pointer goes out of scope and you haven't explicitly put that memory back with free or delete, the memory is now unreachable by your program or any other one -- congrats, you've created a memory leak. It'll free itself when the program terminates4 ... but will that happen because the user terminated it normally, or because it ate up all available memory, threw bad_alloc and died?

By contrast, since smartpointers are objects, they have constructors and destructors. When they come into scope, they allocate whatever they point to; when they go out of scope, they destroy their target and free up its storage, even in troublesome cases like exception-triggered stack unwinding. They are immensely useful, and they are rather more complicated than I have described them here; indeed, it took some three months for the Boost team to decide how best to implement them, and they weren't accepted into the C++ Standards Committee's Technical Report 1 until 2005, some seven years later. But now they're here, they've been part of libstdc++ since gcc4.05, and they're ready to rock.

They are also where all the trouble started.

I found myself in a coding situation where I needed to maintain a std::vector of objects all belonging to the same abstract class. Doing exactly this is impossible, for two reasons:
  1. A vector<T> needs to know how big T is, in order to allocate its own storage properly. Otherwise, T is an "incomplete type", and the compiler won't let you instantiate that vector.
  2. Abstract classes are incomplete types.
However, pointers are all the same size -- a pointer is just a variable that holds an address. So even if T is an incomplete type, *T isn't, and you can create a vector<T*> just fine -- create your derived-class objects, upcast them, and you're golden.

However, although vectors manage their own memory, a vector of pointers will only release the pointers it contains -- not their contents. I could write a routine to free up the pointers' contents, but could I be sure it would be called if the vector were unexpectedly destroyed? I couldn't, but a smartpointer could do that for me. Better yet, shared_ptr (one of the smartpointer classes described in TR1) will automatically upcast itself whenever it needs to. Result: I created a vector of smartpointers to my base class, filled it with smartpointers to objects of various derived classes, and life was good.

For my next trick, I needed to be able to search my vector for the first object which returned false for a certain parameterless method defined in the base class. In most cases, this is easy: find_if(v.begin(), v.end(), not1(mem_fun(&X::foo))), where X is the class, foo is the method, and not1 is an STL adaptor function6 which provides a functor returning the opposite of whatever its argument (here, the functor created by mem_fun) returns.

Alas, this doesn't work for smartpointers, because a shared_ptr<T> is not a T; it doesn't have T's methods. This isn't the STL's fault; mem_fun entered the standard before shared_ptr did. The TR1 authors noticed this, and helpfully included mem_fn, a cleverer version of mem_fun which plays nicely with smartpointers.

However, they did not provide an update to not1 (or, more properly, its underlying function object adaptor unary_negate), and thus my compiler choked all over the place.7 What's a girl to do?

Well, as [ profile] cipherpunk pointed out when I brought this up to him after I'd already solved the problem, the canonical solution is to write up a little adaptor class and point not1 at that. I'll leave that as an exercise for the reader; I could do that, but I didn't want to, because, dammit, standard tools should be able to do standard things. There had to be a way -- but what was it?

The documentation for unary_negate provided a clue: a footnote at the end noted that unary_negate could be constructed from logical_not and unary_compose. Well, unary_compose isn't actually part of the Standard; it's an SGI extension, and gcc provides it in the <ext/functional> header, but that left me in the same boat as not1 did. However, TR1 has expanded facilities for function composition, in the form of bind. The STL has limited support for function composition in the form of binder1st and binder2nd; you can bind a constant value to either the first or second argument of a two-argument function, and that's it. TR1's bind -- which, like mem_fn, had its genesis in Boost's bind -- is infinitely more flexible. You can bind as many arguments as you like, to functions of as many arguments as you like. Cooler still, you don't have to bind constant arguments -- you can bind "placeholder arguments", which substitute runtime arguments positionally. _1 is "the first argument given", _2 is the second, and so on and so forth. And, best of all, you can bind other functions. Or functors, if you so choose.

Armed with this knowledge, I tried the following:
find_if(v.begin(), v.end(), bind(logical_not<bool>(), mem_fn(&X::foo)))
That failed too. If you've read this far, you probably understand enough to figure out why; take a moment and see if you can find the bug.

Okay, here's the deal. find_if takes as its last argument a functor, which it applies to every element of the sequence bounded by the iterators that are its first two arguments. bind, as we said, binds arguments to functions (or functors). logical_not is a functor, but it isn't a function adaptor. It takes a boolean value and returns the negation of that value. So, we have to bind a boolean value to it. mem_fn returns a functor. (Unary, in this case.) A functor isn't a boolean value. In this case, it returns a boolean value, but in order for logical_not to work, I have to bind it to a boolean value.

So, how do we turn a unary functor into a boolean value? bind to the rescue again! Use one of those placeholder values I mentioned, and now we have
find_if(v.begin(), v.end(), bind(logical_not<bool>(), bind(&X::foo, _1)))
It compiles, it runs! Hallelujah!

And yet I am disappointed. In principle, I really shouldn't be. What I've written here is very functional in style -- I am quite literally composing a function with another composed function, and I am doing so with strong typing and polymorphism. (If foo were virtual, the appropriate subclass method would be called.) But it is less readable than it ought to be, and I had to do it because my new tools broke my old tools without providing a replacement for the old tools. I expect better from the latest-and-greatest in my language of choice; at the very least I expect not to have to find out the hard way. (Boost's mem_fn documentation provides a brief warning that it isn't 100% compatible with STL adaptors, but apparently this issue is not general knowledge.)

Hmm. I've been asked, in the past, to write articles for programming magazines about projects I've worked on. Perhaps something less long-winded than this diatribe is in order.

1With a few exceptions. Scott Meyers' Effective STL is a wonderful field guide to the STL, its capabilities, and most importantly, its weird edge cases. I recommend it highly.
2Except references. I haven't run into many situations where I'd want a container of references, though.
3Then again, tonight this got me into trouble. Which is actually why I'm so surprised.
4Unless you were using POSIX shared memory, in which case, sucks to be you. man(1) ipcrm or reboot, sucker.
5No love from Visual C++ until whatever the release after Visual Studio 2008 is, alas. Go download Boost.
6A syntactic-sugar function to create a particular function object adaptor, in this case unary_negate. The STL has several of these, to make code easier to read; others include bind1st and bind2nd (which are handy for function composition, but too nuanced for me to explain in a footnote; go read about them) and in fact our aforementioned ptr_fun and mem_fun.
7Reading the output closely, it actually appears to be a problem with argument deduction:
/Developer/SDKs/MacOSX10.4u.sdk/usr/include/c++/4.0.0/bits/stl_algo.h:259: error: no match for call to '(std::unary_negate<std::tr1::_Mem_fn<bool (pqxx::connection_base::*)()const> >) (std::tr1::shared_ptr<pqxx::lazyconnection>&)'
/Developer/SDKs/MacOSX10.4u.sdk/usr/include/c++/4.0.0/bits/stl_function.h:322: note: candidates are: bool std::unary_negate<_Predicate>::operator()(const typename _Predicate::argument_type&) const [with _Predicate = std::tr1::_Mem_fn<bool (pqxx::connection_base::*)()const>]
Looks to me like unary_negate's operator() doesn't like accepting a shared_ptr to a derived class, or else mem_fn isn't providing an argument_type. OTOH, if this is a const mismatch, I am going to go throw someone out a window.


maradydd: (Default)

September 2010

12131415 161718
26 27282930  


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags