From Writing and Analysis to the Repository: Taking the Scholars' Perspective on Scholarly Archiving

in Lambda the Ultimate, Wed, 27 Aug 2008 22:58:16 GMT
Marshall, C.C. From Writing and Analysis to the Repository: Taking the Scholars' Perspective on Scholarly Archiving. Proceedings of JCDL'08

This paper reports the results of a qualitative field study of the scholarly writing, collaboration, information management, and long-term archiving practices of researchers in five related subdisciplines. The study focuses on the kinds of artifacts the researchers create in the process of writing a paper, how they exchange and store materials over the short term, how they handle references and bibliographic resources, and the strategies they use to guarantee the long term safety of their scholarly materials.

Not directly programming language related, but two things makes this paper relevant. First, many of the tools involved, especially those that really enhance productivity are language-based, or include DSLs (e.g., Latex, Bibtex, R ( Sweave) etc.). Second, many of us write papers, and as language geeks we surely crave great tools...

So, what is you ideal tool chest when it comes to doing and publishing research? And what do you actually use everyday?

Erlang: Recursive anonymous functions and waiting for messages

in Tenerife Skunkworks, Wed, 27 Aug 2008 20:23:14 GMT

I have quite an elaborate test harness in OpenPlayer that allows me to refactor at will and make sure I don't break things in the process. Some tests are fairly basic while others enable me to fully simulate game play.

One of the tests starts a network server, sends messages and waits for a response from the server. Several messages can come back but I only care about one of them and need to skip the others.

Here's a macro that does what I need:

-define(waitmsg1(Msg, Timeout, Skip),
        %% return an anonymous function
    fun() ->
                %% store another anonymous function 
                %% in a variable so that we can invoke it
                F = fun(F) ->
                            %% take a function as argument
                            %% to be able to recurse.
                            receive
                                {packet, M} = P ->
                                    DoSkip = lists:member(element(1, M), Skip),
                                    if 
                                        DoSkip ->
                                            %% call ourselves
                                            %% recursively
                                            F(F);
                                        P == Msg ->
                                            success;
                                        true ->
                                            {error, M}
                                    end
                            after Timeout ->
                                    {error, timeout}
                            end
                    end,
                %% get the whole thing going
                F(F)
        end()).

Why bother with a macro that requires such horrendous contortions? Only a macro lets you to tell where in the code something happened. Erlang gives you ?MODULE and ?LINE to use in macros, e.g.

-define(error2(Message),
    io:format("~s at ~w:~w~n",
          [Message, ?MODULE, ?LINE])).

If waitmsg1 were a function then all the errors would have been reported at the location where the function was defined.

That said, I will take Lisp macros over Erlang macros any date of the week!

Towards Hard Real-Time Erlang

in Lambda the Ultimate, Wed, 27 Aug 2008 04:46:56 GMT

Erlang's actor concurrency model is a good fit for a wide range of concurrent applications. One domain that would seem ideal is real-time control of concurrent physical processes. But as it stands right now Erlang is best suited for soft real-time applications - there's really nothing in the language or runtime geared towards hard real-time constraints. Towards Hard Real-Time Erlang talks about one piece of the puzzle: a hard real-time scheduler.

In the last decades faster and more powerful computers made possible to seriously take into account high–level and functional programming languages also for non–academic projects. Haskell, Erlang, O’CAML have been effectively exploited in many application fields, demonstrating how high–level languages can help in writing efficient, readable and almost bug–free code, rapidly stealing the prominent position gained in many fields by OO languages such as Java and C . One of the fields where low–level imperative languages are still preferred to functional programming is that of hard real–time applications, since usually programmers (and managers) think that high–level languages are really not able to cope with the complex and critical requirements of real–time.

In this paper we propose an implementation of a hard real–time scheduler entirely written in Erlang, and perfectly integrated with the Erlang BEAM emulator. Performance analysis show that the proposed solution is effective, precise and efficient, while remaining really simple to use as expected by Erlang programmers.

The paper closes with mentions of two more pieces of the puzzle.

Real–time message passing will be introduced in a future version...

...

A solution to the unpredictable behaviour of garbage collection should be implemented before a really hard real–time scheduling can be done in Erlang.

Besides the scheduler, message passing, and garbage collector, what else do you think is needed before Erlang or something like it is a viable alternative in this domain? Or is the actor model really not such a great fit?

*Edit: Based on a comment from renox added closing quotes about message passing and garbage collector and added message passing to the editorial question.

Real-Time Concurrent Issues Drive Ada versus Java Choice

in Lambda the Ultimate, Tue, 26 Aug 2008 07:38:11 GMT

A useful short article by Ben Brosgol in COTS Journal.

On the surface, Ada and Java offer similar features to support real-time embedded military applications. But under the hood, they differ significantly in their underlying philosophy...

All that said, the Java / Ada decision need not be either/or. Mixed-language programming is provided by Java through the Java Native Interface, and by Ada through a standard interfacing framework. The enhancements in Ada 2005 make such interfacing easier, and there is current implementation support for mixed Ada/Java development. In a large system it may make sense to program different components in different languages—for example, a user interface in Java, hard real-time elements in Ada—thus taking advantage of the strengths of both. Ada and Java, rather than competing in the embedded defense system arena, may turn out to be comrades in arms.

Ben Brosgol is an expert on both Ada and Java support for real time programming, and I've linked to his papers that provide more detailed analysis a few times in the past.

Optimizing Erlang: A death match of arrays and tuples

in Tenerife Skunkworks, Mon, 25 Aug 2008 16:39:56 GMT

This is the first in my series of posts on optimizing Erlang. I plan to tackle optimizing Mnesia, profiling and scalability.

You need to use arrays of up to 10,000 elements. Erlang offers you tuples as well as fixed-size and extensible pseudo-arrays. What is the fastest option?

Let us start the death match by pitting arrays against tuples in a death match. Trees were an option before the array module became available, so lets throw in trees just for laughs.

I'm running Mac OSX Leopard 10.5.4 on a Mac Pro 2x2.8Ghz Quad-Core Intel Xeon with 14Gb 800Mhz DDR2 FB-DIMM.

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-threads:0] [kernel-poll:false]

27> arr:test().
Fixed-size array: get:     2921µs, set:     5902µs
Extensible array: get:     3336µs, set:     8144µs
Tuple:            get:      632µs, set:   107467µs
Tree:             get:     4321µs, set:    45256µs
ok

30> arr:test(100000).
Fixed-size array: get:    35314µs, set:    74653µs
Extensible array: get:    35349µs, set:    74059µs
Tuple:            get:     6411µs, set: 24304490µs
Tree:             get:    53681µs, set:   632795µs
ok

Note that timer:tc returns time in microseconds. I ran each test 3 times and the results above are from the third iteration.

Trees in Erlang (gb_trees) are built on top of regular tuples and so is the array module. The array module is much more efficient about using tuples than a regular tree, though, and this is the reason why it's so much faster.

The tuple test pre-allocates a tuple of 10k or 100k elements. There's no destructive assignment in Erlang and so the same large tuple needs to be allocated and discarded on every set operation. It's very inefficient to allocate and discard a large tuple on every set operation, thus naive tuple set is very slow.

The array module uses an efficient tree-like internal representation:

%% A tree is either a leaf, with LEAFSIZE elements (the "base"), an
%% internal node with LEAFSIZE 1 elements, or an unexpanded tree,
%% represented by a single integer: the number of elements that may be
%% stored in the tree when it is expanded. The last element of an
%% internal node caches the number of elements that may be stored in
%% each of its subtrees.
%%
%% Note that to update an entry in a tree of height h = log[b] n, the
%% total number of written words is (b 1) (h-1)*(b 2), since tuples use
%% a header word on the heap. 4 is the optimal base for minimizing the
%% number of words written, but causes higher trees, which takes time.
%% The best compromise between speed and memory usage seems to lie
%% around 8-10. Measurements indicate that the optimum base for speed is
%% 24 - above that, it gets slower again due to the high memory usage.
%% Base 10 is a good choice, giving 2/3 of the possible speedup from
%% base 4, but only using 1/3 more memory. (Base 24 uses 65% more memory
%% per write than base 10, but the speedup is only 21%.)

It's far more efficient to allocate small tuples on every set and this is why the array module wins hands down.

Use the code below to replicate my results on your hardware.

-module(arr).

-compile([export_all]).

data1(N) ->
    %% size implies fixed-size array 
    %% but lets be explicit
    array:new([{size, N}, {default, 0}, {fixed, true}]).

data2(N) ->
    %% extensible array
    array:new([{size, N}, {default, -1}, {fixed, false}]).

data3(N) ->
    erlang:make_tuple(N, 0).

data4(_) ->
    gb_trees:empty().

array_set(Array, I, Value) ->
    %% array indexing starts at 0
    array:set(I - 1, Value, Array).

tuple_set(Tuple, I, Value) ->
    %% tuple indexing starts at 1
    setelement(I, Tuple, Value).

tree_set(Tree, I, Value) ->
    gb_trees:enter(I, Value, Tree).

array_get(Array, I) ->
    array:get(I - 1, Array).

tuple_get(Tuple, I) ->
    element(I, Tuple).

tree_get(Tree, I) ->
    gb_trees:get(I, Tree).

get(_, _, 0) ->
    ok;

get(Fun, Data, N) ->
    Fun(Data, N),
    get(Fun, Data, N - 1).

set(_, Data, 0) ->
    Data;

set(Fun, Data, N) ->
    Data1 = Fun(Data, N, N),
    set(Fun, Data1, N - 1).

test() ->
    test(10000).

test(N) ->
    %% fixed-size array
    {S1, D1} = timer:tc(arr, set, [{arr, array_set}, data1(N), N]),
    {G1, _} = timer:tc(arr, get, [{arr, array_get}, D1, N]),
    %% extensible array
    {S2, D2} = timer:tc(arr, set, [{arr, array_set}, data2(N), N]),
    {G2, _} = timer:tc(arr, get, [{arr, array_get}, D2, N]),
    %% tuple
    {S3, D3} = timer:tc(arr, set, [{arr, tuple_set}, data3(N), N]),
    {G3, _} = timer:tc(arr, get, [{arr, tuple_get}, D3, N]),
    %% gb_trees
    {S4, D4} = timer:tc(arr, set, [{arr, tree_set}, data4(N), N]),
    {G4, _} = timer:tc(arr, get, [{arr, tree_get}, D4, N]),
    %% results
    io:format("Fixed-size array: get: ~8wµs, set: ~8wµs~n", [G1 , S1]),
    io:format("Extensible array: get: ~8wµs, set: ~8wµs~n", [G2 , S2]),
    io:format("Tuple:            get: ~8wµs, set: ~8wµs~n", [G3 , S3]),
    io:format("Tree:             get: ~8wµs, set: ~8wµs~n", [G4 , S4]),
    ok.

Untangling with Continued Fractions: Part 2

by sigfpe in A Neighborhood of Infinity, Sat, 23 Aug 2008 22:00:00 GMT
Recall that I defined a rational tangle to be what you get by starting with a pair of untangled strings whose ends are at infinity and then sliding the strings around as much as you like keeping the ends at infinity. If you don't like the infinity bit, just think of strings inside a sphere with the ends constrained to lie on (but free to slide around) the surface of the sphere. We also insist that the ends end up at the four diagonal directions.

Two rational tangles are considered to be isotopic if you can slide the strings about to turn one into the other, this time keeping the ends of the strings fixed. (Such a motion is called an isotopy.)

As I've previously mentioned, you only need twists, antitwists and rotations to untangle a rational tangle. Here are the twist operations again:



I'm using a box to represent some unknown tangle. The antitwist really is an inverse for the twist because one followed by the other gives you what you started with once you pull the strings straight:


We can also draw a rotation like this:

Let's call the twist T (so the antitwist is T-1) and the rotation S and write products of operations from right to left in the usual way. As both S and T are invertible, they generate a group. It's pretty clear that S4=1, where 1 is the identity operation. So we don't need a clockwise rotation, we can just use S3.

But here's a surprising fact: S2=1. If we perform a 180 degree rotation on a rational tangle the result might look different at first, but it is isotopic to what we started with. I'll leave you to convince yourself of this, but you can prove it by induction on the number of twists or antitwists you have in your diagram. Try it for simple rational tangles first, maybe real ones made with real string.

Here's an even more surprising fact, (TS)3=1. I can demonstrate this pictorially. Start with he result of applying T:



The result of ST:



The result of TST:



The result of STST:



After TSTST:


But by sliding one string behind the tangle, allowed in an isotopy, we find that that is isotopic to this:


It's not hard to see that another S brings us back to where we started. (To do that we need to rotate the block labelled A, but that is allowed in an isotopy as we're not moving the ends of the strings.) This means we can write an antitwist in terms of twists and rotations. But it's convenient to keep using antitwists.

So we have S2=1 and (ST)3=1. It can also be proved that any other relationships between S and T can be derived from these using the operations in a group. But this is a well studied group. We can approach it like this. Define two functions s and t by

s(x) = -1/x


t(x) = x 1.

It's easy to show that s2=identity=(st)3. In fact, we have an isomorphism between the group of operations we can perform on rational tangles and the functions we can build with s and t. It goes further.

Take the rational numbers and throw in ∞ to give us the extended rationals. Let s and t act in the usual way on most of the rationals but throw in the extra rules s(∞)=0, s(0)=∞ and t(∞)=∞ to handle infinity. We have defined an action of s and t on the extended rationals. We can exactly parallel this with the rational tangles. Call this tangle 0:



We can reach every rational tangle by applying the operations S and T to 0. But if we replace S with s and T with t then the sequence of s's and t's will give us an extended rational number that labels the tangle. Because S and T obey the same equations as s and t, isotopic knots will end up getting the same extended rational. So we can use extended rationals to label the rational tangles. Here are some examples:

Firstly just an antiwist, T-1. That corresponds to t-1(0)=-1.



Two antitwists gives -2:



Composing a 90 degree rotation gives



And applying another antitwist subtracts one again giving:


The name rational tangle is justified.

So, given any any rational tangle we simply need to find the corresponding extended rational and then figure out how to write it as a composition of s's and t's applied to 0. This is more or less the usual technique for finding the continued fraction representation of a rational. And once we know the sequence, we can then apply it in reverse to untangle the tangle.

But there's one catch, how can we go from the monadic representation of a tangle that I described previously to the corresponding extended rational? We'll start on that problem in my next post, but amazingly the vector space monad can be used to do almost all of the work. It's surprising: it's only a short bit of code that works out the untangling operations, but the explication is getting rather long now...

An amazing application of this work is in the study of the untangling of DNA. DNA is a double helix and the two parts of the helix must be separated in order to transcribe it. As you might imagine, the DNA can become incredibly tangled by this process. To deal with this, the topoisomerases snip, manipulate and repair the DNA so as to eliminate these tangles. It was realised that it was possible to work out exactly what operations the topoisomerases were performing by analysing the before and after DNA strings in terms of rational tangles.

The perosn who first figured out this relationship between tangles and rational numbers was the ubiquitous JH Conway.

References


For the proofs missing above, these references should fill in the details:
Rational Tangles, Kauffman and Goldman
Knots and Physics, Kauffman
And if I've lost you above, the following is a nice easy introduction:
Conway's Rational Tangles, Davis
Modeling protein-DNA complexes with tangles

One last thing. I've noticed some, but not all, papers define the twist oppositely from me.

Features of Common Lisp

in Lambda the Ultimate, Wed, 20 Aug 2008 20:28:56 GMT

A compelling description of the features that make CL the king of the Perl-Python-Ruby-PHP-Tcl-Lisp language ;)

Lisp is often promoted as a language preferable over others because it has certain features that are unique, well-integrated, or otherwise useful.

What follows is an attempt to highlight a selection of these features of standard Common Lisp, concisely, with appropriate illustrations.

In Praise of Scripting: Real Programming Pragmatism

in Lambda the Ultimate, Wed, 20 Aug 2008 01:50:03 GMT

Ronald Loui, In Praise of Scripting: Real Programming Pragmatism, IEEE Computer, vol. 41, no. 7, July 2008. [Openly accessible draft here]

The July IEEE Computer carries an article arguing for the use of scripting languages as first programming languages, and also arguing for a greater study of what the author calls "language pragmatics" (the original article is behind the IEEE paywall, but you can find a draft that has roughly the same content here). The argument for using scripting languages as educational languages can be summed up by Loui's abstract:

The author recommends that scripting, not Java, be taught first, asserting that students should learn to love their own possibilities before they learn to loathe other people's restrictions.
The bulk of the article is devoted to exploring this basic theme in more depth, and provides an interesting contrast to the arguments in favor of moving away from Java (and scripting languages) advanced in Computer Science Education: Where Are the Software Engineers of Tomorrow? (discussed earlier on LtU here).


Loui spends the latter part of the article arguing that, in addition to syntax and semantics, research on programming language should include a formal study of language pragmatics. According to Loui, a formal study of pragmatics would address questions such as:

  • What is the average lifetime of a program written in language X for programmers of type Y, for a program of type Z?
  • What is the average time spent authoring versus debugging a program in language X for programmers of type Y, for a program of type Z?
  • What is the consumption of short-term memory when programming in language X for programmers of type Y, for a program of type Z?

Cocoa bridge for Erlang

in Tenerife Skunkworks, Tue, 19 Aug 2008 21:02:06 GMT

Who wants a Cocoa bridge for Erlang?

Now, who is willing to pay 50 EUR for a license?

Just wondering...



Job trends: Erlang, Lisp and Haskell

in Tenerife Skunkworks, Sun, 17 Aug 2008 22:00:18 GMT
Erlang took off like a rocket at the end of 2007.

Why did it drop off the cliff just a few months later?

Mozilla "Ubiquity"

in Lambda the Ultimate, Sun, 17 Aug 2008 19:15:47 GMT

A command-line, textual, and probably linguistic, interface to the browser.

I am not sure how complex they are planning of making this, nor how it meshes with visions of the future of web browsing, but it's worth keeping an eye on.


I spent some time thinking about site-specific scripting, and had some discussions with students about building a site-specific DSL compiled into Greasemonkey, but alas nothing came of it. It would be interesting to see whether Ubiquity command lines could be integrated into scripts, and how extensible the vocabulary is going to be, and on what level (i.e., site by site, via plugins etc.) Clearly also related to the often discussed topic of "end-user programming".

 

Functional Programming

Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state.

Feeds