More complicated than this

Story: If you're going to do good science, release the computer code tooTotal Replies: 12
Author Content
bigg

Feb 10, 2010
3:16 PM EDT
I generally agree with the article. Openness is critical, because (a) programs are long and complicated, so that bugs are inevitable, (b) a single small bug can completely change the results, and (c) there's no way a single company or research team has enough eyes looking at it to test the software properly.

I disagree with the claim that software written by academics is generally of lower quality - many of the commercial programs draw developers from the same talent pool. When you dedicate your life to something, you gain expertise, and that won't be replaced by someone knocking out a product because this week it's profitable. Academics are also more likely to use established, well-tested libraries.

As I see it, the problem is that as software plays an increasingly important role in the process, as programs grow and require a supercomputer to run, there are few opportunities to have other check that your programs are correct. There's just no incentive to spend your time checking that someone else's code gives the correct results. Who's going to pay you to do that?
TxtEdMacs

Feb 10, 2010
3:37 PM EDT
bigg,

Just one small, well intentioned criticism. On LXer (and other forums) we respect opinion NOT obvious knowledge of the topic. So in the future quit confusing us with facts when are deepest beliefs are our paramount criteria for determination of "Truth."

Thank you (so don't do it again),

YBT
bigg

Feb 10, 2010
4:23 PM EDT
Sorry. Won't let it happen again. I'll work more on sensationalism in the future, so as to encouraged flamefests.
tuxchick

Feb 10, 2010
4:27 PM EDT
TxtEdMacs uses Linspire and wears an "I heart Miguel" t-shirt.
gus3

Feb 10, 2010
4:33 PM EDT
"Those are the facts!"

"Facts are the enemy of truth!"

--The Ingenious Hidalgo Don Quixote of La Mancha
Kagehi

Feb 10, 2010
11:42 PM EDT
Problem is bigg.. We are not talking about drawing from the software pool to run science projects. If we where, you might have a point. There was a review of science projects, in.. astronomy, and some other things, maybe 3-4 years back? It was found that more than 90% of the people on such projects wrote their own code, and used things like Notepad to do it, and placed break points and data dumps all over the place, to test how it was working, because the vast majority of them had so little contact with the rest of the computer world that they didn't even have a clue that integrated debuggers and editors existed, how to use one, or that you could check variables "on the fly", as the program ran, instead of working out what the answer should have been by hand, checking the results, then simply *throwing out* the program, if it didn't generate correct results. Nearly every program and project ever tried, for 20 years, had been a) abandoned, b) shelved, or c) reworked, at 20 times the man hours needed, practically byte by byte, until it got "close enough" to the expected result to be usable. The people doing the review of all the various projects where flabbergasted by it, and amazed that we managed to do "anything" with the insane mess people where working with the run their experiments.

I am sure that, since then, many of these people have installed, and are using, much better systems, having been made aware that they even exist. I am also sure that a) most of them are not using them to potential, since its way too easy to just keep doing things as you already did them, with minor changes, and b) way too many projects are "still" being scrapped, because the code simply doesn't work at intended, and its too complex for the some guys whose expertise is, say, Star Formation, not Computer Science, to figure out what they screwed up.

Its simply not always been, nor probably is now, in some fields, **at all** true that academics use "established, well tested, libraries". Some may not even, if they have been at it for long enough, have a clue what a library *is* in the context of a computer.
bigg

Feb 11, 2010
6:48 AM EDT
@Kagehi

I'm not disagreeing with the point you're making. There's definitely a lot of very bad code out there, and given the training many were given along with the use of FORTRAN 77 (thankfully we now have Fortran 2003), that is easy to understand.

But today, at least in the slice of the world that I see, things are different. Many of those who write code have years of experience writing large programs. They have also taken courses in both programming and numerical analysis. With the shift to software-based research, many who have the interest and talent in programming are willing to work in fields other than software development, because they can now feed themselves. My point is therefore that we should not just automatically assume that because a product is closed and commercial it is automatically of higher quality. In my area, I'd rather have a guy like Brian Ripley doing the programming than anyone else.

In terms of established, well tested libraries, I see a lot of use of Numerical Recipes (not bad in spite of the peculiar license), GNU Scientific Library, IMSL, minpack, etc.
bigg

Feb 11, 2010
6:49 AM EDT
@Txt

See, my original post wasn't as bad as you thought.
tuxchick

Feb 11, 2010
11:37 AM EDT
So what's the real issue, some scientists being embarrassed by their ugly code? Or openness and peer review? If the code is bad, then their data interpretations are suspect too, which is the point of this article.
TxtEdMacs

Feb 11, 2010
12:24 PM EDT
bigg,

Please reread my comment and note its implication, i.e. you seemed to be actually knowledgeable of the issues. The rest was insulting boilerplate meant to be discarded with a mere chuckle.

[/end serious]

As always,

Your Buddy Txt.
bigg

Feb 11, 2010
12:43 PM EDT
@tc

(My opinion only, based on my experience) Both openness and peer review are the issues.

I'm in agreement with the author that the code has to be open. It makes no sense to just take someone at her word that she actually did things correctly and got the reported results. There are mistakes and there is intentional misreporting of results, even just pulling good results out of the air.

The point I was making in my original post (beyond the minor issue about relative trustworthiness of closed, commercial software) is that openness is not sufficient to assure the accuracy of scientific results. Sure, if you've got a political motivation to attack the hockey stick, you might actually go through everything to look for any mistakes. Rarely, however, do we have the resources to do that. In short, there is no peer review of most research software, so open or closed doesn't matter. The only solution I can see is to have projects like the GNU Scientific Library providing the routines for which detailed knowledge is most important.
montezuma

Feb 11, 2010
1:12 PM EDT
Openness is one thing. Reproducibility is another.

Generally speaking scientists write their own programs from scratch in attempts to reproduce the important results of others. This is a really good cross-check on important results. If the result is reproducible then openness is not so important. If it isn't then it becomes crucial.

Recently there was a controversy in this field when the results of a particular scientist (who happened incidentally to be a "warming sceptic") were unable to be reproduced by another set of scientists (who were not "sceptics"). The second group wrote to the first asking for the program so they could find out why the results were not reproducible. Their request was denied. The net effect of that lack of openness was that the credibility of the first groups results were widely called into question.

The Journal of Irreproducible results:

http://www.jir.com/history.html

is a satirical emphasis of this very important point.

gus3

Feb 11, 2010
1:44 PM EDT
Quoting:It makes no sense to just take someone at her word that she actually did things correctly and got the reported results.
All the more so, when those results are used to make public policy decisions. The methods used should be publicly available.

Posting in this forum is limited to members of the group: [ForumMods, SITEADMINS, MEMBERS.]

Becoming a member of LXer is easy and free. Join Us!