Skilled C++ Developers Needed to Crash Code

A research study on failure sampling in KDE is scheduled for September. We are now looking for skilled C++ developers to test one or several C++ classes. The developers at KDAB have already provided some code for the occasion from kdepim. Discussions have been held with the SQO-OSS project on adding the sampling method's basics to Alitheia, the successor to the English Breakfast Network, if the outcome of the study is promising, now all we need, is you...

What do you need to do?

  1. Answer a questionnaire about your C++ experience, "life, the universe, and everything" by September 8. The answers will be confidential.
  2. Between September 10 and September 25, there will be a study package that you can download or have sent to your e-mail. You unit test either a single C++ class for 4 hours, or a component for 8 hours (with breaks of course). The approach is described in the package then you send me the list of failures discovered.
  3. You send us your opinion on how many percent of the failures you think you found.
  4. You review the list of failures found by 2-3 other developers in the same class/component as the one you tested, to say which of the failures you would consider a failure.
  5. What happens after you have participated?

  6. The failures accepted are put into a statistical monster, called a Capture-Recapture estimation method, that chews on them for a while and then spits out a number it thinks likely to be the amount of failures left in the class.
  7. I take the estimate for the class, and using measurement weights from a previous kdepim project, and size and complexity measurements from the kdepim project to which the tested class belongs, I generalise the estimate for the one class, to its neighbours, and their neighbours, and their neigh... well... you get it...

If the result is good, the concept will be added to Alitheia.

To join fill in the questionnaire and send to the e-mail indicated.

Dot Categories: 

Comments

by Thomas Zander (not verified)

"A research study on failure sampling in KDE"

Whats that? Can someone elaborate?

by Hanna Scott (not verified)

Yes. First of all, the definition of failure in this case, is "provoked erroneous behaviour" - where erroneous behaviour means that the code deviates from the purpose for which it was created (Note that this is not the same thing as a failure provoked by the end user of a system!)

a) people are divided into groups, where all within the group have similar skill levels.
b) The people then test the same code. The code is only part of system, making the testing a "sample". The developers who test the code, will note how the failure was provoked, and when.
c) After the testing, the developers within a group review all failures noted, and "haggle" over which ones will be allowed to be seen as actual failures. Majority wins.
d) The overlap of failures found by the developers, is used to statistically calculate, using a method from biology called capture-recapture, how many more failures are likely to still be left to provoke from the software. This estimate of "unprovoked failures" is added to the number of unique failures found during the testing for form "the number of failures possible to provoke from the code tested" - let us call this number just "total failures".
e) The total failures for the code tested, is then used to estimate the "number of failures possible to provoke from the entire system in test".

Before the study begins, meaning before point a), I will have extracted measurements from an old version of kdepim, where I have measured SLOC, CLOC, incoming and outgoing function calls on all classes. I will also have collected the failures reported to the kdepim database at KDAB. The statistical correlation between these measurements and the number of failures for each class is calculated, and will be used as a sort of failure weight for that class. Ex: If SLOC is above a certain number, and CLOC is below a certain ratio to SLOC, then I can with the failure weight calculate the probability for a certain class, having a certain number of "failures possible to provoke" in it. This is a technique that is usually used in fault prediction with regression modelling. (A failure can be the symptom of one or more faults in the code).

Using the weights extracted from the old kdepim version, in combination with the estimate for the tested class' created above in step d), I can predict the number of "total failures" for all classes in the system from which the tested code was taken.

The thought is that we thereby can see if the method works, and if it works, it can be replicated on other KDE projects, maybe even all of KDE if we are really lucky. It also means that the weights created are strong enough to predict the failure contents of neighbouring classes to a sampled one, hence it will be important to include the parsing of these measurements to Alitheia(EBN++), to have the possibility to automate the process of the calculations we will be doing manually now.

I hope this answers your question.

//Hanna

by Thomas Zander (not verified)

So, in short, you will be able to run a test on a different piece of code and combine that with sloc/bug-counts etc and predict how many bugs that code will have.
And if I fix 10 bugs in there over the time of a week, your prediction of bugs left will not change, right?

Hmm, ok, happy hunting. I'm not sure I see the relevance of this for anything but for people who like numbers of no practical use.

by Sebastian Kügler (not verified)

One question is of course if that really matters. Does fixing bugs influence the probability of not-yet-discovered bugs being in the code?

As I understand, your bug fixing frenzy will change the second measurement, though. Ideally, after fixing bugs, you'll count ten less failures while doing the unit testing -- and that should influence the number of failures that are spit out after the capture-recapture computing.

Put roughly, as I understand it it goes "If there's a lot of bugs in the code that I can easily spot, there's probably a lot more which are harder to find".

I might be misunderstanding either of you, of course. :-)

by ninj (not verified)

As I understand it, the purpose of this is to be able to predict code quality based on characteristics of that code, i.e. sloc per function, etc... So, basically we'd have a system which "feels" comfortable or uncomfortable with different types of code, the latest in humanizing computer systems :)

by Maarten ter Huurne (not verified)

Besides C++ experience and a bit of free time, what more is needed? For example, what kind of tooling should be on my PC if I want to participate. GCC 4.x, I assume, but also CMake, Qt4, kdelibs from SVN, ...? Are you using a particular unit testing framework or just "int main()"?

by Hanna Scott (not verified)

Good Question!

I assume that the participants are willing, and can, install/download_ GCC 4.x compiler, CMake, Qt4 and kdelibs. I also assume they have or have access to a compute to be able to do so. We are not using a unit testing framework.

There will be a guide for installation needs and a test guide in the information that is sent out on September 9 to those who have answered the questionnaire.

I hope that answers your question (?)

Kindest Regards:
Hanna

> Answer a questionnaire about your C++ experience, "life, the universe, and everything" by September 8

Isn't this a little harsh? You should know, by the time someone answers this, no one to read it will be around anymore.

by Thomas Zander (not verified)

At least some people will sit in the restaurant at the end of the universe able to finish this thing up, I suppose.

by Hanna Scott (not verified)

Quite right Zander!

by Paul Leopardi (not verified)

Questions 11, 13 and 15 are ambiguous to me. Question 11 comes after a series of questions which say how long ago... Question 11 says:
"How long was your last programming experience?"
Is this meant to say "How long ago was...?" or "What was the duration of...?"
Questions 13 and 15 are similar.