Project outline

Final report

Press coverage

related research




Beyond Description: Testing and Bringing Information Theories to Bear on Free/Libre and Open Source Software Studies
Paul Jones, University of North Carolina

So far research in FLOSS has been reactive and retrospective. Given what we know through theory and practice in other fields of cooperation, information production and publishing, can we also be predictive and integrative?

In mid-1999 when we at UNC began our study of contributors to a very open Open Source archive, a collection of programs that are accompanied by metadata called the Linux Software Maps [1], our aim had been to understand the volatility of changes in the archive so as to facilitate our work with the Internet 2 Distributed Storage Initiative [2]. Along the way, we noticed something else, that is, we could perhaps confirm or refute some propositions, commonly held as truths, in the various debates about the nature of contributors to the archive and to Free/Libre Open Source Software (to be called F/LOSS for rest of this paper).

At that time and since, much has been said about the motivations and the make-up of the developer community, if indeed it is a community. The highest volume of debate, in every sense of the word "volume," was between Richard Stallman and Eric Raymond. Both speakers were, and continue to be, well informed by their central places in software development. But both seemed to be proposing untested theories of work and information production.

Briefly, our study, which has been available on the net since October 1999 -- although not published until May 2002 in Communications of the ACM [3]--, noticed that commercial and European developers played larger roles than had been imagined earlier.

Using the data to direct our inquiry into the basis of discourse about F/LOSS, which was at that time experientially based, led me to question other givens about F/LOSS production.

Testing in a very elementary way, by interviews with important F/LOSS figures, Eric Raymond's contention that Open Source breaks Brooks' "no silver bullet" law for software development [4] revealed that not everyone's experiences were in harmony with Raymond's claims. Not even Raymond's.

But data and interviews, while helpful in dispelling bad rhetoric and ill formed theories, can with insight and reflection point us also to a better rhetoric and well formed theories. That is to say that our data must also be informed by existing theories of information production, cooperation, and group work to improve our understanding of F/LOSS development.

I am not certain that we are uncovering new theory, as we may have thought in our earlier stages, but I am certain that we need to use and refine existing theories and what we call "laws." Without a more integrative and predictive approach, we will end up following the wrong rabbits down the wrong rabbit holes far too many times. We will be stuck in a weak descriptive phase of research and be delayed by far more data than we can interpret wisely.

One recent step by our group [5] has been to notice that Lotka's Law, a bibliometric proposition developed in 1926 about scientific authors' production of papers [6], also applies to F/LOSS developers. This leads me to believe that the vigorous testing of existing theories in the information science, sociology, and cooperative work fields against the careful data collection will yield fruitful and immediate insight into the important questions of how F/LOSS development actually works.

We must go beyond description in order to further understand and accurately describe the nature and work of F/LOSS development.

[1] A Quantitative Profile of a Community of Open Source Linux Developers. by Bert J Dempsey, Debra Weiss, Paul Jones, and Jane Greenberg. October 1999. http://www.ibiblio.org/osrt/develpro.html

[2] http://www.internet2.edu/dsi

[3] A Quantitative Profile of a Community of Open Source Linux Developers by Bert J Dempsey, Debra Weiss, Paul Jones, and Jane Greenberg. Communications of the ACM. April, 2002.

[4] Brooks' Law and open source: The more the merrier? by Paul Jones. IBM developerWorks. May 2000. http://www-106.ibm.com/developerworks/library/merrier.htm

[5] Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming. by Greg Newby, Jane Greenberg and Paul Jones. Journal of the American Society for Information Science and Technology. Forthcoming.

[6] Definition: The number of authors making n contributions is about 1/n2 of those making one contribution. http://www.nist.gov/dads/HTML/lotkaslaw.html