The Economic Organization and Efficiency of OS/FS Software Production:J.-M. Dalle [*], P. A. David [**], and W. E. Steinmueller [***]
An Agenda for Integrated Research
[*] Université de Paris-VI – Jussieu
This version: 18 October 2002
We share the view that the OS/FS movements may carry broader economic and social significance, and so deserve to be the subject of systematic empirical and theoretical study. But, although much about this particular phenomenon remains far from understood, the same might well be said about other aspects of the workings of modern economies, which are no less equally likely to turn out to be important for human well-being. One therefore might be forgiven for remarking that if the research attention that OS/FS software production is attracting from economists is, primarily, to be rationalized on the grounds of novelty and mysteriousness, the extent of this attention is not well founded. The emergence of OS/FS activities at their present scale is hardly so puzzling or aberrant a development as to constitute a rationale for devoting substantial resources to studying it. The cooperative production of knowledge by members of distributed epistemic communities who did not expect to receive direct remuneration for those efforts is not a new departure; there are numerous historical precursors and precedents for OS/FS, notably in the “invisible colleges” that appeared among the practitioners of the new experimental and mathematically approaches to scientific inquiry in western Europe in the course of the 17th century. The “professionalization” of scientific research, as is well known, was a comparatively late development. Moreover, there a substantial body of analysis by philosophers of science and epistemologists, as well as work on the economics of knowledge, that points to the superiority of cooperate knowledge sharing as a mode of generating additions to the stock of reliable empirical propositions. It is the scale on which OS/FS activities are being conducted, the rate at which they may interact, and the geographical dispersion of the participants, rather than their mere existence that can properly be deemed to be historically unprecedented. The modularity and immateriality of software products, and the enabling effects of the advances in computer mediated telecommunications during the past several decades, however take us a long way towards accounting for those aspects of the story.
In our view OS/FS warrants systematic investigation in view of a particular
historical conjuncture, indeed portentous constellation of trends in the
modern economy. The first is that information-goods are moving increasingly
to the center of the stage as drivers of economic growth. Secondly, the
enabling of peer-to-peer organizations for information distribution and
utilization is an increasingly obtrusive consequence of the direction
in which digital technologies are advancing, and the “open”
(and cooperative) mode of organizing the generation of new knowledge has
long been recognized to have efficiency properties that are much superior
to institutional solutions to the public goods problem which entail the
restriction of access to information through secrecy or property rights
enforcement. Thirdly, and of practical significance for those who seek
to study it systematically, the OS/FS mode of production itself is generating
a wealth of quantitative information about this instantiation of “open
epistemic communities.” This last development makes OS/FS activities
a valuable window through which to study the more generic and fundamental
processes that are responsible for its power, as well as the factors that
are likely to limit its domain of viability in competition with other
modes of organizing economic activities.
Most of the researchers associated with this project come to this particular
subject matter from the perspective formed by their previous and on-going
work in “the new economics of science,” which has focused
attention upon the organization of collaborative inquiry in the “open
science” mode, the behavioral norms and reinforcing reward systems
that structured the allocation of resources, the relationships of these
self-organizing and relatively autonomous epistemic communities with their
patrons and sponsors in the public and private sectors. As economists
looking at OS/FS communities, the interrelated and central pair of questions
that are both simple and predictably familiar. Firstly, how do OS/FS project
mobilize the resources, allocate expertise, and retain the commitment
of their members? Secondly, how fully do the products of these essentially
self-directed efforts meet the long-term needs of software users in the
larger society, and not simply provide satisfactions of various kinds
for the developers? In other words, the tasks we set ourselves in regard
to OS/FS address the classic economic questions of whether and how it
is possible for a decentralized decision resource allocation process to
achieve coherent and socially efficient outcomes. What makes the problem
especially interesting in this case is the possibility that the institutions
evolved by the OS/FS movements enable them to accomplish that outcome
without help either from the “invisible hand” of the market
mechanism driven by price signals, or the “visible hands”
of centralized managerial hierarchies. To respond to this challenge requires
that the analysis be directed towards providing a basis for evaluating
the social optimality properties of the way “open science”,
“open source” and kindred cooperative communities organize
the production and regulate the quality of the “information-tools“
and the information-goods” that will be used not only for their
own, internal purposes, but by others with quite different purposes in
the society at large.
The initial thrusts of these four complementary research “salients” are briefly described, taking them in turn:
• Distribution of developer efforts within
We anticipate soon being able to answer questions of the following sort, which give the flavor of the micro-level allocation issues that this class of data will enable us to address: Is the left-skew in the distribution of contributions to the Linux Kernel as a whole also a feature of the distributions found for its components, i.e., is the pattern of concentration self-identified “authorship” fractal? Is the concentration significantly greater for some components than for others? Are these distributions stationary throughout the life of the project, or does concentration grow (or diminish) over time – the former having been found to be the case for the distribution of scientific authorship in specific fields over the lives of cohorts of researchers publishing in that field. In addition, we expect to be able to identify clusters of authors who work together within, and across different components of the Kernel project; to learn whether these grow by accretion of individuals, or coalesce through mergers; and whether, if they do not grow, they contract or remain constant. Further, by correlating the clusters of authors with the dependency data, it may be possible to obtain characterizations of the nature of “knowledge flows” between identified groups. 
It will be an important methodological issue for subsequent work to ascertain whether or not there are significant biases in the ability of the extraction algorithm to identify the distribution of authorship within this, the project for which it was designed. Inasmuch as one cannot treat the Linux Kernel as a “representative “ OS/FS project, other projects, which may differ in their conventions with regard to self-identification of contributions in the code itself, eventually will need to be studied.
• Allocation of developer communitys’
efforts among projects:
Both large and small-scale analysis seem feasible as a way of pinpointing the characteristics that enable (or fail to enable) the creation of ‘burgeoning’ communities that propel the growth of open source projects towards critical mass and and engagement with a self-catalyzing process by which user-producers (those who both use the software and contribute to its further development) sustain a high level of attention and incremental innovation in further developing projects. SourceForge itself provides sufficient information about the initial features of project to permit analysing the influence of factors such as technical nature, intended user/audiences, internal project organization and release policies, and legal aspects (e.g., projected form of licensing).
Timing and path-dependencies may be hypothesized to affect the success or failure of OS/FS projects, and it may be important to recognize that success and failure is not determined in isolation from the characteristics of other projects that may be competing for developer’s adherence. A population ecology perspective (emphasising the processes of birth, growth in the context of rivalry, and demise) is potentially pertinent in this connection, and interactions between the characteristics of the project and the features of “the niche” into which it is launched should be empirically investigated. Given that "developer mind-share" is limited, we may suppose that older projects are entrenched through technological lock-in processes that make more difficult the engagement of a critical mass of developers in competing similar projects. Developers will tend to increase their co-operative activities in older projects as they gain in experience and knowledge about them (these individuals are moving up project-specific learning curves, as well as acquiring generic and more transferable skills). Their attention and willingness to co-operate in other projects/new ones is therefore likely to decrease . A key critical question about the governance of this process is whether the dominance of incumbent older projects serves to suppress innovative variety by creating a ‘standardised’ or ‘dominant design’ model of the attributes and features of the software. This kind of externality effect, through which accidents of timing and initial momentum may serve to “lock in” some projects, while locking-out others that are technically or socially more promising if considered on their intrinsic merits, has been identified in studies of the (competitive) commercial development and diffusion of other technological artefacts. It would therefore be of considerable interest to establish whether or not dynamics of this kind can be observed in the non-market allocation of developmental resources among software systems products. The fact that SourceForge makes it possible to filter projects according to the tools (such as programming languages and techniques) used in their development, and that the differences between these tools may be an important factor in lock-in, makes the analysis of this kind of processes easier. The possibility of tracking down the history of individuals' co-operative records may also make it feasible to study their patterns of involvement, entry and exit of different projects. Quantitative and qualitative methods will be used to identify the presence or absence of path dependency in the attainment of successive “states” of project growth.
• Sponsorship support and individual developer
relations with commercial sponsors:
A two-pronged approach to studying the issues this raises in being pursued at Stanford (SIEPR) by the Dr. Seema Arora (project research associate) and Andrew Waterman (graduate research assistant). A web-based survey protocol has been developed to elicit more detailed information from developers about their contacts with, and possible involvements in complementary/collateral commercial enterprises. This survey replicates a number of questions that were answered on the FLOSS survey, in order to establish the relationship between the two respondent populations, as well as to increase the sample density on particular questions  . By asking respondents to identify the first OS/FS project on which they worked, and the two projects they deemed to be their most significant/important project participations (for identified reasons), the survey design seeks to link responses with the project characteristics information available from the SourceForge archives. A second line of inquiry also makes contact with the work of on determinants of project “success,” previously described: data available at SourceForge will be used to investigate whether there are significant statistical relationships between the specifics of the licensing arrangements adopted at the time that project’s are launched, and the subsequent development of derivative commercial enterprises around those projects that eventually do release code.
• Using Agent Based Simulation Modelling
as an Integrative Device:
In our original attempt to model open-source software development, developers/agents essentially choose to dedicate their efforts – typically contributing new functionalities, correcting bugs, etc. – among alternative projects, each project corresponding to a different software module. The available alternatives at each moment include the launching of new projects. Agents’ actions are probabilistic and conditioned on comparisons of the expected non-pecuniary or other rewards associated with each project, given specifications about the distribution of their potential effort endowments . The shape of the distribution of endowments, strictly speaking, cannot be inferred immediately from the (skewed) empirical distribution of the identified contributions measured in lines of code, but one can surmise that the former distribution also is left-skewed – on the basis of the relative sizes of the “high-activity” and “low-activity” segments of the developer population found by the FLOSS survey. To characterize the structure of the relative rewards associated with participation in various roles and in projects of different types, we begin with rather coarse implementation of a set of “ethnographic” observations describing the norms of OS/FS hacker/developer communities – following Eric S. Raymond’s insights in the well-known essay “Homesteading the Noosphere.”  The core of this is a variant of the collegiate reputational reward system: the more significant attached to the project, the agent’s role, and the extent or criticality of the contribution, the greater the anticipated “reward”. Caricaturing Raymond’s more nuanced discussion, we stipulate that launching a new project is as a rule more rewarding than contributing to an existing one, especially when several contributions have already been made; early releases typically are more rewarding than later versions of project code; there are some rewarding “project-modules” that are systematically accorded more “importance” than others, and these are ordered in a way that reflects meso-level technical dependences. One way to express this last rule is to say that there is a hierarchy of within a family of projects, such that contributing to the Linux Kernel is deemed a (potentially) more rewarding activity than providing Linux implementation of an existing and widely used applications program, and the latter dominates writing an obscure driver for a newly-marketed printer. In other words, we postulate that there is lexicographic ordering of rewards based upon a discrete, technically-based “ladder” of project-types. Lastly, for present purposes, new projects are created in relation to existing ones: we consider that it is always possible to add a new module in relation to an existing one, to which it adds a new functionality, and we assume that this new module will be located one level down the ladder.
As a consequence, all the modules, taken together, are organized as in
a tree which grows as new contributions are added, and which can grow
in various ways depending on which part of it (upstream or downstream
modules, notably) developer will select. We further conjecture that this
tree will be in some extent correlated both with the directory tree, and
also with the technical interdependencies between the modules, although
this correlation will probably be especially imperfect in the first case.
Although our goal is eventually to account for the effects of the relationships that may develop with commercial sponsors of projects, and the direct and indirect market interactions between profit-seeking software firms and OS/FS development projects, our model-building effort thus far has been occupied with representations of the workings of OS/FS communities in isolation. Thus we focus mostly on social utility measurements according to the basic ideas that (1) upstream modules are more valuable than downstream ones simply because of the range of applications that eventually can be built upon them, and (2) a greater diversity of functionalities (breadth of the tree at the downstream layers) is more immediately valuable because it provides software solutions to fit a wider array of user needs.
In this regard, preliminary results tend to stress the social efficiency
of developer community “norms” that accord significantly greater
reputational rewards for adding, and contributing to the releases of upstream
modules. Further, these preliminary explorations of the model suggest
that policies of releasing code “early” tend to generate tree-shapes
that have higher social efficiency scores. The intuitively plausible interpretation
of this last finding is that early releases are especially important (adding
larger increments social utility) in the case of upstream modules, because
they create bases for further applications development, and the reputational
reward structure posited in the model encourages this “roundabout”
(generic infrastructure) process of development by inducing individual
efforts to share the recognition for contributing to upstream code. This
is based upon a static ex post evaluation of the resulting tree form,
and it is evident that the results may be altered by considering the dynamics
and applying social time discount rates to applications that only become
available for end users at considerably later dates. In other words, the
social efficiency of the reward structure that allocates developers’
efforts will depend upon the temporal distribution, as well as relative
extent to which OS/FS-generated code meets the needs of final users rather
than the needs/goals of the agents who choose to work on these projects.
 This project is supported by an NSF Grant (IIS-0112962) to the Stanford Institute for Economic Policy Research’s “Knowledge Networks and Institutions for Innovation Program, led by Paul David.
 A different use of this methodology would analyse the dependency of all the contributions – the signed and the unsigned contributions to the Linux Kernel, and investigate whether there are significant differences in the likelihood that contributions that have high dependency measures will be signed.
 The SourceForge.net website contains data on over 33,000 Open Source Projects, including their technical nature, intended audiences and stage of development. Records of their past history and the engagement of the OS/FM community in their improvement and evolution are also, in principle, available.
 The success/failure of a project can be characterised by its developing rate and speed, and the engagement of the community of developers with its improvement/growth
 Contraposed to this tendency could be that of individuals abandoning old projects as these reach their end or get bored with them. If individuals derive utility from the excitement associated to "new hacks", the kind of attachment to projects previously described will not take place necessarily. If the exit/entry rates of developers in the OS community is fast enough, this problem may also be attenuated. Finally, individuals searching to increase their status inside the community may have incentives to abandon their roles as collaborators of existing projects in order to start new ones (this possibility will be developed later).
 The instrument presently is undergoing the final stage of field-testing at a website to which volunteers willing to identify themselves are directed: http://www.stanford.edu/group/floss-us/survey.fft.
 In the simplest formulations of the model, agents’ endowments (measured in thousand lines of code per period (KLOCs) are treated as “fixed-effects” and are obtained as random draws from a stationary distribution. More complex schemes envisage endogenously determined and serially correlated code-ing capacities, with allowance for experience based learning effects at the agent level.
See The Cathedral and the Bazaar: Musings on Linux and Open Source
by an Accidental Revolutionary, Sebastopol CA:O’Reilly, 2001: 65-112.