OSS Activity Study
Version 0.5 (23 Jul 2005)
The purpose of this study is to rank countries by a relative measure of open source community activity.
A presentation version of this study can be found here.
Table of Contents
Determine a fair measure of open source software contribution per country by analyzing activity on public Internet mailing lists of major open source projects.
The key objectives of the study are:
- Global picture of OSS activity.
- Fair
- Don't favour one country or geographic region.
- Don't favour infrastructure rich countries.
- Measure activity data from a wide range of open source projects.
- Beginning of a larger project to gauge OSS 'Use vs Contribution'.
The methodology used in the study:
- Use open source development mailing list posts as a measure of activity.
- Select major open source projects:
- Apache, Debian, Fedora, Gnome, JBoss, KDE, Linux kernel, OpenLDAP, Samba, Sendmail, X Windows
- Use per Internet Capita as a normative baseline (US Census and ITU data).
- Mailing list data was mirrored uses news mirroring software from the gmane.org mailing list to news gateway.
- 2 years of data was studied from 1 Jan 2003 through 31 Dec 2004
- Account email addresses to countries using TLD lookup and IP GEO location database.
- The free MaxMind GeoIP IP Country was used which is 97% accurate
- In this way we can can account .com, .net and .org to foreign countries.
- Remove SPAM email by doing MX verification of posting email addresses
- Remove outliers that can't be accounted
- 'apache.org', 'kde.org', 'gnome.org', 'debian.org', 'gmane.org', 'samba.org', 'jboss.org', 'sourceforge.net'
- This should not effect rankings as these can be assumed to be proportionatly distributed.
Method for country decoding
The following method is used to assign a country to an email posting:
- The primary MX for host part of all email addresses is looked up.
- Messages are excluded if they don't have a valid primary MX.
- 2 letter ISO top level domain code is used if it exists.
- If not then the first valid IP for the primary MX is looked up in the GeoIP IP Country database.
Filtering criteria
To increase signal to noise ratio countries are excluded that have:
- internet population < 1,000,000
- less than 100 postings
- less than 20 unique email addresses found
Limitations and assumptions in interpretation of study results:
- English language only.
- Most global OSS development in English.
- Type of discourse not taken into account.
- Assumption that posts are related to development and contribution i.e. bugs, patches, etc.
Computed results from mailing list postings:
Western Europe contributes the most with Norway currently ranked number one.
Please contact the authors if you are interested in further development of this study:
Copyright 2005 Metaparadigm Pte Ltd.