What, exactly, is Open Science?

Posted by Dan on July 28, 2009 at 11:45 am | Categories: Open Data, Policy, Science, open science | 26 Comments

I was recently asked to define what Open Science means. It would have been relatively easy to fall back on a litany of “Open Source, Open Data, Open Access, Open Notebook”, but these are just shorthand for four fundamental goals:

  • Transparency in experimental methodology, observation, and collection of data.
  • Public availability and reusability of scientific data.
  • Public accessibility and transparency of scientific communication.
  • Using web-based tools to facilitate scientific collaboration.

The idea I’ve been most involved with is the first one, since granting access to source code is really equivalent to publishing your methodology when the kind of science you do involves numerical experiments. I’m an extremist on this point, because without access to the source for the programs we use, we rely on faith in the coding abilities of other people to carry out our numerical experiments. In some extreme cases (i.e. when simulation codes or parameter files are proprietary or are hidden by their owners), numerical experimentation isn’t even science. A “secret” experimental design doesn’t give skeptics the ability to repeat (and hopefully verify) your experiment, and the same is true with numerical experiments. Science has to be “verifiable in practice” as well as “verifiable in principle”.

In general, we’re moving towards an era of greater transparency in all of these topics (methodology, data, communication, and collaboration). The problems we face in gaining widespread support for Open Science are really about incentives and sustainability. How can we design or modify the scientific reward systems to make these four activities the natural state of affairs for scientists? Right now, there are some clear disincentives to participating in these activities. Scientists are people, and we’re motivated by most of the same things as normal people:

  • Money, for ourselves, for our groups, and to support our science.
  • Reputation, which is usually (but not necessarily) measured by citations, h-indices, download counts, placement of students, etc.
  • Sufficient time, space, and resources to think and do our research (which is, in many ways, the most powerful motivator).

Right now, the incentive network that scientists work under seems to favor “closed” science. Scientific productivity is measured by the number of papers in traditional journals with high impact factors, and the importance of a scientists work is measured by citation count. Both of these measures help determine funding and promotions at most institutions, and doing open science is either neutral or damaging by these measures. Time spent cleaning up code for release, or setting up a microscopy image database, or writing a blog is time spent away from writing a proposal or paper. The “open” parts of doing science just aren’t part of the incentive structure.

Michael Faraday’s advice to his junior colleague to: “Work. Finish. Publish.” needs to be revised. It shouldn’t be enough to publish a paper anymore. If we want open science to flourish, we should raise our expectations to: “Work. Finish. Publish. Release.” That is, your research shouldn’t be considered complete until the data and meta-data is put up on the web for other people to use, until the code is documented and released, and until the comments start coming in to your blog post announcing the paper. If our general expectations of what it means to complete a project are raised to this level, the scientific community will start doing these activities as a matter of course.

If you meet a scientist who tells you that they did a fantastic experiment and have wonderful data, you naturally ask them to email you a reprint. Any working scientist would be perplexed if the response was: “Oh, I’m not going to be writing this work up for publication.” It would be absolute nonsense in the culture of science to not publish a report in a journal on the work you have done. And yet, no one seems surprised when scientists are too busy or too secretive to release their data to the community. We should be just as perplexed by this. Instead of complaining about the reward and incentive systems, we should be setting the standard higher: “What do you mean that you haven’t got around to putting your data on the web? You aren’t done yet!” Or: “How can I possibly review this paper if I can’t see the code they were using? There’s now way for me to tell if they did the calculation right.” We’re going to have to raise the expectations on completing a scientific project if we want to change the culture of science.

Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter

26 Comments »

RSS feed for comments on this post. TrackBack URI

  1. Thank you for a detailed explanation.

    Open-source software (such as Linux, Apache, or Python) is popular not so much because it is open/free but mostly because it is better than closed-source. Similarly, open science will prove itself when it is shown to be more powerful than closed/secretive (pseudo)science. To promote it, we should focus on success stories of scientists getting further in their research by documenting and sharing the details of computational experiments.

    Comment by Sergey Fomel — July 29, 2009 #

  2. I agree wholeheartedly with this :-) I also recommend the following article; although it’s meant for computational linguists I think the points it makes are relevant for all scientists:

    Empiricism is Not a Matter of Faith (Pedersen), Computational Linguistics, Volume 34, Number 3, pp. 465-470, September 2008. [Journal Citation Reports Index Factor 2007: 2.367]

    Comment by Kevin Brubeck Unhammer — July 29, 2009 #

  3. Pedersen’s article is wonderful. Thanks for posting that link, Kevin!

    Comment by Dan Gezelter — July 29, 2009 #

  4. [...] 29, 2009 · Leave a Comment Dan at The OpenScience Project has a new post defining what “open science” means. The author advocates a completely [...]

    Pingback by Defining Open Science « — July 29, 2009 #

  5. Great piece. One addition: although it is true that institutional motivation factors to collaborate are not present, success of the networking (Internet Engineering Task Force), RFCs, is food for thought that scientists tend to overlook and instead just focus on openness of data, final results, methodology. What about Open-process?. The question is, can sciences improve if we reuse aspects of what i call the Internet Model (Free Software + IETF)? I’m convinced the answer is positive. Are there institutional rewards? I don’t think there were any for the open source/data/methodology part either. But the situation changed, as academics and volunteers of all kinds, promoted, used and developed the model in areas other than software. If we do the same for the volunteer-core open-process model that gave us the Internet and the Open Source type collaboration, we are justified to expect significant leaps in the productivity of sciences that adopt this full Internet Model (open source + open process). Open source/data is half of it, volunteer open-process is the half that we’re missing to make the leap. best, toni.

    Comment by toni prug — July 29, 2009 #

  6. Four short links: 30 July 2009…

    iPhone App v1.3 Released — revealing glimpse into how third-party apps (such as this iPhone app, built on the Brooklyn Museum’s API) reflect on the institution providing the API. Brooklyn Museum has dealt with this sensitively and intelligently, a m…

    Trackback by O'Reilly Radar — July 30, 2009 #

  7. [...] What, Exactly, is Open Science? – In general, we’re moving towards an era of greater transparency in all of these topics (methodology, data, communication, and collaboration). The problems we face in gaining widespread support for Open Science are really about incentives and sustainability. How can we design or modify the scientific reward systems to make these four activities the natural state of affairs for scientists? Right now, there are some clear disincentives to participating in these activities. (via Glyn Moody) [...]

    Pingback by Four short links: 30 July 2009 | Tech-monkey.info Blogs — July 30, 2009 #

  8. [...] Hanna posted The OpenScience Project » What, exactly, is Open Science? [...]

    Pingback by Moonlit Minds @Moonlit Minds — July 30, 2009 #

  9. [...] exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on [...]

    Pingback by Science, publishing, and such - elearnspace — July 30, 2009 #

  10. Thanks for the post! I totally agree that transparency is essential. There will always be some aspects of science that will remain closed (eg, real-time data sharing will be difficult for the reasons you stated above). But to add the perspective of a wet-lab scientist about the transparency of techniques- 99% of what we do at the bench on a daily basis is not a trade secret. In fact, in 15 years at the bench, I did a lot of experiments- not one of them was novel. Most of the time the novelty comes in how those techniques are applied.

    So the fact that protocols and techniques are not openly shared is insane. Everyone wins with increased access. Moreover, the idea that a traditional “publication” is the only point at which information can be shared is also unfortunate. There are many grad students, postdocs, and professors who are excellent scientists with impeccable technique who will experience long droughts in between papers. There should be a way for others to learn from them and for them to gain recognition in their field during these downtimes.

    In fact, I believe so strongly in this, I just launched a website called BenchFly.com that addresses this exact issue by allowing anyone with an internet connection to upload a video of their protocols, techniques, tips, tricks- basically anything that will help another scientist out. The same way we learn in the lab, just now on the internet… We can’t overhaul the entire scientific process overnight, but we’ve got to start somewhere.

    Comment by Alan Marnett — July 30, 2009 #

  11. Thanks for a thought provoking post. I’m interested in the idea of transparency in research, and am attempting to update the progress of my PhD study through my blog. However, I find it difficult to justify the time needed to write a considered post on the process. I’m also uncertain how much of my work my institution will be comfortable with me sharing in this way. I think it’s going to be difficult for researchers to dedicate time to this “additional” work, and in order to move forward, institutions must formally recognise it as part of the research process.

    Comment by Michael Rowe — July 31, 2009 #

  12. [...] What, Exactly, is Open Science? – In general, we’re moving towards an era of greater transparency in all of these topics (methodology, data, communication, and collaboration). The problems we face in gaining widespread support for Open Science are really about incentives and sustainability. How can we design or modify the scientific reward systems to make these four activities the natural state of affairs for scientists? Right now, there are some clear disincentives to participating in these activities. (via Glyn Moody) [...]

    Pingback by Four short links: 30 July 2009 | Design Website — July 31, 2009 #

  13. Great article!

    Regarding your second point, “Public availability and reusability of scientific data”, you may be interested in the Open Knowledge Definition (OKD), which provides criteria for openness in content/data:

    http://opendefinition.org/

    For anyone interested in open data in science, there is a Working Group on this at the Open Knowledge Foundation:

    http://wiki.okfn.org/wg/science

    Comment by Jonathan Gray — July 31, 2009 #

  14. [...] exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on the [...]

    Pingback by New Technologies And Media By George Siemens, 07-31-09 | graphics and innovation — July 31, 2009 #

  15. Media Literacy: Making Sense Of New Technologies And Media by George Siemens – Aug 1 09…

    The worth of media literacy is to help learners develop the necessary skills to evaluate and analyze reality. Inside and outside the classroom, you need a critical and solid approach to face the complexity of these fast-changing times. Photo credit:……

    Trackback by Robin Good's Latest News — August 1, 2009 #

  16. [...] knowledge in a paper in order to support the one-to-three new assertions made in any one paper.” What exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on the [...]

    Pingback by Media Literacy: Making Sense Of New Technologies And Media by George Siemens - Aug 1 09 | Write a Blog Site — August 1, 2009 #

  17. [...] knowledge in a paper in order to support the one-to-three new assertions made in any one paper.” What exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on the [...]

    Pingback by Free Readings Online » Blog Archive » Media Literacy: Making Sense Of New Technologies And Media by George Siemens - Aug 1 09 — August 1, 2009 #

  18. [...] knowledge in a paper in order to support the one-to-three new assertions made in any one paper.” What exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on the [...]

    Pingback by Media Literacy: Making Sense Of New Technologies And Media by George Siemens – Aug 1 09 | Digest I Realize — August 1, 2009 #

  19. [...] What, exactly, is Open Science? ‘Transparency in experimental methodology, observation, and collection of data; public availability and reusability of scientific data; public accessibility and transparency of scientific communication; using web-based tools to facilitate scientific collaboration.’ [...]

    Pingback by Recent links on Open Access « Free Our Books — August 1, 2009 #

  20. [...] What exactly is open science?: “your research shouldn’t be considered complete until the data and meta-data is put up on the web for other people to use, until the code is documented and released, and until the comments start coming in to your blog post announcing the paper.” [...]

    Pingback by Media Literacy: Making Sense Of New Technologies And Media by George Siemens – Aug 1 09 « Argument — August 1, 2009 #

  21. [...] — gvwilson @ 7:28 pm Over at the OpenScience Project, Dan Gezelter is trying to define what “open science” means. He thinks there are four key [...]

    Pingback by What *Is* Open Science? « Software Carpentry — August 3, 2009 #

  22. [...] The backreaction blog post is a response, not specifically to the polymath project (in fact, it does not even mention polymath and is not specific to mathematics). Rather, it is a reaction to this post on the Open Science blog. [...]

    Pingback by Collaborative mathematics, etc. « What Is Research? — August 8, 2009 #

  23. A couple of you made some good points about open-process. And I think that the opening of academia itself (the free availability of course content, lesson plans, syllabi, etc) will help break down the institutionalized nature of academic science, and speed up evolution of ideas a bit more.

    But I also think that a lot of closed science is a result of ego/pride/desire for acclaim on the part of some scientists – those who are more concerned about building a legacy for themselves.

    Comment by Shingai — September 2, 2009 #

  24. Ultimately, I think there are some aspects of the evolutionary process that can’t be shared. Those that can synthesize multiple ideas and are inspired by a consequent vision, those that are driven by intuition, etc.

    You can deconstruct these things mechanically, perhaps, and devise theories about their process. But they are intrinsically a “property” of the originator. Just as disciples of a great leaders deal with permutations of the original philosophy

    Comment by Shingai — September 2, 2009 #

  25. Thanks for this wonderful article.

    I’ve been thinking about these issues lately because of my role as a technical editor for a new journal called “Mathematical Programming Computation”, (Mathematical Programming already has series A and B, the C series is for “computational” papers.) Papers submitted for publication in MPC have to be submitted together with their source code, and I’m one of the group that reviews these codes.

    One very common issue that we face is that an author will call routines from a closed source commercial library (such as ILOG’s CPLEX) within their code. This creates all kinds of problems for the review process and later replication of the results, as the version of the library originally used by the author must be available to the reviewer, and then after a few years it’s extremely likely that that particular version of the library will no longer be available to anyone.

    Comment by Brian Borchers — November 8, 2009 #

  26. Thanks for this nice article.
    What I am missing here is a comment concerning the free availability of scientific papers. I think this is a very fundamental requirement for a scientific work as well. Else, scientific results are unavailable by institutions, universities and individuals who do not have the resources for often embarrassing high-priced
    subscriptions and journal papers during a very long period of time after issuing. This, while producing costs have been reduced drastically due to the change from paper to electronic based
    journals. The editor and publisher tasks should be performed by non-commercial driven and non-political biased institutions. A good way might be to put the entire (commercial orientated!) scientific publishing industry to the ground and charge public-founded, non-political-biased universities. A collaboration between librariens, university-press and informatics services might work well to do the organization of the peer reviewing, composing journals and providing (for free) on their servers.

    Currently, the scientific industry is suffering from a typical lock-in situation: funding is based on publications in traditional journals with high impact factors. Moving out to a new and free available and open journal is just scientific suicide. A big change will have to happen to break this situation. Now it is a good opportunity regarding the whole climate-gate affair, causing many discussions what has gone wrong in this industry.

    A remark to Brian Borchers:
    In the Debian/GNU system (and probably other FOSS systems) a piece of software is only regarded as free if the software itself AND its depending libraries have been issued under a FOSS licence. Else, it will not be included in the ‘main’ repository, but in the contrib or non-free repo, without further support.

    So, concerning MPC: software that is free, but depends on non-free libraries should be rejected for publication.

    Comment by Gerber van der Graaf — December 6, 2009 #

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^