Open Science consulting gig with the American Heart Association

I just got notification about a short-term consulting gig with the American Heart Association that is specifically related to issues of Open Science:

The American Heart Association has recently formed a task to explore and determine AHA’s role in open science.  Our specific interest at this time is focused on data repositories for finalized research data.  We are currently looking for a consultant to assist in gathering background information for this task force in the following areas:

  • What open science is, and how it relates to the AHA.  This should be approached more broadly than just our data focus so the committee is aware of other research transparency opportunities.
  • What other similar organizations are doing relative to open science and making data and publications available (e.g. NIH, HHMI, ACS, EU, etc.) The AHA can provide contact information within most these organizations.  We would like to see comparisons to about 6 similar organizations.  Our publishing area has looked at policies related to public access to publications, but this task force will be looking at how open access could impact our research program.
  • Existing repositories in use which can hold the types of data that AHA funded grants produce.
  • Legal, data design, or intellectual property issues with requiring researchers to store their data in a repository.
  • Documenting issues that should be addressed by the task force if they decide to move forward (e.g. timing for providing data, retention of data, definition of data to store, requiring common terminology, issues related to the re-use of the stored data etc.)

We anticipate that these efforts may involve the consultant 5-10 hours a week for a 6 week time period.   The project will begin in mid-October and would ideally be completed by Dec 1st.  Consultants that are interested in this project should provide a summary of their qualifications by October 12th.

 

Belinda Orland

Research Information Manager

Division of Research Administration

 

American Heart Association/American Stroke Association 

7272 Greenville Avenue
Dallas, TX 75231
214 360 6110
belinda.orland@heart.org

Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in Uncategorized | Leave a comment

Advice to junior faculty who want to do get promoted doing Open Science

I recently sent some advice to a colleague who is coming up for tenure at another university.  He’s quite well known in the Open Science community and is trying to figure out how best to make the case to his tenure committee that the open science contributions he has made in addition to his traditional journal publications are important.  We’re talking some major contributions here — lab protocols on OpenWetWare, open lecture materials on slideshare, data files released with CC0, videos of lab protocols on Benchfly, and he’s a regular contributor to science discussions on FriendFeed.

The advice I gave him was basically to make the committee’s job of measuring these contributions easier.  Here’s the advice (in a slightly edited form):

The audience for most tenure documents (and particularly the external letters) is a committee of non-specialists that advises the provost or other high-ranking administrator.  These committees are often somewhat skeptical of departments and candidates and are looking for external validation of what they are reading in the tenure dossier and the packet prepared by the departments.  They are swayed by real experts in the field (named chairs at other institutions, national academy members, people at top 10 institutions) and by things they can measure (publications, h-indices, grant money, citation counts). If you want to add a non-traditional contribution to a tenure dossier, you should also include a way of measuring the importance of that contribution.

First, if the rules of your institution allow it, make sure there is a strong defense of open ways of doing science in your dossier (1-2 paragraphs or so).  Make the case that it is important to consider non-standard contributions even though previous tenure committees did not.

Use as many metrics to back up your contributions as you can. Make a case that each of your software releases counts as much as a full publication, and use download statistics as if they were directly comparable to academic citations.  List external users of your software as if they were research collaborators, because they are!  If you can collect them, include download statistics on open contributions to sites like OpenWetware and Wikipedia.

If your institution’s rules allow it, make sections directly under your publications for  ”Published Datasets”, “Contributed Software”, “Published Protocols & Notebooks”, “Scientific Videos”.  In each section, list authors, a title, description, and URL of the resource you have contributed along with a count of downloads or views, and a list of other groups using your data. Make this look as much like your publication section as possible, as you can then make the argument that these things should be treated with a similar weight to traditional academic publication.   Provide the metrics in the document so that your committees aren’t guessing about how important something is.  I can’t emphasize this enough – citation counts are easy for a committee to dig up – download stats are harder.  Do the measurement work for your committee and they’ll make the assumption that your metrics are important.

So that’s the advice.  I’ve been involved in a few internal tenure discussions, and the metrics are always important.  If there isn’t an easy analogy to something in my own experience, I look to the candidate’s documents and the external letters to tell me why something matters.

Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science | Leave a comment

An informal definition of OpenScience

Over at the open-science mailing list at okfn.org, Michael Nielsen just posted a great “informal” definition of open science:

 

Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.

The discussion on the list has been very interesting, but that particular “informal” definition is great because it gets at why we’re struggling with established social norms in science given the new technological methods of communicating results:

 

…when the journal system was developed in the 17th and 18th centuries it was an excellent example of open science.  The journals are perhaps the most open system for the dissemination of knowledge that can be constructed — if you’re working with 17th century technology.  But, of course, today we can do a lot better.

Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science | Tagged , | Leave a comment

10 years of CDK

Today marks (roughly) the tenth birthday of a fantastically successful open science project called the Chemical Development Kit (CDK).  At the time the skeleton of the project was set down on my office whiteboard, I was still the lead developer of Jmol, and Egon Willighagen and Christoph Steinbeck had contributed code to the Jmol project. Christoph’s pet code was a neat 2-d structure editor called JChemPaint, and Egon was working largely on the Chemical Markup Language (CML), although his code contributions were showing up nearly everywhere. Egon and Christoph were in the US for a “Chemistry and the Internet” conference and made a side trip by train to visit me so we could figure out how to unify these projects and to make a more general and reusable set of chemical objects.

The CDK waterfall whiteboard

The CDK waterfall whiteboard

The CDK design session was a fun weekend. In retrospect, they were some of the purest days of collaborative creativity I’ve ever experienced. We spent many hours and a lot of coffee hashing out some of the basic classes of CDK. The final picture of the whiteboard shows a classic waterfall diagram of what we were going to implement.

I’m the first to admit that my contributions to CDK were minimal. Egon & Chris ran with the design, expanded and improved it, implemented all the missing pieces, and released it to the world. It has become an important piece of scientific software, particularly in the bioinformatics community. Beyond Egon & Chris, Rajarshi Guha has been one of the prime developers of the software.

CDK is, by all objective standards a fantastic success story of open source scientific software. It has a large and vibrant user community, active developers, and a number of people (including myself) who browse the code just to see how it does something difficult. Egon has written a thoughtful piece on where CDK should go from here.

Happy Birthday CDK!

VN:F [1.9.10_1130]
Rating: 4.2/5 (5 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science, Software | 3 Comments

Packmol

Packmol One of the biggest issues you face when you first start doing molecular dynamics (MD) simulations is how to create an initial geometry that won’t blow up in the first few time steps. Repulsive forces are very steep if the atoms are too close to each other, and if you are trying to simulate a condensed phase (liquid, solid, or interfacial) system, it can be hard to know how to make a sensible initial structure.

Packmol is a cool program that appears to solve this problem. It creates an initial point for molecular dynamics simulations by packing molecules in defined regions of space. The packing guarantees that short range repulsive interactions do not disrupt the simulations. The great variety of types of spatial constraints that can be attributed to the molecules, or atoms within the molecules, makes it easy to create ordered systems, such as lamellar, spherical or tubular lipid layers. It works with PDB and XYZ files and appears to be available under the GPL. Very, very cool!

VN:F [1.9.10_1130]
Rating: 4.7/5 (10 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science, Software | 2 Comments

Gwyddion – Open Source SPM analysis

Gwyddion We just discovered a very cool open source program for analyzing scanning probe microscopy (SPM) data files. There a number of incompatible and proprietary file formats for surface microscopies (AFM, MFM, STM, SNOM/NSOM) and getting data out from a microscope for further processing (including baseline leveling, profile analysis, and statistical analysis) can be a difficult task. Gwyddion is a Gtk+ based package that runs on Linux, Mac OS X (with MacPorts) and Windows and appears to do nearly everything that some expensive commercial packages (and some free closed-source packages) can do. Some of our colleagues were very happy to discover this piece of wizardry!

VN:F [1.9.10_1130]
Rating: 5.0/5 (4 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science, Software | Leave a comment

Open Science on “Future Tense”

Yesterday’s “Future Tense” radio program on Australian Broadcasting was just posted online. The topic was Open Science, and I managed to get interviewed for the show. The interview with Anthony Funnell was a great conversation, and he’s pulled out some of the better bits while making the Open Science movement sound only slightly utopian.

VN:F [1.9.10_1130]
Rating: 5.0/5 (1 vote cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in Uncategorized | Leave a comment

If you’re going to do good science, release the computer code too

A very nice aarticle by Darrel Ince has just been posted over at the Guardian. It deals with the climate-gate email theft and the quality of academic science code has just been . An excerpt:

Computer code is also at the heart of a scientific issue. One of the key features of science is deniability: if you erect a theory and someone produces evidence that it is wrong, then it falls. This is how science works: by openness, by publishing minute details of an experiment, some mathematical equations or a simulation; by doing this you embrace deniability. This does not seem to have happened in climate research. Many researchers have refused to release their computer programs — even though they are still in existence and not subject to commercial agreements. An example is Professor Mann’s initial refusal to give up the code that was used to construct the 1999 “hockey stick” model that demonstrated that human-made global warming is a unique artefact of the last few decades. (He did finally release it in 2005.)

VN:F [1.9.10_1130]
Rating: 5.0/5 (5 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Science, Software | 1 Comment

Kitware has a blog!

Geoff Hutchinson just pointed us to the new blog over at Kitware (the makers of VTK).  I’ve found VTK enormously helpful in the past (particularly the source to vtkMath.cxx) and I’m glad they’ve made the commitment to Open Source.

My favorite post so far: Why Open Source Will Rule Scientific Computing by Will Schroeder.

VN:F [1.9.10_1130]
Rating: 4.5/5 (2 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in open science, Software | Leave a comment

Being Scientific: Fasifiability, Verifiability, Empirical Tests, and Reproducibility

If you ask a scientist what makes a good experiment, you’ll get very specific answers about reproducibility and controls and methods of teasing out causal relationships between variables and observables. If human observations are involved, you may get detailed descriptions of blind and double-blind experimental designs. In contrast, if you ask the very same scientists what makes a theory or explanation scientific, you’ll often get a vague statement about falsifiability. Scientists are usually very good at designing experiments to test theories. We invent theoretical entities and explanations all the time, but very rarely are they stated in ways that are falsifiable. It is also quite rare for anything in science to be stated in the form of a deductive argument. Experiments often aren’t done to falsify theories, but to provide the weight of repeated and varied observations in support of those same theories. Sometimes we’ll even use the words verify or confirm when talking about the results of an experiment. What’s going on? Is falsifiability the standard? Or something else?

The difference between falsifiability and verifiability in science deserves a bit of elaboration. It is not always obvious (even to scientists) what principles they are using to evaluate scientific theories,[1] so we’ll start a discussion of this difference by thinking about Popper’s asymmetry.[2] Consider a scientific theory (T) that predicts an observation (O). There are two ways we could approach adding the weight of experiment to a particular theory. We could attempt to falsify or verify the observation. Only one of these approaches (falsification) is deductively valid:

Falsification Verification
If T, then O
Not-O
If T, then O
O


Not-T T


Deductively Valid Deductively Invalid

Popper concluded that it is impossible to know that a theory is true based on observations (O); science can tell us only that the theory is false (or that it has yet to be refuted). He concluded that meaningful scientific statements are falsifiable.

A more realistic picture of scientific theories isn’t this simple. We often base our theories on a set of auxiliary assumptions which we take as postulates for our theories. For example, a theory for liquid dynamics might depend on the whole of classical mechanics being taken as a postulate, or a theory of viral genetics might depend on the Hardy-Weinberg equilibrium. In these cases, classical mechanics (or the Hardy-Wienberg equilibrium) are the auxiliary assumptions for our specific theories.

These auxiliary assumptions can help show that science is often not a deductively valid exercise. The Quine-Duhem thesis[3] recovers the symmetry between falsification and verification when we take into account the role of the auxiliary assumptions (AA) of the theory (T):

Falsification Verification
If (T and AA), then O
Not-O
If (T and AA), then O
O


Not-T T


Deductively Invalid Deductively Invalid

That is, if the predicted observation (O) turns out to be false, we can deduce only that something is wrong with the conjunction, (T and AA); we cannot determine from the premises that it is T rather than AA that is false. In order to recover the asymmetry, we would need our assumptions (AA) to be independently verifiable:

Falsification Verification
If (T and AA), then O
AA
Not-O
If (T and AA), then O
AA
O


Not-T T


Deductively Valid Deductively Invalid

Falsifying a theory requires that auxiliary assumption (AA) be demonstrably true. Auxiliary assumptions are often highly theoretical — remember, auxiliary assumptions might be statements like the entirety of classical mechanics is correct or the Hardy-Weinberg equilibrium is valid! It is important to note, that if we can’t verify AA, we will not be able to falsify T by using the valid argument above. Contrary to Popper, there really is no asymmetry between falsification and verification. If we cannot verify theoretical statements, then we cannot falsify them either.

Since verifying a theoretical statement is nearly impossible, and falsification often requires verification of assumptions, where does that leave scientific theories? What is required of a statement to make it scientific?

Carl Hempel came up with one of the more useful statements about the properties of scientific theories:[4] “The statements constituting a scientific explanation must be capable of empirical test.” And this statement about what exactly it means to be scientific brings us right back to things that scientists are very good at: experimentation and experimental design. If I propose a scientific explanation for a phenomenon, it should be possible to subject that theory to an empirical test or experiment. We should also have a reasonable expectation of universality of empirical tests. That is multiple independent (skeptical) scientists should be able to subject these theories to similar tests in different locations, on different equipment, and at different times and get similar answers. Reproducibility of scientific experiments is therefore going to be required for universality.

So to answer some of the questions we might have about reproducibility:

  • Reproducible by whom? By independent (skeptical) scientists, working elsewhere, and on different equipment, not just by the original researcher.
  • Reproducible to what degree? This would depend on how closely that independent scientist can reproduce the controllable variables, but we should have a reasonable expectation of similar results under similar conditions.
  • Wouldn’t the expense of a particular apparatus make reproducibility very difficult? Good scientific experiments must be reproducible in both a conceptual and an operational sense.[5] If a scientist publishes the results of an experiment, there should be enough of the methodology published with the results that a similarly-equipped, independent, and skeptical scientist could reproduce the results of the experiment in their own lab.

Computational science and reproducibility

If theory and experiment are the two traditional legs of science, simulation is fast becoming the “third leg”. Modern science has come to rely on computer simulations, computational models, and computational analysis of very large data sets. These methods for doing science are all reproducible in principle. For very simple systems, and small data sets this is nearly the same as reproducible in practice. As systems become more complex and the data sets become large, calculations that are reproducible in principle are no longer reproducible in practice without public access to the code (or data). If a scientist makes a claim that a skeptic can only reproduce by spending three decades writing and debugging a complex computer program that exactly replicates the workings of a commercial code, the original claim is really only reproducible in principle. If we really want to allow skeptics to test our claims, we must allow them to see the workings of the computer code that was used. It is therefore imperative for skeptical scientific inquiry that software for simulating complex systems be available in source-code form and that real access to raw data be made available to skeptics.

Our position on open source and open data in science was arrived at when an increasing number of papers began crossing our desks for review that could not be subjected to reproducibility tests in any meaningful way. Paper A might have used a commercial package that comes with a license that forbids people at university X from viewing the code![6] Paper 2 might use a code which requires parameter sets that are “trade secrets” and have never been published in the scientific literature. Our view is that it is not healthy for scientific papers to be supported by computations that cannot be reproduced except by a few employees at a commercial software developer. Should this kind of work even be considered Science? It may be research, and it may be important, but unless enough details of the experimental methodology are made available so that it can be subjected to true reproducibility tests by skeptics, it isn’t Science.


  1. This discussion closely follows a treatment of Popper’s asymmetry in: Sober, Elliot Philosophy of Biology (Boulder: Westview Press, 2000), pp. 50-51.[]
  2. Popper, Karl R. “The Logic of Scientific Discovery” 5th ed. (London: Hutchinson, 1959), pp. 40-41, 46.[]
  3. Gillies, Donald. “The Duhem Thesis and the Quine Thesis”, in Martin Curd and J.A. Cover ed. Philosophy of Science: The Central Issues, (New York: Norton, 1998), pp. 302-319.[]
  4. C. Hempel. Philosophy of Natural Science 49 (1966).[]
  5. Lett, James, Science, Reason and Anthropology, The Principles of Rational Inquiry (Oxford: Rowman & Littlefield, 1997), p. 47[]
  6. See, for example www.bannedbygaussian.org[]
VN:F [1.9.10_1130]
Rating: 4.2/5 (6 votes cast)
Share and Enjoy:
  • Digg
  • Technorati
  • Slashdot
  • del.icio.us
  • Reddit
  • StumbleUpon
  • connotea
  • LinkedIn
  • FriendFeed
  • Google Bookmarks
  • Posterous
  • Twitter
  • Facebook
Posted in Open Data, open science, Science | 1 Comment