Cheng Soon Ong just emailed me about mloss.org, a community creating a comprehensive open source machine learning environment. Mloss.org is essentially a community portal with lots of detailed information about each of the listed projects. One of the more interesting features of their site is that they’ve tied specific software to publication in an associated journal, the Journal of Machine Learning Research to make it easy for users of the software to find and maintain a citation trail to the work of the original developers. The journal itself encourages open source submissions and automatically ties publication of papers related to the software to appearance at the portal.
This last bit is a very clever idea. Would a broader electronic journal (perhaps the Journal of Open Science) would be a useful way to give open projects (Open Source, Open Data, Open Notebook) more citation currency?
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
Go read this wonderful manifesto over at arXiv: Astronomical Software Wants To Be Free: A Manifesto by Weiner et al. The authors talk about some of the barriers to astronimical software development that are true in all scientific fields. The chief barrier they see is that there are no incentives (and are some real disincentives) for authors to release software and documentation to other users. The recommendations are great (modified here only to include all scientific fields):
- We should create an open central repository location at which authors can release software and
documentation.
- Software release should be an integral and funded part of projects.
- Software release should become an integral part of the publication process.
- The barriers to publication of methods and descriptive papers should be lower.
- Programming, statistics and data analysis should be an integral part of the curriculum.
- There should be more opportunities to fund grass-roots software projects of use to the wider community.
- We should develop institutional support for science programs that attract and support talented scientists who generate software for public release.
The whole thing is a great read. Check it out!
I just got email from Brandon Wood about an open source project called Quantum Espresso (formerly known as PWSCF), which is a rather extensive open-source project for DFT-based electronic structure calculations. It appears to be a refactoring of some established codes (PWscf, PHONON, CP90, FPMD, Wannier) that have been developed and tested by some of the original authors of novel electronic-structure algorithms – from Car-Parrinello molecular dynamics to density-functional perturbation theory – and applied in the last twenty years by some of the leading materials modeling groups worldwide.
There are definitely some scientific niches which desperately need open source codes (plane wave DFT is one of the ones that comes to mind), so I’m very pleased to learn about this project.
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
Some new software to point out today:
- In the Tools section, we have a new link to cb2bib a tool for rapidly extracting unformatted bibliographic references from email alerts, journal web pages, and PDF files.
- In the Atomic & Molecular Physics section we have a new link to FELLA, which stands for Free Electron Laser Atomic, Molecular, and Optical Physics Program Package. FELLA is a joint project of Christian Buth from LSU and Robin Santra at Argonne National Laboratory.
- In the Engineering section, we have two new links, one for View3D, a command-line tool for evaluating radiation view factors for scenes with complex 2D and 3D geometry, and one for OSIV a program that performs cross-correlation analysis of particle image velocimetry (PIV) images.
Check them out, and as always, be sure to suggest your favorite open source scientific software!
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
At the debate last night, John McCain brought up (twice!) for special scorn an example of spending on earmarks. His target? The “overhead projector for a planetarium”. It wasn’t the first time he’s brought up this earmark request up either. Bad Astronomy had a good post on how McCain’s comments on planetaria make him “literally antiscience”. The projector in question is hardly your run-of-the-mill overhead projector. The Adler planetarium in Chicago has a “Sky Theater” or a hemi-spherical dome on which it can project just about anything if you have the right equipment. Notre Dame (where I teach) has a very similar set-up in our digital visualization theater. The projectors we use were modeled on the current system at the Hayden planetarium, and just to give you some scope, we have a 50-foot high domed ceiling for a hexagon array of chairs that seats 136 students. The system is run with 10 computers, 8 of which do nothing but render 3D objects and transform them for hemispherical projections. It was a million dollar facility that goes a long way toward making all aspects of science visible to our students. In fact, as earmarks go, the planetarium projector at the Adler is a lot less offensive than some other projects (notably a certain bridge in Alaska).
In the past, McCain has also targeted for scorn an expenditure to study the “DNA of bears in Montana”. To be fair, other earmarks have also been his target: The Woodstock museum, and the bridge to nowhere (at least until he picked a running mate who was in favor of that same bridge) have also been the targets of McCain’s anti-pork ire. But last night, he seemed to express a special loathing for earmarks for science.
Now, a good case can be made (and should be made) that using earmarks to fund basic science research or science outreach is just bad policy. In fact, I’d be happier if the budgets for science-related earmarks were turned over to the NSF in order to fund peer-reviewed and merit-based proposals. But if the earmarks are the only way to fund science outreach projects like the Adler’s planetarium, then count me in. It is certainly a better use of money than David Vitter’s proposed earmark of $100,000 for a group that promotes “creation science”. In fact a list of examples of religious earmarks pointed out by Americans United for the Separation of Church and State are all worse than the Adler planetarium project.
Posted in Policy, Science
|
David Karger‘s lab at MIT has developed some neat web software called exhibit, which is designed to let non-ultra-sophisticated individuals publish data in ways that make it immediately accessible and interactive for people encountering it on the web. With exhibit, a scientist with a lot of data doesn’t need to manage a database (mysql, etc.) and program a front end for it. Instead, they can put a data file (as simple as a spreadsheet) and a presentation file (written in basic html) on their web site and they’re done. There are a couple of great examples including an interactive elements table that one of Karger’s undergraduates put together.
Exhibit is a three-tier web application framework written in Javascript, which you can include like you would include Google Maps. The integration with Google maps is quite impressive. One can imagine using it to display geographic or other spatial data. In fact, here’s an exhibit of Danish monthly weather records since 1874. And here’s a great example of exhibit being used to display a bibliography for the MIT haystack group.
Other useful related projects are Timeplot and Timeline for placing interactive time data on a web page.
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
Some new software is in our Knowledge Discovery and Data Mining section. I can remember a time when “data mining” was a bit of an epithet in science (like “fishing expedition”), but now it has become an established way of finding links and connectivities in large data sets. Three new open source data mining programs appeared on our radar recently:
- KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models.
- RapidMiner (formerly YALE) – not much detail is known about this package
- Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
I don’t know how I missed this before, but there’s a really interesting article from 2006 up at the Harvard Business School “Working Knowledge” site. It details some of Karim Lakhani‘s results from a paper called ‘The Value of Openness in Scientific Problem Solving‘. The paper itself is actual detailed research on different methods of scientific problem solving that is really worth a read for anyone in the Open Science movement. They went looking to see if “Broadcast Search” (i.e. telling the world what problem you are working on) is an effective means of problem solving. My favorite part of the paper:
Our most counter-intuitive finding was the positive and significant impact of the self-assessed distance between the problem and the solver’s field of expertise on the probability of creating a winning solution. This finding implies that the farther the solvers assessed the problem as being from their own field of expertise, the more likely they were to create a winning submission. We reason that the significance of this effect may be due to the ability of “outsiders†from relatively distant fields to see problems with fresh eyes and apply solutions that are novel to the problem domain but well known and understood by them.
So, I like Radiohead. A lot. Kid A has been in permanent rotation in my music collection for a couple of years now. But their new video for House of Cards is something else entirely. It was generated from 3-D data of Thom Yorke’s face collected via a Geometric Informatics scanning system which uses structured light to capture 3D images at close proximity. There’s an official video, but the best part is the completely interactive data viewer. Try it yourself!
The code I’ve been working on has some cool features. If you give it a list of atoms and bonds, it automatically figures out bend and dihedral interactions using simple graph concepts. That is, if the molecule has a bond between atoms i and j and another bond between atoms j and k, you can easily deduce that there’s a bend interaction between i, j, and k. Similar three-bond ideas can be used to automatically determine dihedral interactions: Find bonds i-j, j-k, and k-l, then you can deduce the torsion for i-j-k-l.
For out-of-plane bends or improper torsions at the sp2 sites, there’s no simple graph theory way to determine an out-of-plane interaction. You actually need to know something about the chemical identity of the central atom. At least, I think this is the case. I’d love to be proven wrong, because keeping track of valences and bond counts is beyond the level of coding I wanted to include.
VN:F [1.9.3_1094]
Rating: 0.0/5 (0 votes cast)
Posted in Science, Software
|
Tagged OOPSE
|