[TriLUG] OT: Standardized Data Crunching

Tom Roche Tom_Roche at pobox.com
Mon Feb 11 12:51:18 EST 2008


Tom Eisenmenger Mon, 11 Feb 2008 7:35:28 -0800
 > I'm preparing a presentation/proposal exploring the application of
 > the open source philosophy to solving some of the world's more
 > daunting problems (the example I'll use is developing economically
 > low-impact, environmentally high-impact policy using computer
 > modeling).

ICBW, but ISTM this is a kinda large topic for trilug. Feel free to
ping me offline about this; better yet, you might find or setup a
list. Given your (and my) interest in policy, you might try hosting @

http://governmentforge.org/

(which, despite the name, appears to be for public-sector NGOs as well
as governments, and is based on Savannah).

 > One of the more vexing "gotchas" is the lack of awareness about what
 > data is out there "floating around" and its apparent lack of
 > standardization.

Ah, yes: a central catalog of data. Good luck with that :-) Phil
Rhodes is quite correct that RDF et al provide the means to provide
standardized global-scale repositories; unfortunately they don't
provide the funding to do that.

 > A second question is whether or not computer models are available
 > online using some kind of standard API.

I'm betting the answers will depend on your domain of interest, and
what you mean by "available." Mine is climate forcing (what's yours?)
about which I know that

* information about many models is available online, e.g.

http://gctm.acd.ucar.edu/mozart/

* the ability to submit jobs or run those models is generally closed
  (to authorized users)

 > Of course, it could just be that I'm naive but having a variety of
 > computer models, but it seems to me that providing public data in a
 > standardized format (XML would probably be fine) and, more
 > importantly, providing computer modeling horsepower (of a wide range
 > of sophistication) through a standardized API, all freely available
 > on the net, would be a no-brainer.

OK, I'll say it: you're naive :-) Not that this isn't a good
idea--it's a great idea, it's just not a no-brainer. There's a
tremendous "legacy" out there, of both tools and data. Just providing
secure open access to what we have now would be a helluva lotta
work--hell, just documenting how to access all that would be a
challenge! given that, always and everywhere, Documentation Comes
Last--and that's gonna require some resources. Then, on top of that,
you want to

* *standardize* access to those tools and data. Try standardizing
   authentication first! Talk to anyone who's ever worked on SSO: it
   ain't easy.

* provide hardware on which to run those tools on that data, and
  generate new data

... we're talking longterm, financial committment. To which the
obvious rejoinder is to point to, e.g., Google/gmail or SourceForge:
free services and hardware to run them on. My response to that is

* Google et al pay for these services by selling advertising. I'm not
  thinking that would work in the modeling/policy domain. Am I missing
  something?

* there's no standard for interoperation even between free mail
  suppliers, and email is a pretty narrow domain compared to all
  scientific data.

Net: great idea, and I'm interested in work that you et al are doing
on this, but IMHO this is, literally, the work of a lifetime. OTOH,
make it happen and you'll be covered in glory. (Or something :-)

FWIW, Tom Roche <Tom_Roche at pobox.com>




More information about the TriLUG mailing list