[TriLUG] OT: Standardized Data Crunching

James Tuttle jjtuttle at trilug.org
Tue Feb 12 09:52:20 EST 2008


It's been my experience working with processing geospatial data over the
course of a three year cooperative partnership under the Library of
Congress' National Digital Information Infrastructure Preservation
Program that while XML is often seen as the answer to data preservation,
resuse, and portability, it isn't.  For example, ESRI, the 800 lb
gorilla in GIS software, created a vastly complicated format called the
geospatial database.  They sort  of implemented an XML export feature to
aide portability.  It's become pretty apparent that a pile of XML is
pointless without extensive documentation and software agents that
understand it.

Apart from the issue of well documented formats, we hit a wall in our
ability to automated the processing, metadata extraction, and archive
ingest handling of geospatial data over less technical issues.  We'd
love to see layer naming conventions, file naming conventions, and
mandatory embedding of all referenced objects.  This will probably never
happen.

My project, the North Carolina Geospatial Data Archiving Project, has
gained a lot of traction specifically because we don't impose any rules
on data contributors.  Another NDIIPP project, a joint effort by
Stanford and UC Santa Barbara, has taken a different approach, I think,
and asked that all data be accompanied by valid and accurate metadata
and be packaged in a documented way.  I think they've had some
disappointments regarding participation.

One of the goals of the NDIIPP is to start developing a registry of
data.  They're also interested in the experiences of data curators
regarding collection, preservation, migration, and providing access to
various types of data.  The most recent NDIIPP funding stage is a
multistate initiative you may find interesting.  See the press release
at http://www.loc.gov/today/pr/2008/08-004.html  It's just getting
going, but looks to be useful.

Maybe I'm totally off-topic.  If so, I apologize.

Jim
Tom Eisenmenger wrote:
> I'm preparing a presentation/proposal exploring the application of the open source philosophy to solving some of the world's more daunting problems (the example I'll use is developing economically low-impact, environmentally high-impact policy using computer modeling).  
> 
> One of the more vexing "gotchas" is the lack of awareness about what data is out there "floating around" and its apparent lack of standardization.  A second question is whether or not computer models are available online using some kind of standard API.  Any and all suggestions, tips, etc. would be very much appreciated.  If you are aware of initiatives launched to build a networked community of datasets and models communicating through a standardized API, that would be a goldmine.  If not, your opinions as to why not would be appreciated!  
> 
> Of course, it could just be that I'm naive but having a variety of computer models, but it seems to me that providing public data in a standardized format (XML would probably be fine) and, more importantly, providing computer modeling horsepower (of a wide range of sophistication) through a standardized API, all freely available on the net, would be a no-brainer.  


-- 
--
---Jim Tuttle
------------------------------------------------------
http://www.braggtown.com
PGP Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x69B69B08




More information about the TriLUG mailing list