[TriLUG] OT: Standardized Data Crunching
timjowers at gmail.com
Tue Feb 12 10:16:39 EST 2008
FWIW, from a different viewpoint, business integration efforts typically
fail as the data model is extremely entrenched in the business operation.
E.g. Accenture recently failed to deliver on integrating four order systems
extant in a major telco because they tried to use an industry standard XML
schema and conform the extant systems into it. Turns out, the four systems
were from the merger of four different companies and encoded alot of
information about how circuits are installed and configured and I believe
were more expressive than the BOML or whatever the industry standard XML
was. I'm not sure if the failure was due to lack of data expressiveness or
simply an inability to integrate.
My point is the standardization committees often fall short of the real
needs of the customer/app. The data sharing then becomes only surface level.
I'm also familiar with standard schemas in the auto insurance industry
(ACORD) and in healthcare (HIPAA EDI). Again, each business applies the
schema is a special way. Sure, an ontology can help but alot of businesses
have profitable reasons for NOT conforming. E.g. the insurance industry as a
practice tries to reject claims. By having special rules (like dependent DOB
being required even if service is for primary insured - a real rule by one
insurance company we encountered - then they can reject claims more times.
The longer they can delay paying then the larger their profit on the float.)
Likewise, ACORD is basically information on your driver record but we found
the actual ontology was data dependent. This is no different from the
non-normalized databases you run into commonly at large companies. E.g.
Oracles' OID and OITType fields are even part of Oracle's base tables!
That's not 2NF. It's the special cases which make the problem extremely
So, I second the opinion a business case needs to be made for data sharing.
Even with Federally Mandated formats like HIPAA (some might call it an
ontology), the interpretation and application differ. There are also many,
many legal issues with data standards. For instance, insurance policy
repricing (raising the rates) is controlled by limits set by the state's
insurance commission; so, repricing reasons might be susceptible to lawsuit
if these are published. (Of course one major home insurer who advertises on
TV a lot has their policy holders sign a form releasing them from the limits
set by the insurance committee; so, when I checked my bill last year it had
gone up about 25%!).
On a final note, consider the ICD (International Classification of Diseases
codes used by the medical industry). These are a bag of symptoms, causes,
and diseases but treated as diseases in most data implementations (hospital
systems, practice management systems). The utter lack of a CompSci audit has
left the healthcare industry making crap diagnosis such as diagnosing
"Chronic Fatigue" as a disease when in fact it is a symptom. The World
Health Organization defines these but obviously lacks basic computer science
training. The rule of efficient markets does not apply to healthcare. So,
the definition of data standards is so very, very, very important as today
we have industries just doing unacceptable data management and exhibiting
almost NO efforts for improvement. The governmental organizations have taken
a stab at it but really need both industry experts and qualified computer
scientists. Overall, I'd say HIPAA as a data format was a success as it both
allows claims and insurance information management and also allows the
government to track all healthcare and issues instantaneously in the
Anyway, just some thoughts from a non-GIS forefront.
President, United SWE, Inc.
On Feb 12, 2008 9:52 AM, James Tuttle <jjtuttle at trilug.org> wrote:
> It's been my experience working with processing geospatial data over the
> course of a three year cooperative partnership under the Library of
> Congress' National Digital Information Infrastructure Preservation
> Program that while XML is often seen as the answer to data preservation,
> resuse, and portability, it isn't. For example, ESRI, the 800 lb
> gorilla in GIS software, created a vastly complicated format called the
> geospatial database. They sort of implemented an XML export feature to
> aide portability. It's become pretty apparent that a pile of XML is
> pointless without extensive documentation and software agents that
> understand it.
> Apart from the issue of well documented formats, we hit a wall in our
> ability to automated the processing, metadata extraction, and archive
> ingest handling of geospatial data over less technical issues. We'd
> love to see layer naming conventions, file naming conventions, and
> mandatory embedding of all referenced objects. This will probably never
> My project, the North Carolina Geospatial Data Archiving Project, has
> gained a lot of traction specifically because we don't impose any rules
> on data contributors. Another NDIIPP project, a joint effort by
> Stanford and UC Santa Barbara, has taken a different approach, I think,
> and asked that all data be accompanied by valid and accurate metadata
> and be packaged in a documented way. I think they've had some
> disappointments regarding participation.
> One of the goals of the NDIIPP is to start developing a registry of
> data. They're also interested in the experiences of data curators
> regarding collection, preservation, migration, and providing access to
> various types of data. The most recent NDIIPP funding stage is a
> multistate initiative you may find interesting. See the press release
> at http://www.loc.gov/today/pr/2008/08-004.html It's just getting
> going, but looks to be useful.
> Maybe I'm totally off-topic. If so, I apologize.
> Tom Eisenmenger wrote:
> > I'm preparing a presentation/proposal exploring the application of the
> open source philosophy to solving some of the world's more daunting problems
> (the example I'll use is developing economically low-impact, environmentally
> high-impact policy using computer modeling).
> > One of the more vexing "gotchas" is the lack of awareness about what
> data is out there "floating around" and its apparent lack of
> standardization. A second question is whether or not computer models are
> available online using some kind of standard API. Any and all suggestions,
> tips, etc. would be very much appreciated. If you are aware of initiatives
> launched to build a networked community of datasets and models communicating
> through a standardized API, that would be a goldmine. If not, your opinions
> as to why not would be appreciated!
> > Of course, it could just be that I'm naive but having a variety of
> computer models, but it seems to me that providing public data in a
> standardized format (XML would probably be fine) and, more importantly,
> providing computer modeling horsepower (of a wide range of sophistication)
> through a standardized API, all freely available on the net, would be a
> ---Jim Tuttle
> PGP Key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x69B69B08
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
More information about the TriLUG