Is DATA the first step in intelligent and semantic applications ? list of usages back to home page
|
The old order is rapidly changing, and a battle is outside raging... The most serious disconnect of last 10-12 years is a restrictive word called RDBMS (relational database blah) , which developers seem to love, but business users cannot quite relate to. Business users like their data surrounded by (English or other) words. They like visible context and meaning. As a species, they understand this phraseology and exchange data/text this way. They often add it to atomic data after some local computing in excel. They often have the highly intelligent logic in little bits of Excel (the kind of pruning intelligence that us software folks can never imagine machines will do). In reality business users want to feed those intelligent Excel macros, not deal with some dumb applications that just seem to get and set data. Often, given those same contextual and meaningful text sources (which other business users have flavored and exchanged over the web), most development efforts treat the text as an irritant on the route to intelligent applications, and end up spending the most time and effort on making it clean and usable for business users. Paradoxical. In short, business users want unstructured (semi-structured) data, but to developers, such text is not data. In a simplistic way we can conclude that developers have learned stuctures but they havn't learned what runs the business. I think little pieces of semantic text runs the business. I may be quite alone and wrong in thinking so. As we enter Web 2.0 (which by the way stands for a semantic web, not the current mish-mash of social networking with nice visuals) - the ugliness of RDBMS becomes more prominent. Business intelligence projects suffer as 80% of the effort is nothing but a re-alignment of atomic pieces of data. Enter into this story two interesting technologies - Google Gears and SilverLight. So here now, yon business analyst, take a small chunk of my large centralized data onto your little desktop, and do thy specilaized reports stuff. Or, here take my diverse web services and create your own mash-up. The trouble is, the business analyst wants the whole world. Not your tiny dataset that has some reference data and some company operations data. Your true business analyst wants the works, the educational profile of your Directors, their experience in a particular field, their compensation tiny details, notes. But if you ask him/her today what data is needed, the answer is "I dont know". Exit database developers. Enter text mining. Ok, so you want this flavor, that flavor, and want to mix them up. Here it is, an Excel ful of it. Maybe it is still not pristine good as RDBMS, but you just snip and tailor it a bit. The central database stays of course, but you use also these little flavorings. That is what this little text mining tool is for. By no means it is a finished piece, but it does its job on any semi-structured web document with a little configuration effort. I suspect everyone has a bespoke web scraper or miner at hand, but there are few general purpose ones. They either go off into statistical high notes, or they try to match the HTML tags. Neither truly works in a general purpose case. I think the proof of the technology pudding is in the eating, when all is said and done. And we probably ought to never forget what Tim Berners Lee had to say about the semantic web.... |