Articles on and examples of using Scientio technology.


Introducing Concept strings

May-292008
A big research interest at Scientio has been what we call Concept Strings. These are data structures that can be used to represent the content of a normal textual string.
The data structure contains both Part of Speech (POS)information and the associated concepts found for each word in a 2 dimensional sparse array.
 
Our research is driven by the realisation that there are common structures in two of the products we are currently working on: ScientioBot and our Sentiment mining service.
 
Both take short pieces of text and look up a response of some kind.  Current implementations of the two are very different; ScientioBot uses a tree based mechanism to look up words based on the original AIML implementation, whereas our current Sentiment Mining service uses naive Bayes and 'bag of word' approaches.
 
Our plan is to use a single representation of text and a matching mechanism for both these applications, and we hope many more.
 
So why bother with concept strings?
 
Well, it's what you can do with them.
For instance these two sentences:
 
'The cat sat on the mat'
'The feline squatted on the carpet'
 
are very different lexically, but identical as far as concept strings are concerned.
 
Similarly, if you try to measure the distance between two textual strings you end up with  a fairly meaningless measure of the number of letters in common, or the edit distance, whereas for concept strings you get an estimate of how far appart these two strings are in meaning.
 
We're working on tools to build a database of concept strings, and locate sub strings efficiently.
Watch this space.
 
 
 
Posted by Andrew Edmonds | 0 Comments | Trackback Url | Bookmark with:        
Tags: ConceptMine

Links to this Post

Comments

Name:
URL:
Email:
Comments:

CAPTCHA Image Validation