For several years Scientio has been creating class libraries that perform complex functionality. In order to buy these products, potential purchasers want to evaluate them, but because they're complex there are lots of pitfalls for the unwary. It's no good to us if a potential purchaser downloads a demo, and uses it on a machine with too little memory, or with the wrong kind of data, or the wrong settings, and then unjustly comes to the wrong conclusion about our software. Also we're very proud of our intellectual property, which represents a huge investment, so allowing customers access to .Net and Java versions that are easily reverse engineered is not the best way to guard our IP.
Enter Web services. Scientio has taken the strategic decision to offer free web services for all our products. On our sister sites http://www.metarule.com and http://www.chaoskit.com you can access web services for Xml Miner and Chaoskit. You will shortly be able to access ConceptMine here.
The reasons are twofold. Allowing students, academics, hobbyists and casual users to use the services can only help to publicize our products. Allowing potential commercial users to experiment with a web site offers a controlled way for them to try out solutions, while giving us vital feedback on usage patterns and scenarios.
It's win/win. We don't even care if you use the services as part of a commercial mashup. If your mashup becomes popular you're bound to want a more reliable solution than an open web service, and we'll be here to sell you one.
So, we're in the middle of testing an open web service. It makes use of the db4o object database for back end storage and can handle multiple users, each with multiple corpora. The service doesn't store documents, of course, it only stores the indexes you supply to documents and the signatures generated from the documents. We're planning to add indexes for major research corpora such as the WT10G and the Reuters data set. Initial trials show the expected O(log(n)) lookup time and the expected linear document processing time, for reasonably large sets. We expect to have the system running with large sets like the WT10G by the new year.