One interesting question that I often think about is the application of ideas from data mining / AI to the wider world. Obviously, for good or ill, data mining has a lot of impact on our lives. Whether we get that loan, get targeted with those mail shots or get offered a particular set of products online is determined by data mining.
But what can discoveries in data mining do to improve the external world? One obvious area to me is the general topic of governance, be it big companies or big government. The systems that are put in place are much like the systems of rules generated by XmlMiner, or that old GP program I designed called Darwin. They are all sets of rules that are incrementally added to, whether they are a corpus of laws or company policies.
The thing that is interesting about many kinds of machine learning is that they incrementally add complexity. So during a training run a neural net algorithm might add more neurons, XmlMiner will add more sub clauses , a GP based system will add more mathematical operators. So they all act as a kind of mini lab for policymakers, not in specifics, but in general characteristics.
All these systems exhibit very similar behaviour as they add complexity, if you measure their performance on the data they are training on, and other data (the test set) drawn from the same system. Here is a typical graph.

As you can see the errors on the training set keep going down, but the performance on the fresh data improves and then at some point starts to get worse.
This shouldn’t come as any surprise. William of Occam in the 14th century stated “Entia non sunt multiplicanda praeter necessitatem” Basically that elements in an argument should not be expanded unnecessarily, or conversely that the simplest argument that fits the facts is most likely to be true.
Albert Einstein said exactly the same thing about theories in physics. I know of proofs of this idea from Bayesian Statistics, Information Theory and Number Theory. There is no doubt it applies to all systems that add complexity incrementally.
(Incidentally, XmlMiner has several strategies to terminate training at or near this optimum!)
In the domain of company policy or government, the training set represents all the problems the policy makers think they are solving or trying to solve. The test set represents all the rest of the world, the things they haven’t thought of, the things they don’t want to impact.
So what does this mean to policymakers? Well, I guess the simplest reading might be: “If you keep on tinkering you’re going to make it worse.” It means that lawmakers should have sunset clauses on all new laws, that every new law added should result in at least one being removed. That complex tax systems are at some point self defeating, that company rule books should be as simple as possible and that too much complexity will kill profits.
What this can’t help with is determining at what point a system becomes too complex. This will differ from one system to another.
All over the world, for instance in the UK with ever increasing tax system complexity, in the EU with the enormous rule book created to govern every area of commerce, there are governments busily adding to the rules that bind their citizens. In many cases this justifies their existence; they don’t feel they are doing their jobs unless they add to the corpus of laws. At some point their work becomes counter-productive. They should be aware of this.