2000-09-28: I guess my views were too pessimistic, so I have decided to withdraw this text. Automatic Filtering appears much more real now than it did 4 years ago.


The following are my thoughts inspired by discussions with Alexander Chislenko and his paper Automated Collaborative Filtering in late 1996.



Forecasts - relevance and reliability

It is wrong to even try to decode the signal when the noise is stronger than the signal. This is more or less the case with the future. The noise increases exponentially with time, while the interesting signals usually decline.

It is especially wrong to try to predict the behavior of a white noise. This would be gladly agreed upon by everyone, the problem is that not everyone agrees upon whether the signal in question is white noise or not.

My major statement is that the quantitative development is a (more or less) predictable signal, while qualitative development is (more or less) white noise. This means that the statement "the computing power of personal computers will double every 18 months for the rest of the decade" (or, to sound more futuristic, for the rest of the millennium) is a meaningful statement, whose veracity we can gainfully discuss; while the statement "we will be able to replace object transmission with information transmission" is meaningless, because its realization will require a true scientific as well as technological breakthrough. This is not to imply that this is impossible. No, quite the opposite, the concept of a "3d printer" seems to be quite popular in the futuristic milieu, and it sounds at least as realistic as any other concept they play with. In a way they are reality already - the CAD/CAM systems do just that - you draw a part or an assembly in your workstation screen, and then send the electronic data to the "intelligent machines" that produce whatever you thought of. The problem is that to make the last step - to replace the enormous plants stack with high-power equipment with a small "3d printer" on your desk will require an enormous breakthrough in science and technology, and you cannot predict when it will happen.

It is quite easy to say: "Look, the newspapers are all but obsolete now: we can download the news from the Internet and then print it on our laser printer; similar to this, in 10 years (or in a couple or years, depending on the degree of optimistic intoxication of the speaker) we will download furniture and print it out instead of having it delivered to the door by a truck". My answer is simple: one has to know some physics to understand the complexity of the problem. I am yet to hear an optimistic opinion on the issue from a professional (although I do have a PhD in Math, I do not claim to be a specialist in the subject).

It is even easier to remind of the two famous episodes - a German professor telling Max Planck not to major in theoretical physics because it is a finished subject, where everything is already done and perfected, and nothing of any interest can ever happen. (I assume here that you know who Max Karl Ernst Ludwig Planck (1858-1947) was. If you don't, go back to high school!) Another one is an American physicist writing a paper proving the impossibility of heavier-than-air flight a couple of months before the Wright brothers flew their airplane.

The answer to this follows the infamous "Pot Principle". The joke goes like this:

A neighbor X goes to his neighbor Y and complains that the pot he lent Y was broken when Y returned it. Y replies: First, I did not borrow the pot from you. Second, when I took the pot, it was already broken. Third, when I returned the pot to you, it was not broken.

Therefore I reply: first, only the failed negative predictions become famous (while the correct ones abound and are happily forgotten), and, second, these predictions were correct.

The first argument is the other side of the coin for the well known fact that only successful predictions are famous, while failed predictions are abound and are never mentioned. This is true - for positive predictions. For negative predictions the situation is reverse - there are zillions of fulfilled negative forecasts, and only the failed ones are ever noticed. This is not to complain that the optimists are treated better than the pessimists, but to put the situation into a proper prospective: one should remember these facts when looking at a prediction.

The second statement is even simpler: those two denounced professors are simply misquoted. They meant (in case of Planck's mentor) or explicitly said (in case of the heavier-than-air flight) that the statement is predicated upon a premise that "nothing new will come up", like the Michelson-Morley experiment, or internal combustion engines and aluminum.

And this is exactly my point: UNLESS SOMETHING NEW COMES UP, "3d printer" is just as fantastic as a time travel. By something new I mean some breakthrough, and the timing of true breakthroughs (as opposed to "commercial" ones like HDTV and fake ones like Windblows 95) is and has always been white noise. In the last 400 years the intensity of that white noise was much higher than during the preceding millenia, but true breakthroughs remained, and in my humble opinion, will remain, white noise.

ACF - Automated Collaborative Filtering

The semantic filtering is a nice and important thing, and well worth any effort in terms of time and money, just like about any meaningful (and not so meaningful) research is. That said, the Pot Principle applies here also, in the following form: First, it is unlikely to be developed to the degree of being of much practical use on the scale it is being advertised for, and, Second, if it ever will be worth 10% of what it is being billed for now, it will bring more trouble (in terms, first and foremost, of lost Freedom and Privacy) that benefits. If you think you have found a contradiction in the above paragraph, read it again!

"Filtering Systems", "Semantic Transport", "Semantic Exchange Protocols"... Sounds very nice indeed. Who would refuse an opportunity to use a WWW search engine that will find what you want, not what you asked for? Or, say, a personal organizer that will be able to find something useful when you ask for "you know, I mean, that thing, ughm, that he, what was his name, told me to look at, it was supposed to be about that thing, everyone seems to be crazy about it nowadays"? There is a fine problem here.

Slaves of our own Tools

As the technology progressed from stone axe to computer, we have acquiesced to use tools beyond our understanding. This is quite unfortunate, but there is nothing intrinsically bad about it: after all, we have retained all our decision-making powers and facilities. But now we are being prodded into relegating our decision-making to "intelligent tools". Decision-making means just that - making important decisions, for if you give up the power to control the information you receive, you are effectively giving up your independence in making your decisions.

What is under the hood?

Although you cannot fix your car, and probably don't have a very clear idea how it works, it is still important to you that you can open the hood and look at the engine and maybe, with some help from a professional, find out what is wrong. The fact that the hood can be open at any time is the best possible guarantee against fraud: you don't even think that someone might put there something you don't want. If your car dealer or manufacturer would have put there something bad - like illegal drug disposer or a radio transmitter so that someone will be able to watch your location - you can be confident that some mechanic or just a techno-geek will notice that (this is purely imaginary, of course, but it is so because your hood is not sealed by the manufacturer).

Now, look at the "commercial" software. It comes in binary format, which is not readable to a human directly, without pre-processing called "disassembling", and the software vendors are working on making that illegal. How do you know it does only what you want? It is a well-known episode that at the time of registration the early "beta" version of MS Win95 sent out the directory structure of your hard drive. You may call me paranoid (which I am not! AM NOT!!! AM NOT!!! :-) but I cannot completely trust software for which I do not have the source code. Not that I will necessarily read all the sources myself (just as I will not explore the insides of my car), but I will feel safer. (There is another reason to use free, or open source, software - it is usually better than the competing commercial software, better supported, more stable and reliable and more user-friendly. You might want to look at the Free Software Foundation's homepage, and download their excellent GNU software.)

In case you are wondering about the relevance of the above paragraph to my main idea, let me clarify: commercial "Semantic Filtering" systems, which will soon mushroom from everywhere, will be insufficiently trustworthy to be of much use. If I know what I am looking for, a simple "non-semantic" search will suffice. If I am not sure what I need, I cannot entrust the search to Microsoft, Netscape or whoever will join the lot fabricating bug-ware.

Conclusion

Attempts at making the computers wade in the wealth of information for us are extremely important, and in the nearest future will be the most lucrative area in the software engineering. Their advent will obsolete the argument that there are some limits on our need for bandwidth (it will not be humans, but the automated filtering systems that will be absorbing the information). If you hoped for the opposite, you don't know the market: Microsoft will try to sell the "semantic version" of its junk to every PC user, not just to the maintainers of the main Internet nodes.

A new era will dawn if and when a working ACF system will be designed. Whether we will be happy or not with the changes it will bring depends on us, the customers, in a larger degree than on the programmers.



Comments are welcome. Flame and commercial junk mail is not.