Natural Language Search

Powerset ( looks interesting. At present the company is in “semi-stealth” mode. Gathering investment as well as developing their natural language search engine.

Powerset’s Barney Pell has someinteresting stuff to say on the topic.

Well worth a read, some interesting concepts with respect to the way that search engines work nowadays.

Search engines are keyword based and at their heart are really just boolean based searches against their index. They take your search term and stripout out what are known as stopwords leaving just the keywords. Stopwords are words such as a, about, from, of, for and the like, these would only complicate the results of a boolean search as they are such common words.

Pell and gang demonstrate that in some searches these words are acutally useful. For example, take these three search terms:

  • Books for children
  • Books about children
  • Books by children

When for, about and by are all stripped out we are only left with Books children, and the search engine cannot distiguish between the three quite different purposes of the queries.

Pell says that we are all searching with an impovourished, pidgin english at present, I for one would welcome a more natural approach at times. I’m sure like me, many of you have sometimes come across a particular search that never seems to get the results that you’re after, or at the most it takes a long time to get the right string of keywords and advanced search options. Imagine what searching is like for the less techinacally minded out there who don’t speak keywordese. NL searching, if it works and is marketted well to that larger group of people, could be very successful.

If though, when it launches, it doesn’t have a toolbar-esque plugin then I will find it very difficult to remember to use it. When I want something my mouse cursor always goes straight for my Google toolbar.

USB Keyboards

It is just me or are USB keyboards slow? Everyone that I have ever used has a little bit of trouble keeping up with my typing sometimes. Is this something that anyone else has experienced?

Perhaps it’s just poor membranes and the completely unpositive feel. That’s it I’m going to finally bite the bulllet and get a nice Cherry M8 mechanically switched keyboard.

There you go, ordered. Matter of fact my boss was kind enough to pay for it.

Cheers Brian!

Windows Live Writer

I’ve just downloaded Windows Live Writer (WLW), Microsoft’s desktop/offline Blog composer/writer software.

Set up was very easy, I simply game the address and login details of my blog. Not only did WLW determine what my blog was running on, but it also has downloaded and used my CSS.

Editing in the WLW is therefore truly WYSYWIG as I am typing I am seeing everything styled by the CSS from my site. I’ll do a screen shot.

Click for bigger pic

There you go, all I did was ALT-PrintScreen and paste. Looks like it’s inserted a thumbnail with a link. Ah yes it has, there are plenty of handy properties to choose where it links to and how big it is etc.

I’m quite enjoying this, let’s try and publish now.

– edit –

Well that published fine, I forgot categories as always though. I can also do those here, it’s picked them up. Can’t add any new ones though. Still I think I may find this a useful place to store the longer type of heavy thinking articles.

Ahah! I have just discovered you can also open and edit from a list of existing posts. I was wondering how you did that. Overall this is quite a handy utility.

PHP Wiki Software and Skinning

Having recently acquired some unix hosting I’ve been experimenting with various PHP/MySQL based applications. WordPress for example that this blog runs under, as well as Joomla for CMS and phpBB for forums.

Having been a long time user of ASP applications some of my experiences have been quite refreshing, especially with respect to ease of installation on shared space in some cases.

Now just the other day I decided I needed a Wiki for a new project. I’ve been using the driven FlexWiki for some time now and have been very happy with it’s ease of install and the small changes I usually want to make to look and feel. When it came to choosing a unix based option I immediately plumped for MediaWiki, it seemed an obvious choice being that it runs the most famous Wiki out there, WikiPedia.

Oh how wrong could I have been though. When it came to skinning it to get a look that suited my purpose, it became a complete nightmare. The skinning system is a complete disaster area and it requires more work than I have time for to get your head around it. Yes it can be done, as witnessed by the Mono-project website or the Mozilla Dev site, but it’s a complete bitch. The mixture and muddle of markup and PHP code is just simply unprofessional.

So I had a look around and after a while searching I came up with a beautifully easily skinnable Wiki software. PMWiki has a great philosophy behind it’s code base and a simple skinning system that is a joy to behold after wading through the nightmare depths of MediaWiki.

I shall enjoy designing a nice template later tonight.

Sigma SD14

Sigma’s new D-SLR the Foveon X3 powered SD14 is looking very nice. The teaser site has some very luscious photography on it, mostly portraiture, that has the look of well scanned transparencies rather than digital pics.


I must admit I find the whole Foveon sensor concept very appealing, yet I’m never likely to go the Sigma route now that I have a collection of EF mount lenses. Now if Sigma were able to build bodies with different mounts as they do with their excellent range of lenses then I expect you’d see many many more people thinking about upgrading to a Sigma body. Especially if the sample images from the SD14 are anything to go by.

Also announced at the same time though is the new Sigma DP1 compact digital camera. This a similar Foveon X3 sensor as the SD14 with the same 14 mega pixels. So if you’re looking for a compact and want to try out the whole Foveon lark then this seems the perfect route for those of us that can’t see ourselves switching to a different mount for our SLRs.


It has a prime lens equivalent to a 28mm. Looking t the pics it also has a large LCD screen, and manual controls. Looks promising.


Millenium/Gollancz SF Masterworks List Updated

I’ve updated my Millenium/Gollancz SF Masterworks List to include all the latest releases and even a few that are not quite released yet. They are now upto seventy in number.

Have fun reading.

I’m thinking about creating a dynamic list where you can sign up and select the ones you have purchased so you have a quick way to tell which ones you still need for the collectors out there. Anyone think that’s a good idea?

An incoming link ( made me realise people must actually use the list, so you have them to thank.

4 digit door entry systems

Our office block is protected by a keypad. The entry code is 4 digits long and the pad has the numbers 0 to 9. Now I noticed the other day, and have discovered that this is quite common with these entry systems, that there is no kind of punishment for getting the number wrong. The pad just looks at the last 4 numbers you have typed.

So you can try numbers over and over at quite a high speed till you get the right four, with no delay enforced for getting anything wrong.

So I’m thinking, could you make a small device with 10 solenoids that tried every number in turn? How long would it take?

There’s 10,000 four digit combinations, or 40,000 keypresses, but as it only looks at the last four digits we can seriously shorten that.

For example if I type 012345 then I’ve just tested 0123, 1234 and 2345. I looked about a bit and found something called de Bruijn sequences. A de Bruijn sequence is the shortest sequence containing all possible words of a certain length.

So it turns out that the shortest sequence is 10,003 digits long. Googling about some more I discovered this site, which generates de Bruijn sequences for you.

Say for example the keypad can manage 10 digits a second then we need 1000 seconds to try every possible combination. That’s roughly 15 minutes. Not bad down from an hour but I’m sure it could be faster.

Now, the person who regularly resets the code on my building is obviously fond of history, as the codes are always famous dates.We have not, for example, ever had a code that didn’t start with “1”.

I expect many people come up with a year or a day/month combination when asked to generate a 4 digit code. I wonder how many people pin numbers start with 0, 1, 2 or 3? Given this is there a way to order the sequence to increase the probability of an earlier win?

Humans are rubbish at being random. If we concentrate on years then it would seem to make sense to check 19[0-9][0-9] first or just the first 2000. If we are thinking many people choose day/month combinations such as then we shoud check for [0-3][0-9][0-1][0-9] or for US month/day [0-1][0-9][0-3][0-9].

To test any of this we really need sample data to work on. I reckon we need a sampling of the kind of 4 digits humans would choose, that would be a good start. How to go about gathering that data though?

Quote Time

Someone just said this to me and it really sums up how I’m feeling today.

“Here I am, brain the size of a planet, and they’ve got me parking cars.”