So I’ve been playing around with how sites respond to phrases and typo’s that are so common in everyday computing and my finding have been less that, well, none existent.
When I played around with search engine features in sites like eBay and Amazon and entered words with misspellings or phonetics such as searching ‘Chair’ as ‘chare’ or ‘table’ as 'taybel', I was given little or no results (AKA ‘The Dead End’) and the information which was displayed gave me no incentive to try again or look into why the results were so poor. So I started looking at the UX of search and how this could be improved (if possible) and if it would be possible for a search engine to learn words and phrases that someone may enter which otherwise wouldn’t be collect and potentially losing large amounts of useful data and stats.
I firstly reviewed some sites and what I could improve so a user wouldn’t be disheartened when trying to find a product or service and decided to build a simple word association in PHP. This started out by using a GET variable in the Web address bar and have it return a bunch of data from a database that was linked to said word, for example, if i was to enter ‘dog’ which would be displayed in the browser as
This type of word association is pretty basic in the world of ‘Did you mean’ techniques employed by most ecommerce sites, so I took it a step further and asked myself the question – ‘What if the user defined these associated words during ‘The Dead End’?’ and by this I mean, what if I collected all of this mismatched, error-filled search queries and used it to aid the user? For example, you searched for a cabinet in an online shop but typed ‘cabenet’ – you very may well be referred back to ‘cabinets’ using the LIKE syntax in SQL but in the case of complete word destruction (kabbinets) that actually does happen – how could I load a result still, before reaching the dead end?
My theory would be to build tool that firstly checked the result using LIKE and then use a back up plan if nothing could be displayed, which is where ‘The Learning Engine’ begun. I built a search box that searched for the matching word in a database and using a series of checking methods (LIKE, SOUNDEX, PHONO) it would resort to displaying the most likely ranked links like so:
Search : “kabbinets”
Result : “cabinets, carbon fibre bike frames, kangol”
This is where the system would learn and the data collected. The script would open a ‘ticket’ in the database when a search result hadn’t been found, this would contain the search query and the date as reference, it would then log the information that the user selected and then add that as a reference to the ‘references’ table followed by a +1 ranking and the ticket closing and deleted.
The reason for the open ticket would be to verify that the user had infact chosen a match, if not, and they had redirected away from the page, it wouldn’t add false references to the database. After doing this with a couple of words, I started building a dictionary of reference words and added a function that said if the 0 result search query was equal to over 10 rank points, automatically redirect to the reference which meant that any misspelled word would automatically goto the relevant page as ranking would be high enough to prove this word is commonly mistaken. The obvious flaw here would be someone sat typing gibberish and purposefully clicking wrong links to make the reference data invalid but this could be avoided with simple IP recognition in the script.
I’m currently working on improving this tool to try and make it even better but you can take a look and play around with it from the link below.