Big data applied downunder

The ability of organisations to search vast volumes of data has led to an increasing focus on semantic search technology. Wellington company SYL Semantics is at the forefront of this field and has so far signed three government customers (which…

The ability of organisations to search vast volumes of data has led to an increasing focus on semantic search technology. Wellington company SYL Semantics is at the forefront of this field and has so far signed three government customers (which it can’t disclose). SYL chief scientist Peter de Vocht, who developed the software, speaks four languages and has a master’s degree in computational linguistics. His parents relocated from Europe to New Zealand when he was in his teens, and he attended Auckland University. He talks to Randal Jackson.Why did you develop this software?
Ever since I was in my teens, I was inspired by popular culture to look at Artificial Intelligence. By the time I was 16 years old I started writing larger computer programs and tried my hand at writing a natural language system. Very soon I realised I lacked the necessary knowledge to get anywhere. So when it was time to go to university, I enrolled in computer science and began learning all I could.
I realised that the academics of the time didn’t really have any ideas on how to build the system I kept seeing in my mind’s eye. I finished a computer science master’s degree in computational linguistics. Computational linguistics is basically combining computer programming with human languages (aka. natural language).
I started training as a commercial programmer, writing a variety of systems and learning how to write better code all the time. I had a brief stint in the gaming industry, where I was asked to choose between my job and my family. After choosing family-friendly environments, I ended up in Wellington. I still dabbled with systems to the point where I became a mentor and started teaching.
Suddenly I had time to continue my thoughts and work on this system I kept seeing in my mind. I decided to implement one of the popular artificial intelligence computer languages of the time called Prolog. I thought that if I could write such a logic system, it would help me understand the final pieces of the puzzle. It took me a few times to write it and get it to go. Once I wrote it, I based the first prototype of SYL on this home-grown Prolog language.
I decided to look for funding and approached a few people with my ideas of combining logic programming with human languages. Search seemed to be the most natural and first application. If nothing else, computer science had taught me that the understanding of artificial intelligence was based primarily on search.
I met Sean Wilson, who evaluated the idea of creating a platform for language analysis mixed with search. Sean also had a few innovative ideas about how to construct and run the business and already had a lot of contacts all over town.
SYL’s roadmap is much larger than just search, it was initially meant to be much like Apple’s Siri when I thought of it. The business world, however, has an appetite for data that is unmatched with anything else I had ever seen up to that point, and I decided that these challenges too would keep me busy for a long time. What practical problem were you trying to solve?
I’ve always been interested in helping people with knowledge assistants. The long-term, far away problem I was trying to solve was that I wanted people to have intelligent assistants – perhaps like a virtual personal assistant.Why is the semantic approach important and compelling?
This assistant would need to be able to communicate with a human being. Our most natural ‘interface’ is face to face speech. It’s what people have done for thousands of years. There is also the case for knowledge building when it comes to semantics. For a computer to be able to help a person, it would need to be able to ‘understand’ what that person wanted. We do this with knowledge representation. Our modern computers are good at calculating sums and performing logical steps (usually of the form ‘if this is the case, then do that’).

Human language is complex and hugely ambiguous. When two people communicate, there is a lot of knowledge they share. Two people who grew up in the same culture at roughly the same time have still a lot more in common than, for instance, people from different cultures or times. So when we communicate, all that background knowledge is ‘assumed’ and doesn’t have to be repeated in each sentence.

Computers don’t deal well with background knowledge (since they have no culture, and need to be taught such structures from the ground up). To teach a machine to interact with us more naturally, one would have to introduce a mechanism for it to ‘reason’ at a logical level. Semantics is that last step before building that knowledge representation for a computer.

Why is the enterprise space an ideal space for SYL?

SYL in its current state is a basic language system with semantics, and without this knowledge representation, though this is on the roadmap. The future for SYL is a fascinating one where it would become more sophisticated and able to do more and more with the knowledge put into it. The enterprise space, too, is a smaller problem than ‘everything’ (for example Google’s approach). Dealing with the knowledge in a specific industry enables us to approach the problem of understanding in a more manageable way.

We can find out from our customers what they want the system to do first and work towards a solution that doesn’t boil the ocean.

Where is semantic technology going in the next five years?

As you might have already deduced, the goal of semantic technologies is to bridge that gap between what a human being means and what a computer needs to construct to use that information to ‘learn’, adapt and help that person in a meaningful way. I think that such technologies will proliferate and become common place. You will see a lot more products trying to manage large amounts of data more effectively too.

The only way this can be done is through ‘understanding’ what is in this information. Once a computer understands or processes the information it can easily detect duplicates, find relevant information (as opposed to looking for combinations of letters – keywords), and linkages between pieces of information.

Systems like SYL will be able to start answering rudimentary questions: Who did what, when, and where. The ‘why’ and ‘how’ are a lot harder, but they will follow.

How is this reflected in the way you created SYL?

My initial idea for SYL was to be a logical inference engine. That is – give the computer a clever logic engine first that can deal with the knowledge problem – the answering of who, what, when.

However, once I started writing this system, I soon realised that the language aspect of the system is much more important and has the potential to shape this logic. I came to the realisation that logic is a poor companion for human thought and language.

I think SYL can be much better than that by not relying heavily on logic, as other systems seem to automatically do. Human beings aren’t logic engines.

The other challenge of SYL is scalability. There is little use in having a system that can’t process real-time information, or store more than a few thousand concepts. I’ve always felt that working to the limitations of a technology should never be the main concern when designing a system.

There is of course a practical aspect to this; you do have to know what limitations are. Since the 1950s computer hardware has been increasing exponentially in capacity every 18 months.

Taking that into account and relying on it enables one to make predictions a few years ahead (not too many) on what is possible. My personal advice is “think big”.


Favourite mobile device: currently has an iPhone but moving to Android because of its more open platform

Car: Toyota Corolla

Most important technology innovation: Transistor

Who do you most admire? Aram Khachaturian (composer)