Mimicking Humans: Using AI to Build a Better Workplace Search

Gaurav Kotak sits down with Founder and CEO of Qatalog, Tariq Rauf, in this inaugural episode of AI Unveiled. Qatalog offers a knowledge intelligence platform that automates and streamlines workflows for teams across tools, databases, and software. 

Qatalog’s core product is Workplace Search, which helps business users find information faster across all their SaaS apps. This is a treasure trove of unstructured data, so it’s no surprise their solution materially leverages AI and LLMs. 

In this episode, Tariq discusses the broader impact of generative AI on software development and how Qatalog uses AI today. 

The company has taken some novel approaches to AI, with two key examples discussed in today’s episode. First, they don’t store and index data. Instead, in real-time based on the user’s query, they use AI to determine which SaaS apps to query, and synthesize the results back to the user. 

Second, Qatalog decided to sunset their vector database because it wasn’t the ideal solution for vast amounts of data. Instead, they’ve fine tuned the model and moved from GTP-4 to Llama 2 to have greater control over the core engine of their product. 

This episode is a behind-the-scenes look at some of the most cutting-edge AI tools. You won’t want to miss it.

Timestamps:

* 4:34 – Searching without indexing: using AI to mimic how humans search for information

* 6:06 – The limitations of a chat interface and customizing the UI on the fly

* 14:26 – How to approach accuracy

* 16:52 – The decision to move from GPT-4 to Llama 2

* 20:04 – The shift away from vector databases

* 24:31 – How usage-based pricing ensures computation costs don’t eat into your margin

* 29:25 – Impacts from the broader shift from inorganic (deterministic) to organic (probabilistic) software

Highlighted Excerpts:

GAURAV: Can we talk a little bit more about how AI in general makes the experience more personalized or better for the end user?

TARIQ: So prior to this we essentially had lookup engines, keyword search, and to some extent vectors. Vector databases enabled semantic retrieval, but in order to do any of these things you needed to hold a copy of the data separately across, you know, if a business had 20 tools and they had 500 gigabytes worth of data, you would have to usually copy that over to an index or convert that into embeddings and store it in a vector database.

And you could imagine how complex and sophisticated and costly that is. And this new paradigm enables us to essentially leapfrog that requirement. And use AI to non-deterministically find the data sources where a specific question might be held, use AI to sort of understand, sift and sort through tremendous amount of results that come back and then synthesizing that back to the user, all of that is now through Gen AI. 

GAURAV: Amazing. Elaborate a little bit more on exactly how you were able to accomplish that technically.

TARIQ: Our engine is called Action Query, and we have essentially decomposed the problem of information retrieval and search like how a human would. And we have enabled Gen AI to essentially behave like this. When you search for something, we’re not doing a keyword retrieval and then showing you the results. We’re searching the systems like a human would, if that makes sense. 

And so today, imagine if you’re looking for information, what you would do is say, oh, that looks like something that’s got to do with engineering. Engineering is going to be in Jira and I’m going to go look and go and search in Jira and find the seven tickets. This person is looking for this thing in the seven tickets.

This might have the answer. And within that, here’s a response. So they’re essentially automating that workflow with AI on the fly rather than needing to keep a store of that information. So we are essentially a stateless machine that sits on top of the existing systems that somebody has and the AI essentially works in real time to get the answers back for the user.

TARIQ: I think vector databases are very meaningful and powerful when you have small data sets. When you can get away with low precision, low accuracy—things like customer support use cases, things like knowledge base retrieval when the knowledge base is like a shared resource amongst the company.

And it’s just maybe a few hundred pages or something like that. The vector databases start to fall apart when the data size is large and the retrieval needs precision and accuracy. 

So when the information set is small enough, it’s meaningful, but when it gets out of hand the recall drops, the performance drops, the scalability drops, a whole bunch of things just don’t fundamentally work with that technology. So vector databases work on something called ANNs, which is approximative nearest neighbor search. So maybe taking a step back, vectorizing information, there’s inherently loss when you go from a high dimensional string into a vector. And when you store these things in a vector database, there is a loss because there’s clustering of information. So semantic clustering happens when there’s a lot of material that are of a similar vein and match the query.

And so retrieval is lossy. The conversion is lossy. The retrieval is lossy. And when you query against this database, that mechanism is also an approximation. So the whole system is extremely approximative. And so in situations where approximations work really well, and that’s when the data set is really small or where the use cases you can get away with 60%, 70% accuracy, 

GAURAV: That’s amazing. And I think you really laid out kind of some of the trade offs you need to make. There’s no right answer, right? But kind of my takeaway here is one, understand your business case and your functional requirements.

And second, you’ve in a way chosen the hard way because it’s so core to your business, right? And, the complexity of the problem you’re solving is, again, heterogeneous data, high accuracy, and kind of that copilot nature, which also I would imagine the vector database approach probably doesn’t lend itself really.

GAURAV: Longer term, how important is AI, and specifically generative AI, to your strategic thinking as a product?

TARIQ: I think there’s a much broader shift happening. I think we’re a consequence of that much broader shift. AI is—we’re going from essentially inorganic software to organic software. So in nature, you have these biotic and abiotic systems. We’re essentially going from inorganic, which is extremely deterministic. You know, water molecule is H2O wherever, you go to essentially probabilistic systems. So cell divisions are not probabilistic. There’s an amount of, it’s not, it’s never replicated the same way. But as a water molecule is replicated exactly the same.

And so, we’re moving into a world where software is essentially inorganic. You can’t hold, put it in a box and say, this is what it does. This is what it might do is the world we’re going into, but this is what it might do opens up a whole world of possibilities. And thinking about the whole stack of software is going to be reinvented.