Blog

Our Company Has Helped More than 400 Customers Explore GenAI Adoption, Part 2

Technical Insights on GenAI, from Agents to BioLLMs

Our Company Has Helped More than 400 Customers Explore GenAI Adoption, Part 2

Listen to this article:

0:00
0:00

A short while ago I offered three crucial insights about GenAI adoption that Loka has gained through all the conversations and engagements we’ve conducted with a huge variety of clients. Today I offer three more. 

These observations are a bit more technical, aimed at business leaders who’ve embarked on the GenAI journey and find themselves somewhere between POC and production. They’re offered in the spirit of collective improvement, one of Loka’s core principles. 

Go Beyond Training with Text

Most people are familiar with Large Language Models (LLMs) that input and output text in conversational English, such as ChatGPT and Claude; these are known as natural language models. But LLMs can be trained to read and write on many other forms of unstructured data. 

Some of today’s most impactful LLMs operate in the languages of biology: DNA, proteins and molecules. These are called BioLLMs, and many are already revolutionizing drug discovery, chipping away at a process that historically takes an average of 10 years and billions of dollars. Loka is currently working on several BioLLM projects, and through them we’re gaining valuable expertise. 

Unlike natural language models, APIs are rarely available for BioLLMs, which makes deploying them all the more important for refinement. Additionally, out-of-the-box BioLLMs rarely solve real-world use cases. Instead they need to be integrated into broader architectures. For example, a common use is drug-target interaction, where a possible architecture could involve two BioLLMs: Molformer for encoding the molecular data and ESM for encoding the protein data. This model architecture is then fine-tuned for the downstream task with the client’s custom-labeled data.

Training data doesn’t have to be text-based. For instance, vision learning transcends text entirely. Yann LeCun is chief AI scientist at Meta and a Turing Prize winner—the equivalent of a Nobel Prize for computer Science. He recently shared his thoughts on visual data versus language data on LinkedIn: “The total amount of visual data seen by a 2-year-old is larger than the amount of data used to train LLMs… by a factor of 30.” 

The takeaway is that text-based GenAI is just the tip of the iceberg. Vision, sound, even smell—many potential inputs exist. So much more is coming.

Agents Are Impressive. But They Can’t Do Everything… Yet. 

Agents are one of the most powerful tools in the GenAI space. They act as intermediary systems that interact with their environment through perception, capturing provided context, sensing environmental data and taking action via web search, SQL engine or other tools. Capable of extensive data analysis, generating automatically validated code and acting as chatbots with memory and external tools, agents enable solutions that are too complex for traditional GenAI applications. 

Metaphors abound: Agents are the GenAI version of chefs, architects or orchestra conductors, take your pick. Or if a chatbot is an intern who provides a list of information based on a Google search, then an AI agent is an executive assistant who not only pulls answers from a wider variety of sources but can also order new research materials, arrange a meeting with experts, create a personalized reading list and book travel. The major advantage is their ability to execute based on the information they gather. (We’ll just ignore the fact that the bad guys in The Matrix were also known as agents.)

When it comes to generic agents, no perfect solution or strategy exists. The technology is still being developed and new approaches and best practices evolve on a regular basis. Agents achieve impressive results in certain scenarios, but they’re not capable of being applied generally. They certainly don’t compare to humans, which makes use case selection essential. Agents are best used to assist humans, not replace them. 

That said, agents deployed in some specific scenarios have yielded positive results. Orchestrator strategies like planners or subqueries are constantly evolving; different use cases benefit from different strategies. Agents’ results depend on user prompting experience, data source quality and other factors that are hard to address a priori. Their instability reinforces the importance of extensive evaluation.

Get Your Data Right

LLMs and GenAI in general are unique in their ability to handle unstructured data. But in the eyes of GenAI, not all unstructured data is created equal. 

GenAI tech moves at an astonishing speed, so I won’t address brand-new systems like multimodal LLMs, which are improving by the second, or stateful LLMs, which could be very disruptive. And even limited to established LLMs, better practices can turn up at any moment. This information is intended to be as evergreen as possible in this emerging field. 

Bear with me as we jump up to a 400-level discourse here…

One of the most important parts of any AI system is its connectors to underlying data. The way we query the data is as important as the data itself. Different use cases (or even different queries within the same use case) may require completely different connectors and data structures for proper information retrieval and interpretation.

LLMs face the same garbage in/garbage out principle of ML and other data-reliant systems. Vector databases, knowledge graphs and good old SQL provide connectors to underlying data, and we’ve found that most of our customers want to keep their GenAI solutions within AWS because that's where all their data already resides, often within RDS, Aurora or OpenSearch. Or put it this way: Data exerts gravity, and AWS is the sun of the data universe. 

Good data connectors that facilitate data access can enable an LLM-powered system to perform more complex tasks. Pure text is easiest for an LLM to work with, but that doesn’t mean it’s appropriate for all scenarios; for example, a graph knowledge base can make it much easier to extract relevant adjacent/contextual information. Another example is aggregate information, which can be obtained from multiple sources with an improved data connector. So say you have a set of pdf documents, each specific to a single entity (e.g. contracts). A basic data connector won’t allow a query about the number of distinct individual contracts unless there’s an additional document containing that precise information. An advanced data connector could perform some basic aggregation.

Think carefully about the purpose of the system you’re building, the information it will need to fulfill its purpose and the simplest way it can access that information. And think about how you’ll serve the output of your GenAI-based system. You may need to make structural changes to the way you collect and store data. Consider the trade-offs between flexibility, reliability and responsiveness. 

This kind of big-picture technical analysis falls squarely into Loka’s area of expertise. We’re here to help your company understand and plan a GenAI offering for your company. Reach out with any questions.

Loka's syndication policy

Free and Easy

Put simply, we encourage free syndication. If you’re interested in sharing, posting or Tweeting our full articles, or even just a snippet, just reach out to medium@loka.com. We also ask that you attribute Loka, Inc. as the original source. And if you post on the web, please link back to the original content on Loka.com. Pretty straight forward stuff. And a good deal, right? Free content for a link back.

If you want to collaborate on something or have another idea for content, just email me. We’d love to join forces!