MinIO gives object storage a GenAI upgrade with Iceberg

Interview: MinIO is building a linking layer between its object store and GenAI and that involves embracing structure in the form of Iceberg tables. It also entails building some linkage between them and, up until now, the vector-focused GenAI models and agents.

We talked to MinIO co-founder and co-CEO AB Periasamy and Erik Frieberg, his CMO, to explore this topic. It’s fairly complex and we’ll cover it in two parts, with this being part one. The interview has been edited for brevity.

Blocks & Files: What’s the main thing affecting MinIO presently?

AB Periasamy

AB Periasamy: The best thing that happened to us is GenAI and in every enterprise the budget is now towards GenAI and the race is starting to shift towards data. Who has more data and how to put data to use? That is directly contributing to our company’s growth. Because once you start scaling your data and AI data infrastructure, it points to object store and customers are also looking beyond cloud. Cloud is already behind GenAI and the scale of data is pushing the customers to good private cloud and the private cloud object store, everything is in our backyard. We created that market outside of AWS and that’s contributing to our growth. And you can see the company has been aggressively growing hiring in on all sites, engineering, like marketing, sales, even. We launched a government BU and are launching a partner program. Overall, the company is going through an upgrade and aggressive growth.

Blocks & Files: How many people work at MinIO now?

AB Periasamy: I stop counting, Erik. How much? It is like 150-plus.

Erik Frieberg: I would say 160 something now. It’ll be 170-ish next week.

Blocks & Files: What’s happening with MinIO marketing?

Erik Frieberg

Erik Frieberg: I joined a little while ago to run marketing. The go-to-market organization a month before I joined in November was dramatically different than it is today. I think, that’s no joke, eight or nine times of what it was a year ago.

Blocks & Files: Why the strategy to have that built up so quickly?

AB Periasamy: I made the decisions. I’ll just give the high level view that, even though we are hiring aggressively relative to others, we are very prudent about hiring because, in this AI world, we have to think about hiring very differently. All the junior-level jobs today, machines can do so much better and most other companies hire aggressively and then fire aggressively. 

For us, we treat our team like family and it’s so important to take your time, be prudent about every one of those who you bring on board. Two years from now, five years from now, are they going to be with the family and are they going to be more productive because they can combine themselves with AI? That’s the only hope. Human plus machine is going to be better than human and machine in isolation. 

So be prudent about bringing the right people in who will fill the holes we have inside the organization. That’s how we are looking at it. So, in many ways, I actually look at this as this is not fast enough, but I would rather hire less and not hire anybody than hire aggressively.

Erik Frieberg: I’m going to say three things. One is I would actually say AB probably started hiring too late, so this should have been on the rise already. Second is the difference of having AIStor. I come from an open source background. I was at MongoDB where there was no commercial differentiation and it makes a big difference. There’s always only so much support and services you can sell to customers. 

You’ve got to have differentiated production software that has features that are a catalyst for growth. And that [AIStor] came out in the October-ish time frame and I think people are now really understanding it, seeing the difference. You’re going to see some announcements coming in the future on this, so that’s creating more opportunity. And then like AB said, it’s not a target. It’s not that he’s saying: “Go hire five people in marketing.” He’s saying: “Hey, we need more product marketing now because we have more to talk about.”

Blocks & Files: AB, MinIO is a storage company and for a customer of yours who is using generative AI, large language models and agents, there’s a software stack between the storage layer and their agents and large language models. So where in that stack do you see MinIO’s activities ending? Where’s the border and is it moving up the stack?

AB Periasamy: Yes. In fact, the story we have been telling all along resonates more with the customers now than the past. We always said MinIO object store is a key value store. What is a key value store? It’s a data store. We are closer to Oracle than EMC in the past. The data store is the dumbest of all databases. It is a key value store with get and put objects, but it needs to process data at massive scale without losing a single object. It has to be transaction ACID-compliant, all of that and scale, while it has the simplicity at its roots. So we always thought of ourselves as a database company, except that we are talking about unstructured data. Things are starting to evolve there because of GenAI.

We brought Erik in because he came from MongoDB. Our culture is closer to MongoDB than Pure or NetApp or MCA software company and in modern times [with] GenAI, the scale is much larger. The only way you can go is scale-out and then go software-defined. There are some interesting things that are happening [with] the object store and the layer above Gen AI, where the integration is going on.

The AI is directly talking to the object store through the MCP server. It’s the agents that are interacting with the object store, both in terms of administration as well as the data discovery and dealing with the data itself. There are some interesting challenges that are emerging here. If it’s a spreadsheet, if it’s a CSV file, single object? Say I have a database file, can I analyze this? We are talking about enterprises having 10 petabyte tables; data sitting in Iceberg tables. In the past it was proprietary data sitting in proprietary databases. They are increasingly now moving towards Iceberg open table format.

How can you have large language models understand that scale of data? That’s a bridging factor that we need to work on and those are the areas that we are actually actively investing in as well. The model may still be running on OpenAI or Anthropic, but the enterprise data never leaves their facility. How do they discover the data? Only the metadata or an anonymized form of the intelligence is shared. We are actively working on that part of the stack.

There’s also another interesting twist here that, if it’s unstructured data like blobs, photos, videos, documents, then everybody knows how LLM take advantage of them. The new emerging area is structured data, structured data like Iceberg tables, and at large scale. How can LLMs understand structured data?

Comment 

The point AB is making is that Iceberg tables are structured and it makes no sense to vectorize them as, unlike words or images, they don’t stand alone with dimensional aspects that you can vectorize. There needs to be some intervening logic/abstraction between the tables and the GenAI LLMs and agents that bridges their unstructured data vector focus and the searching of structured data. We’ll explore this more in part two.