Hitachi’s reanimated Pentaho business has overhauled its data catalog and optimizer products to make them – and customers’ data – fit for the AI age.
The catalog has been reworked to give “an enhanced Data Marketplace experience” with the stated aim of helping executives, business users, and data scientists to more easily find curated and trusted data sets for daily and strategic efforts.
The product also gets deeper integration with Okta and Active Directory to improve policy access and security measures, not least to guard against key data being inadvertently fed into models then broadcast to the world.
Kunju Kashalikar, senior director for product management at Pentaho, said the aim was to make it easier for data scientists both to find information, and to ensure that those datasets are only available to the appropriate people or applications, and that key info – eg credit details or PII – are suitably masked.
It has also improved data delivery to Python IDE, and other ML Test and Deployment tools. This means they can easily build data workflows, from data discovery to ingestion, through training, and through to review and deployment. All, says Kashalikar, in a governable, auditable manner.
Data optimization and retiering has been reworked to better support structured and unstructured data support during archiving, migration, and policy driven lifecycle management.
Dataflow? Pentaho. Moves your data…to and fro
As companies work to get AI into production, establishing data pipelines becomes more and more important, with the likes of Snowflake and Databricks putting significant effort into easing the process for their users.
But, said Kashalikar, “What I have seen, especially in medium to large companies, is no organization has a single data technology. I work with customers who have Databricks and Snowflake, and now they have Apache Iceberg. In various instances, they have still data in Oracle and Salesforce and SAP and AWS.”
Pentaho’s aim, he said, was to provide a single pane of glass across those technologies. “So whether your data is in Snowflake or Databricks, the end user gets the same experience.”
Beyond the data pipeline issue, he said, the company aimed to build better classification of data and for companies to understand their data landscape across structure and unstructured data, so that they can use what is appropriate for AI, but also understand the potential time bombs waiting to go off.
So, for example, in a finance company this could mean understanding that “Your SQL server has this type of data, but we can also tell you on your SharePoint in this particular user’s folder, we are seeing a lot of mortgage documents. Maybe they were correct to be there. Maybe not…”
Hitachi revived the Pentaho brand about a year and a half ago. It augmented its original ETL and analytics business with the data catalog, and data quality and data optimization tools. The tools operate across on-prem and cloud estates, and across documents, and structured data.
While it pitches itself as a single platform, customers can pick and mix the tools. “We know that many customers have one of these five already purchased, and they are not willing to throw away that investment,” said Kashalikar.
“So we wanted to be open to integrate with other tools, but also provide first class integration if two or more of our tools are used.”