Don’t Forget the Data Quality with your Self-Service Data Preparation…Announcing Trillium Refine™
"You had me at 'Hello'"

How Trillium Refine™ Makes Your Data Preparation and Data Quality Easier and Faster

by Keith Kohl, VP Product Management, Trillium Software

Refine

In my last blog, we announced the launch of Trillium Refine™. In that blog, I outlined the 6 step process for preparing data. In this blog, I will outline how Trillium Refine makes it easier for business and data analysts to prepare and trust their analytics.

Everything inside Trillium Refine’s highly indexed repository is searchable. As data sources are read, metadata is stored and indexed inside the repository. As you go through the 6 step process, the analyst can search for anything that is in the repository. The search feels like an internet search engine like Google as it has “type ahead” and showing results as the user types.

For data sources coming in without a header (think Excel) or an enterprise database with complex column names, there is a need to put real-world names associated with any metadata. Users can put in definitions and business terms associated with data sources, database columns, etc. Those definitions are then also searchable and the response time is very fast because of indexes in the repository and the use of Solr.

A business or data analyst will understand data types from an Excel perspective, but if you think about all of the different data types different DBMS’s have just for date, times, timestamps, etc. the product will simplify and normalize the data types. Data types include: strings (not fixed or varchar), integer & decimal (not, bigint, float, etc.), datetime, url, and more. The original data source data types are maintained in the repository.

For other data sources such as a JSON or delimited file (CSV), the product uses heuristics to determine the data type based on the data itself.

There is also a built-in recommendation engine. For instance, the second step (mentioned in the last blog) is to join the data sources or what we call data sets. Trillium Refine will recommend join keys to the user. It does this through a variety of mechanisms. If a data source is from a database, Trillium Refine will import the schema and know the keys for the table. The product will also look for potential join keys based on the data and the metadata. It also learns over time based on everyone’s behavior over time meaning users benefits from everyone’s use over time.

Lastly, Trillium Refine will automatically create an output file ready for visualization tools like Tableau creating a Tableau Data Extract (TDE) file.

The future is self-service data preparation and data quality. If you don’t believe us, check out the latest Gartner BI and Analytics Magic Quadrant. “By 2018, most business users and analysts in organizations will have access to self-service tools to prepare data for analysis …”

Until next time…

Check out Trillium Refine at www.trilliumsoftware.com/products/big-data.com