Matt Aslett's Analyst Perspectives

Bigeye Enable Monitoring, Quality and Lineage of Data

Written by Matt Aslett | Nov 20, 2024 5:50:11 PM

I previously explained that data observability software has become a critical component of data-driven decision-making. Data observability addresses one of the most significant impediments to generating value from data by providing an environment for monitoring the quality and reliability of data on a continual basis. Maintaining quality and trust is a perennial data management challenge, the importance of which has come into sharper focus in recent years thanks to the rise of artificial intelligence (AI). As enterprises seek to automate aspects of decision-making processes using AI, it is essential that they have confidence in the data upon which AI depends. This has increased the focus on data observability software providers such as Bigeye and the role they play in ensuring that data meets quality and reliability requirements.   

Bigeye was founded in late 2018 by Chief Executive Officer Kyle Kirwan and Chief Technology Officer Egor Gryaznov. Through their experience with various data-related projects at Uber, Bigeye’s founders had identified reliability as a key concern that impacted the success of data projects. To improve data reliability, enterprises were largely dependent on data-quality tools that required manual effort by data engineers, data architects, data scientists and data analysts. As a result, many data teams were not as productive as they might be, with time and effort spent on manually troubleshooting data-quality issues and testing data pipelines. Additionally, while the tools available at the time enabled data teams to respond to quality issues, they did not provide a way to identify quality thresholds or measure improvement, making it difficult to demonstrate to the business the value of time spent remedying data-quality problems. With the aim of rectifying that situation, Bigeye’s founders set out to build a business around data observability. The company has raised $73.5 million in funding since its formation from the likes of Sequoia Capital, Costanoa Ventures, Coatue, Alteryx and In-Q-Tel, and added a $5 million strategic investment by United Services Automobile Association in October 2024. In addition to expanding its platform through internal research and development, Bigeye also added data lineage capabilities in mid-2023 through the acquisition of Data Advantage Group. Data lineage is now one of three core components of the company’s data observability platform, alongside automated monitoring and anomaly detection. 

Having trust in data is crucial to business decision-making. However, only 28% of participants in ISG’s Data Governance Benchmark Research report that data is well trusted in their organization. Enterprises have traditionally sought to improve trust in data using data quality tools and platforms to ensure that data used in decision-making processes is accurate, complete, consistent, timely and valid. As I previously explained, although data quality and data observability are closely related and complementary, they are separate product categories. Data quality is concerned with the suitability of the data to a given task and data quality software is used to help users identify and resolve data quality problems. In comparison, data observability is concerned with the reliability and health of the overall data environment. Data observability software automates the detection and identification of the causes of data quality problems, potentially enabling users to prevent data quality issues before they occur. Data observability is a key aspect of data operations (DataOps), which focuses on the application of agile development, DevOps and lean manufacturing by data engineering professionals in support of data production. 

Bigeye’s monitoring capabilities start with automated dependency mapping to identify the source of data used in analytic dashboards and data products, as well as a lineage graph of the data pipeline. The ability to monitor and measure improvements in data quality relies on instrumentation. This is handled by Bigeye’s data observability platform via what it calls autometrics, which perform automated data quality checks as well as freshness, volume, and schema monitoring. Browser extension-based integration with analytics dashboards provides business and data analysts with instant access to data health information and status alerts. Bigeye’s anomaly detection capabilities rely on the automated generation of data quality thresholds based on machine learning (ML) models fueled by historical data. The company also offers associated alerts delivered to data owners and data consumers, and reinforcement learning to adapt notifications based on user feedback. Bigeye’s data lineage capabilities include connectors to numerous data platforms, data integration tools and business intelligence (BI) dashboards, as well as automated lineage mapping across on-premises and cloud environments, to identify data pipeline dependencies, root cause and impact analysis of data reliability issues, and automatically generate debug queries that aid resolution.  

I assert that through 2026, two-thirds of enterprises will invest in initiatives to improve trust in data through automated data observability tools addressing the detection, resolution and prevention of data reliability issues. The significance of data observability is increasingly pertinent due to the rise of enterprise AI initiatives that combine enterprise data with AI and generative AI (GenAI) models to automate customer service and business decision-making. Vector search and retrieval-augmented generation can improve trust in GenAI, but are the result of complex data pipelines that need to be monitored and evaluated to ensure the data and processes are reliable. Like other data observability software providers, Bigeye could be making more of its applicability to support AI and GenAI use cases. Nevertheless, I recommend that enterprises exploring potential approaches to improving data reliability should examine Bigeye to understand how its software can facilitate greater trust in data and accelerate data-driven business decisions. 

Regards,

Matt Aslett