The objective of Data Strategy is to empower the business by delivering the right data at the right time to the right users for decision-making and for accelerating innovation across products and services.
Defining Data Strategy involves developing a comprehensive roadmap and framework for collecting, managing, analysing, and sharing data assets across an enterprise. Since the success of ChatGPT, it has become essential for all organisations to develop a Gen Artificial Intelligence (Gen AI) strategy to embed Gen AI across business activities. Establishing the right data foundation for the Gen AI applications brings in the right business specific context. Data serves as the anchor around which Gen AI applications are to be developed and implemented, hence a right data strategy is a prerequisite.
Data Readiness for AI (DRAI) should be a core aspect when defining a data strategy. Three critical areas to consider in defining a future-proof robust data strategy includes:
- Adaptability through flexible data architectures
- Cross-industry collaboration with AI and;
- Automated data management through AI
Traditional data systems are rigid and siloed, unable to scale and require higher efforts to accommodate the rapid influx of data in various forms from diverse sources. Organisations must adopt flexible data architectures that can adapt to these changes without incurring significant costs or delays.
By leveraging open-source compatible multi-cloud services, organisations can gain flexibility to customise, expand, and evolve the data ecosystems. Considerations for data formats like Delta, Iceberg and computes like Spark, Ray are some of the examples. Open-source technologies provide adaptability, while multi-cloud architectures eliminate dependence on any single cloud provider, increasing both resilience and scalability. This combination also enables seamless integration of recent technologies and tools without significant re-engineering, as both open-source and multi-cloud environments support broad interoperability and open standards.
Data marketplace acts as a hub where consumers can easily discover, evaluate, and access data assets. A marketplace-driven model enables flexible consumption and fosters collaboration. Additionally, it opens avenues for data monetisation, to generate revenue by delivering high-value insights with Gen AI applications
Making data ready for consumption by GenAI platforms with comprehensive governance empowers users to interact with data more intuitively, regardless of their technical skill level. Gen AI can deliver personalised insights based on users’ roles, preferences, and historical interactions, making analytics more relevant and impactful. By embedding Gen AI, organisations bring in a user-centric data environment where more users can leverage data without dependency on the IT team.
Intelligent data wrangling uses AI to automate data preparation, significantly reducing the time and effort required to clean, transform, and structure data for analysis. By embedding AI, we can detect errors, recommend corrections, and as well perform corrections in an automated way. Intelligent data blending also facilitates combining diverse data sources with a unified view to uncover richer insights. Embedding Gen AI in the data prep process, also enables data analysts to perform their tasks much quicker and respond back to business with newer insights quicker.
As data forms and volumes continue to grow, the complexity of data management increases. Organisations face challenges in data quality, governance, and compliance. Traditional data management approaches are rule based and are unable to keep pace with the data growth, leading to delays and increased cost.
DRAI, involves preparing data to meet the specific requirements of AI applications, ensuring that the data is relevant, clean, and in accessible form. Some of the DRAI metrics that help us get insights into data toxicity include measuring bias, data skewness, sample size, mislabels, image quality, and lexical diversity. By establishing DRAI metrics, organisations can systematically assess and track the preparedness of data for AI, enabling efficient and reliable AI model deployment.
Gen AI enhances the data discovery and data quality management process by automatically identifying data relationships, patterns of anomalies, missing values, and inconsistencies across large datasets. It also recommends corrective actions and provides descriptive content on the data issues along with the lineage. Measuring the quality of both structured and unstructured is equally critical and it has been made possible with Gen AI. Its ability to handle diverse data types along with the insights on lineage increases user trust on the data made available to them. Implementations include AI-driven data quality platforms and tools that utilise machine learning algorithms and anomaly detection models.
Organising, tagging, and making data attributes easily accessible across an organisation is essential for efficient data governance. Gen AI enhances metadata management by intelligently tagging, classifying, and adding business contextual descriptions to the data attributes. It enables semantic search using NLP, allowing users to discover data intuitively and efficiently. A well-organised data catalogue promotes data democratisation, empowering both technical and non-technical users to access data seamlessly
A data architecture that supports collaboration across different business sectors or industries enables organisations to exchange valuable data insights, leading to innovations that benefit multiple industries.
For instance, during the Covid-19 pandemic, a data-sharing initiative among health care and technology companies enabled the development of predictive models for outbreak tracking. Google collaborated with healthcare providers with pool of anonymised health data. This collaboration improved response times, highlighting the power of cross-industry data sharing. Organisations benefit from such collaborations by gaining access to richer datasets, faster AI model training, and broader perspectives, providing competitive advantages and enabling new business models.
To achieve cross-industry collaboration, organisations need a flexible and interoperable data architecture with data governance that supports secure data exchange, data privacy compliance, and standardised data formats.
Future-proof data strategies are essential for organisations seeking to thrive in an increasingly AI-driven world. To establish such a data strategy that is AI-ready, it is crucial to address two critical questions: How can we get the data ready for Gen AI? How can we leverage Gen AI to accelerate the data readiness process? The ability to adopt a flexible data architecture that supports collaboration across industries and incorporates Gen AI-driven automation in data management processes will define the future data landscape.
This article is authored by Muneeswara Pandian C, vice president, Data & Analytics, Ascendion.