Improving data management and
efficiency through data hubs.
Background:
Since 9/11, the ever-increasing availability of data as well as the various disparity in the format, and structure of that data has challenged organizations of all types as well as technology companies. Since then, the U.S. is significantly aware of the importance to rapidly provide access to as much data as possible, particularly for analytics and artificial intelligence. This focus on data integration and real time data access for National Security and Law Enforcement has led to a departure from the current modalities of Extract Transform and Load processes to focus on Data Access on Demand (DAOD) to improve data connectivity, latency and data normalization. DAOD processes resolve critical data management issues focused on data connectivity, access and normalization to provide organizations the ability to quickly connect to disparate data sources, and conduct multi-nodal querying that can ingest, transform and normalize data from the originating data form to the desired data form at the destination, i.e. data lake, repository, or application. This automated approach to data normalization provides significant advantages to include access to critical data in real-time, immediate alerts on relevant data updates, the ability to quickly connect/disconnect to new and evolving technologies, a “single pane of glass” view and other critical advantages in normalizing data from across multiple nodes for artificial intelligence and National Security analytics.
Secondly, enterprises across the U.S. Government are hampered by the inability for their users to quickly access disparate data and query it in a manner that will provide shared access to data sources and data normalization for specific use cases within an organization. Improving data querying and standardization is a critical requirement for organizations as their users must rely on manual processes to query databases and process the results which requires them to spend up to 80% of their time on data collection, vice analytics. Users also lack the ability to construct no-code or low-code queries and must use static query portals that cannot flex to meet their unique requirement or obtain the services of a developer/coder to build the query.
In essence, organizations are hobbled by the fact that they continuously invest in new technologies, sensors and data forms to stay abreast or ahead of rapidly evolving technological advancements that are being used by criminals, terrorists, and fraudsters but cannot successfully integrate those capabilities into a seamless end to end process that will allow for the real time access, facilitated data sharing and application/use case focused standardization of data returns as well as integration with internal or legacy data sets.
Problem:
It is recognized throughout the data and technology industry that the rapid access to data sources and automation of data normalization processes is critical to improving the capability of organizations to access available data in “real-time” to identify and aggressively pursue threats or for organizations to have access to disparate data and to be able to standardize it for their purposes.
However, most enterprise applications are not capable of multi-modal data connectivity and data normalization without significant investment in expensive data science or processes like Extract, Transform and Load (ETL) which requires the organization to rely heavily on contracted expertise for both the development and maintenance of these systems as well as expensive hardware and licensing[i]. Currently, enterprise data management solutions focus on large scale migrations of data and are inflexible when connecting to new data sources or adapting new technologies and their data exports. This situation leaves organizations not only without access to their disparate data, but with no means to normalize it for delivery or integration into their contracted repositories or visualization tool for analysis or visualization. Additionally, the processes being used like ETL provide access to data in near real time, vs. real time, so the user only has access to the data that has been copied into the database, instead of having direct access to the data source. Furthermore, these solutions require extensive and costly engineering and customization, require proprietary technology or services and duplicate data storage resources, all issues that need to be addressed to improve data management.
Proposed Solution
Leverage the Data Access on Demand technology to create enterprise level “data hubs” at any classification level that provides rapid, real-time connectivity to sanctioned data sources, no-code, automated data queries and automated data normalization through its patented Data Access on Demand processes to access, return and integrate data to feed the application or destination the customer desires. Once installed on the customer’s designated server environment, this Data Access on Demand functionality allows users to have real time access to the critical information they need, select the data sources to be searched and query those data sources with a minimal of effort to automatically transform that data for the organization’s needs. The technology will provide no-code, federated searches across the connected disparate data sources, automated or scheduled iterative searches and alerts on new data inputs and data pathway connectivity between two disparate pieces of information to improve analyst efficiency and workflow.
Data Access on Demand Overview
Data Access on Demand technology, like the patented Blue Fusion product, provide rapid connectivity to disparate databases at the same classification level with easy to build, re-useable data connectors that are agnostic to the origin data form, location and structure. DAOD data connectors leave the organization’s data at rest and allow access to that data in real-time, reducing the latency that is normally associated with today’s “near real time” data and analytic solutions.
Besides rapid connectivity to disparate data, the single largest advantage to DAOD technology is its ability to normalize disparate data sets to support federated searches, federated alerts and real time monitoring of data sets. DAOD technology provides a standardized approach to disparate data for the first time and is completely agnostic, allowing not only connectivity with any database to include existing ETL or ELT databases or data lakes, but minimizing integration and configuration costs saving time and resources.
DAOD Advantages:
The unique advantages of Data Access on Demand for first responder and national security analysts are:
1. Automated Data Normalization:
Blue Fusions patented approach to data management automates the transformation of data queries for visualization or other application uses. The technology allows for the simple, even client-side or AI construction of data connectors that map both the connected data source and the destination database or application. Once queries are created in the Blue Fusion User Interface (UI) or created as a data demand for storage, the data is identified, retrieved and transformed automatically into the desire data form saving analysts 80% of their time and organizations the costs associated with ETL for AI.
2. Real-Time Data Access and Ingestion:
Blue Fusion’s technology allows for the Analyst to query databases at rest, meaning that all the data in those databases are available immediately instead of having to wait for large scale data migrations. This approach provides significant advantages over existing Extract, Transform, and Load approaches where the Analyst must wait for data to be uploaded to be queried. Blue Fusion leverages the Enterprise Application Integration (EAI) methodology to increase data querying and ingestion 10x over ETL processes with less than 20% of the cost of installing systems that use ETL.
3. Automated Federated Searching Capability:
DAOD technology allows analysts to connect to multiple data sources via API or Data Access on Demand (DAOD) connectors to conduct “one-click” federated searching of those databases with easily constructed visual queries, saving them up to 80% of the time spent on data collection.
4. Single Pane of Glass Visualization of Disparate Data:
DAOD Technologies provides the Analyst the ability to query, import, normalize, and ingest data automatically into a data analytics visualization tool like i2 Analyst’s Notebook bypassing the transform and load processes of ETL, providing them the powerful ability to visualize disparate data that has never been collected together, and significantly increasing the Analyst’s effectiveness.
5. 3rd Party Data Alerts of Disparate Data:
With its ability to set iterative searches of any database, some DAOD technologies, like Blue Fusion are unique in that it can continuously search connected disparate databases for information related to an analyst’s target and alert them within their visualization platform, such as i2 Analyst’s Notebook, when the data is available. This automated approach to disparate data querying provides significant savings in time and resources as analysts no longer have to conduct multiple manual queries to identify datapoints.
6. Rapid Technology/Data Output Adoption:
With the technology space continuously evolving or creating new and better data collection technologies, Government Agencies are rapidly changing collection technologies and creating data that must be integrated into a visualization tool like i2, if it’s to be useful. While most intelligence systems are not flexible when it comes to ingesting or connecting to new technology, Blue Fusion’s “plug and play” approach to data can connect analysts to any existing database/source while allowing agencies to leverage existing or future investments in technology. DAOD or API connectors are easily constructed using a standardized “Plug and Play” approach with existing schema and there are connectors are already built for a multitude of commercial and USG OSINT, Dark Web, Geospatial, natural language processors, and machine learning tools and technologies.
7. Simplified Querying:
DAOD technologies like Blue Fusion provide significant relief to analysts in that they are not required to write code or script queries to ask questions of connected data fabric. Some ETL models require data scientists to create code to query databases, demonstrating the complexity of ETL at the analyst level. With simple visual queries, analysts can ask complex questions of extensive amounts of data in real time.
9. Datapoint Connectivity:
Blue Fusion has the unique capability to find the connectivity path between two data points on disparate data. This allows the analyst to identify non-obvious relationships between any two data points within a connected data environment of disparate sources. A process that normally requires in-depth analysis is quickly resolved in seconds within a few clicks of identifying two known targets to identify hidden relationships.
8. Database Selection:
Most enterprise approaches to data do not allow the Analyst to select the databases they want to search. With Blue Fusion, the Analyst can toggle on or off the databases they want to search, leaving them the flexibility to fine tune their data queries and to save resources involved with data searches, particularly for subscription databases.
10. Reduced Storage Requirements:
Blue Fusion’s approach to disparate data searching means that only data relevant to the Analyst’s needs are returning to be analyzed and only data that is valuable and vetted is sent to a repository for storage. Under most approaches to data storage, entire databases must be copied and ingested, meaning large amounts of data storage are unnecessarily duplicated.