data lake – EvaluateSolutions38 https://evaluatesolutions38.com Latest B2B Whitepapers | Technology Trends | Latest News & Insights Wed, 05 Apr 2023 18:47:15 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.6 https://dsffc7vzr3ff8.cloudfront.net/wp-content/uploads/2021/11/10234456/fevicon.png data lake – EvaluateSolutions38 https://evaluatesolutions38.com 32 32 AWS Supply Chain Debuts Announces General Availability with New Features https://evaluatesolutions38.com/news/cloud-news/aws-supply-chain-debuts-announces-general-availability-with-new-features/ https://evaluatesolutions38.com/news/cloud-news/aws-supply-chain-debuts-announces-general-availability-with-new-features/#respond Wed, 05 Apr 2023 18:47:15 +0000 https://evaluatesolutions38.com/?p=51753 Highlights:

  • AWS Supply Chain includes logistics models that automatically arrange inventory records in a unified format, facilitating analysis to address the problem.

AWS Supply Chain, a cloud-based inventory monitoring and forecasting application, was made generally available by Amazon Web Services Inc. recently. It was first introduced at the cloud giant’s re:Invent conference in December.

Retailers must prevent scenarios where a store lacks inventory to satisfy customer demand. They must also watch out that store owners don’t buy more of a particular item than is required. In some cases, excess inventory must be thrown away, which can reduce a retailer’s profit margins.

AWS Supply Chain is designed in such a way that it helps companies avoid such issues. This cloud-based application can map the locations of a retailer’s stores, logistics facilities, and other distribution centers. When you click on a facility, a panel that displays the likelihood of product shortages or overstocking appears.

AWS Supply Chain gathers inventory information from a retailer’s logistics applications to track product availability in the background. Records are frequently kept in multiple formats by these applications. AWS Supply Chain includes ML models that automatically arrange inventory records in a unified format, facilitating analysis to address the problem.

AWS claims the models also produce a data lake to store the gathered data. They can identify supply chain problems and produce remediation recommendations using that data lake.

The app can identify a nearby logistics hub where a product is in stock if a store is about to run out. It occasionally generates not just one but several recommendations for resolving supply chain problems. The application ranks the recommendations to assist supply chain teams in choosing the best course of action.

In a blog post, Danilo Poccia, the Chief Evangelist for AWS in Europe, the Middle East, and Africa, said, “Recommendation options are scored by the percentage of risk resolved, the distance between facilities, and the sustainability impact. Supply chain managers can also drill down to review the impact each option will have on other distribution centers across the network.”

In addition to assisting retailers with inventory issues, AWS Supply Chain can forecast future product availability. It has a tool for forecasting customer demand based on previous sales records and other information. A retailer can use the application’s forecasts to determine whether it has enough inventory to meet future demand and place additional orders as needed.

AWS Supply Chain is now generally available and includes several improvements that weren’t present in the preview version.

The company has facilitated the application’s integration with SAP SE’s S/4HANA enterprise resource planning system. S/4HANA is used by many businesses to store inventory information that AWS Supply Chain can use to monitor product availability. Additionally, the application is getting interface upgrades and an automation feature to help users prepare their data more quickly.

]]>
https://evaluatesolutions38.com/news/cloud-news/aws-supply-chain-debuts-announces-general-availability-with-new-features/feed/ 0
CelerData Extends Lakehouse Support in StarRocks-backed Analytics Platform https://evaluatesolutions38.com/news/data-news/big-data-data-news/celerdata-extends-lakehouse-support-in-starrocks-backed-analytics-platform/ https://evaluatesolutions38.com/news/data-news/big-data-data-news/celerdata-extends-lakehouse-support-in-starrocks-backed-analytics-platform/#respond Mon, 20 Mar 2023 14:22:09 +0000 https://evaluatesolutions38.com/?p=51555 Highlights:

  • In order to improve workload and resource isolation and allow the creation of distinct warehouses for various use cases, the new release is based on a cloud-native architecture.
  • Users can question historical data and streaming in real-time without waiting for batching the streaming data for analysis.

CelerData Inc., developer of a real-time analytics platform based on the StarRocks open-source massively parallel database, released version 3 of its commercial offering with increased support for data lakehouses, also known as hybrid data warehouse/data lake repositories.

CelerData, formerly known as StarRocks Inc., is the primary creator of StarRocks, an Apache Doris derivative recently donated to the Linux Foundation.

The company mentioned that most query engines are not well-coordinated for real-time analysis. They find ad-hoc queries challenging and crumble under several concurrent users. Li Kang, Vice President of Strategy at CelerData, said, “They may accept streaming data sources, but they don’t support real-time. Enterprises will often build two pipelines — one for batch processing in the data lake or data warehouse and a separate real-time pipeline.”

The new release is based on a cloud-native architecture to improve workload and resource isolation and allow the creation of distinct warehouses for various use cases. Users of Lakehouse can now choose to conduct high-performance analytics without transferring their data to a central data warehouse. CelerData’s query engine is three times quicker than competitive query engineers and can handle 10,000 queries per second from thousands of concurrent users.

Users can question historical data and streaming in real-time without waiting to batch the streaming data for analysis. The company’s method distinguishes itself from the quasi-real-time processing technique known as micro-batching, which occurs by dividing data into distinct segments called tablets. “Each time we get a new record, we read it from our reader. It’s not micro-batching, but you can think of it that way and combine that data with other tables,” said Kang.

This version includes integration with popular storage formats like Apache Iceberg and Apachi Hudi. Previously, the software could only handle one type of direct-attached storage and local storage on a server or virtual machine. With reference to Amazon Web Services Inc.’s object-storage format, Kang mentioned, “Data can now be stored in S3 or our local storage.”

Multi-table materialized views constructed from numerous joint base tables and a local caching layer for remote input/output processes can enhance performance.

Beginning in early April 2023, CelerData Version 3 will be widely accessible. Additionally, the business runs a completely managed cloud service.

]]>
https://evaluatesolutions38.com/news/data-news/big-data-data-news/celerdata-extends-lakehouse-support-in-starrocks-backed-analytics-platform/feed/ 0
Snowflake Purchases Myst to Strengthen Its Time-Series Forecasting https://evaluatesolutions38.com/news/cloud-news/snowflake-purchases-myst-to-strengthen-its-time-series-forecasting/ https://evaluatesolutions38.com/news/cloud-news/snowflake-purchases-myst-to-strengthen-its-time-series-forecasting/#respond Thu, 05 Jan 2023 14:10:14 +0000 https://evaluatesolutions38.com/?p=50643 Highlights:

  • Myst’s AI platform offers a workflow so that data science teams can build, deploy, and maintain highly accurate forecasting models based on time series data in minutes.
  • Snowflake hasn’t said much about the acquisition, like when it will close or how it will be integrated into the data cloud, but the deal shows that the company is still committed to the data science community.

Snowflake, a data cloud company, has signed a final agreement to buy Myst, a time series forecasting company in California. No information was disclosed about how much money was involved in the deal.

Snowflake has worked very hard over the past year to grow its data cloud. The Montana-based company has been working on making the platform more open to Machine Learning (ML), and this deal is a step in the right direction.

Why is Myst Important to Snowflake?

Myst has an AI platform with a workflow that allows data science teams to build, deploy, and maintain highly accurate forecasting models based on time series data in minutes.

Time series data, also called time-stamped information, is a list of data points organized over a specific timeframe. These data points are measurements taken at regular intervals from the same source.

For example, the movement of stock prices over time. With this deal, Myst’s team will join Snowflake, which is expected to improve its data cloud with time-series forecasting so that teams can use their past data to look ahead.

In a blog post, Snowflake’s SVP of engineering, Greg Czajkowski, stated, “Time series forecasting is one of the most applied data science techniques in business. It is used extensively in supply chain management, inventory planning, and finance. Accurate forecasts can establish measurements to guide management, facilitate planning and goal setting, and help mitigate risk”.

He said that the application of time series forecasting is widespread across sectors.

No Disclosure from Snowflake

Snowflake hasn’t said much about the acquisition, like when it will close or how it will be integrated into the data cloud, but the deal shows that the company is still committed to the data science community.

Snowflake currently supports eight workloads: data warehouse, data engineering, data lake, data sharing, data apps, data science, and unistore and cybersecurity.

Before this, the company bought a platform for understanding documents called Applica and an open-source app framework called Streamlit. The company has more than 7,000 business customers.

]]>
https://evaluatesolutions38.com/news/cloud-news/snowflake-purchases-myst-to-strengthen-its-time-series-forecasting/feed/ 0
Logical Data Warehouse: What It Is and Its Benefits https://evaluatesolutions38.com/insights/data/logical-data-warehouse-what-it-is-and-its-benefits/ https://evaluatesolutions38.com/insights/data/logical-data-warehouse-what-it-is-and-its-benefits/#respond Tue, 30 Aug 2022 17:41:54 +0000 https://evaluatesolutions38.com/?p=49063 Highlights:

  • One of the prime reasons for the popularity of LDW is that their components can be conceptually combined in one location by combining numerous engines and various data sources from throughout the organization.
  • With LDW, organizations can manage and access each of the many data stores as if they were a single logical (or virtual) data store.

The concept of a logical data warehouse has recently been the subject of much discussion. But what is causing this buzz? Why are logical data warehouses considered to be so crucial for data professionals? To get an understanding, it is necessary to examine the function of a logical data warehouse — what is it, and what does it do?

A Logical Data Warehouse (LDW) is an architecture for data management where an architectural layer sits on top of a conventional data warehouse to enable access to several data sources while making them appear to users as a single, “logical” source of data. It essentially consists of an analytical data architecture that optimizes both conventional data sources (such as databases, enterprise data warehouses, data lakes, etc.) and other sources of data (such as applications, big data files, web services, and the cloud) to satisfy every analytics use case. The phrase logical data warehouse was first used in 2009 and is now more in demand as data complexity becomes a bigger issue for many enterprises.

LDW is often referred to as the data warehouse of the future since it can accommodate the expanding data management requirements of businesses. One of the prime reasons for the popularity of logical data warehouses is that their components can be conceptually combined in one location by combining numerous engines and various data sources from throughout the organization. Modern LDW has grown and now supports the many different data platforms, sources, and business use cases that are currently available. It assists businesses in digitally reinventing themselves, enables real-time streaming analytics, and optimizes operations through better, data-driven judgement.

Why is the logical data warehouse so important?

To understand why a logical data warehouse is important, it is essential to examine how data –storage and processing – has changed over the past few years and how those changes affect the data architecture. Of course, this is to do with the big data explosion that has led to a surge in interest in advanced analytics platforms to deal with big data.

According to IDC, as data volumes grow more than 50 times in the coming years, 85% of this data will be ‘new data’. New data will be unstructured or multi-structured, one that would not fit nicely into your existing databases or data warehouse. This means that the future data architecture will contain many data stores containing different types and formats of data. With all of these data stores, it is very easy to create more and more data silos and not get the full value out of all of the data.

A data lake can possibly resolve this issue, but the question remains as to how many organizations can acquire a data lake repository. The way data is used – processed, refined and aggregated to derive actionable insights – indicates that there will be several data repositories with different qualities unless we get to the ‘authoritative’ data store for regulatory and statutory reporting.

This is where LDW comes to play. Thanks to the logical data warehouse, organizations can manage and access each of the many data stores as if they were a single logical (or virtual) data store. All differences and complexities of different data stores are hidden via a logical data warehouse. The data can be combined and transformed to provide a common and consistent view of all data through the LDW. This enables the full utilization of data assets.

Benefits of Logical Data Warehouse

Logical data warehousing helps meet evolving data demands: The logical data warehouse model enables businesses to address changing data requirements while gaining from existing investments in physical approaches such as data warehouses, data marts, sandboxes, data lakes, and others. Its multi-engine approach allows businesses to fulfil various analytical needs. The various data management elements, such as enterprise data warehouses, data lakes, data marts, etc., are not mutually exclusive and can effectively complement one another.

A logical data warehouse ensures that your analytics strategy is agile and flexible for new data demands. It prevents the data management team from becoming stuck using a single technology or strategy, regardless of how the market evolves in the future. Businesses can choose which components to use for various data management tasks to fulfil their needs. The data virtualization layer can accommodate additional data sources as the business expands and more data is produced without interfering with any already-running operations.

Modernizing data approach with logical data warehousing: By implementing a uniform analytic data management architecture across all data types, technologies, users, and use cases, logical data warehouses enable businesses to update their data approach and analytics architecture. By combining all of the data from different sources, the logical data warehouse gives an organization the ability to examine past performance, forecast future results, and respond to queries about the business. As an organization expands, a logical data warehouse can help it scale its data management strategy by starting with the data it already has and easily adding new data or changing the architecture as priorities change. Any modern solution for data management must use this dynamic approach.

Logical data warehousing empowers data consumers: The LDW technique makes it easier to access data, thus empowering users of all skill levels. By combining all data sources, including streaming sources, into a single comprehensive “logical” source, the logical data warehouse can increase the productivity of all users. This allows shared access to data throughout the organization, enabling various business teams to conduct independent assessments. As a result, businesses are better positioned to decide thanks to a shared understanding of their data across all teams and departments.

The logical data warehouse has become even more essential due to the data boom and the wide range of data that is now readily available. It offers technology or tool that allows for the collection and consolidation of all the data within an organization, including historical data, as well as the performance of unified analysis that no one system could conduct independently. Utilizing dependable and reusable data services is made possible by LDW for a wide range of data users. Self-service analytics are made possible by democratizing access to an organization’s data in this fashion, ensuring that the data a business uses is reliable and consistent.

Final Word

The concept of data virtualization has been expanded to enable logical data warehouses in recent years. The logical step for organizations is LDW, which extends data virtualization to integrate, manage and govern enterprise data across a hybrid, multi-cloud environment.

]]>
https://evaluatesolutions38.com/insights/data/logical-data-warehouse-what-it-is-and-its-benefits/feed/ 0
Deriving Insights from Unstructured Data https://evaluatesolutions38.com/insights/tech/artificial-intelligence/deriving-insights-from-unstructured-data/ https://evaluatesolutions38.com/insights/tech/artificial-intelligence/deriving-insights-from-unstructured-data/#respond Thu, 18 Aug 2022 15:16:06 +0000 https://evaluatesolutions38.com/?p=48845 Highlights:
  • As unstructured data is growing much faster than structured data, it is essential that enterprises learn how to extract value from it, and quickly.
  • AI will be crucial in using unstructured data to resolve business problems and find new opportunities.

With organizations, communities, businesses, and products becoming more intelligent, data-generating endpoints have also rapidly increased. Data significantly impacts our day-to-day activities – at work, at home, and general. But to make the most of it, we must be able to gain actionable data insights, which rests crucially on our ability to comprehend that data in highly specific and specialized ways. This means that data ought to be organized and structured.

Across industries – from consumer products to healthcare – much of the data being generated today is unstructured. For example, data generated from internal messaging platforms don’t fit into traditional analytics models. But if their potential ought to be realized, we must redefine what it means to access the appropriate information at the right time and use it to improve outcomes.

Unstructured and semi-structured data constitute opportunities worth millions and hold the potential to offer new levels of access, services, and insights. Many organizations are already deploying Artificial Intelligence (AI) across unstructured datasets, which has helped them put vast amounts of unstructured data to good use. The insights gained from the analysis of unstructured data are then used to create recommendation engines, fake-news detection tools, and dynamic pricing models.

Several hurdles must be crossed to realize the potential of unstructured data truly. In this blog, we will cover the following:

  • Which data can be categorized as unstructured data?
  • How can we analyze unstructured data?
  • AI’s role in unstructured data.
  • Limitations of AI and unstructured data.

Making sense of unstructured data

Unstructured data is different from structured data in many ways. Most importantly, while the latter is more organized and formatted, the former does not have a predefined format – it can be stored in the form of sequences, point clouds, images, irregular meshes, and so on. It can also take different shapes, including multi-resolution, multi-channel, non-tabular, and sparse. This makes it difficult to collect, process, and analyze. Hence, traditional methods and tools cannot be used to analyze and process it. For this reason, unstructured data finds a purpose in BI and analytics.

Analyzing unstructured data

The methods to analyze data are broadly statistical. With so many entries, algorithms identify patterns/relationships between them. They may also apply an additional layer of structure to the data source – the process is often referred to as embedding the data or building an embedding.

To cite an example, a text can be searched for the 10,000 most common words that may not have anything in common in other books or sources. It can also be broken into different sections. This rough structure forms the base for statistical analysis.

Developing these embeddings is as much an art as it is a science — data scientists involved in this process design and test different strategies to develop a draft embedding.

Unstructured data in big data environments are analyzed using various techniques and tools. Other techniques used for unstructured data analytics include data mining, machine learning and predictive analytics.

Using AI on unstructured data

AI will be crucial in using unstructured data to resolve business problems and find new opportunities. Adaptability will be required across system architectures, storage and analytic services to bridge the gap between unstructured data’s inherent problems and AI’s current maturity.

For instance, to deliver better analytic results at the requisite speed and accuracy, technology must do a better job processing varied volumes of data at different scales. This includes the creation of highly specialized services that prioritize performance and scalability. In short, we require a platform that considers the what is, what if, what else, and what could be aspects of search.

Some critical features will be needed for the optimal solution to manage unstructured data at the required scale, size, and complexity level, namely:

  • Capability to host and serve data in many forms.
  • Must allow AI algorithms to search for patterns in the hosted data.
  • Must support a query language for database retrieval (exact search), Machine Learning-based pattern search (approximate search), and user-defined functions (domain-specific search).
  • Provide programming interfaces for database operations that are simple to use.
  • Should run on a variety of new server architectures (shared-memory, distributed-memory, or fabric-attached-memory technologies).
  • High-performance computing frameworks are included that can mature to manage ever-increasing data quantities and scale up to reduce time to insight.

Early results from firms that have started experimenting with unstructured data look promising. The level of detail with which they comprehend consumers, processes, and the firm as a whole suggests that there is a lot of room for growth. However, the widespread adoption of high-performance systems is yet to occur. It’s critical to rethink existing modalities and interfaces as we go toward increasing the integration of AI and unstructured datasets.

What artificial intelligence and unstructured data can’t accomplish

The quality of the data determines how well an algorithm will perform with that data. The data may often fall short of providing enough correlation for a definitive response to a query. This problem is exacerbated by the fact that unstructured data is more likely to contain useless information and much more noise. This makes it even more difficult for the algorithms to sift through data and remove useless parts.

On top of that, even when the algorithms are successful, some unstructured data analysis is ineffective because success is too rare. Detecting an event that occurs infrequently does not generate a lot of profit.

Poorly defined queries can yield unclear findings. Looking for insights into unstructured data will be fruitless because, without well-established definitions, the results can be just as unclear. For many unstructured projects, specifying a clear goal so the models may be trained effectively is a major difficulty.

]]>
https://evaluatesolutions38.com/insights/tech/artificial-intelligence/deriving-insights-from-unstructured-data/feed/ 0
Treeverse About to Launch LakeFS Cloud Data Lake Service https://evaluatesolutions38.com/news/cloud-news/treeverse-about-to-launch-lakefs-cloud-data-lake-service/ https://evaluatesolutions38.com/news/cloud-news/treeverse-about-to-launch-lakefs-cloud-data-lake-service/#respond Thu, 30 Jun 2022 03:58:53 +0000 https://evaluatesolutions38.com/?p=48078 Highlights:
  • The open-source LakeFS technology gets a fully managed cloud service to help organizations better iterate and version cloud data sources for development efforts.
  • The LakeFS cloud will initially be accessible on AWS, and Google and Microsoft Azure support will be added in the coming months.

Treeverse recently unveiled its LakeFS Cloud service, a managed product that would provide companies access to versioning tools for cloud data lakes. The company stated that the service is anticipated to be generally accessible on June 27.

Users can store various sorts of data with a cloud data lake, but there is typically little to no tracking of how the data changes over time and no simple way to go back to an older version.

Similar to how the Git version control system helps developers track and build versions of application code, Treeverse developed the open-source project in 2020 to enable versioning for a data lake. Instead of requiring consumers to deploy and manage their cloud service themselves, the vendor wants to offer a cloud service that is managed and deployed by itself.

Treeverse is competing with several rivals, such as the Dremio-run Nessie open-source project and the AWS Lake Formation service, which offers rudimentary versioning and data cataloging features. The LakeFS cloud will initially only be accessible on AWS and plans to add support for Google and Microsoft Azure in the coming months.

Versioned data lake and medical field

Healthcare start-up Karius, based in Redwood City, California, is one of the users of the open-source LakeFS technology. Karius has created a solution combining chemistry and AI to diagnose infectious diseases without requiring invasive surgery.

“As you can imagine, such a complex technology is fueled by massive amounts of complex data that comes with every patient,” said Sivan Bercovici, CTO of Karius. “To go from what’s in the tube, to what’s in the cloud, to what’s in a physician report, the chain of custody of data needs to be secured.”

Bercovici believed that in the age of data and precision medicine, many organizations have become used to the idea of never destroying any data.

Karius utilizes LakeFS because managing all the data is increasingly becoming challenging as complexity increases. According to Bercovici, his company is aware that when it updates its essential data on LakeFS, just as it versions its code, it can count on that data being accessible and discoverable.

“LakeFS brings the much-needed focus in the clouded data space, which is the daily reality of pharma and biotech,” Bercovici said. “We went from weeks’ worth of data hunting, and anxiety around whether or not we got the right data version, to simply being able to rely on the availability of the right data, to the right data scientist, at the right time. It is liberating.”

For the present, Karius self-hosts LakeFS and, to ease management, intends to move to the cloud offering in the future.

“As a rapidly growing company, we want to make sure someone who is deeply versed in the specific technology has you covered for uptime and develops efforts while we focus on building our differentiated value,” Bercovici said.

How does LakeFS work to version cloud data lakes?

Co-founder and CEO of Treeverse, Einat Orr, stated that LakeFS aims to enable enterprises to employ engineering best practices used in code development for data lakes.

Some of the best practices of LakeFS include having several versions or branches for data that permit users to interact with any branch. The system also supports reversion, allowing users to return to an earlier version if there’s an update. Another fundamental function that LakeFS offers is the capacity to merge various branches.

The open-source LakeFS technology necessitates just a server, a database, and access to storage. Although users can and have set it up independently using a self-hosted approach, maintaining LakeFS in an ideal deployment can be difficult and time-consuming.

Here, the new LakeFS cloud service steps in as a managed service that takes care of users’ LakeFS deployment and operation needs. As part of the AWS deployment, the gateway component of the LakeFS cloud enables organizations to securely connect to and access a company’s data lake using AWS PrivateLink.

According to Orr, the ability to version data in a data lake can help development efforts and data quality, which can be challenging to troubleshoot.

“The moment you have the quality of your data questioned, the process is manual, difficult, and hard to manage, and this is where the value of LakeFS shines,” Orr said. “LakeFS allows reproducibility and reversion capabilities, and it can support working in isolation for development and debugging.”

]]>
https://evaluatesolutions38.com/news/cloud-news/treeverse-about-to-launch-lakefs-cloud-data-lake-service/feed/ 0