- Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
In the field of big data, different database systems often use different SQL dialects. This is similar to people from different regions speaking different languages, which brings great trouble to data analysts and developers. When an enterprise needs…
- 6 days ago 28 Apr 25, 7:00pm - - Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
In the field of big data analytics, Apache Doris and Elasticsearch (ES) are frequently utilized for real-time analytics and retrieval tasks. However, their design philosophies and technical focuses differ significantly. This article offers a detai…
- 11 days ago 23 Apr 25, 7:00pm - - A Modern Stack for Building Scalable Systems
In software engineering, we have a lot of tools—tens or hundreds of different tools, products, and platforms. We have SQL DBs, we have NoSQL DBs with multiple subtypes, we have queues, data streaming platforms, caches, orchestrators, cloud, cloud v…
- 11 days ago 23 Apr 25, 12:45pm - - Stateless vs Stateful Stream Processing With Kafka Streams and Apache Flink
In data-driven applications, the rise of stream processing has changed how we handle and act on data. While traditional databases, data lakes, and warehouses are effective for many batch-based use cases, they fall short in scenarios demanding low lat…
- 13 days ago 21 Apr 25, 9:00pm - - Enhancing Avro With Semantic Metadata Using Logical Types
Apache Avro is a widely used data format that keeps things compact and efficient while making it easy to evolve schemas over time. By default, it comes with basic data types like int, long, string, and bytes. But what if you need to store something…
- 18 days ago 16 Apr 25, 1:00pm - - Securing Parquet Files: Vulnerabilities, Mitigations, and Validation
Apache Parquet in Data WarehousingParquet files are becoming the de facto standard for columnar data storage in big data ecosystems. This file format is widely used by both sophisticated in-memory data processing frameworks like Apache Spark and mor…
- 19 days ago 15 Apr 25, 7:00pm - - Emerging Data Architectures: The Future of Data Management
In my last article about data architectures, you learned about emerging data architectures like data mesh, Generative AI, and Quantum-based, along with existing architectures like Data Fabric. In this article, you will continue to learn about emergin…
- 19 days ago 15 Apr 25, 11:00am - - A Deep Dive into Apache Doris Indexes
Developers in the big data field know that quickly retrieving data from a vast amount of information is like searching for a specific star in the constellations — extremely challenging. But don't worry! Database indexes are our “positioning magic…
- 20 days ago 14 Apr 25, 9:00pm - - How Doris + Hudi Turned the Impossible Into the Everyday
In the world of big data, there's a legend that goes like this: A data scientist, constantly worried about query performance and working late every night to optimize SQL, suddenly discovered the "perfect match" of Doris and Hudi, and immediately kick…
- 20 days ago 14 Apr 25, 7:00pm - - AWS S3 Strategies for Scalable and Secure Data Lake Storage
Amazon S3 is an object storage service that offers scalability, data availability, security, and performance. S3 is the main component of your data lake, and creating buckets with the right strategy and properties can help you consume the data from t…
- 26 days ago 8 Apr 25, 7:00pm - - Optimizing Data Storage With Hybrid Partitioned Tables in Oracle 19c
Effective management of large datasets is paramount for both performance and cost optimization. Oracle 19c introduces Hybrid Partitioned Tables (HPT), a feature that allows you to distribute table partitions across multiple storage tiers — from hig…
- 26 days ago 8 Apr 25, 4:45pm - - Building a Cost-Effective ELK Stack for Centralized Logging
If your company has budget constraints, purchasing licensed products like Splunk for logging infrastructure may not be feasible. Fortunately, a powerful open-source alternative exists: ELK (Elasticsearch, Logstash, and Kibana). ELK offers robust logg…
- 30 days ago 4 Apr 25, 11:15pm - - Integrating Apache Doris and Hudi for Data Querying and Migration
In the field of big data analytics, real-time data availability, query performance, and flexibility are crucial. With the rise of the Lakehouse architecture as a new paradigm in big data, integrating Apache Doris, a high-performance real-time analyti…
- 31 days ago 3 Apr 25, 9:30pm - - Bridging OT and IT: IIoT Middleware for Edge and Cloud With Kafka and Flink
As industries continue to adopt digital transformation, the convergence of Operational Technology (OT) and Information Technology (IT) has become essential. The OT/IT Bridge is a key concept in industrial automation to connect real-time operational p…
- 32 days ago 2 Apr 25, 11:00am - - Building Scalable Data Lake Using AWS
Data lakes are centralized repositories that facilitate flexible and economical data management, and businesses are using them to store, process, and analyze this data effectively. AWS offers a strong ecosystem for creating a safe and scalable data l…
- 33 days ago 1 Apr 25, 5:00pm - - Doris vs Elasticsearch: A Comparison and Practical Cost Case Study
In the domain of big data real-time analytics and log search, enterprises frequently find themselves choosing between Elasticsearch and Apache Doris. Elasticsearch is well-known for its powerful full-text search and flexible aggregation capabilities.…
- 34 days ago 31 Mar 25, 6:00pm - - Accurate Quantitative Analysis With ChatGPT and Azure AI Hub
LLMs are not very good at quantitative analysis. For example, when I asked ChatGPT, "Which number is bigger, 9.9 or 9.11?" it incorrectly responded with 9.11.In another example, I have an Excel file containing a large amount of quantitative data. Th…
- 38 days ago 27 Mar 25, 9:00pm - - Self-Healing Data Pipelines: The Next Big Thing in Data Engineering?
I'm an enthusiastic data engineer who always looks out for various challenging problems and tries to solve them with a simple POC that everyone can relate to. Recently, I have thought about an issue that most data engineers face daily. I have set ale…
- 38 days ago 27 Mar 25, 6:00pm - - A Comprehensive Guide to Protect Data, Models, and Users in the GenAI Era
Editor's Note: The following is an article written for and published in DZone's 2025 Trend Report, Generative AI: The Democratization of Intelligent Systems.Generative AI (GenAI) is transforming how organizations operate, enabling automation, cont…
- 38 days ago 27 Mar 25, 10:00am - - Ensuring Data Quality With Great Expectations and Databricks
Data quality checks are critical for any production pipeline. While there are many ways to implement them, the Great Expectations library is a popular one. Great Expectations is a powerful tool for maintaining data quality by defining, managing, an…
- 39 days ago 26 Mar 25, 7:00pm -