We're live on Product Hunt right now! Come support our launch

Rocket Icon

Join our Public Beta to get early access and lock-in a lifetime discount (limited spots available). Learn more.

Blog home

Rockset, ClickHouse, Apache Druid, or Apache Pinot? Which is the best database for customer-facing analytics?

By
Rogan Sage
on
November 20, 2023

What's Covered

Before you jump in...

Looking to build remarkable analytics experiences for your customers into your app? Get early access to Embeddable by applying to our Public Beta.

Learn more

When it comes to user-facing analytics, you need a database that supports sub-second query responses, near real-time updates and high QPS (concurrent Queries-per-second). 

In other words, while you may be able to wait a couple of minutes for results to come back for your business intelligence (BI) report to share with internal stakeholders, your end users don’t want to wait more than a few seconds to see results and take action in your platform. Hence, you need a data stack that can keep up with the required pace. 

We’ve analyzed the four best-regarded real-time analytics databases to help you answer the question: What’s the best database for customer-facing analytics? Spoiler alert: It depends on your needs, but we can help you make the call.

Let’s see what Rockset, ClickHouse, Apache Druid, and Apache Pinot have to offer.  

Considering a Headless BI architecture? Find out what headless BI is and how it could benefit your business here.

What is customer-facing analytics? (+ how it’s different from internal BI)

Customer-facing analytics is when you present data to your end-users so they can gain insights from it and take action. It can increase the real and perceived value of your product by presenting metrics in a digestible and immediate way. 

For example, real-time user-facing analytics is critical for things like live trading applications, health monitoring tools, or even delivery apps that show exactly how many minutes it’ll take to get your food. 

Data warehouses optimized for internal BI vs real-time databases for user-facing analytics 

Internal BI, on the other hand, is all about presenting business information to data analysts or managers for company use. So, in the delivery app example, internal BI would show metrics like number of daily users, best-performing restaurants, and average delivery time. 

“There’s a big difference between database types,” says Tom Gardiner, CEO at Embeddable. “Their individual qualities should be carefully considered when planning for different applications—to ensure you can deliver the desired result.” 

Internal BI tools typically use data warehouses for storing and processing large volumes of historical data. They usually process data in batches rather than in real time. It’ll have a completely different performance compared to the kind of database you need to build customer-facing analytics (which we’ll discuss later).

Three key differentiating factors between databases for BI and real-time analytics are:

Comparison chart showing key differentiating factors between databases for BI and real-time analytics are
  • Examples of BI data warehouses include: Redshift, Snowflake, and Google BigQuery
  • Examples of real-time databases for analytics include: Rockset, ClickHouse, Cassandra, Apache Druid, and Apache Pinot.
👉 Learn when to use headless BI.

What to look for in a database for real-time analytics

In short, a good database for real-time analytics needs to process data efficiently, scale, load fast, and connect to your tech stack. Here’s each characteristic in more detail:

High-performing and cost-effective

Databases for user-facing analytics need to manage, store, process, and structure data at an affordable price—taking scalability into account. That’s because the solution needs to cater to the demands of a customer, delivering responses fast to large numbers of users - and handling peaks and troughs of activity from those data consumers.

The best way to achieve this is by using a database that organizes the data efficiently (so the queries can run fast). It should also be able to process large volumes of data at a fair price, keeping your overheads manageable.

Real-time loading

Real-time loading is critical because it ensures a smoother, more efficient user experience and allows people to make decisions faster, based on accurate data. 

“When it comes to customer-facing analytics, the customers we speak to almost invariably want it to load fast,” says Tom. So, look for databases that have high query per second (QPS), data freshness, and sub-second query responses. Optimizing for constant batch processing or data streaming will give users access to the most updated version of the data at all times.

Integrations

You also need to make sure that your data management provider is compatible with your tech stack. You want it to connect to your data sources, embedded analytics solution, and other analytics tooling you might be using it for.

If you’re using Embeddable, the toolkit for creating fast, fully bespoke user-facing analytics experiences, it integrates with Clickhouse and Druid. Embeddable will be able to connect with Rockset and Pinot as of Q1 2024.

4 best real-time analytics databases

Choosing the right database for storing and querying data can be a very impactful decision. This can ultimately affect your app’s performance and customer experience.

To help you out, we’ve compared our four favorite real-time analytics databases. Rockset, ClickHouse, Apache Druid, and Apache Pinot can all:

  • Process online analytical processing (OLAP) style queries 
  • Handle high queries per second (QPS) and are incredibly fast 
  • Support customer-facing analytics
  • Perform much faster than data warehouses like BigQuery, Snowflake, and Redshift (but might be less cost-efficient)

Let’s take a look at each one in-depth.

1. Rockset: Best for quick setup

Diagram of how Rockset works
Bring your data to Rockset and connect it to your application for end users to ask questions and get fast responses. Source: Rockset

Rockset is a cloud-based real-time analytics database. It’s also fully managed, which means the company handles the effort of running it and takes it off of your hands. 

Things to consider about Rockset

Ingestion in Rockset works just like the other mentioned tools as you can insert data from streams, lakes, or operational databases—both by batching or streaming. But the biggest benefit of Rockset is that it’s optimized for ingest latency. “This means it’s good for reliable freshness as the newly ingested data is immediately available for querying,” says Tom. 

You can query data directly from Rockset’s engine using a simple SQL script that you can turn into an API key and add to the application. It’s easy to use and has the best join support of all the players. A couple of features that will help reduce effort from your engineers include:

  • It infers your schema from the data on ingest (no need to specify schema upfront)
  • Supports unstructured data like JSON
  • Data is indexed automatically so you don't need to determine query patterns upfront
  • It automates configuration, deployment, and data redundancy

According to Rockset, it’s the fastest real-time analytics dashboard with: 90MB/s Streaming Ingest. 70ms p95 Query Latency; 20,000 QPS.

Companies that use Rockset

Many brands choose Rocketset as their real-time analytics database. These include:

  • JetBlue
  • WindWard
  • Allianz
  • Meta
  • Sequoia
  • SkyHive
  • Seesaw
  • Clinical ink
  • Command Alkon

What we think of Rockset

We like how fast Rockset is and how easy it’s on your engineering team to use this solution. However, it’s one of the newer solutions so it doesn't yet have the same ecosystem or community as some of the other players. 

Something else to consider: Some users report that it’s designed for up to 10s of terabytes of data, so you'll need to prioritize what data to ingest (unlike a data warehouse where you can ingest everything, and worry about how to use it later). But maybe it makes sense to think about what data really matters!

2. ClickHouse: Best database with batch ingestion

ClickHouse logo in the center and all the logos of the different compatible data sources around the main logo
ClickHouse ingests data from different types of databases and it’s compatible with multiple programming languages and data visualization tools. Source: ClickHouse

ClickHouse is an open-source, column-oriented, distributed, and OLAP database that’s very easy to set up and maintain. “Because it’s columnar, it’s the best architectural approach for aggregations and for ‘sort by’ on more than one column. It also means that group by’s are very fast. It’s distributed, replication is asynchronous, and it’s OLAP—which means it’s meant for analytics,” says Tyler Hannan, Senior Director of Developer Advocacy at ClickHouse.

Things to consider about ClickHouse

ClickHouse allows you to query and also perform several million writes per millisecond (check for yourself here), making this a very efficient database. Users choose ClickHouse for:

  • Real-time dashboards
  • Real-time analytics
  • Business intelligence (BI)
  • Data warehouse speed layer
  • Logging and metrics
  • Machine learning (ML) and data science 

This database provider uses materialized views for performance. This means you need to know how your query patterns will look upfront to get the best performance. And once you do, it makes ClickHouse a good alternative to data warehouses. “You need to define the relevant materialized views beforehand, but then, you get fantastic performance on those queries,” says Tom. 

ClickHouse also optimizes ingest throughput by batching data ingestion. This makes it great for high-volume ingest, but doesn’t guarantee that the latest inserted data will be part of a query result. This is because it needs to recreate materialized views in each batch update. 

Companies that use ClickHouse

These are some of the companies using ClickHouse for real-time dashboards and analytics:

  • Cloudflare
  • Microsoft
  • Contentsquare
  • OpenSea
  • highlight.io
  • Dassana
  • Disney+
  • GraphQL
  • Plausible

What we think of ClickHouse

If you want to use a database with a fast speed batch ingestion instead of streaming data, ClickHouse might be the right option for you. This super-fast database and its columnar approach make it great for aggregations in real-time analytics.

ClickHouse is also very efficient, it runs fast on your CPU or the cloud and makes good use of your system resources. However, it doesn’t support full-fledged transactions. It won’t let you edit or delete already inserted data with high rate and low latency (you can do this in batches). 

3. Apache Druid: Best database for enterprises

Diagram of how Apache Druid works
Every time a customer makes a query, Druid looks into its data nodes that are properly indexed and gives them an answer in milliseconds. Source: Apache Druid

Apache Druid is a fast and efficient database for real-time analytics applications. It’s high-performing thanks to its low query latency even while being able to handle multi-tenancy. Druid can run sub-second queries at a large scale. 

Things to consider about Apache Druid

Druid makes analytics available in real-time because it supports stream-based ingestion using Apache Kafka and Amazon Kinesis APIs. Since it also allows batching ingestion from multiple sources and formats, it can access historical data in milliseconds. “It makes the latest ingested data available immediately which is great for data freshness,” says Tom.

This database specializes in ranking, groupby, counting, and time trends. Since Druid is designed for big enterprises, it can handle huge amounts of data and scales very well but it also requires a dedicated team to run it. 

Apache Druid also has many enterprise features such as allowing you to prioritize particular queries, like jumping important jobs ahead of the queue. Users also like it because it supports high concurrency, which is particularly useful for real-time analytics. 

“When you're dealing with highly concurrent environments, you really need an architecture that’s designed for that CPU efficiency to get the most performance out of the smallest hardware footprint—which is another reason why folks like to use Apache Druid,” says David Wang, VP of Product and Corporate Marketing at Imply. (Imply offers Druid as a service.)

Companies that use Apache Druid

These are some of the brands that use Druid for real-time analytics:

  • Airbnb
  • Alibaba
  • Cisco
  • Deep.BI
  • GumGum
  • Nielsen
  • Salesforce
  • Shopify
  • Verizon

What we think of Apache Druid

Druid is a great option for businesses with large user bases as it was built for multi-tenancy, and it’s super fast at high concurrency. 

What we like the most about Druid is that it lets you prioritize queries. However, some say this is a feature you wouldn’t need to use with something like ClickHouse as that aims to make all queries very fast loading (so in theory there’s no reason to prioritize one over the other).

The downside to Apache Druid is that it requires a lot of dev time to operate. So, while you’re saving money on the system, you might need to hire or spend more developer hours just to manage the database.

4. Apache Pinot: Best for indexing data

Diagram of how Apache Pinot works
Connect your SQL or NoSQL databases, data lakes, and preferred sources to Apache Pinot and query your data to gain fast responses. Source: Apache Pinot

Apache Pinot is a tabular, distributed, OLAP datastore for big data real-time analytics. It was built by the LinkedIn engineering team after they outgrew Apache Druid.

Things to consider about Apache Pinot

It ingests data from sources such as operational data, data warehouses, data lakes, and data streams. 

The biggest value behind Apache Pinot is that you can index each column, which allows it to process data at a super fast speed. “It’s like taking a pivot table and saving it to disk. So you can get this highly dimensional data with pre-computed aggregations and pull those out in what seems like supernaturally fast time,” says Tim Berglund, Developer Relations at StarTree. (StarTree.ai offers Pinot as a service.)

So, Apache Pinot is great for enterprises with millions of end-users because it provides fast answers to concurrent queries. Just like Druid, it supports stream and batch ingest, and you can combine the two models. 

Companies that use Apache Pinot

In the list of businesses that use Apache Pinot, you’ll find: 

  • LinkedIn
  • Zoho
  • Uber
  • Microsoft Teams
  • 7-Eleven
  • Hyundai
  • Walmart
  • Target
  • NVIDIA

What we think of Apache Pinot

Apache Pinot is best for massive companies with a huge user base (think: LinkedIn, Instagram, or TikTok). We really like its indexing and tiered storage because it makes queries run very fast. 

Pinot also offers smart data layouts which improve the database performance. The main issue users report with Pinot is that it has limited support for joins and prefers inserts to updates, but the tradeoff is that you can get incredible performance at scale.

Make a decision: Which database is right for your analytics needs?

Comparison chart including Rockset, ClickHouse, Apache Druid  ndApache Pinot

These four database providers will help you create real-time analytics experiences for your end users. However, there are some considerations that you need to take into account. Assuming these are all very fast:

  • Use Rockset if you want fast streaming ingestion and need to host your database on the cloud. 
  • Use ClickHouse if you want to ingest data in batches and are okay with a minor delay in freshness.
  • Use Apache Druid or Pinot if you have a massive user base and your developers have the time to set up and manage the database. 

If you’re planning to use Embeddable for building custom user-facing analytics and haven’t yet decided which database is best for your needs, we’re always happy to chat and give some advice, if you want it.

With Embeddable you can build fully bespoke analytics using your own designs in just 10% of the time it would take you to build it from scratch. You control the frontend code, we handle the backend, and your non-developer team can use our no-code builder to make adjustments in seconds.

Find out how to ensure your embedded analytics is performant. Chat to us

Frequently asked questions about best databases for analytics

What is the best database for customer-facing analytics?

The best database for customer-facing analytics depends on several factors, including the scale of data, the need for real-time insights, the complexity of the queries, and the integration with other tools and systems. These four are among the fastest and most performant ones: 

  • Rockset 
  • ClickHouse 
  • Apache Druid 
  • Apache Pinot 

What to look for in a database for real-time analytics?

You should look for these considerations when choosing a database for real-time analytics: 

  • Performance: Does it process data efficiently? Is it scalable? 
  • Loading time: Can it load in real time?
  • Integrations: Does it connect to my tech stack?

Embeddable is registered in England as TMD Technology Limited (no. 13856879), at International House, 142 Cromwell Road, London, SW7 4EF.