Comparing Import vs DirectQuery Storage Modes in Power BI

When building semantic models in Power BI, you have different options on how to connect and store the data in your models. There are two main Power BI storage modes to choose from when querying data from your data source. These are import mode and DirectQuery mode. Import mode in Power BI allows you to import the data in your local semantic model and use that as a source for your interactive reports. DirectQuery in Power BI on the other hand, involves querying the data source directly. In this post I will go through the functionality between Import mode vs DirectQuery mode and give some considerations on when to use which mode when connecting to your data from Power BI.

import mode and direct query mode in get data dialog in Power BI

Power BI Import mode

Import mode stores data inside of Power BI’s Vertipaq engine. Vertipaq is an in-memory columnar database, which is used as storage and query engine for cached data in Power BI.

Vertipaq is highly performant and optimized for storing and querying analytical data. It is therefore often considered as the preferred choice of storage mode. Data get stored in Vertipaq whenever data is refreshed from the source, which can be triggered manually or on a schedule (as well as programmatically). Vertipaq will then query your cached data storage whenever data needs to be presented in Power BI. Data is queried from the cache whenever an interaction from Power BI is made. This means that the data is only as up to date as it is in the cache. An interaction could be things like filtering a slicer, grouping values, loading a report page or toggling a report bookmark.

Power BI import mode is also the preferred choice of storage mode since it will benefit from Power BI’s very performant Vertipaq engine, which is highly optimized for reading analytical datasets.

Power BI DirectQuery

DirectQuery mode stores no data in Power BI, except for the cache for viewing the data in the Power BI visuals. Data is queried from the source whenever an interaction from Power BI is made. This means that the data is as up to date as it is in the data source. An interaction could be like filtering a slicer, grouping values, loading a report page or toggling a report bookmark.

Powe BI DirectQuery semantic models will query your data source whenever the data is needed for presentation. The performance of Power BI DirectQuery semantic models is therefore highly dependent on the performance of the data source system and the type of query that is executed. The performance can depend on a number of things like data source location, size of data, source system network setup and settings and source system performance. Because the data is brought in from an external data source, Direct Query will most likely offer less performance than using Import that queries data from Power BI’s vertipaq engine.

DirectQuery is only supported for a limited number of data sources. You can see the up to date list of supported data sources on Microsoft learn Power BI data sources

There are also some technical limitations you should be aware of when choosing to use DirectQuery mode. For example, the size or number of rows of data returned from the source query. There are also limitations in both Power Query and DAX. You can find a full list of all limitations of DirectQuery here

Composite model

Composite model allows for both Import datasets and DirectQuery datasets in the same dataset.

A Power BI semantic model becomes a composite model once you have either multiple DirectQuery datasets or a dataset consisting of both Import datasets and DirectQuery datasets. These datasets can have relationships between them and work together in your Power BI dataset and reports. The same characteristics of Import and DirectQuery datasets apply.

Hybrid tables

Hybrid tables allows you to mix scheduled refresh through import mode together with DirectQuery data, within the same table. The feature is used in combination with Power BI incremental refresh and currently requires Premium.

Hybrid tables allows you to get data beyond the incremental refresh setup by creating a DirectQuery partition that will include all data with date/time later than the incremental refresh period.

Live connection

Live connection is a method where you can create reports on top of remote semantic models. It’s essentially a thin report that has no underlying semantic model. When using Live connection, you create reports on top of already existing semantic models in Power BI, Azure Analysis Services (AAS) or SQL Server Analysis Services (SSAS). Because you are not creating a new dataset, data is not stored inside of your local (thin) Power BI report. When using a live connection, you are limited to only using one connection to a remote dataset. This means that you can’t combine a live connection with other data sources.

Power BI Import vs DirectQuery

Import mode is the general recommended way to go when working with Power BI. However, there are scenarios or specific requirements that requires DirectQuery usage.

Look into use DirectQuery mode when

  • You have such a large dataset that importing data is not and option. This can also be combined with import mode for aggregated data.
  • You have sensitive data that is not allowed to be stored in Power BI SaaS
  • You have data that needs to be updated (near) real time or more often than what is possible using scheduled refresh

It’s important to remember that your use case may differ from the points made in this post. There could be other very valid reasons to why you need to use DirectQuery. But hopefully this will give you a good starting point on choosing if and when you need to use DirectQuery for your datasets.

Thank you for reading

If you enjoyed this article or learned something from it, please share it with others or comment below.

Links

Power BI Composite models
Power BI Hybrid tables
Power BI data sources
Power BI semantic model modes

Leave a Comment