When building datasets in Power BI, you have different options on how to connect and store the data in your model. There are two main modes to choose from when querying data from sources. These are import mode and direct query mode. Import mode allows you to import the data in your local dataset, and use that as a source for your interactive reports. Direct query on the other hand, involves querying the data source directly. In this post I will go through the functionality between Import vs Direct query and give some considerations on when to use which mode when connecting to your data from Power BI.
Power BI Import mode
Import mode stores data inside of Power BI’s Vertipaq engine. Vertipaq is an in-memory columnar database, which is used as storage and query engine for cached data in Power BI.
Vertipaq is highly performant and optimized for storing and querying analytical data. It is therefore often considered as the preferred choice of storage mode. Data get stored in Vertipaq whenever data is refreshed from the source, which can be triggered manually or on a schedule (as well as programmatically). Vertipaq will then query your cached data storage whenever data needs to be presented in Power BI. Data is queried from the cache whenever an interaction from Power BI is made. This means that the data is only as up to date as it is in the cache. An interaction could be things like filtering a slicer, grouping values, loading a report page or toggling a report bookmark.
Power BI import mode is also the preferred choice of storage mode since it will benefit from Power BI’s very performant Vertipaq engine, which is highly optimized for reading analytical datasets.
Power BI Direct Query
Direct Query mode stores no data in Power BI, except for the cache for viewing the data in the Power BI visuals. Data is queried from the source whenever an interaction from Power BI is made. This means that the data is as up to date as it is in the data source. An interaction could be like filtering a slicer, grouping values, loading a report page or toggling a report bookmark.
Direct Query will query your data source whenever the data is needed for presentation. The performance of Direct Query is therefore highly dependent on the performance of the data source system and the type of query that is executed. The performance can depend on a number of things like datasource location, size of data, source system network setup and settings and source system performance. Because the data is brought in from an external data source, Direct Query will most likely offer less performance than using Import that queries data from Power BI’s vertipaq engine.
Direct Query is only supported for a limited number of data sources. You can see the up to date list of supported data sources on Microsoft learn Power BI data sources
There are also some technical limitations you should be aware of when choosing to use Direct query mode. For example, the size or number of rows of data returned from the source query. There are also limitations in both Power Query and DAX. You can find a full list of all limitations of Direct Query here
Composite model allows for both Import datasets and Direct Query datasets in the same dataset.
A Power BI dataset becomes a composite model once you have either multiple Direct Query datasets or a model consisting of both Import datasets and Direct Query datasets. These datasets can have relationships between them and work together in your Power BI dataset and reports. The same characteristics of Import and Direct query datasets apply.
Hybrid tables allows you to mix scheduled refresh through import mode together with direct query data, within the same table. The feature is used in combination with Power BI incremental refresh and currently requires Premium.
Hybrid tables allows you to get data beyond the incremental refresh setup by creating a direct query partition that will include all data with date/time later than the incremental refresh period.
Live connection is a method where you can create reports on top of remote datasets. It’s essentially a thin report that has no underlying dataset. When using Live connection, you create reports on top of already existing datasets in Power BI, Azure Analysis Services (AAS) or SQL Server Analysis Services (SSAS). Because you are not creating a new dataset, data is not stored inside of your local (thin) Power BI report. When using a live connection, you are limited to only using one connection to a remote dataset. This means that you can’t combine a live connection with other data sources.
Power BI Import vs Direct Query
Import mode is the general recommended way to go when working with Power BI. However, there are scenarios or specific requirements that requires direct query usage.
Look into use Direct Query mode when
- You have such a large dataset that importing data is not and option. This can also be combined with import mode for aggregated data.
- You have sensitive data that is not allowed to be stored in Power BI SaaS
- You have data that needs to be updated (near) real time or more often than what is possible using scheduled refresh
It’s important to remember that your use case may differ from the points made in this post. There could be other very valid reasons to why you need to use direct query. But hopefully this will give you a good starting point on choosing if and when you need to use direct query for your datasets.
Thank you for reading
I hope you enjoyed this article as much as I did writing it. If you did, please share it with others or comment below.
Power BI Composite models
Power BI Hybrid tables
Power BI data sources
Power BI dataset modes