Working with Power BI DirectQuery datasets, you might sometimes find that performance issues can feel inevitable. Especially with larger datasets where many moving parts can cause potential complications.
In this series, I will guide you towards mastering DirectQuery in Power BI, starting with an overview to help you understand the aspects of DirectQuery performance and identifying performance issues.
In this initial article, I will present the foundation of understanding the DirectQuery behavior and the components it involves. My aim is to equip you with the knowledge to pinpoint where a problem originates, giving you the resources for effective troubleshooting and optimizing your Power BI DirectQuery dataset.
This post aims towards understanding the aspects of Power BI DirectQuery performance. If you are interested in learning more about DirectQuery you can find a comparison between Power BI storage modes here.
Understanding Power BI DirectQuery performance
There are many things that can cause performance issues in Power BI DirectQuery datasets. There are a lot of involved parts that can affect the performance of any model. In particular large datasets with larger data volumes.
Before diving in to DirectQuery specific topics, I want to introduce you to the two different types of engines behind the scenes in Power BI. These are called formula engine and storage engine.
The formula engine is responsible for generating request sent to the storage engine. It also handles data processing and calculations on top of data retrieved by the storage engine. The formula engine by itself can’t retrieve data from a data source.
The storage engine is responsible for retrieving data from an underlying data storage. The Import mode storage engine is called Vertipaq. The Vertipaq engine stores the data cached in-memory in your local dataset. However, when using DirectQuery mode, the data is not stored locally but is instead queried at runtime from your original data source.
“Vertipaq is a database engine used in e.g. Excel Pivot Table, SQL Server Analysis Services Tabular, Azure Analysis Services Tabular and Power BI. It’s an in-memory columnar storage database with capabilities to highly compress data with high read efficiency over large datasets. The Vertipaq storage engine is responsible for both storing data and querying the database when using Import mode in Power BI”
When it comes to DirectQuery datasets, the DirectQuery storage engine immediately forwards any request directly to the original data source. This means the root of any performance issues might lie either on the Power BI side, in your original data source, or both. Therefore, the first step in troubleshooting DirectQuery performance issues is to identify where the issues originates from.
Whenever a DAX calculation occurs, a DirectQuery data source query is executed. Power BI converts this into a query that your original data source can understand, for example SQL. Power BI reports with many visuals send more queries to your data source. Therefore, optimizing the layout of a report can be af wise move in order to improve performance.
“DAX, Data Analysis Expressions, is a functional language used in Analysis Services Tabular and Power BI. It is used to create definitions of calculations on top of a dataset. It has a similar syntax as expressions used in Excel. However, DAX operates over tables and columns, where Excel operates over cells. Even though the language has it similarities they do not function the same way.”
The content of the generated query depends on various aspects, including the type of data requested and the specific tables and relationships included from your dataset. Once your data source processes the query and returns the dataset, Power BI performs calculations on the results. These calculations might include things like basic mathematical operations and summarizations, all depending on the requested type of calculation.
Get that query!
A poorly performing underlying data source will likely have a negative affect on the performance of your Power BI DirectQuery dataset. Therefore, it’s important to understand how well your original data source handles the queries sent from Power BI.
You have several options to view a query generated from Power BI. It all depends on the type of data source that you are using for your data. The most common is using tracing applications which captures queries in real time, or digging through logs. Luckily, Power BI Desktop offers a straightforward method to view queries generated for a single visual.
By using the Performance Analyzer in Power BI Desktop, there is a function which allows you to copy the generated query for a single visual. You can find the Performance Analyzer in the Optimize ribbon in Power BI Desktop.
Using the Performance analyzer in Power BI allow you to copy and paste an underlying query in an external editor of choice to investigate it further. Even better, you have the option to run it against the original data source for yourself in any application you like.
To get the query from Performance analyzer, make sure that you generate a query by refreshing a visual. Either by choosing a filter or choosing “Refresh visual” from the Performance analyzer pane. Once your visual has refreshed. Expand the visual object in the Performance analyzer pane and select “Copy query”
When you copy the query, it goes to your clipboard and you can paste it into any editor you choose. Lets take a look at an example. Below, you can see a query pasted in a SSMS (SQL Server Management Studio) query window.
“SQL Server Management Studio (SSMS) offers a unified management environment for any SQL infrastructure, from SQL Server to Azure SQL Database. SSMS includes tools for configuring, monitoring, and administering SQL Server instances and databases. SSMS can be used to deploy, monitor, and upgrade the data-tier components utilized by your applications, as well as to create queries and scripts.”
The first part of the query shows the generated DAX statement which is used to make calculations on top of the dataset in Power BI, which is data retrieved from your original data source. The second part of the query is the actual generated DirectQuery query that was sent to your underlying data source. In this case the source is a SQL Server database and therefore the query generated is a T-SQL query.
If you are able to run your query against your original datasource directly, e.g. through SSMS or any other application, you will see that the query results are similar as the data used in your Power BI report visual.
When you find slow-running DAX queries not caused by poor performance of underlying data sources, you might want to troubleshoot them to pinpoint the exact issues in your code. However, doing this kind of troubleshooting directly in Power BI can be a tedious task, depending on the problem and the amount of code. I’m sure you have encountered this before. Luckily there are other great tools available that can help you, such as DAX Studio. To learn more, check out how to troubleshoot a DAX query from a Power BI visual.
Power BI on-premise gateway
If your data source is located on-premise, then using a Power BI on-premises data gateway is a requirement for a Power BI DirectQuery dataset to work while hosted in the Power BI service.
The performance of the data gateway also depends on the server or computer which it is hosted on. Hardware such as CPU, RAM and type of disk will have an impact on the performance of the data gateway.
On top of this, there are a lot of points to consider when installing and configuring a gateway. For example, its not recommended to install it on a machine which is using a wireless network. Installing it on a virtualized layer can also affect performance, so it’s not recommended.
Furthermore, you must understand that even if you follow the recommended hardware setup for your machine, other installed applications could consume resources of your machine and therefore impacting the performance of your on-premise data gateway.
1 million row limitation
There’s a fixed limit of 1 million rows for any query being sent to an underlying DirectQuery data source. This limitation exists to prevent queries that runs for too long taking up too much resources. In most practical scenarios, this will never cause an issue. However, depending on the optimization techniques that Power BI applies on your DirectQuery queries, e.g. query folding, and filters applied in the report, there’s a risk that queries sent exceeds the 1 million row limit.
“Query folding is a technique used by Power Query to generate a single query statement on top of source data. Power Query strives to transform any applied transformations back into a query that can be sent back to the underlying data source, without further processing in the Power Query engine.”
Data transformations in Power BI
Practically every action in a Power BI DirectQuery report generates a query to the underlying data source. Separately, it’s essential to remember that Power BI evaluates any data transformations at the time the query is generated. Any transformation made over existing columns in Power Query has to be folded into a query that can be sent back to your original data source. Because of this, there is a limitation in the number of transformations available in Power Query as well as functions in DAX. This is because some functions cannot be translated back into a query or would be too heavy to be calculated during query time.
DirectQuery datasets supports the use of calculated columns. However, Power BI evaluates them at query time. Unlike Import datasets, which store these values locally in the model, DirectQuery datasets do not store the values locally and must calculate them each time a query is sent to your data source. Depending on the type of data and the number of calculated columns, this can have implications on the performance of your model.
Improving data source performance
Have you identified performance issues in your original data source? Then you probably need to focus on improving the performance in your data source first. Remember, if your bottleneck lies before data gets into Power BI, there’s little you can do to mitigate any performance issues on the Power BI side.
Fixing the performance of your data source depends on a variety of things. It’s impossible to list all the optimization techniques here. There are however, a few general things to look for e.g. checking the source data structure and data model. For example, are you querying views which have underlying joins or filters with poor performance? You can also check available indexes in your source tables and create new ones where appropriate. Missing indexes or poor maintenance of existing ones can cause poor performance on any query.
Other areas affecting performance
Other parts can also affect the performance of a Power BI DirectQuery dataset. For example, there can be problems with high latency when moving data between your data source and Power BI. This can depend on the location of your original data source. Keep in mind there’s a difference running Power BI hosted in the service and using Power BI desktop locally. Power BI sends all DirectQuery queries from the location of the Power BI dataset. Datasets stored in the Power BI service resides in an Azure region, which might be far from the location of your underlying data source.
There could also be network configurations or a proxy that have negative impact on performance when moving data between your data source and Power BI dataset.
Summary
In this article, we looked at how Power BI’s DirectQuery mode works. We covered the fundamentals, such as the roles of different engines, understanding specific components of the on-premise data gateway, how network and hardware configurations affect performance, and the importance of a reliable data source.
Understanding how DirectQuery works is useful for more than just resolving technical concerns. It ensures that everyone who uses Power BI will have smoother, faster reports. Remember that it all starts with a good understanding of how DirectQuery works, from the data source to your Power BI report.
As we continue through this series, we’ll build on what you’ve learned here by providing practical suggestions to help you get the most out of Power BI DirectQuery datasets.
In part 2 of this series, we discuss practical ways to enhance your DirectQuery performance in Power BI. This includes understanding the role of Power BI data gateway, exploring options for data streaming, and addressing key aspects of data source query performance. We’ll also look at how thoughtful data modeling, especially using a star schema, and careful application of Power Query and DAX transformations, can make a big difference in your Power BI implementations.
In part 3 of this series, we cover advanced topics such as limited and regular relationships, combining data from multiple data islands, changing query join types, DAX Fusion and parallelism.
Thank you for reading
If you enjoyed this article or learned something from it, please share it with others or comment below.
What is your experience using DirectQuery? Do you have any tips? Please share them below!