In the first part of this series, we explored the essentials of Power BI’s DirectQuery mode. We discussed its core components and initial steps towards optimization. We touched on the topic of formula engine and the storage engine in Power BI, the importance of a well-structured data source, and the basics of optimizing Power BI DirectQuery performance. Furthermore, we explained where potential bottlenecks in performance can occur
Moving forward in Part 2, we are going to expand on these ideas. Focusing on practical ways to enhance your DirectQuery performance in Power BI. This includes understanding the role of Power BI data gateway, exploring options for data streaming, and addressing key aspects of data source query performance. We’ll also look at how thoughtful data modeling, especially using a star schema, and careful application of Power Query and DAX transformations, can make a big difference in your Power BI implementations.
When optimizing Power BI DirectQuery performance, the Power BI on-premise data gateway could have a big part.
A Power BI data gateway is needed whenever you are using on-premise data sources (and in some cases cloud data sources within a private network). When using an on-premise data source there are some options available when it comes to performance monitoring and performance tuning. These can be helpful in cases where you have identified a data gateway as a bottleneck causing performance issues in your DirectQuery semantic model.
In cases where default logs do not contain enough information, you can enable additional logging in the data gateway application. Keep in mind, this will cause the logs to grow significantly in size. It’s recommended that you turn this off whenever you are done troubleshooting and not keep it enabled by default.
Power BI on-premise data gateway is a Microsoft product that is used to connect Power BI reports and semantic models to underlying non-cloud data sources. It acts as a secure bridge between the underlying on-premise data source and the Power BI service.
Optimize performance by streaming data through the data gateway
By default, the data gateway uses a spooling technique of the data before returning the result to the Power BI storage engine. This behavior can potentially cause slower performance for queries in DirectQuery mode.
The setting can be overwritten, causing data to be streamed instead of spooled, potentially causing better performance.
This is an advanced setting that is only accessible from the server where the data gateway is installed. It is not something that you should jump at as your first step to Power BI DirectQuery optimization. However, it is something that could be of use and should be considered as one of many tools in your toolbox.
The spooling technique used in Power BI on-premise data gateway means that data is "cached" on the server where the on-premise data gateway is installed before continuing to be uploaded to the Power BI service.
Data source query performance
Even if you follow all performance tips in this article, you might still have a data source that acts a significant bottleneck for the entire solution, or parts of it. This is something simply out of reach for Power BI and is something that you need to solve on the data source side. A crucial aspect of optimizing Power BI DirectQuery performance is understanding your data source query performance.
This is simply an extra cost of building Power BI DirectQuery solutions. You need to manage and optimize two different engines to get good performance. Both Power BI and your data source.
You might find that altering the type of queries sent from Power BI to your underlying data source is limited. So you will have to focus on minimizing bottlenecks at your source. In many cases this means analyzing query plans, optimizing indexes and materialize your data in a way that is best suited for the type of queries sent from Power BI.
A crucial step towards troubleshooting and optimizing Power BI DirectQuery performance is to understand where any potential bottlenecks lie in your data pipeline. There can be potential issues anywhere between your data source and your final report. Data gateway, network latency, Power Query transformations, data modeling issues, complex DAX logic just to name a couple areas where you might want to dive into.
If you followed the first part in this series you saw that there is a great way to capture the underlying query that is generated by Power BI and sent to your underlying data source. You can then use this query and run it in external applications, such as SQL Server Management Studio, to investigate further.
You can further troubleshoot and investigate DAX queries using DAX Studio. DAX Studio offers some great capabilities such as offline DAX authoring, server timings and query plans. You can find more information about how in part 1 and part 2 of my DAX studio series.
Understanding different concepts around data modeling is important in any BI solution, and Power BI is no different.
It’s recommended that you use star schema modeling in Power BI. Not only will it increase the performance of the model. It will increase the simplicity and minimize complexity of DAX code. Furthermore, it will simplify the type of filtering and relationships needed in your model to get the correct results.
There are also other benefits of using a star schema modeling technique. For example, you can create a model that supports multiple fact tables with different grains, which can be filtered by a single dimension. By using a star schema model, you will also avoid getting into issues such as auto exist in Power BI.
Star schema modeling or Dimensional modeling is a combination of methodologies and modeling techniques aimed towards building data models for business intelligence, data analysis and reporting purposes. Kimball modeling, founded by Ralph Kimball, has been around for decades and has evolved into one, if not the best practice way to build analytical and reporting friendly data models. Power BI is a model based tool. This means that there is a dataset / semantic model beneath the reporting layer. Therefore, Power BI allows for importing star schemas and keeping the model as is, including relationships between tables. Moreover, Power BI supports importing or creating multiple star schemas and multiple fact tables in the same dataset, called fact constellation schemas or a galaxy schemas. This essentially means having multiple fact tables that connects (or can connect) to the same dimension.
When it comes to Power BI DirectQuery semantic models, the shape of your data and more specifically the shape of your data model, will greatly impact the performance of your data model. All requests in a report using a DirectQuery semantic model are forwarded as queries to your underlying data source. The type of request and the amount of data returned will directly impact on the performance of your model.
Therefore, not having your data model in good shape will cause Power BI to be much less forgiving in terms of performance. More specifically, this means not using a star schema model.
Interested about the purpose of dimensional modeling and why it’s important in Power BI?
I’ve written an in-depth article about exactly this topic!
Within the scope of Power BI, data can be transformed using Power Query or DAX.
Generally, Data should be transformed as far upstream as possible, and as far downstream as necessary. The main recommendation in Power BI is to do transformations in Power Query whenever possible. Transformations in DAX should only be done when absolutely necessary. Following the best practice here will allow you to limit technical debt, increase reusability of your code as well as increase the performance of your model.
Power Query is both a data preparation engine and a data transformation query language. It's used in multiple tools e.g. Power BI, Excel, Azure Data Factory, Data Factory in Microsoft Fabric and Dynamics 365.
DAX, Data Analysis Expressions, is a functional language used in Analysis Services Tabular and Power BI. It is used to create definitions of calculations on top of a dataset. It has a similar syntax as expressions used in Excel. However, DAX operates over tables and columns, where Excel operates over cells. Even though the language has it similarities they do not function the same way.
Do you have data stored in a database or a central data warehouse or lakehouse? That’s even better. If possible, you should do the transformations there!
Unfortunately, even if you are consuming data from a database or a data warehouse, the content could be managed by other teams in the organization. Updating or adding new logic to the source can be subject to long and complicated processes. In some cases not even doable in a reasonable timeframe. In those cases, you will most likely need to use Power Query in Power BI to create such a model.
Skill and technical scope
One other thing to consider is the scope of the solution as well as the technical skillset of the developer(s).
Is the Power BI solution an enterprise scale asset which has huge demands on scalability, maintenance and performance?
Or perhaps built for a small niche case with small amounts of data, where lack of reusability and performance is not considered an issue? If it’s the latter, using for example Calculated columns in DAX will not be an issue, even if I would not recommend it. Are you able to build your logic any place more upstream than DAX? Then do that.
In many scenarios you can manage just fine even if you do not have the appropriate skillset in order to follow best practices. However, it’s important to understand that whenever you deviate from these best practices, there will be a time where the consequences eventually catches up with you. And you should prepare yourself for that.
Generally it boils down to avoiding technical debt and increasing performance as well as scalability and simplifying maintenance. And making sure that you use the correct tool for the job!
As mentioned in the earlier section, data transformations should always be done as far upstream as possible. The same goes for logic in DAX measures.
Following the best practices around data modeling in Power BI, your DAX logic will be much simpler to write. Ideally, you should only need to make simple calculations such as sum, average, max, min, count and so on. Of course, it all depends on the shape of your Power BI semantic model and the complexity of your requirements. There are cases where more complex logic is needed, for example time intelligence.
For example, if you have a fact table for sales you probably have a column containing the sales amount. In this case, a simple DAX measure could look something like the following
Perhaps you have a fact table for product ratings and a column containing the rating score of the product. Let’s say you want to display the average rating for a product. In this case, a DAX measure could look something like the following
As mentioned earlier, it’s recommended that you use star schema model in Power BI. One of the reasons is that your DAX logic will be much less complex in order to retrieve the correct calculations. By using a proper data model you will also reduce the risk of having too complex relationships in your model. These complex relationships can cause errors like circular dependencies or the need for bi-directional filters.
Do you not have the possibility to consume a pre-made star schema model in your Power BI solution? Try and create such a model instead using the modeling tools available in Power BI, such as Power Query. By doing this, you will have a much better and easier experience with DAX.
When it comes to queries generated by Power BI, visuals within a single report page is isolated from each other. This means that every visual on a report page will send at least one query to your underlying data source. Many times a single visual can also send multiple queries to load different parts or areas of a single visual.
This means that more visuals on a single report page results in more queries sent to your underlying data source.
Therefore, you should carefully plan your report layout and try to limit the number of visuals on a single page. If possible, try and split your reports into multiple pages or combine data from multiple visuals into single visuals.
For example, let’s say you want to visualize multiple measures in Power BI using the card visual. Try and combine these using a multi row card, the new card visual or a table visual instead. It will effectively mean better performance of the execution of the underlying queries.
This has to to with Power BI’s underlying architecture and how it translates DAX into queries sent to your data source.
A part from limiting the number of visuals on a page, it’s also wise to keep down the number of other graphical objects, such as images and shapes. These can also add to the load times on your report which in the end affects the experience of your end users.
There are some built in features in Power BI that can help you in optimizing your DirectQuery reports. For example by minimizing the numer of unnecessary queries sent to you underlying data source. This can also help you increase the user experience of your reports towards your end users.
Optimizing report for query reduction
When users are interacting your report, every action is triggering new queries to be sent to your underlying data source. One of these actions are cross-highlighting and cross-filtering. These can be turned off in Power BI desktop. Doing that can help in reducing queries sent to your underlying data source and potentially increase performance if your report.
There are multiple ways to change this setting. One way is to open the Optimize tab in Power BI desktop and change the Optimization preset to Query reduction. This will effectively turn off cross-highlighting and cross-filtering in your report as well as add a Apply button to your filters.
After enabling the Query reduction preset, you will be prompted with the following window.
Another way to change this setting is by opening Options in Power BI desktop and enabling Disabling cross highlighting/filtering by default in the Query reduction section.
This setting will also add a Apply button on your filter pane in Power BI. By adding this button, choosing filters in individual filters will not trigger queries to your underlying source. First, you need to press the Apply button. Once that is done, queries will be sent based on the selections made in the filters.
Another option is to alter the functionality of your report filters so that they do not automatically trigger a filter for every interaction. Instead you need to press a button that applies the selection of your filters.
Normally, every selection in a filter triggers a query to your underlying data source. Having lot’s of different visuals on a single page with many users simultaneously interacting with a report could have a huge negative impact on your performance.
Applying this ‘apply all’ button will effectively limit the number of queries sent to your underlying data source. It will potentially improve the performance of your reports and also the experience of your end users.
To add this button, navigate to the Optimize tab and select Apply all slicers button
In this second part of the series, we focused on topics to optimizing Power BI DirectQuery performance. Key topics included the effective use of the Power BI data gateway for managing data sources, optimizing data streaming through the gateway, and the critical role of data source query performance.
We emphasized the benefits of star schema data modeling in Power BI for improving performance and simplifying complexity. The guide also highlighted the importance of sensible use of Power Query and DAX transformations, recommending that transformations be performed as far upstream as possible.
Additionally, we provided insights into optimizing the report experience, discussing how to manage visuals and utilize Power BI’s built-in tools for query reduction and efficient filtering.
Each point presented here is a continuation of the foundational concepts from Part 1, aimed at equipping you with a robust toolkit for optimizing Power BI DirectQuery semantic models in Power BI across various projects.
To summarize, here are some key points that you need to consider
- Try and find the performance bottleneck as a first step in your performance troubleshooting
- Make sure that you understand your data source query performance and how it affects your Power BI performance
- Make sure that you adopt star schema modeling in Power BI
- Limit data transformations in Power BI. Always strive to perform data transformations as far upstream as possible
- Don’t overcomplicate your DAX measures. Try to keep things simple. If you can’t, then it’s likely you need to revise your data model.
- Limit number of visuals in a single Power BI report
Thank you for reading
Want to continue learning? Jump over to part 3 of this series, where we cover advanced topics such as limited and regular relationships, combining data from multiple data islands, changing query join types, DAX Fusion and parallelism. You do not want to miss that!
If you enjoyed this article or learned something from it, please share it with others or comment below.
What is your experience using DirectQuery? Have you encountered any specific challenges and how did you tackle them? Or do you have any other tips?
Please share below!