Why can performing a split by position on a column cause a dataflow to load more data than expected?

Study for the Fabric Analytics Engineer Associate Test. Engage with interactive flashcards and multiple-choice questions complete with hints and explanations to solidify your understanding. Get thoroughly prepared for your certification exam!

Multiple Choice

Why can performing a split by position on a column cause a dataflow to load more data than expected?

Explanation:
Split by position can’t be folded back to the data source. In Power Query/Dataflows, query folding pushes as many operations as possible down to the source so only the needed rows are retrieved. When you split by position, the transformation isn’t something the source can apply, so the data must be pulled into Power Query first, then the split is performed and the filters are applied afterwards. That means more data is loaded into the flow than if the filtering could have happened at the source. So the reason this causes more data to be loaded is that the data is pulled into Power Query before filtering, rather than filtering at the source.

Split by position can’t be folded back to the data source. In Power Query/Dataflows, query folding pushes as many operations as possible down to the source so only the needed rows are retrieved. When you split by position, the transformation isn’t something the source can apply, so the data must be pulled into Power Query first, then the split is performed and the filters are applied afterwards. That means more data is loaded into the flow than if the filtering could have happened at the source.

So the reason this causes more data to be loaded is that the data is pulled into Power Query before filtering, rather than filtering at the source.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy