Page tree
Skip to end of metadata
Go to start of metadata

Sorting when working with spreadsheets or tabular data is a common expectation - you'd be hard pressed to have not sorted a worksheet in Excel.

Insight features several reports that are presented as tables, but Insight does not sort the way Excel or similar applications like Google Sheets does. Why is this?

The Naivety of Excel

While Excel does a great job sorting based on a select column or series of columns, it's very naive when it comes to the relationships that might exist between those columns and rows A very important distinction to make between Insight and spreadsheet applications like Excel is that Excel and its similar derivatives are naive when it comes to data. That is, the data presented in rows and columns doesn't really describe a relationship Excel is aware of.

For example, consider this simple table:

Col1Col2Col3Col4
22America

low

23

3462UKlow21
890Americahigh11

At a glance, there are some patterns that stand out to us as humans. Col1 looks like it might correspond to a record ID because the same number never appears multiple times. Col2 strongly suggests a country name and Col3 seems like it's used as a relative value for the frequency or urgency of something. Col4 has the same patterns as Col1, so we may think it could be a reference to another ID and maybe not necessarily just a quantity.

However, when you present this table to Excel it derives no relationships or meaning to the columns and the data types of the rows. It's essentially just flat data. Nothing is stopping you from declaring =A1+B2 in an additional Col5, though you'd get a #VALUE error likely because Excel at least knows you can't add a number and a string. Though in normal practice you'd more likely put in a formula like =A1+D1 into a Column E (AKA a supposed Col5) at row 1 (E1). If you copy and pasted E1 into E2, you'd get the correct formula for =A2+D2

In Excel, relationships are generally first defined when you introduce formulas to do something useful relative to something else - usually in a different worksheet like summing up two columns and placing the result in a new column. But what about rows whose cells aren't used in formulas that are doing things like aggregation? It's just plain, old data to Excel.

This leads us to answering how Excel is able to sort rows very quickly relative to any arbitrary single column or a series of columns in order. Because Excel does not know anything about relationships between the individual cells making up each row, it is able to sort ascending/descending on practically anything. So when we select a column in Excel - for example, Col2 from the above, and sort ascending; our introduced Col5 (or E, whichever you prefer) is going to maintain the correct references to the other cells to add together. The semantics that Excel applies is simply going in alphabetic order ascending or descending.

Insight Searches For and Assigns Meaning

By comparison, Insight examines each row's columns (called fields) and their corresponding values (cells) to piece together some basic relationships at the row level. The most basic relationship Insight will derive is determining whether a column (field) is a dimension or measure.

Dimensions are columns that describe properties of the corresponding row. They're almost always a finite set of discrete values, which generally makes it easy to distinguish them from measures. Measures are what they sound like - columns whose values are quantified and can be aggregated on. Typically measures are going to be numeric, but the general rule of thumb is they typically comprise an entire spectrum of speculative values and having the same value occurring multiple times is fully valid and expected.

So let's take another look at our example data table above the way Insight would interpret it - we've annotated in italics how it would be analyzed:

Col1 

(Measure)

Col2

(Dimension)

Col3

(Dimension)

Col4

(Measure)

 

22America

low

23

3462UKlow21
890Americahigh11

That looks reasonable:

-We can perform aggregation on Col1 because all the values are numeric, so things like summing, totaling, averaging, and min/max work natively.

-Col2 has only two distinct values and neither of them are numeric. We don't think this is a unique identifier because we see America occurring twice. Therefore, this is most likely a dimension describing something about the data row.

-Col3 is similar to Col2 with only two distinct values that are not numeric. A non-specialized system without an interpreter to quantify non-numeric data isn't able to assign meaning to values like "low" and "high". So we have to consider this a dimension as well.

-Col4 is structured similarly to Col1, so it's safe to assume this is also a measure.

Let's pull the curtains back a little and adjust our data table to something we'd more likely encounter:

User ID

Country

Issue Severity

Occurrences

22America

low

23

3462UKlow21
890Americahigh11

With column names to inform us to a degree of what we're really looking at, this changes the relationships of the columns Insight should apply.

-User ID (Col1) should actually be a dimension because IDs while numeric are not intended to be aggregated (taking the average of a bunch of ID numbers produces nothing of value). Also, because they are representing ID numbers, we should expect to only see the same ID occur multiple times if the data row itself is forming a pivot.

-Country (Col2) and Issue Severity (Col3) remain as Insight inferred. The only thing to note is that Country may automatically have latitudinal and longitudinal measures automatically generated to correspond to a geopoint that exists within the country in case we wanted to ever map the values.

-Occurrences (Col4) stays as a measure because we expect to see the same values occur multiple times and semantically from the column name it suggests something we'd want to possibly aggregate.

How Relationships Affect Sorting

With the differences in how Excel and Insight initially perceive data, we can finally answer the question of how Insight sorts relative to what we generally expect with Excel.

The relationships Insight forms at the row level creates a strong likelihood of nesting, which is a relative hierarchy. You encounter logical nesting quite frequently when describing reports in natural language. For example, "Average Call Volume by Extension by Month". Such a report is really informing us of two dimensions and a measure - Extension and Month are dimensions leaving Call Volume a measure (because it's being aggregated by averaging).

In Insight, such a report would place those columns in an order like this: Month, Extension, Average Call Volume. The presentation you'd get in a table would look similarly to this (as an example):

Notice how there is a row dividing line applied at the Ext Number level and not at each individual data row. This is nesting at work. There's a relationship defined here that says "Group the measures first by Ext Number, then by Description, then by Day". Because there is no flat row layout as you'd find with Excel, there's no simple way to sort everything relative to a single arbitrary column. Attempting to disassemble the nesting relationship defined in the visualization (table) and then reconstructing it arbitrarily rearranging the nesting order could very well result in a completely different value on aggregated measures, which would be our Total Calls and Total Minutes.

In "flat tables" like below, we still cannot achieve Excel-like sorting in Insight because Insight is still assigning a nesting hierarchy. Just in this case every "row" is really a nesting group consisting of just that row. When Insight sorts, it's doing so relative to a part of the hierarchy (nesting group) and not on all the individual rows overall. If we tried sorting on this "flat table" we'd either see nothing visibly change or get a wrong sort that doesn't meet our presumed expectations. This occurs because Insight is sorting X individual nesting groups that all consist of one row each - in others, every nesting group is already sorted because it's only comparing to itself being the only row in the nesting group.

Insight can sort, but it's going to be applied to some level of the nesting group. When nesting groups don't occur, you're typically going to end up with a situation as described above with X nesting groups of one row each.

With our call volume example above, we can sort this based on the Ext Number in respect to the Total Calls or Total Minutes. This would cause each nesting group defined by the Ext Number to sort within itself. The result we'd get sorting ascending on Ext Number by Total Calls for Ext Number 279 would be:

DayTotal CallsTotal Minutes
October 7, 201910.17
October 4, 201921.85
October 1, 2019211.52
October 2, 2019418.73
October 3, 201956.25

We'd get a different sorting order for the same Extension if we'd instead sorted it in respect to the Total Minutes.

I Understand but I Still Need to Sort Excel's Way

If you still need to sort tabular data the way Excel would we recommend you create a Crosstab (Excel) export download of the data then perform the sorting in Excel itself. However, do be aware that the Crosstab export will only contain the data as displayed in the report currently and will not retrieve all the available data. Before you create a Crosstab export be sure you've set data filters appropriately - particularly in regard to any grouping based on date/time intervals as measures themselves will generally always be aggregated. 

  • No labels