Variable Interactions in Query Driven Visualization
Luke Gosink, John C. Anderson, Wes Bethel, and Ken Joy
Obstacles hindering scientific research may be broadly categorized into two separate but overlapping groups. The first category, concerned mainly with issues of throughput, includes the challenges inherent in the efficient management and visualization of large-scale datasets. The second category includes difficulties innate to the task of gaining insight from datasets of high complexity.
Query-driven visualization (QDV) is well suited for performing analysis and visualization on datasets which are both large and highly complex. Tools like FastBit leverage highly efficient (in both terms of speed and compression) data management techniques to rapidly identify and visualize regions of interest within a dataset. These regions are user specified and take the form of boolean range queries. As such regions tend to be a smaller subsets of the original dataset, time and effort spent on analysis, visualization and interpretation are significantly reduced.
The information provided by the generated solution of a boolean range query helps to define spatial regions where specifically defined events occur. Beyond indicating these regions, however, the solution is a black box providing no additional information. Origins and directions of entropy change for chemical reactions, gradient directions and locations of flame fronts, vortex cores, etc. are all examples of phenomenon which are broadly characterizable through boolean range queries, but little is understood of the interactions and behavior which lie in the domain of these characterizations.
In such phenomenon, it is the behavioral trends between variables, or groups of variables, which are more important in providing insight than the traits of individual variables. Thus, the challenge is to identify these behavioral trends and utilize them to construct coherent and meaningful visualizations which convey information about the phenomenon of interest.
The novel contributions of this work are new techniques which extend the capabilities of QDV by providing intuitive insight in determining:
- how relationships between variables interact to generate the phenomenon of interest in complex datasets, and
- what role other variables play in creating/altering these interactions.
We utilize the cumulative distribution functions (CDF) generated by the solution of a given query. The CDF of a query is an n-dimensional field where each of dimension corresponds to one of the n variables in the query. Each of the n fields indicates the population of data from a given variable that satisfies a particular query. Succinctly, the CDF of the query is an aggregate of 1-D histograms (one for each variable).
In QDV, the solution set for a query is a list of records which satisfy a set of variable dependent range-restrictive conditions. The CDFs for these variables are formed by integrating over the solution space and accumulating the values given by the respective functional mappings independently as a histogram. Examining the CDFs of the query's variables reveals initial information about statistical regions of interest.
We extend this analysis further to reveal trends between a query's variables by defining correlation fields between pairs of variables. These mappings exist both for variables expressed in the query and those excluded from the query. The correlation field created by any particular pair of variables is used in conjunction with the CDFs of each of the query’s variables to reveal, both visually and statistically, trends of behavior and interaction between the variables defined in the given query.
We apply our approach to a dataset that models turbulent combustion for a methane based V-flame (see Figure 1). The simulated dataset, consisting of 38 variables, is generated from the DRM-19 subset of the GRI-Mech 1.2 methane mechanism for chemical kinetics. This mechanism models the combustion behavior of methane by considering 20 chemical species and 84 fundamental reactions. Our goal is to provide insight into the interactions between these various species.
The following image depicts the iso-surface of temperature at increasing values. The iso-surface is colored by the correlation space constructed from the variables oxygen and ethylene. As the temperature increases (left to right, top to bottom) we observe a full spectrum color change from red to blue. Here red regions indicate regions of high positive correlation between oxygen and ethylene and blue regions indicate areas of strong negative correlation. The green regions indicate independence between the two variables. Green regions also indicate regions of increased entropy where flame-front regions exist.
This image depicts a cut-away view of the third time step from the image above where temperature has been rendered through a correlation ﬁeld constructed from oxygen and ethylene. Here the iso-surface of temperature (in green) is shown to "thread" the highest iso-surface values for ethylene (in blue). This iso-volume formed by temperature is the region where the flame-front regions exist.
- Luke Gosink, John C. Anderson, Wes Bethel, Ken Joy, Variable Interactions in Query Driven Visualization, in: IEEE Transactions on Visualization and Computer Graphics (Proceedings Visualization / Information Visualization 2007), pp. 1400--1407, 2007.