lens.explorer API¶
Explore a Summary

class
lens.explorer.
Explorer
(summary, plot_renderer=<function _render>)[source]¶ An explorer to visualise a Lens Summary
Once a Lens
Summary
has been generated withlens.summarise.summarise()
, this class provides the methods necessary to explore the summary though tables and plots. It is best used from within a Jupyter notebook.
cdf_plot
(column)[source]¶ Plot the empirical cumulative distribution function of a column.
Creates a plotly plot with the empirical CDF of a column.
Parameters: column : str
Name of the column.

column_details
(column, sort=False)[source]¶ Show typespecific column details.
For numeric columns, this method produces a table with summary statistics, including minimum, maximum, mean, and median. For categorical columns, it produces a frequency table for each category sorted in descending order of frequency.
Parameters: column : str
Name of the column.
sort : boolean, optional
Sort frequency tables in categorical variables by category name.

correlation
(include=None, exclude=None)[source]¶ Show the correlation matrix for numeric columns.
Print a Spearman rank order correlation coefficient matrix in tabular form, showing the correlation between columns. The matrix is reordered to group together columns that have a higher correlation coefficient. The columns to be shown in the table can be selected through either the
include
orexclude
keyword arguments. Only one of them can be given.Parameters: include : list of str
List of columns to include in the correlation plot.
exclude : list of str
List of columns to exclude from the correlation plot.

correlation_plot
(include=None, exclude=None)[source]¶ Plot the correlation matrix for numeric columns
Plot a Spearman rank order correlation coefficient matrix showing the correlation between columns. The matrix is reordered to group together columns that have a higher correlation coefficient. The columns to be plotted in the correlation plot can be selected through either the
include
orexclude
keyword arguments. Only one of them can be given.Parameters: include : list of str
List of columns to include in the correlation plot.
exclude : list of str
List of columns to exclude from the correlation plot.

crosstab
(column1, column2)[source]¶ Show a contingency table of two categorical columns.
Print a contingency table for two categorical variables showing the multivariate frequancy distribution of the columns.
Parameters: column1 : str
First column.
column2 : str
Second column.

describe
()[source]¶ General description of the dataset.
Produces a table including the following information about each column:
desc
 the type of data: currently
categorical
ornumeric
. Lens will calculate different quantities for this column depending on the value ofdesc
. dtype
 the type of data in Pandas.
name
 column name
notnulls
 number of nonnull values in the column
nulls
 number of nullvalues in the column
unique
 number of unique values in the column

distribution
(column)[source]¶ Show properties of the distribution of values in the column.
Parameters: column : str
Name of the column.

distribution_plot
(column, bins=None)[source]¶ Plot the distribution of a numeric column.
Create a plotly plot with a histogram of the values in a column. The number of bin in the histogram is decided according to the FreedmanDiaconis rule unless given by the bins parameter.
Parameters: column : str
Name of the column.
bins : int, optional
Number of bins to use for histogram. If not given, the FreedmanDiaconis rule will be used to estimate the best number of bins. This argument also accepts the formats taken by the bins parameter of matplotlib’s :function:`~matplotlib.pyplot.hist`.

pairwise_density_plot
(column1, column2)[source]¶ Plot the pairwise density between two columns.
This plot is an approximation of a scatterplot through a 2D Kernel Density Estimate for two numerical variables. When one of the variables is categorical, a 1D KDE for each of the categories is shown, normalised to the total number of nonnull observations. For two categorical variables, the plot produced is a heatmap representation of the contingency table.
Parameters: column1 : str
First column.
column2 : str
Second column.
