lens.explorer API¶
Explore a Summary
-
class
lens.explorer.
Explorer
(summary, plot_renderer=<function _render>)[source]¶ An explorer to visualise a Lens Summary
Once a Lens
Summary
has been generated withlens.summarise.summarise()
, this class provides the methods necessary to explore the summary though tables and plots. It is best used from within a Jupyter notebook.-
cdf_plot
(column)[source]¶ Plot the empirical cumulative distribution function of a column.
Creates a plotly plot with the empirical CDF of a column.
Parameters: column : str
Name of the column.
-
column_details
(column, sort=False)[source]¶ Show type-specific column details.
For numeric columns, this method produces a table with summary statistics, including minimum, maximum, mean, and median. For categorical columns, it produces a frequency table for each category sorted in descending order of frequency.
Parameters: column : str
Name of the column.
sort : boolean, optional
Sort frequency tables in categorical variables by category name.
-
correlation
(include=None, exclude=None)[source]¶ Show the correlation matrix for numeric columns.
Print a Spearman rank order correlation coefficient matrix in tabular form, showing the correlation between columns. The matrix is reordered to group together columns that have a higher correlation coefficient. The columns to be shown in the table can be selected through either the
include
orexclude
keyword arguments. Only one of them can be given.Parameters: include : list of str
List of columns to include in the correlation plot.
exclude : list of str
List of columns to exclude from the correlation plot.
-
correlation_plot
(include=None, exclude=None)[source]¶ Plot the correlation matrix for numeric columns
Plot a Spearman rank order correlation coefficient matrix showing the correlation between columns. The matrix is reordered to group together columns that have a higher correlation coefficient. The columns to be plotted in the correlation plot can be selected through either the
include
orexclude
keyword arguments. Only one of them can be given.Parameters: include : list of str
List of columns to include in the correlation plot.
exclude : list of str
List of columns to exclude from the correlation plot.
-
crosstab
(column1, column2)[source]¶ Show a contingency table of two categorical columns.
Print a contingency table for two categorical variables showing the multivariate frequancy distribution of the columns.
Parameters: column1 : str
First column.
column2 : str
Second column.
-
describe
()[source]¶ General description of the dataset.
Produces a table including the following information about each column:
desc
- the type of data: currently
categorical
ornumeric
. Lens will calculate different quantities for this column depending on the value ofdesc
. dtype
- the type of data in Pandas.
name
- column name
notnulls
- number of non-null values in the column
nulls
- number of null-values in the column
unique
- number of unique values in the column
-
distribution
(column)[source]¶ Show properties of the distribution of values in the column.
Parameters: column : str
Name of the column.
-
distribution_plot
(column, bins=None)[source]¶ Plot the distribution of a numeric column.
Create a plotly plot with a histogram of the values in a column. The number of bin in the histogram is decided according to the Freedman-Diaconis rule unless given by the bins parameter.
Parameters: column : str
Name of the column.
bins : int, optional
Number of bins to use for histogram. If not given, the Freedman-Diaconis rule will be used to estimate the best number of bins. This argument also accepts the formats taken by the bins parameter of matplotlib’s :function:`~matplotlib.pyplot.hist`.
-
pairwise_density_plot
(column1, column2)[source]¶ Plot the pairwise density between two columns.
This plot is an approximation of a scatterplot through a 2D Kernel Density Estimate for two numerical variables. When one of the variables is categorical, a 1D KDE for each of the categories is shown, normalised to the total number of non-null observations. For two categorical variables, the plot produced is a heatmap representation of the contingency table.
Parameters: column1 : str
First column.
column2 : str
Second column.
-