Variability at various scales¶
At the use case scale of tens of thousands of dwellings, the variability in total energy consumption is low. This can be explained by considering how the stochastic inference processes are designed :
The dwelling inference works by pairing two datasets (census and GIS building data) that have been adjusted to be as close as possible at the district level in terms of numbers of dwellings in the groups formed by the common variables (‘residential_type’ and ‘construction_year_class’). As a consequence, the pairing is almost always successful at the district level (100% for the “Ambert-Livradois-Forez” case, 99.9% for the others). Thus, while the heating system attributed to each building may vary between runs, the overall mix at the district level will remain largely the same.
As for the building and boundary inference, the density of diagnosis data (ie the share of buildings with matching energy diagnosis records in the same area) combined with the ‘minimal_diagnosis_share’ parameter are the main factors influencing the geographic level of inference.
_inf_levels_se
,_inf_levels_ro
and_inf_levels_am
illustrate the disparities in the density of diagnosis data between cases while showing that most of the inferences still happen below or at the department level, which is only one level higher than the whole use case

Inference geographic levels, use case Saint-Etienne¶

Inference geographic levels, use case “Roannais”¶

Inference geographic levels, use case “Ambert-Livradois-Forez”¶
Logically, the variability observed at lower scales (from a few buildings to a few hundred buildings) is much higher.
We illustrate this by drawing random samples of increasing size and displaying boxplot of the resulting energy
consumptions by fuel (see variability_levels
).
From a qualitative standpoint, we can observe that :
For a few buildings, the variability is so high that only general tendencies of energy use can be obtained
For a few hundred, energy consumption can be predicted with a reasonable level of uncertainty (relative standard deviations in the 5-25% range)
For a few thousand, energy consumption can be predicted with a low level of uncertainty (relative standard deviations in the 0.1-5%)

Energy consumption boxplots for various sizes of building samples¶