Data Sources¶
GIS building data¶
In the case of France, the GIS building data used is taken from the ‘Bâtiment’ table of the ‘Bâti’ section of the BDTOPO V3 database produced by IGN. This table contains polygons representing the footprints and heights of buildings obtained from aerial images as well as descriptive attributes obtained from merging the GIS data with cadastre and tax registers. The following attributes are available :
NATURE : architectural type of the building
USAGE1 : main use of the building (residential, services, industrial, agriculture, etc.)
USAGE2 : secondary use of the building
DATE_APP : construction date
LEGER : set to True if the building does not have foundations or is open on one side
NB_LOGTS : number of dwellings
NB_ETAGES : number of floors
MAT_MURS : wall material
MAT_TOITS : roof material
HAUTEUR : height of the building (between ground and bottom of the roof)
Z_MIN_SOL : minimal altitude of the floor
Z_MAX_SOL : maximal altitude of the floor
Z_MIN_TOIT : minimal altitude of the roof
Z_MAX_TOIT : maximal altitude of the roof
The datasets can be downloaded by department from the IGN website.

A view of BDTOPO footprint data with an OpenStreetMap layer¶
The role of this dataset in the buildingmodel methodology is central, as it provides a simplified geometry (3D right prism) of the buildings to simulate, including the adjacencies between them and the number of floors. In addition, the descriptive data is used to pair the buildings with coherent district-level census data and energy diagnosis samples
Census data¶
Population and housing censuses are regularly counducted by countries to obtain accurate data on their population characteristics, its localization and housing conditions. Of particular interest to buildingmodel’s purposes is the data regarding housing conditions at the local level. In the case of France, it consists in the “Fichier Détail Logements Ordinaires” which contains a description of each dwelling anonymized at the district level. The following attributes of the dwellings are relevant for buildingmodel :
IRIS : the administrative code of the district
ACHL : period of construction of the building
CATL : category of occupancy of the dwelling (main residence, secondary home, vacant, etc.)
CHFL : building-level heating system
CMBL : main heating fuel
INPER : number of occupants of the dwelling
SURF : area class of the dwelling
TYPL : type of dwelling (individual house, apartment, etc.)
The dataset can be downloaded by department from the INSEE website.

The number of dwellings in the census dataset by construction year class and residential type¶
Energy diagnosis data¶
Energy diagnosis are performed to estimate the energy consumption related to heating, cooling, ventilation and domestic hot water in buildings. Two main methodologies are used that differ in the way the energy consumption is calculated :
Bill-based : energy bills are collected, analyzed and corrected to estimate the energy consumption
Audit-based : an energy audit is performed to determine the characteristics of the dwelling envelop, ventilation, heating and domestic hot water systems. The result of the audit is then used as input of a conventional calculation method based on static thermal losses modelling to estimate the energy consumption of the dwelling.
Additional data describing the dwelling (date of construction, number of occupants, living area, address) are collected and will be used in buildingmodel to match the energy diagnosis records with similar buildings derived from GIS and census data.
In France, energy diagnosis are mandatory for dwellings when they are sold or rented since 2006. ADEME, an agency in charge of improving energy efficiency, collects the records and makes them available at ADEME website.
While the availability of more than 5 million energy diagnosis records is welcome, great care has to be taken in the exploitation of this database for the following reasons :
Bill-based diagnosis cannot distinguish between occupant behavior (such as heating setpoint, dhw use, presence duration) and dwellings’ intrinsic energy performance
Audit-based diagnosis necessitates the collection of dozens of characteristics that can be hard to obtain, while allowing auditors to use default values when they estimate the data cannot be retrieved
Contrary to GIS and census data collection which are supervised by public entities, energy diagnosis are performed by companies commissioned by dwelling owners. This creates a conflict of interest, as the result of the energy diagnosis has a direct impact on the market value of the dwelling.
A selection bias is created by the fact that the energy diagnosis has been mandatory only when the dwelling is sold or rented

The distribution of primary energy consumption by living area for the complete energy diagnosis database of residential dwellings.¶
The large discontinuities around the limits of each energy class illustrate the conflict of interest described above : when estimated consumption are close but above the lower bound of an energy class, a significant share of auditors will falsify the results to get just below the lower bound and gain an energy class.

The number of energy diagnosis records in the ADEME dataset by construction year class and residential type. The comparison with the equivalent data from the census dataset illustrates the selection bias in energy diagnosis.¶
Climate data¶
- The following climate measurements at a 1-hour resolution are used as input of buildingmodel :
air temperature
direct normal radiation
diffuse horizontal radiation
dew point temperature
opaque sky cover
Buildingmodel includes a wrapper to pvlib read_epw function to allow easy use of the EPW file format.
Note
When estimating average energy consumption of buildings over long periods of time, it is recommended to use synthetic weather data sets designed to encapsulate the wide range of conditions in one typical or reference year. Such data sets are made available for hundreds of locations at Climate.OneBuilding.Org.