Dwelling Inference

The process of inference aims at enriching GIS records by matching them with records from non-localized data sources containing more complete information. In the case of dwelling inference, we use the ‘construction_year_class’, ‘residential_type’ and ‘district’ derived from GIS data to find the corresponding records in the census data from which we extract the following attributes :

  • ‘occupancy_type’ : (possible values : ‘primary residence’, ‘second home’, ‘vacant’, ‘occasional housing’)

  • ‘occupant_count’ : the number of dwelling occupants, from 0 to 6 (set to 2 for second homes)

  • ‘heating_system’ : the main heating system of the dwelling (possible values : ‘electric_heater’, ‘electric_heat_pump’, ‘oil_boiler’, ‘gas_boiler’, ‘wood_boiler’, ‘district_network’)

  • ‘living_area_class’ : (possible values : ‘Less than 30 m²’, ‘From 30 to 40 m²’, ‘From 40 to 60 m²’, ‘From 60 to 80 m²’, ‘From 80 to 100 m²’, ‘From 100 to 120 m²’, ‘More than 120 m²’)

Algorithm

  1. All the dwellings are grouped according to ‘construction_year_class’, ‘residential_type’ and ‘district’

  2. The records with the same set of attributes in the census database are selected

  3. If there are no records, the operation is repeated by replacing ‘district’ with a higher geographic level (‘city’, ‘city_group’, ‘department’, ‘region’) until records are found

  4. If the total weight of the selected census records is lower than the number of dwellings in the group, the weights are scaled to surpass it

  5. A list of census records with unit weight is generated by repeating the records N times, where N is the integer closest to the record’s weight

  6. The matching census records are selected by random sampling without replacement

  7. The inferred data is transferred from the census records to the dwellings

  8. The geographic level of inference is recorded as ‘inference_geo_level’

Note

When inferring data for dwelling with ‘apartment’ ‘residential_type’, a consistent ‘heating_system’ must be enforced for a given building, which can contain several dwellings. To achieve this, we add ‘building_id’ to the attributes used to group dwellings and select a single census record from which the ‘heating_system’ is taken and applied to the group of dwellings. The algorithm above can then be applied with ‘heating_system’ as an additional matching attribute.

Living area estimation (experimental)

Estimating the living area in residential buildings is necessary to determine their energy consumption and energy efficiency. However, there does not exist a simple relationship between the living area and the floor area of a building. Below is a scatter plot of living area and floor area, obtained by matching an energy diagnosis record (containing the living area) to a BDTOPO building address using the ADRESSE PREMIUM database from IGN.

../_images/living_area_vs_floor_area.png

Living area as a function of floor area for buildings with address-level matching energy diagnosis record in the Rhône department (a filter on minimal and maximal values of areas has been applied)

The approach taken to circumvent this issue relies on using the subset of buildings for which the living area can be obtained from a diagnosis record to create probabilistic models of the relationship between living area and floor area for various building configurations.

Individual houses

For individual houses, we only need to find the living area of a single dwelling. To achieve this, we estimate the ratio between living area and floor area (living area share). We start by grouping the buildings in three categories :

  • the building has no other use and no annex

  • the building has no other use and an annex

  • the building has multiple uses

For each category, we define intervals of floor areas for which the histogram of living area share is displayed below.

../_images/living_area_share_distribution_house_only.png

Histograms of living area share for the various floor area intervals for the category ‘no other use, no annex’

../_images/living_area_share_distributions_house_with_annex.png

Histograms of living area share for the various floor area intervals for the category ‘no other use, with annex’

../_images/living_area_share_distributions_house_with_other_use.png

Histograms of living area share for the various floor area intervals for the category ‘multiple use’

The marked differences in the living area share distribution for the various categories and intervals of floor area confirm the relevancy of these groupings. However, a significant amount of anomalous data is present such as buildings with no other use and no annex the living and a living area share below 30%, or buildings with a living area share close to or above 100%. To accommodate these limitations, we propose the following procedure :

  • fit a metalog distribution to each grouping of living area share

  • group the dwellings according to their ‘residential_only’, ‘has_annex’ and ‘floor_area’ attributes

  • draw the living area share in the corresponding metalog distribution

  • clip the living area share to minimal and maximal values depending on the ‘residential_only’ and ‘has_annex’ attributes

  • calculate the living area

Minimal and maximal values of living area share for individual houses

residential_only

has_annex

min living_area_share

max living_area_share

True

False

70%

85%

True

True

20%

70%

False

False

5%

50%

Warning

Coherence between the living area estimated and inferred census data could be obtained by drawing the census records in a sample with the corresponding living area class. However, in cases where the living area distribution does not match the living area classes at the district level, this creates a selection bias resulting in shares of heating systems that are incoherent with district level census data. As a consequence, we currently ignore the living area class when selecting the initial census record and correct the occupant count afterwards by drawing a new record with the living area class and heating system as additional matching attributes.

Apartments

For collective housing building, the objective is to estimate the living area of multiple dwellings, while managing anomalous situations for which the number of dwellings itself might need to be adjusted. The first part of the procedure consists in independently estimating the living area of each dwelling using the inferred living area class and metalog distributions of living area for each class obtained from a random sample of diagnosis data.

../_images/apartment_living_area_distribution.png

Histogram of living area by class for a random sample of energy diagnosis of apartments

The second part of the procedure is a corrective algorithm dealing with cases when the obtained building level living area share is outside bounds depending on the ‘residential_only’ and ‘has_annex’ attributes of the building. For such cases, the following steps are performed :

  • a living area share target is drawn inside the bounds

  • the areas of the smallest and biggest dwelling are multiplied by the ratio between the living area target and current value

  • if the values obtained lie in a range of reasonable living areas (10 to 250 m²), the scaling is performed for all living areas

  • if not, randomly selected dwellings are added or removed until the living area lies inside the bounds

Minimal and maximal values of living area share for apartment buildings

residential_only

has_annex

min living_area_share

max living_area_share

True

False

70%

90%

True

True

40%

70%

False

False

5%

70%

Building level records

Once the inference and living area calculations have been performed, the following attributes are added to the buildings :

  • ‘living_area’

  • ‘living_area_share’

  • ‘heating_system’

Application

We use the “Peuple-Boivin-St-Jacques” district of Saint-Etienne to illustrate the results obtained by the dwelling inference algorithm.

../_images/dwelling_inference_construction_year.png

Building construction year classes of district “Peuple-Boivin-St-Jacques, Saint-Etienne”.

../_images/dwelling_inference_heating_system.png

Building heating system of district “Peuple-Boivin-St-Jacques, Saint-Etienne”.

../_images/living_area_vs_floor_area_results.png

Living area vs floor area for buildings in district “Peuple-Boivin-St-Jacques, Saint-Etienne”.