Python

Pandas, Requests, Beautiful Soup

JUPYTER LAB – Barcelona Real Estate Market – Property Rent Data

BCN_REAL_ESTATE_API_MARKDOWN

JUPYTER LAB¶

BARCELONA: Real Estate Market – Property Rent¶

METHODOLOGY¶

DATA¶

Data can be accessed via APIs. Data is stored by quarter. Each quarter has its own API and contains:
data aggregated at the city level
data aggregated at the neighbourhood level (for all 73 neighbourhoods comprising the city)
Relevant data to extract is at the Neighbourhood level – 73 neighburhoods in total, in particular:
DISTRICT_CODE
DISTRICT_NAME
NEIGHBOURHOOD_CODE
NEIGHBOURHOOD_NAME
RENT_QUARTER_DATE: datetype added by merging RENT_YEAR and RENT_QUARTER
RENT_YEAR
RENT_QUARTER
RENT_PRICE_EUROS_x_MQ: average rent per square metre by neighbourhood
RENT_PRICE_EUROS_x_MONTH: average rent per month by neighbourhood

Source: Adjuntamiento de Barcelona – https://opendata-ajuntament.barcelona.cat/data/es/dataset/est-mercat-immobiliari-lloguer-mitja-mensual

PROCESS¶

API¶
APIs are constructed and added to a list to be iterated later
DATA NORMALISATION:¶
District and Neighbourhood codes need to have a ‘0’ in front of all codes up to 10.
DATA EXPORT TO EXCEL¶
Results are exported onto an Excel file. The prefix enables to differenciate the file and needs to be set at the beginning. The full name of the file, the sheet_name and the index_label are assigned automatically.

SEQUENCE¶

1) API
2) DATA NORMALISATION
3) DATA EXPORT TO EXCEL

BCN_REAL_ESTATE_API_CODE_QUARTER

JUPYTER LAB¶

BARCELONA: Real Estate Market – Property Rent¶

REQUIRED LIBRARIES¶

In [17]:

import pandas as pd
import requests

Average monthly rent (€/month) and per surface (€/m2) of the city of Barcelona¶

CREATE LIST OF RESOURCE IDS¶

In [18]:

#RESOURCE IDs:
R_ID_2022 = 'e96bf614-467b-40ab-91b9-e48a616ea775'
R_ID_2021 = 'cfc45f2b-62eb-4621-8486-1b90e36b4bfe'
R_ID_2020 = '47c9d64d-317a-45d0-8c45-45488df8601c'
R_ID_2019 = '004c76b1-6269-4136-89b2-89fd47046930'
R_ID_2018 = '3dc45b16-42a9-4f57-9863-e6d1a4f5869f'
R_ID_LIST = [R_ID_2022,R_ID_2021,R_ID_2020,R_ID_2019,R_ID_2018]

SET PREFIX AND PERIOD TO NAME COLUMNS AND EXCEL EXPORT FILE¶

In [19]:

#PREFIX for COLUMN NAMES
col_prefix = 'RENT'
#SET PERIOD for EXCEL FILE NAME
start_year = 2018
end_year = 2022

In [23]:

#NAME THE EXPORT EXCEL FILE
excel_file_name = 'T_REAL_ESTATE_{}_YEAR_QUARTER_{}-{}.xlsx'.format(col_prefix,end_year,start_year)
excel_sheet_name = '{}'.format(col_prefix)
excel_index_label = '{}_INDEX'.format(col_prefix)
#NAME COLUMNS
col_DISTRICT_CODE = '{}_DISTRICT_CODE'.format(col_prefix)
col_DISTRICT_NAME = '{}_DISTRICT_NAME'.format(col_prefix)
col_NEIGHBOURHOOD_CODE = '{}_NEIGHBOURHOOD_CODE'.format(col_prefix)
col_NEIGHBOURHOOD_NAME = '{}_NEIGHBOURHOOD_NAME'.format(col_prefix)
col_QUARTER_DATE = '{}_QUARTER_DATE'.format(col_prefix)
col_YEAR = '{}_YEAR'.format(col_prefix)
col_QUARTER = '{}_QUARTER'.format(col_prefix)
col_PRICE_EUROS_x_MONTH = '{}_PRICE_EUROS_x_MONTH'.format(col_prefix)
col_PRICE_EUROS_x_MQ = '{}_PRICE_EUROS_x_MQ'.format(col_prefix)
column_order = [col_DISTRICT_CODE,col_DISTRICT_NAME,col_NEIGHBOURHOOD_CODE,col_NEIGHBOURHOOD_NAME,col_YEAR,col_QUARTER,col_PRICE_EUROS_x_MQ,col_PRICE_EUROS_x_MONTH]
#CREATE API URL List:
API_URL_LIST = []
for R_ID in R_ID_LIST:
    url = 'https://opendata-ajuntament.barcelona.cat/data/api/action/datastore_search_sql?sql=SELECT%20*%20from%20%22{}%22'.format(R_ID)
    API_URL_LIST.append(url)
#CREATE A FINAL DATAFRAME WHERE TO CONCAT TOGETHER ALL SINGLE URL DATAFRAMES
df_final = pd.DataFrame()
#API
for url in API_URL_LIST:
    response = requests.get(url)
    if response.ok:
        data = response.json()
        result = data.get('result')
        records = result.get('records')
    else:
        print('Problem with: ', url)
    df = pd.DataFrame.from_dict(records)
    df = df.drop(df.columns[[3,8]],axis = 1)
    df_EUROS_MONTH = df.loc[df['Lloguer_mitja'] == 'Lloguer mitjà mensual (Euros/mes)']
    df_EUROS_MONTH = df_EUROS_MONTH.rename(columns={'Preu':'Preu_Euros_Month'})
    df_EUROS_MQ = df.loc[df['Lloguer_mitja'] == 'Lloguer mitjà per superfície (Euros/m2 mes)']
    df_merge = pd.merge(df_EUROS_MQ, df_EUROS_MONTH[['Codi_Barri','Any','Preu_Euros_Month']],on=['Codi_Barri','Any'], how='inner')
    df_merge = df_merge.drop(columns=['Lloguer_mitja'])
    #RENAME COLUMNS
    df_merge.columns = [col_DISTRICT_CODE, col_DISTRICT_NAME, col_QUARTER, col_PRICE_EUROS_x_MQ, col_NEIGHBOURHOOD_NAME, col_NEIGHBOURHOOD_CODE, col_YEAR, col_PRICE_EUROS_x_MONTH]
    #REORDER COLUMNS
    df_concat = df_merge[column_order]
    #CONCAT TO FINAL DATAFRAME
    df_final = pd.concat([df_final,df_concat], ignore_index=True)
#DISTRICT AND NEIGHBOURHOOD CODES: INSERT '0' IN FRONT OF DISTRICT CODES AND NEIGHBOURHOOD CODES UP TO 10
df_final[[col_DISTRICT_CODE]] = df_final[[col_DISTRICT_CODE]].apply(lambda x: ['0{}'.format(y) if len(y) == 1 else y for y in x])
df_final[[col_NEIGHBOURHOOD_CODE]] = df_final[[col_NEIGHBOURHOOD_CODE]].apply(lambda x: ['0{}'.format(y) if len(y) == 1 else y for y in x])
#REPLACE NA AND - WITH EMPTY CELL
df_final[col_PRICE_EUROS_x_MQ] = df_final[col_PRICE_EUROS_x_MQ].str.replace('-','').str.replace('NA','')
df_final[col_PRICE_EUROS_x_MONTH] = df_final[col_PRICE_EUROS_x_MONTH].str.replace('-','').str.replace('NA','')
#RESET DATA TYPES
df_final[col_DISTRICT_CODE] = df_final[col_DISTRICT_CODE].astype('string')
df_final[col_DISTRICT_NAME] = df_final[col_DISTRICT_NAME].astype('string')
df_final[col_NEIGHBOURHOOD_CODE] = df_final[col_NEIGHBOURHOOD_CODE].astype('string')
df_final[col_NEIGHBOURHOOD_NAME] = df_final[col_NEIGHBOURHOOD_NAME].astype('string')
df_final[col_YEAR] = df_final[col_YEAR].astype('int64')
df_final[col_QUARTER] = df_final[col_QUARTER].astype('int64')
#INSERT DATE COLUMN DERIVED FROM YEAR AND QUARTER COLUMNS
df_final.insert(4, col_QUARTER_DATE, pd.PeriodIndex(year=df_final[col_YEAR], quarter=df_final[col_QUARTER]).to_timestamp())
#DATAFRAME EXPORT TO EXCEL
df_final.to_excel(excel_file_name, sheet_name= excel_sheet_name, index_label=excel_index_label)

In [24]:

df_final

Out[24]:

	RENT_DISTRICT_CODE	RENT_DISTRICT_NAME	RENT_NEIGHBOURHOOD_CODE	RENT_NEIGHBOURHOOD_NAME	RENT_QUARTER_DATE	RENT_YEAR	RENT_QUARTER	RENT_PRICE_EUROS_x_MQ	RENT_PRICE_EUROS_x_MONTH
0	01	Ciutat Vella	01	el Raval	2022-01-01	2022	1	15.5	917.0
1	01	Ciutat Vella	01	el Raval	2022-01-01	2022	1	15.5	872.0
2	01	Ciutat Vella	01	el Raval	2022-04-01	2022	2	14.7	917.0
3	01	Ciutat Vella	01	el Raval	2022-04-01	2022	2	14.7	872.0
4	01	Ciutat Vella	02	el Barri Gòtic	2022-01-01	2022	1	17.2	1181.6
…	…	…	…	…	…	…	…	…	…
4959	10	Sant Martí	73	la Verneda i la Pau	2018-07-01	2018	3	11.42	756.1
4960	10	Sant Martí	73	la Verneda i la Pau	2018-10-01	2018	4	11.81	715.37
4961	10	Sant Martí	73	la Verneda i la Pau	2018-10-01	2018	4	11.81	719.55
4962	10	Sant Martí	73	la Verneda i la Pau	2018-10-01	2018	4	11.81	767.1
4963	10	Sant Martí	73	la Verneda i la Pau	2018-10-01	2018	4	11.81	756.1

4964 rows × 9 columns

JUPYTER LAB – Barcelona Real Estate Market – Property Sale Data

BCN_REAL_ESTATE_WEBSCRAPING_MARKDOWN

JUPYTER LAB¶

BARCELONA: Real Estate Market – Property Sale¶

METHODOLOGY¶

DATA¶

Data is stored in html pages. Each page refers to a year and contains:
data aggregated at the city level
data aggregated at the neighbourhood level (for all 73 neighbourhoods comprising the city)
Relevant data to extract is at the Neighbourhood level – 73 neighburhoods in total, in particular:
DISTRICT_CODE
NEIGHBOURHOOD_CODE
NEIGHBOURHOOD_NAME
YEAR
NUMBER_NEW: number of newly built houses traded
NUMBER_PROTECTED: number of houses in protected regime traded
NUMBER_USED: number of preexisting houses traded
AVG_MQ_NEW: average size in metre squares of newly built houses traded
AVG_MQ_PROTECTED: average size in metre squares of houses in protected regime traded
AVG_MQ_USED: average size in metre squares of preexisting houses traded
PRICE_EUROS_X1000_NEW: average price in thousand euros of newly built houses traded
PRICE_EUROS_X1000_USED: average price in thousand euros of preexisting houses traded
PRICE_EUROS_MQ_NEW: average price in euros per metre square of newly built houses traded
PRICE_EUROS_MQ_USED: average price in euros per metre square of preexisting houses traded

Source: Adjuntamiento de Barcelona – https://ajuntament.barcelona.cat/estadistica/catala/Estadistiques_per_temes/Habitatge_i_mercat_immobiliari/Mercat_immobiliari/Compravenda_habitatges/index.htm

PROCESS¶

WEB SCRAPING¶
The first column of the html page contains merged data is this order:
District Code
Neighbourhood Code
Neighbourhood Name
These codes and names are extracted and placed into separate columns for later reference in visualizations.
This is done by replacing the 5 spaces xa0 between the two codes by with a to enable a column split to separate the codes into distinct columns.
The character is later removed from the NEIGHBOURHOOD_NAME column.
District and Neighbourhood codes need to have a ‘0’ in front of all codes up to 10.
DATA CLEANING, NORMALISATION AND TRANSLATION:¶
Data is cleaned of spaces (Xa0) and dots (.) in numbers during web scraping.
DATA EXPORT TO EXCEL¶
Results are exported onto an Excel file. The prefix enables to differenciate the file and needs to be set at the beginning. The full name of the file, the sheet_name and the index_label are assigned automatically.

SEQUENCE¶

1) WEB SCRAPING
2) DATA CLEANING, NORMALISATION AND TRANSLATION
3) DISTRICT AND NEIGHBOURHOOD CODES: INSERT ‘0’ IN FRONT OF DISTRICT CODES AND NEIGHBOURHOOD CODES UP TO 10
4) DATA EXPORT TO EXCEL

BCN_REAL_ESTATE_WEBSCRAPING_CODE_QUARTER

JUPYTER LAB¶

BARCELONA: Real Estate Market – Property Sale¶

REQUIRED LIBRARIES¶

In [1]:

import requests
import pandas as pd
from bs4 import BeautifulSoup

SET THE PERIOD¶

In [2]:

start_year = 2018
end_year = 2022

SET PREFIX FOR COLUMNS AND EXCEL FILE NAME¶

In [3]:

#PREFIX for COLUMN NAMES
col_prefix = 'SALE'

WEB SCRAPING AND DATA EXPORT¶

In [6]:

#TO NAME THE EXPORT EXCEL FILE
excel_file_name = 'T_REAL_ESTATE_{}_YEAR_QUARTER_{}-{}.xlsx'.format(col_prefix,end_year,start_year)
excel_sheet_name = '{}'.format(col_prefix)
excel_index_label = '{}_INDEX'.format(col_prefix)
#TO NAME COLUMNS
col_DISTRICT_CODE = '{}_DISTRICT_CODE'.format(col_prefix)
col_NEIGHBOURHOOD_CODE = '{}_NEIGHBOURHOOD_CODE'.format(col_prefix)
col_NEIGHBOURHOOD_NAME = '{}_NEIGHBOURHOOD_NAME'.format(col_prefix)
col_QUARTER_DATE = '{}_QUARTER_DATE'.format(col_prefix)
col_YEAR = '{}_YEAR'.format(col_prefix)
col_QUARTER = '{}_QUARTER'.format(col_prefix)
col_NUMBER_NEW = '{}_NUMBER_NEW'.format(col_prefix)
col_NUMBER_PROTECTED = '{}_NUMBER_PROTECTED'.format(col_prefix)
col_NUMBER_USED = '{}_NUMBER_USED'.format(col_prefix)
col_AVG_MQ_NEW = '{}_AVG_MQ_NEW'.format(col_prefix)
col_AVG_MQ_PROTECTED = '{}_AVG_MQ_PROTECTED'.format(col_prefix)
col_AVG_MQ_USED = '{}_AVG_MQ_USED'.format(col_prefix)
col_PRICE_EUROS_x_1000_NEW = '{}_EUROS_x_1000_NEW'.format(col_prefix)
col_PRICE_EUROS_x_1000_USED = '{}_EUROS_x_1000_USED'.format(col_prefix)
col_PRICE_EUROS_x_MQ_NEW = '{}_PRICE_EUROS_x_MQ_NEW'.format(col_prefix)
col_PRICE_EUROS_x_MQ_USED = '{}_PRICE_EUROS_x_MQ_USED'.format(col_prefix)
#TO CREATE A LIST OF YEARS
years = list(reversed(range(start_year,end_year+1,1)))
#CREATE LIST OF QUARTERS
quarters = list(reversed(range(1,5,1)))
#TO CREATE A FINAL DATAFRAME WHERE TO CONCAT TOGETHER ALL SINGLE URL DATAFRAMES
df_final = pd.DataFrame()
#WEBSCRAPING ALL SINGLE URLs AND CONCAT SINGLE URL DATAFRAMES TO FINAL DATAFRAME
for year in years:
    df_year =pd.DataFrame()
    for quarter in quarters:
        url ='https://ajuntament.barcelona.cat/estadistica/catala/Estadistiques_per_temes/Habitatge_i_mercat_immobiliari/Mercat_immobiliari/Compravenda_habitatges/a{}/t0{}02.htm'.format(year,quarter)
        #DISCART 4th AND 3rd QUARTERS AS DATA NOT AVAILABLE YET
        if (year == 2022 and quarter == 4) | (year == 2022 and quarter == 3):
            continue
        response = requests.get(url).text
        soup = BeautifulSoup(response, 'html.parser')
        tables = soup.find_all('tr')
        columns = []
        for table in tables:
            headers = table.find_all('td', class_='WhadsColVar1')
            for header in headers:
                column = header.getText().strip().replace('xa0xa0xa0xa0xa0','').replace('xa0','').replace(' ','')#trim spaces at the beginning and at the end and remove 5X spaces xa0 in between and other spaces within strings
                columns.append(column)
        data = []
        for table in tables:
            row =[]
            neighbourhoods = table.find_all('td', class_='WhadsRowVar3')
            for n in neighbourhoods:
                neighbourhood = n.getText().strip().replace('xa0xa0xa0xa0xa0','*')#trim spaces at the beginning and at the end and remove 5X spaces xa0 in between and replace with * to use later for split
                row.append(neighbourhood)
            values = table.find_all('td', class_='WhadsDades')
            for v in values:
                    value = v.getText().replace('.','').replace(',','.').replace('-','').replace('NA','').strip()#trim spaces xa0 and remove . dots from numbers (to avoid problems with number conventions later)
                    row.append(value)
            data.append(row)
        #SINGLE URL DATAFRAME
        df_quarter = pd.DataFrame(data, columns = columns)
        #insert columns at the left of the dataframe: NAT_DISTRICT_CODE, NAT_NEIGHBOURHOOD_CODE,NAT_NEIGHBOURHOOD_NAME
        df_quarter.insert(0,'col_NEIGHBOURHOOD_NAME', df_quarter['Dte.Barris'].astype(str).str.replace('*','',regex=True).str.replace('d+', '',regex=True).str.replace('.','',regex=True).str.lstrip())#remove the split character *, digits, dot, and space left at the beginning
        df_quarter[['split1','split2']] = df_quarter['Dte.Barris'].str.split('*',expand=True)
        df_quarter.insert(0,'col_NEIGHBOURHOOD_CODE', df_quarter['split2'].str.extract('(d+)'))
        df_quarter.insert(0,'col_DISTRICT_CODE',df_quarter['split1'])
        #ADD YEAR COLUMN
        df_quarter.insert(3,'col_YEAR', year)
        #ADD QUARTER COLUMN
        df_quarter.insert(4,'col_QUARTER', quarter)
        #keep columns of interest
        df_quarter.drop(['Dte.Barris','Total','','split1','split2'],axis=1, inplace=True)
        #keep rows of interest
        r1 = []
        for i in range(10,83):
            r1.append(i)
        df_quarter = df_quarter.iloc[r1,:]
        #RENAME COLUMNS
        df_quarter.columns = [
            col_DISTRICT_CODE,
            col_NEIGHBOURHOOD_CODE,
            col_NEIGHBOURHOOD_NAME,
            col_YEAR,
            col_QUARTER,
            col_NUMBER_NEW,
            col_NUMBER_PROTECTED,
            col_NUMBER_USED,
            col_AVG_MQ_NEW,
            col_AVG_MQ_PROTECTED,
            col_AVG_MQ_USED,
            col_PRICE_EUROS_x_1000_NEW,
            col_PRICE_EUROS_x_1000_USED,
            col_PRICE_EUROS_x_MQ_NEW,
            col_PRICE_EUROS_x_MQ_USED
            ]
        #CONCAT SINGLE QUARTER URL DATAFRAMES INTO MERGED YEAR DATAFRAME
        df_year = pd.concat((df_year,df_quarter), ignore_index=True)
    #CONCAT SINGLE QUARTER URL DATAFRAMES INTO MERGED YEAR DATAFRAME
    df_final = pd.concat((df_final,df_year), ignore_index=True)
#DISTRICT AND NEIGHBOURHOOD CODES: INSERT '0' IN FRONT OF DISTRICT CODES AND NEIGHBOURHOOD CODES UP TO 10
df_final[[col_DISTRICT_CODE]] = df_final[[col_DISTRICT_CODE]].apply(lambda x: ['0{}'.format(y) if len(y) == 1 else y for y in x])
df_final[[col_NEIGHBOURHOOD_CODE]] = df_final[[col_NEIGHBOURHOOD_CODE]].apply(lambda x: ['0{}'.format(y) if len(y) == 1 else y for y in x])   
#RESET DATA TYPES
df_final[col_DISTRICT_CODE] = df_final[col_DISTRICT_CODE].astype('string')
df_final[col_NEIGHBOURHOOD_CODE] = df_final[col_NEIGHBOURHOOD_CODE].astype('string')
df_final[col_NEIGHBOURHOOD_NAME] = df_final[col_NEIGHBOURHOOD_NAME].astype('string')
df_final[col_YEAR] = df_final[col_YEAR].astype('int64')
df_final[col_QUARTER] = df_final[col_QUARTER].astype('int64')
#INSERT DATE COLUMN DERIVED FROM YEAR AND QUARTER COLUMNS
df_final.insert(3, col_QUARTER_DATE, pd.PeriodIndex(year=df_final[col_YEAR], quarter=df_final[col_QUARTER]).to_timestamp())
#DATAFRAME EXPORT TO EXCEL
df_final.to_excel(excel_file_name, sheet_name= excel_sheet_name, index_label=excel_index_label)

In [7]:

df_final

Out[7]:

	SALE_DISTRICT_CODE	SALE_NEIGHBOURHOOD_CODE	SALE_NEIGHBOURHOOD_NAME	SALE_QUARTER_DATE	SALE_YEAR	SALE_QUARTER	SALE_NUMBER_NEW	SALE_NUMBER_PROTECTED	SALE_NUMBER_USED	SALE_AVG_MQ_NEW	SALE_AVG_MQ_PROTECTED	SALE_AVG_MQ_USED	SALE_EUROS_x_1000_NEW	SALE_EUROS_x_1000_USED	SALE_PRICE_EUROS_x_MQ_NEW	SALE_PRICE_EUROS_x_MQ_USED
0	01	01	el Raval	2022-04-01	2022	2	3		109	66.0		67.0	395.5	254.9	5960.2	3632.6
1	01	02	el Barri Gòtic	2022-04-01	2022	2	6		59	88.0		85.9	326.4	404.1	4297.5	4919.5
2	01	03	la Barceloneta	2022-04-01	2022	2	2		53			44.5		206.4		4704.7
3	01	04	Sant Pere, Santa Caterina i la Ribera	2022-04-01	2022	2	3		66	61.3		66.9	219.2	307.7	3604.1	4371.1
4	02	05	el Fort Pienc	2022-04-01	2022	2	1		56	92.0		90.5		370.6		4208.9
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
1309	10	69	Diagonal Mar i el Front Marítim del Poblenou	2018-01-01	2018	1			12			95.3		509.7		5032.1
1310	10	70	el Besòs i el Maresme	2018-01-01	2018	1	8		82	102.0		70.4	397.9	236.9	3795.7	2985.9
1311	10	71	Provençals del Poblenou	2018-01-01	2018	1	1		8	72.0		67.9		251.3		3909.4
1312	10	72	Sant Martí de Provençals	2018-01-01	2018	1	1		39	45.0		74.7		270.3		3451.5
1313	10	73	la Verneda i la Pau	2018-01-01	2018	1	1	1	86	78.0	66.0	72.2		153.2		2180.1

1314 rows × 16 columns