Summary for visualization of geospatial data

Here we summarize the geospatial data visualization.

The Synthetic Power Grid Data Set will be used as an example. [Download here]

Before all the things, let’s import some basic tools:

1
2
3
4
5
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

1. Data Preparation

1.1 Import Data

Here, the Location( (Lon, Lat) pairs), Demand and Supply information are loaded by using pandas.read_csv("path").

1
2
3
Bus_Location = pd.read_csv("Gen_WI_Bus_Locations.csv")
Demand_Values = pd.read_csv("Gen_WI_Demand_Values.csv")
Supply_Values = pd.read_csv("Gen_WI_Supply_Values.csv")

1.2 Merge Data

These 3 tables have the same column named _Bus Number_ , we can use reduce() to merge them.

1
2
3
4
5
6
# first compile the list of dataframes we want to merge
data_frames = [Bus_Location, Demand_Values, Supply_Values]
from functools import reduce
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['Bus Number'], how='outer'), data_frames)
#Then call head() method, we can see the first five rows of our merged data
df_merged.head()

head

Moreover, we can call describe() method to summarize our data.

1
df_merged[['Lon', 'Lat', 'Demand (MW)', 'Supply (MW)']].describe()

describe


2. Using geopandas to visualize

We can refer to geopandas docs Click here.

2.1 Data Structures

GeoPandas implements two main data structures, a GeoSeries and a GeoDataFrame. These are subclasses of pandas Series and DataFrame, respectively. Before visualization, we should convert our data into proper type.

1
2
3
4
5
import geopandas as gpd
gdf = gpd.GeoDataFrame(df_merged, geometry = gpd.points_from_xy(df_merged.Lon, df_merged.Lat))
gdf.head()
#And here call type(gdf):
#geopandas.geodataframe.GeoDataFrame

gdfhead

2.2 Coordinate Reference Systems

CRS are important because the geometric shapes in a GeoSeries or GeoDataFrame object are simply a collection of coordinates in an arbitrary space. A CRS tells Python how those coordinates related to places on the Earth.

CRS are referred to using codes called proj4 strings. You can find the codes for most commonly used projections from www.spatialreference.org. Common projections can also be referred to by EPSG codes, so this same projection can also called using the proj4 string "+init=epsg:4326".

geopandas can accept lots of representations of CRS, including the proj4 string itself ("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") or parameters broken out in a dictionary: {'proj': 'latlong', 'ellps': 'WGS84', 'datum': 'WGS84', 'no_defs': True}). In addition, some functions will take EPSG codes directly.

For reference, a few very common projections and their proj4 strings:

  • WGS84 Latitude/Longitude: "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" or "+init=epsg:4326"
  • UTM Zones (North): "+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
  • UTM Zones (South): "+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs +south"

contextily: context geo tiles in Python

contextily is a small Python 3 package to retrieve and write to disk tile maps from the internet into geospatial raster files. Bounding boxes can be passed in both WGS84 (EPSG:4326) and Spheric Mercator (EPSG:3857). See here for usage.


Let’s just dive into code

1
2
3
4
5
import contextily as ctx
gdf.crs = {'init': 'epsg:4326', 'no_defs': True}
gdf = gdf.to_crs(epsg=3857)
ax = gdf.plot(figsize=(10, 10), alpha=0.5, edgecolor='k')
ctx.add_basemap(ax)

first

2.3 Mapping attributes into it

Here we map the Demand values into the map, and try to set some parameters

1
2
3
4
5
ax = gdf.plot(figsize=(10, 10), alpha=0.5, edgecolor='w', column = "Demand (MW)", legend = True, 
legend_kwds={'label': "Demand (MW)",'orientation': "vertical"}) #horizontal
ctx.add_basemap(ax, url=ctx.providers.Stamen.TonerLite)
ax.set_axis_off()
plt.savefig('Demand(MW)')

Second

2.4 Summary

Here we summarize 2.3 as follows:

1
2
3
4
def Attribute_mapping(gdf, column, legend_kwds):
ax = gdf.plot(figsize = (10, 10), alpha = 0.5, edgecolor = 'w', column = column, legend = True, legend_kwds = legend_kwds)
ctx.add_basemap(ax, url=ctx.providers.Stamen.TonerLite)
ax.set_axis_off()

Our input is:

  • gdf : Corresponding geopandas dataframe after processing in 2.2
  • column: Corresponding Attribute column that we want to map
  • legend_kwds: Corresponding legend parameters

Let’s see an example below:

1
2
3
4
5
6
7
8
9
10
11
12
#First get our gdf
import geopandas as gpd
gdf = gpd.GeoDataFrame(df_merged, geometry = gpd.points_from_xy(df_merged.Lon, df_merged.Lat)) #Note: df_merged is a pandas dataframe!
#process as in 2.2
import contextily as ctx
gdf.crs = {'init': 'epsg:4326', 'no_defs': True}
gdf = gdf.to_crs(epsg=3857)
#set our column and legend_kwds
column = "Supply (MW)"
legend_kwds={'label': "Supply (MW)",'orientation': "vertical"}
#call the function above
Attribute_mapping(gdf, column, legend_kwds)

Third


3. Try Something More

3.1 Remove Zero

As we see the describe results in 1.2, more than 75% of Demand and Supply are all zero. So, in this part, we try to deal with them.

Just see the code

1
2
3
None_Zero_Demand = gdf[gdf['Demand (MW)'] != 0.0]
None_Zero_Demand.head()
None_Zero_Demand.describe()

None0DemandDescribe

Here we see:

  1. there are only 3095 none zero Demand compare to 14430 in total.
  2. All the Supply are zero while Demand are none zero

3.2 Visualize the None Zero Part

1
2
3
column = "Demand (MW)"
legend_kwds={'label': "None Zero Demand",'orientation': "vertical"}
Attribute_mapping(None_Zero_Demand, column, legend_kwds)

nonezerodemand

Though the points become less, most of them are near zero (purple color). So, we can try to limit the value to visualize more clear.

3.3 Separate into Three Parts by ordered values

Let’s first order the DataFrame by Demand

1
2
None_Zero_Demand = None_Zero_Demand.sort_values(by = 'Demand (MW)')
None_Zero_Demand.head()

order

Then split it into 3 parts

1
2
3
4
5
6
Split3 = np.array_split(None_Zero_Demand, 3)
Part1, Part2, Part3 = Split3[0], Split3[1], Split3[2]
Part1.head()
Part2.head()
Part3.head()
#Note: here the other two columns not presented in the figures

P1headP2headP3head

Finally, visualize each part

1
2
3
4
5
6
7
8
9
10
column = "Demand (MW)"
#Part1
legend_kwds={'label': "None Zero Demand",'orientation': "vertical"}
Attribute_mapping(Part1, column, legend_kwds)
#Part2
legend_kwds={'label': "None Zero Demand",'orientation': "vertical"}
Attribute_mapping(Part2, column, legend_kwds)
#Part3
legend_kwds={'label': "None Zero Demand",'orientation': "vertical"}
Attribute_mapping(Part3, column, legend_kwds)

P1VP2vP3v

Here we:

  • [x] ​ Visualize three parts separately

  • [ ] ​ Unify the color bars in above three figures.