Fun With Shapefiles
Shapefiles are a format developed by ESRI (the makers of ArcGIS) to store and share geospatial data. Many interesting datasets are freely available in shapefile format. Shapefiles can be viewed with a number of freely available applications, such as ArcGIS Explorer (which requires the .NET framework or Silverlight) or ArcReader (which is multi-platform but closed-source). Open-source GIS packages such as Quantum GIS and GRASS can also view and edit shapefiles, and recent versions can be installed from the UbuntuGis PPA.
However GIS applications can be tricky to use without training. Moreover, sometimes you may want to use shapefile-data for some other purpose, outside of traditional GIS applications. Here are a few useful tricks you can do with shapefiles, and ways to get data out of a shapefile and into another application.
Reprojecting a shapefile
If you want to combine data from different sources, you’ll have to make sure that they use the same coordinate system, reference point (datum), geographic projection and representation of the Earth’s surface (geoid).
These days, it is common to convert everything to WGS84
latitude/longitude coordinates. To convert a given shapefile to WGS84
coordinates, you can use the GDAL utility ogr2ogr. In Ubuntu,
first install the package gdal-bin
and then execute the following
command:
ogr2ogr -t_srs EPSG:4326 output_wgs84.shp input.shp
The parameter -t_srs
specifies the spatial reference system of
the output file, which is usually specified through an EPSG code. In
this case, it tells the program to convert all coordinates in the
shapefile to geographic 2D WGS84 coordinates (EPSG:4326). Use
EPSG:4979 for 3D coordinates.
Converting a shapefile to KML (Google Earth/Maps)
One of the more interesting ways to visualise data stored in a shapefile
is to convert it into a KML file (Keyhole Markup Language), which can
then be used in e.g. Google Earth (or with Google Maps). To do this in
Ubuntu, first install the package gdal-bin
and then execute the
following command:
ogr2ogr -f "KML" -t_srs EPSG:4326 output.kml input.shp
Converting a shapefile to KMZ (compressed Google Earth)
The problem with XML-based formats such as KML is that they can get
rather big if you have a large shapefile with lots of data. To save some
disk-space, you can compress the KML file into a KMZ file. A KMZ file is
simply a ZIP-file that contains a KML-document (doc.kml) and optional
directories with other files that are referenced from the KML-file. To
directly convert a shapefile into KMZ, you can use this shell-script
(shp2kmz). Just copy it to your home-directory or to /usr/local/bin
and make it executable (e.g. chmod a+x shp2kmz
. It uses ogr2ogr
and
zip, so in Ubuntu it requires that the gdal-bin
package is installed.
Usage: shp2kmz input.shp output.kmz
Merging shapefiles
Merging two or more shapefiles is easy with ogr2ogr. A merge of two shapefiles ‘file1.shp’ and ‘file2.shp’ into a new file ‘file_merged.shp’ is performed like this:
ogr2ogr file_merged.shp file1.shp
ogr2ogr -update -append file_merged.shp file2.shp -nln file_merged
The first command just copies the first shapefile to a new file (file_merged.shp).The second command opens file_merged.shp in update mode, tries to find existing layers and appends the features being copied. The -nln option sets the name of the layer to be copied to.
Converting a shapefile to a spreadsheet
You may be able to load some of the shapefile attributes into a spreadsheet, by opening the associated DBF file in a spreadsheet-application that supports dBASE-files, such as Gnumeric. However, often this will exclude interesting data, such as coordinates and some types of attributes. A better way is to load the shapefile into Quantum GIS, open the attribute table, select all the rows and copy the data to the clipboard. You can then paste everything into a spreadsheet. It may get big and messy though…
Converting a shapefile to CSV (textfile)
If your shapefile contains points rather than polygons, you can export
the data as a CSV file using ogr2ogr and the GDAL CSV-driver, e.g:
ogr2ogr -f "CSV" -t_srs EPSG:4979 -lco GEOMETRY=AS_XYZ output input.shp
This will create a directory containing one or more CSV-files containing
attributes and X/Y/Z coordinates (use
ogr2ogr -f "CSV" -t_srs EPSG:4326 -lco GEOMETRY=AS_XY
if you need only
latitude and longitude). Add SEPARATOR=TAB
if you prefer a
tab-separated file rather than commas.
However this won’t export coordinates if your shapefile contains polygons. In that case you can export the entire geometry using:
ogr2ogr -f "CSV" -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT output input.shp
Unfortunately if you just want latitude and longitude of the polygon centroids, you’ll need to compute them yourself. The polygons will be in stored in a WKT-representation. If they are small, you can approximate the centroid by simply averaging the longitude and latitude coordinates. Rather than parsing everything yourself, you can for instance use Python and the shapely module, e.g.:
from shapely.wkt import loads
polygon = loads('POLYGON ((-8.254538773542281 54.511112320794687,-8.25368883970682 54.510785254030836,-8.25331620402474 54.510641858248889))')
centroid = polygon.centroid
lat = centroid.y
lon = centroid.x
For Perl take a look at the Geo-Point modules. If your polygons are big or accuracy is important, it’s probably best to use R and the maptools package (see below), or load the data into a spatialite database and use the CENTROID SQL-function. I might try to put together an example at some point.
An alternative way to convert shapefiles to text is by using the shp2text utility.
Working with shapefiles in Python
Rather than first exporting to CSV, you can easily load shapefiles into Python directly using the ogr module, and then process them with shapely or other modules. For example:
from osgeo import ogr
from shapely.wkb import loads
source = ogr.Open(“testpoly.shp”)
couche = source.GetLayerByName(“testpoly”)
for element in couche:
geom = loads(element.GetGeometryRef().ExportToWkb())
Working with shapefiles in Perl
Of course you can also read shapefiles directly in Perl, have a look at the Geo::ShapeFile module.
Working with shapefiles in R
The readShapePoly function from the maptools package will read a shapefile and create a spatialPolygonsDataFrame object.
The rest I still need to figure out…
Some potentially useful resources:
- http://science.nature.nps.gov/im/monitor/stats/R/
- http://:cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf
- https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles
- http://help.nceas.ucsb.edu/R:_Spatial
- http://spatialanalysis.co.uk/files/2011/01/intro_to_spatial_data.txt
- http://www.statistik.uni-muenchen.de/~hoehle/teaching/stepi2010/visualization_4.pdf
- http://pvanb.wordpress.com/2010/02/27/a-map-of-deforestation-in-africa-using-r-2/
- http://stackoverflow.com/questions/5130860/access-spatial-objects-in-spatial-polygons-data-frames
- http://www.people.fas.harvard.edu/~zhukov/spatial.R
- http://www.nceas.ucsb.edu/files/scicomp/GISSeminar/UseCases/CreateKMLFiles/ConvertShapefileToKML.html
- http://www.est.ufpr.br/aRT/docs/aRTsp.pdf
- http://www.nceas.ucsb.edu/scicomp/usecases/computepolygoncentroids