Utilize ArcGIS Geo-Processing functions in R

3 minutes read

In this post, I will describe the procedure to couple R with ESRI Arcpy system.

Introduction

R is a statistical software with a lot of computational capabilities. Many scientists and researchers choose R as their primary development language, thus leading to plenty of state of art techniques and packages to analyze data. Despite R computational power, it may not be very efficient while working with large datasets due to the R high memory footprint, although numerous solutions exist to overcome this issue. On the other hand GIS data, especially raster data tend to be huge and processing them may require a lot of computations. ArcGIS system comprises a powerful Geo-Processing framework coupled with Arcpy, a python site-package, capable of automating tons of GIS workflows. I am going to present a way to call Arcpy Geo-Processing scripts from R. It is assumed ArcGIS Desktop is installed prior to running the R script. This is important when automation is key in your workflow. Note that the reverse procedure-call R function from ArcGIS-is also possible using R-Bridge. R-bridge consists of an R package arcgisscripting and a set of C++ bindings named rarcproxy; it requires ArcGIS 10.3 and above. For further information please see ESRI R-bridge page on Github.

Scenario

Certain Geo-Processing tasks in R are resource intensive and require a lot of time to complete. This includes but not limited to:

  • Polygonize raster : to convert a raster to polygon feature class
  • DEM to TIN/contour : to convert a digital elevation model in raster format to a TIN i.e. a polyline feature class
  • Buffer : to buffer spatial objects especially linear and areal objects
  • Polygon overlay : spatial predicate like query i.e. to join layers spatially

Here I demonstrate a simple workflow to run an Arcpy script from within R. I present a boilerplate python script to write in the Arcpy Geo-Processing function and the R code to call the python script. I opt for raster2polygonfunction in raster package of R software and compare its performance to RasterToPolygon_conversion Arcpy function. I run the scenario on the R logo jpeg image that resides in the Rgdal installation directory.

Python script

This python template has one Geo-Processing function namely arcpy.RasterToPolygon_conversion; it can be replaced with any other Geo-Processing function in ArcGIS Geo-Processing framework. This script receives the necessary arguments from the command line (note the sys.argv[1]) and streamlines the process of calling the script from the command line in R.

import arcpy
import sys

arcpy.env.overwriteOutput = True

ras = sys.argv[1].replace('\\','\\\\')
shp = sys.argv[2].replace('\\','\\\\')
#ras = r"C:\Program Files\R\R-3.2.2\library\rgdal\pictures\Rlogo.jpg"
#shp = r"D:\arcgis_geoproc\arcgis.shp"

arcpy.RasterToPolygon_conversion(ras, shp, "SIMPLIFY","VALUE")

R run

I start the R part by writing a simple function to read the raster, polygonize it and then writing it to the disk using writeOGR.

# Using R ---------------------------------
raster2polygon <- function(ras,shp){
  
  r1 = raster::brick(ras)
  poly1<-rasterToPolygons(r1, dissolve = T)
  dirr=dirname(shp)
  shpName = basename(shp)
  writeOGR(obj=poly1, dsn=dirr, layer=shpName, driver="ESRI Shapefile")
}

rlog = system.file("pictures/Rlogo.jpg",package = "rgdal")
shpR = "D:/arcgis_geoproc/r"
system.time(raster2polygon(rlog,shp))

and the response time is:

usersystemelapsed
49.190.0649.95

Now I try to run the python script I drafted in the previous section from within R:

# Using Python and ArcGIS -------------------------
pythonScript = "D:/arcgis_geoproc/raster2polygon_sample.py"
shpArc = "D:/arcgis_geoproc/arcgis.shp"
arguments = sprintf('"%1$s" "%2$s" "%3$s"',
                    normalizePath(pythonScript),
                    normalizePath(rlog),
                    normalizePath(shpArc))
system.time(system2('python', args=arguments))

this time, the response time is almost 4 times faster than R:

usersystemelapsed
0.010.0313.01

I tested the procedure on a small raster (r logo) file, yet the difference is more significant on larger and more typical raster imageries that are dealt with in everyday research.

Plot

Now I plot the result of the two approaches:

# Plotting the results --------------------------
r1 = brick(rlog)
rshp <- readOGR(dsn = dirname(shpR), 
                 layer = basename(shpR))
arcshp <- readOGR(dsn = dirname(shpArc), 
                 layer = gsub(x=basename(shpArc),pattern = '.shp',replacement = '',fixed = T) )

op <- par(mfrow=c(1,3))
plotRGB(r1,main="R logo")
plot(rshp,main = "R result")
plot(arcshp,main = "ArcGIS result")
par(op)

Updated: