Reading JSON Data from URLs in R
R handles JSON data effectively through packages designed for parsing and manipulating structured data. The most practical approach for fetching JSON from REST APIs is using the jsonlite package.
Install and load jsonlite
Install the package once:
install.packages("jsonlite")
Load it in your R session:
library(jsonlite)
Fetch JSON from a REST API
Use fromJSON() to retrieve and parse JSON directly from a URL:
btc <- fromJSON("https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1d&limit=365")
str(btc)
The function automatically converts the JSON response into an R data structure—typically a list or data frame depending on the API response format.
Working with the returned data
Most REST APIs return nested structures. Inspect what you received:
# Check the class and dimensions
class(btc)
head(btc)
# Convert to a data frame if needed
btc_df <- as.data.frame(btc)
For the Binance API example, the response is a matrix where each row represents a candlestick:
# Extract useful columns
colnames(btc) <- c("open_time", "open", "high", "low", "close", "volume",
"close_time", "quote_asset_volume", "trades",
"taker_buy_base", "taker_buy_quote", "ignore")
# Convert timestamps to datetime and prices to numeric
btc_df <- data.frame(
date = as.POSIXct(as.numeric(btc[, 1]) / 1000, origin = "1970-01-01"),
close = as.numeric(btc[, 5]),
volume = as.numeric(btc[, 8])
)
head(btc_df)
Handle errors and timeouts
Production code should handle network issues:
tryCatch(
{
data <- fromJSON("https://api.example.com/endpoint")
},
error = function(e) {
message("API request failed: ", e$message)
NULL
}
)
Set a timeout for slow connections:
data <- fromJSON("https://api.example.com/endpoint", timeout = 10)
Working with pagination
Some APIs require multiple requests. Handle this systematically:
fetch_all_pages <- function(base_url, max_pages = 5) {
all_data <- list()
for (page in 1:max_pages) {
url <- paste0(base_url, "&page=", page)
response <- fromJSON(url)
if (length(response) == 0) break # No more data
all_data[[page]] <- response
}
do.call(rbind, all_data)
}
Validate and parse complex responses
APIs often nest data within metadata. Extract the relevant portion:
response <- fromJSON("https://api.example.com/endpoint")
# If the actual data is nested
data <- response$data
# Check for API errors
if (!is.null(response$error)) {
stop("API error: ", response$error$message)
}
Alternatives to jsonlite
For specific use cases, consider these packages:
httr2– Better control over HTTP headers, authentication, and requestsarrow– Superior performance for large JSON datasetstidyjson– Easier wrangling of deeply nested JSON structures
library(httr2)
response <- request("https://api.example.com/endpoint") |>
req_headers("Authorization" = "Bearer token") |>
req_perform()
data <- fromJSON(rawToChar(response$body))
Use httr2 when you need custom headers, authentication, or retry logic.
Performance considerations
For large responses or repeated requests:
- Cache results to avoid redundant API calls
- Consider pagination limits to reduce payload size
- Use
simplifyVector = TRUE(default) infromJSON()for cleaner output - Set
flatten = TRUEto unnest one level of nested lists automatically
# Cache example
if (!exists("cached_data") || Sys.time() - cached_time > 3600) {
cached_data <- fromJSON("https://api.example.com/data")
cached_time <- Sys.time()
}