What is the R Programming Language and How to Use It for Working with the APIs of Google AdWords an Yandex.Direct

4
802

If you run a number of small ad campaigns in a single account, you only need to check the necessary data in Google AdWords and Analytics, or in Yandex.Direct and Yandex.Metrica. However, if you use the same way to roughly estimate the performance of a couple douzens ad campaigns in multiple accounts, you’ll spend at least a few hours. The detailed analysis will take you days and even weeks, while you’ll be spending your marketing budget without knowing if you do it effectively or not.

In this article I’ll tell you how to automate working with the data from Google Analytics and Yandex.Direct, by using two packages that enhance the basic capacities of the R programming language.

Below I explain everything simply and clearly, to be understood by any user that hasn’t ever written any code. To understand the content of the article you only need to know what is a function in the context of Microsoft Excel or Google Spreadsheets. Warning! This article may trigger you to start learning the R language.

Content of the article:

What is R programming language

R is a programming language and free software environment for statistical computing and graphics. This is the R language definition from Wikipedia.

R was created by Robert Gentleman and Ross Ihaka, two employees of the Oakland University. The language was named after the first names of the R creators.

How I learned about the R language: from 1C to R in 10 years

Many people today talk about Big Data, but the Excel or any other similar spreadsheets are still good enough for a lot of companies.

The majority of my work from 2007 to 2009 was writing software requirements specifications for 1C programmers to implement some new reporting features. At that time, we had 1C 7.7., so it’s needless to mention any integrated report builders. To obtain a custom report, I had to write new requirements specifications and the way to implement it for the programmers.

A couple years later I managed to gain insight into Microsoft Excel, and create one of my last SRS for the 1C programmers. It was one big export of raw data about receipts that contained all the possible fields. This is how I moved from 1C to Excel, and the spreadsheets from Microsoft became my wingman for the next few years.

I would probably keep using Excel, if it didn’t require so many resources even for a mid size report. When you need to quickly process a couple dozen lines, Excel will do the job. However, an attentive Microsoft Excel user might notice, that the number of lines on a single page hasn’t changed since 2007, and the maximum lines is 1,048,576, while the maximum columns is 16,384. It’s not that Microsoft doesn’t want to add new lines or columns. It’s just that Excel requires too many resources on such data volume, and the processing of hundred thousands lines will engage all the computer’s operating memory. Should I mention that a million lines table is not a surprise to anyone today?

All of the above mentioned reasons made me start learning SQL in 2012. Using even complicated JOIN queries, it takes up to a minute to process a million lines in any of the popular database management system, considering that the tables were indexed correctly. Thus, I used SQL to collect and store data, while Excel helped me visualize the data.

In 2015, Microsoft presented the Power BI platform, one of the leading BI solutions nowadays. Power BI actually appeared as a pack of extensions for Excel, Power View (for building interactive dashboards), Power Query (for exporting data from different sources), and PowerPivot (for creating data models, and linking different tables). Power BI is in fact a handy tool for exporting, filtering and building up models of data from multiple sources, as well as for visualizing data. Here at Netpeak, we use this platform to analyze our ad campaigns and the performance of the PPC advertising department along with our corporate blog.

As we use SQL and Google BigQuery to store data, and Power BI to visualize up to 80% of our data, the only issue remained the data transfer. Thus, we needed a tool to automate the data transfer along with other routines that we did by hand.

This is how I came to study the R language, which has become my main working tool over the past two years. The R language can actually do the visualizing job, while the language capabilities are only limited by your imagination. In this article, I’ll tell you how to get data from the interface APIs of the most popular ad platforms, which was the main goal for the Netpeak internet marketing agency.

Important terms

There’s a number of terms that are often used in the article. To make things clear, I’ll give a couple definitions in this part of the article.

Function is a grouped number of commands returning some value or functions in the R language. As mentioned above, functions in the R language are similar to those in MS Excel.

Arguments. Almost any function requests some entering values that will be used to make some actions and calculations. You need to attribute the aforementioned values to function arguments. For example, let’s talk about the VLOOKUP Excel function, and check what its syntax looks like:

=VLOOKUP(Value you want to look up, range where you want to look up the value, the column number in the range containing the return value, Exact Match or Approximate Match — indicated as 0/FALSE or 1/TRUE).

The arguments of the function are all the values you indicate inside the brackets after the function name. The VLOOKUP function requires at least 3 arguments: the value you want to look up, also called the lookup value, the range where the lookup value is located, the column number in the range that contains the return value, and, optionally, you can specify TRUE if you want an approximate match or FALSE if you want an exact match of the return value.

Assignment operator "<-" is used to assign a value to an object. For example, to create an X variable that will contain a value of 10, you need to perform the following command.

X <- 10

Vector in the R language is a named single-dimension array that contains a set of the elements of the same type (numeric, logical or text values that can’t be combined). To create a small length vector, you can use the concatenation function c(). The comma-separated list of values in vector stands for the arguments in the function.

For example, a vector camp_name that contains the names of the ad campaigns can be created this way:

camp_name <- c(“CampName1”, ”CampName2”, “CampName3”, “CampName4”)

DataFrame — is the class of objects in the R language that is similar to tables. DataFrame is actually a set of vectors of the same length, but this is exactly how we see tables that we are used to work with.

Here’s an example of the mtcars frame that is embedded in the language R. The data was imported from the Motor Trend US magazine of 1974. The frame includes fuel burn rate, 10 aspects of car design, and different characteristics for 32 cars (models from 1973 to 1974):

Examples of code from the article

The article contains examples of queries to the APIs of Google AdWords and Yandex.Direct in the R language. You can simply copy-paste the examples from the article or download them from the GitHub repository here.

Installing the R Language

To understand the material from the article better, I recommend that you execute all of the provided examples of code on your PC. Let’s start with installing the R language.

R is a cross-platform language. You can install it on Windows, MacOS and Unix operating systems. ​

The installation process is the same as installing a program, and it doesn’t require any special settings. You simply need to follow this link, choose your operating system, then download and install the R language on your PC.

What are packages in R

For now, the R language has obtained quite a big community of users and developers. Many of them create their own R packages to enhance the R language capacities.

Package in R language is a module that contains a number of functions and a set of data. The main repository for the R packages is CRAN. At the moment of writing this paragraph, there are 12,148 packages in R. Along with CRAN, users upload a lot of packages on GitHub, available to the public. As a rule, you can find the latest versions of packages from developers, while the number of repositories in the R language is more than 80,000.

Installing packages in the R language

Before using any functions or data sets from a package, you need to install it. To install a package from CRAN, you simply need to use a function install.packages, specifying the package name.

Let’s install the devtools package.

install.packages("devtools")

The devtools package contains a set of functions to create new functions in the R language, as well as to install packages from third-party repositories including GitHub.

As was mentioned above, the latest versions of packages can be found on GitHub, and not in CRAN. Let’s try to install the RAdwords package from GitHub.

To install a package from GitHub in the devtools package, there’s the install_github function. However, the devtools package is not included in the basic R configuration. That’s why you’ll have to use the library command in order to apply the package to the R session.

#Applying devtools
library(devtools)

#Installing the RAdwords package
install_github(’jburkhardt/RAdwords’)

The function install_github requires just 1 argument: name of the user, author of the repository on GitHub (https://github.com/jburkhardt) and the name of the repository RAdwords (https://github.com/jburkhardt/RAdwords).

Obtaining detailed information on how to work with the package functions

Any package in R and any function of a package have a detailed information describing each argument, working with functions, and the examples of the function use.

To get a list of the package functions, along with the explanation of what the function does, you can use the help command and send the necessary package name in the package argument. For example, to get information on the RAdwords functions, execute the following command:

help(package = «RAdwords»)

You’ll open an html page with this content:

result if the help command(package = "RAdwords")

If you only need info on a definite function, add its name to the help command. For example, to get information on the doAuth function, apply the RAdwords package using the library function and execute help:

help("doAuth")

R packages for working with ad services

RAdwords. A package for working the Google AdWords API

Link to the official RAdwords documentation

Repository on GitHub

Author: Johannes Burkhardt

How to install: devtools::install_github(’jburkhardt/RAdwords’)

How to work with the AdWords API

Google AdWords has a broad interface of 49 reports, for uploading data from your ad account. At first, you may find it a little complicated to find a necessary report. The official documentation provides the following table, to make looking for a necessary report easier:

chart to find reports in AdWords API

Each report consists of its own set of fields which can be attributes, segments and metrics.

Fields of Campaign Performance Report.

Attributes are fields, measurements or parameters, that contain information about some settings or constants that don’t change the grouping of the required data. For example, such account settings as currency (field AccountCurrencyCode) don’t change the number of lines from the Campaign Performance Report.

Segments are also parameters, but they change the grouping structure. For example, if you request data from the Campaign Performance Report with the Date field, the requested data about the campaign efficiency will be grouped by date.

Metrics are quantity indicators such as clicks, views, summary of spent money, etc.

Using the RAdwords package, you can address any of the reports and request the available fields from the report without getting to deep into how to form and send HTTP queries to the API service, as well as without studying every interface detail of the API.

What you need for working with RAdwords

To start working with the RAdwords package, you’ll need access to the Google AdWords API. Here’s a link with more detailed information.

To set up access to the API, you’ll need an identifier and a secret key from the OAuth2 client. By the OAuth2 client I mean a client app, but not the AdWords MCC account AdWords.

  1. Log in to your managing account and navigate to  "Credentials" in the Google Developers Console.
  2. In the dropdown menu, choose "Create project", then name the project and click "Create".
  3. Choose "Create Credentials", next "OAuth client ID".
  4. Click "Configure consent screen" and fill in the necessary fields. Save changes to get back to the "Credentials" screen.
  5. Choose "Other" in the Application type menu and enter a name in the field you see.
  6. Click "Create" and you’ll see the client ID and the secret key of the OAuth2 client. Copy them and save them, as you will need this information in the next step to be added to the configuration file.

Main functions of the RAdwords package

RAdwords doesn’t really have many functions. Here’s the list of main ones:

  • doAuth is the authentication in the API. To run this function, you’ll need the AdWords API developer token, Client ID and Client Secret credentials that you received in Google Console.
  • reports returns the vector of available reports in the API according to your query.
  • metrics returns the vector of available fields in the report according to your query.
  • statement is the preparation of the query to the AdWords API.
  • getData is sending a query and obtaining data from the API.

Examples of code to get data from Google AdWords, using RAdwords

						
#Activating the package
library(RAdwords)

#Authentication
adwords_auth <- doAuth()
account_id   <- “000-000-0000” #ID of your AdWords account

#Preparing the query
#Getting data for the last 30 days
#Function format.Date changes the data from the standard format 2017-01-01 to 20170101 
body <- statement(select=c('Date',
			   'CampaignName',
			   'CampaignId',
			   'AccountCurrencyCode',
			   'Impressions',
			   'VideoViews',
			   'Clicks',
			   'Interactions',
			   'AllConversions',
			   'Cost'),
                  report="CAMPAIGN_PERFORMANCE_REPORT",
                  start=format.Date(Sys.Date() - 31, "%Y%m%d"),
                  end=format.Date(Sys.Date() - 1, "%Y%m%d"))

#Requesting data
AdwData <- getData(clientCustomerId = account_id,
                   google_auth = adwords_auth,
                   statement = body,
                   transformation = T)

						
					

Example of code on GitHub

The statement function forms the body of the query. In the select argument you send the names of all the required fields, with the names presented in the form of a vector. You need to use the report argument to send the name of the report, then the start and end arguments to send the reporting period.

Remember to use the YYYYMMDD format of dates. By default, the date format in R is YYYY-MM-DD. You can send the dates either as lines or transform any date using the format.Date function as in the aforementioned example.

The getData function queries the clientCustomerId argument for the number of the ad account in the format of 000-000-0000.

At the beginning of the script we created an adwords_auth variable. We used the doAuth function to send the credential data for the access to the Google AdWords API. You need to send the saved credential data to the google_auth argument, using the getData function.

Next, you’ll have to send the query body, created using the statement function, to the statement argument.

Having executed all of the aforementioned code, you’ll get the AwData object in your operational environment. The object will contain a table (Data frame) with all of the information queried from the API.

More examples of working with RAdwords

ryandexdirect. A package for working with the APIs of Yandex.Direct and Yandex.Metric

Official documentation for ryandexdirect

Repository on GitHub

Author: Alexey Seleznev

Installing the package on Windows: devtools::install_github("selesnow/ryandexdirect")
Installing the package on MacOs and Linux: devtools::install_github("selesnow/ryandexdirect«, subdir = «utf8»)

How to work with the API of Yandex.Direct

The 5th version of the Yandex.Direct API, unlike the previous version, is really similar to the API of Google AdWords when getting statistic data.

To get statistical data in the 5th version of the API, you’ll need the Reports service. This service, similar to the AdWords API, can request any fields from 7 reports.

How reports look in the Report service

Report type Description
ACCOUNT_PERFORMANCE_REPORT Statistics on the advertiser’s account
CAMPAIGN_PERFORMANCE_REPORT Statistics on campaigns
ADGROUP_PERFORMANCE_REPORT Statistics on ad groupings
AD_PERFORMANCE_REPORT Statistics on ads
CRITERIA_PERFORMANCE_REPORT Statistics on the view requirements
CUSTOM_REPORT Statistics with custom groupings
SEARCH_QUERY_PERFORMANCE_REPORT Statistics on search queries

The most general report type is CUSTOM REPORT. It doesn’t add any extra groupings.

All report fields are divided by type:

  • filter is used to filter only data and can’t be included in the report.
  • segment helps group the data in reports.
  • metric contains a numeric value according to filters and groupings.
  • attribute contains a fixed value according to set groupings. Adding such a field to a report doesn’t lead to a new grouping.

For example, the CampaignId field for the CUSTOM_REPORT type is a segment. If you add the CampaignId field to the report, the data will be grouped by campaigns. However, the CampaignId field is an attribute for the ADGROUP_PERFORMANCE_REPORT type. In such case, the data is already grouped by AdGroupId, and the campaign identifier is a set value for each group.

You can find the list of all the allowed fields along with the field types for each report here.

Some fields can’t be queried at the same time within the same query:

  • Date, Week, Month, Quarter, Year are mutually exclusive fields (only one of them can be included in the report).
  • CriterionType, CriteriaType, AudienceTargetId, DynamicTextAdTargetId, Keyword, SmartBannerFilterId are mutually exclusive fields (only one of them can be used to filter data).
  • ClickType is not compatible with Impressions, Ctr, or AvgImpressionPosition.
  • ImpressionShare is not compatible with AdFormat, AdId, Age, CarrierType, Gender, MobilePlatform, RlAdjustmentId, TargetingLocationId, or TargetingLocationName.
  • Criterion, CriterionId, CriterionType are not compatible with Criteria, CriteriaId, or CriteriaType.

Main functions of the ryandexdirect package

By the moment the article was written, there were 20 functions available for ryandexdirect. Below is the list of main functions that I use when working with statistics:

  • yadirGetToken is getting the token to access the API. You need the token for working with all the other package functions, as it helps you authorize for working with the API of Yandex.Direct.
  • yadirGetReport is getting statistics from the Report service.
  • yadirGetDictionary is getting help information, for example, reference on the regions or currency.
  • yadirGetLogsData is getting raw data from the Logs of the Yandex.Metrica API.
  • yadirGetMetrikaGAData is getting raw data from the Yandex.Metrica API, while the obtained data is compatible with Google Analytics Core Reporting API (v3).

Aside from working with statistics, you can use ryandexdirect to stop and renew views on the level of ads, key_words and ad campaigns. There are 6 functions for that:

  • yadirStartAds means renew views on ads.
  • yadirStopAds means stop views on ads.
  • yadirStartCampaigns means renew views on ad campaigns.
  • yadirStopCampaigns means stop views on ad campaigns.
  • yadirStartKeyWords means renew views on ad campaigns.
  • yadirStopKeyWords means stop views on ad campaigns.

An example of code to get statistics from Yandex.Direct using ryandexdirect.

Example #1: Getting statistics on campaigns across date

Let’s start with the most simple code example that can be used to get statistics on ad campaigns across days.

						
#Activating the package 
library("ryandexdirect")

#Getting a developer’s token
tok <- yadirGetToken()

#Enter the variable containing the user login. Substitute MYLOGIN with your login in Yandex.
my_login <- “MYLOGIN”

#Requesting statistics
data <- yadirGetReport(ReportType = "CAMPAIGN_PERFORMANCE_REPORT",
                       DateRangeType = "LAST_30_DAYS",
                       FieldNames = c("Date",
                                      "CampaignName",
                                      "CampaignId",
                                      "Impressions",
                                      "Clicks",
                                      "Cost"),
                       Login = my_login,
                       Token = tok )
						
					

Example of code on GitHub

I believe the example code is easy enough to understand, but here’s a bit of explanation:

  • Activate the package with the help of the library command.
  • Next, get the developer’s token with the help of yadirGetToken and save it to the tok object. You don’t need to receive a new token every time you address the API. The token will expire 30 days after the last time it was used. Once received, you can use the token as a line in a script. For example, pass the value of the token to the tok object in such a way: tok <- «abcdefg1234567»
  • Request the data from the API using the yadirGetReport function.
  • Argument ReportType requires the name of the report that will be used to request the needed fields, at the input. As we’re planning to get data on the level of ad campaigns, the most suitable report will be CAMPAIGN_PERFORMANCE_REPORT.
  • You need to send the information on the period to the DateRangeType argument. We choose the data for the last 30 days, but you have quite a variety for choosing the time period:
    • TODAY
    • YESTERDAY
    • LAST_3_DAYS, LAST_5_DAYS, LAST_7_DAYS, LAST_14_DAYS, LAST_30_DAYS, LAST_90_DAYS, LAST_365_DAYS
    • THIS_WEEK_MON_TODAY
    • THIS_WEEK_SUN_TODAY
    • LAST_WEEK
    • LAST_BUSINESS_WEEK
    • LAST_WEEK_SUN_SAT
    • THIS_MONTH
    • LAST_MONTH
    • ALL_TIME
    • CUSTOM_DATE. You’ll need to set the date of the beginning and end of the period, using the DateFrom and DateTo parameters.
    • AUTO is a period during which the statistic data could change. The period is chosen automatically depending on the statistic corrections if such. You can find more information here.
  • Send the vector with the fields, requested from the report, to the FieldNames argument. We’ll send the following fields: Date, CampaignName, CampaignId, Impressions, Clicks, Cost. More fields can be found here.

Having executed the aforementioned code, you’ll get a data object in your operational environment. The object will present a table (Data frame) with the data requested from the API.

Example #2: Getting data from the ad account across countries and regions.

Let’s try something more difficult and get the data on views, clicks and spent money from the ad account, across countries and regions.

The LocationOfPresenceName contains the most accurate name of where the user is (village, city or region). If we need to see the data across regions and not cities, we’ll have to export the reference book of regions and make it look like we need, using recursive scanning of the reference book.

Let’s execute the following code:

						
#Activate the package
library("ryandexdirect")

#Get the developer’s token
tok <- yadirGetToken()

#Enter the variable that contains user login and substitute MYLOGIN with yours in Yandex
my_login <- "MYLOGIN"

#Get the gazetteer
geo_dict <- yadirGetDictionary(DictionaryName = "GeoRegions", Language = "en", login = my_login, token = tok )

#Add the field with the country identifier
geo_dict$CountryID <- NA

#Set the country for each of the reference book element
#Start a cycle for each of the reference book elements 
for(gd in 1:nrow(geo_dict)){
  #Check if the current element is a country. If yes, set the ID of the same element, if not, set the ID of a parent element
  if(geo_dict$GeoRegionType[gd] == "Country"){
    geo_dict$CountryID[gd] <- geo_dict$GeoRegionId[gd]
  }else{
    geo_dict$CountryID[gd] <- geo_dict$ParentId[gd]}

  #Check the type of the parent region. If it has a lower value than a country, attribute the country ID to the region ID that is a parent ID for the parent region of the very first line. Keep doing this until you find the country.
  while(geo_dict$GeoRegionType[geo_dict$GeoRegionId == geo_dict$CountryID[gd]] %in% c("Administrative area", "District", "City", "City district", "Village")){

    geo_dict$CountryID[gd] <- geo_dict$ParentId[geo_dict$GeoRegionId == geo_dict$CountryID[gd]]

  }
}


#Separate the reference book of regions
geo_dict <- merge(geo_dict, subset(geo_dict, select = c("GeoRegionId","GeoRegionName")),by.x = "CountryID", by.y = "GeoRegionId", all.x = T)
names(geo_dict) <- c("CountryID", "GeoRegionId", "ParentId", "GeoRegionType", "GeoRegionName", "CountryName")

#Add a field with the region identifier
geo_dict$AreaID <- NA

#Set a region for each of the reference book elements
#Start a cycle for each of the reference book elements

for(gd in 1:nrow(geo_dict)){
    #Check if the current element is a region. If yes, set an ID of this element to a country field. If not, set an ID of a parent element.
  if(geo_dict$GeoRegionType[gd] == "Administrative area"){
    geo_dict$AreaID[gd] <- geo_dict$GeoRegionId[gd]
  }else{
    geo_dict$AreaID[gd] <- geo_dict$ParentId[gd]}

  #Check the type of the parent region. If it has a lower value than a region, attribute the region ID to the district ID that is a parent ID for the parent region of the very first line. Keep doing this until you find the country.
  while(geo_dict$GeoRegionType[geo_dict$GeoRegionId == geo_dict$AreaID[gd]] %in% c("District", "City", "City district", "Village")){

    geo_dict$AreaID[gd] <- geo_dict$ParentId[geo_dict$GeoRegionId == geo_dict$AreaID[gd]]

  }
}

#Separate the reference book of regions
geo_dict <- merge(geo_dict, subset(geo_dict, select = c("GeoRegionId","GeoRegionName")),by.x = "AreaID", by.y = "GeoRegionId", all.x = T)
names(geo_dict) <- c("AreaID","CountryID", "GeoRegionId", "ParentId", "GeoRegionType", "GeoRegionName", "CountryName", "AreaName")

#Request statistics from the Report service
stat <- yadirGetReport(ReportType = "CUSTOM_REPORT",
                       DateRangeType = "CUSTOM_DATE",
                       DateFrom = "2018-01-01",
                       DateTo = "2018-01-31",
                       FieldNames = c("LocationOfPresenceId",
                                      "Impressions",
                                      "Clicks",
                                      "Cost"),
                       FilterList = NULL,
                       IncludeVAT = "YES",
                       IncludeDiscount = "NO",
                       Login =my_login,
                       Token = tok)

#Combine the statistic data with the region reference book
stat <- merge(stat, geo_dict, by.x = "LocationOfPresenceId", by.y = "GeoRegionId", all.x = T)
						
					

Code example on GitHub

First thing we do after activating a package is getting a token and setting the necessary variables: requesting geographic reference book with the help of the yadirGetDictionary function.

There are 4 fields in the reference book that the API returns:

  • GeoRegionId is the region identifier.
  • GeoRegionName is the name of the region.
  • GeoRegionType can be World, Continent, Region, Country, Administrative area, District, City, City district, Village.
  • ParentId is the parent region identifier.
4 fields in the reference book that the API returns

As mentioned, the API will return the most accurate statistic data on the user location, according to the hierarchy in GeoRegionType. This means that the data on the clicks and spent money will be attributed to the village from the reference book where the user is (village is the lower level of hierarchy in the geographic reference book).

Thus, you get the data from the API, containing the information of geographical hierarchy (village, city, region, district, continent, etc.).

To see the data grouped according to administrative districts or countries, we need to find out what each geographic element belongs to. This means, that the reference book should look this way:

The next code part can help you find an administrative district and the country that the element belongs to:

As a result, we’ll make the reference book we’ve obtained before look this way:

  • AreaID
  • CountryID
  • GeoRegionId
  • ParentId
  • GeoRegionType
  • CountryName
  • GeoRegionName
  • AreaName

Having changed the look of the reference book, request the statistic information from the ad account with the information grouped by the LocationOfPresenceId field. Using the merge function, combine the statistic data with the data from the geographical reference book on the LocationOfPresenceId and GeoRegionId fields.

Using the argument all.x = T, you’ll request the table X (statistic data) to be merged with the table Y (geographical reference book), using the LEFT JOIN. This means that we take all of the lines from the statistics table and add new lines from the geographical reference book according to the GeoRegionId key.

Thus, we got a report with the primary indicators of the ad campaign performance across countries and regions.

Conclusions

The R language can help you automate working with the statistic data from many sources including any ad systems and web analytics platforms. There’s a variety of ready-to-use packages to enhance the basic R capacities.

Choosing the correct package will help you avoid weeks of studying documents on the APIs of ad services, as 90% of work has already been done by someone before. The only thing you’ll have to do, is find the necessary functions in the chosen package and study their main arguments.

Good luck with automating your work with statistics and studying the R language! If you have any questions or need a bit of advice, feel free to leave a comment ;)

You might also like