Share
Contents
Almost every website uses Google Search Console as the primary source of information about organic traffic performance. However, this tool doesn’t have all the necessary filters to break down your data to the deep bottom, and even changing the line chart from daily to weekly or monthly is impossible.
Therefore, storing raw data where you can access it quickly and visualise it in the format you need is critical. Of course, you can export data into Google Sheets and work there, but it’s not a topic of this article; this article will present two robust methods of sending data from Google Search Console to BigQuery.
After having data there, you can run SQL queries and visualise organic data the way you need using Google Looker Studio (Tableau or PowerBI) or more advanced methods such as Python visualisation packages (Seaborn, Matplotlib, Plotly). For instance, you can collect keywords into groups by their position and see the performance of each group over time.
Option #1 (Recommended): New bulk data export of Search Console data to BigQuery
The Search Console team announced the new bulk data export on February 21, 2023, allowing you to send data from Google Search Console to BigQuery using the Search Console interface.
Unlike exporting into Google Sheets or using the second option, it’s not limited by a daily row limit that allows you not to worry that some data won’t be exported or something will break in future.
So, to integrate Search Console and BigQuery, you should take the following steps:
- Create the Google Cloud project and enable billing.
- Go to “API & Services“ and click on “Enable API and Services” button, Select BigQuery API and activate it
- Click on the left-side menu and select “IAM & Admin”
- Click on “Grant access” button at the top of the page and specify the permissions:
1) Principal: search-console-data-export@system.gserviceaccount.com
2) Roles: BigQuery Job User and BigQuery Data Editor
3) Save the changes - Now, open Google Search Console and go to Settings and select “Bulk data export”
- Adjust the settings accordingly:
1) Provide your Google Cloud project id
2) Dataset name. You can leave it as it is if you want to send data from one Search Console property to BigQuery
3) Dataset location
4) Click “continue” to save your settings and confirm your setup.
After you click “continue”, you should be informed relatively quickly if there is an immediately detectable problem (such as access). Otherwise, Search Console should begin the export process within a day. You will be able to see the first data within the next 48 hours.
After 48 hours, you can return to your BigQuery project and notice an additional dataset, “searchconsole”, which will include 3 tables with the following schemas.
Table “ExportLog”
Field | Description |
agenda | Which type of data was saved. Currently the only value is SEARCHDATA |
namespace | Which table was saved to. Search Console typically exports data separately for each table, so each day typically has two export rows. |
data_date | Date of the click or impression, in Pacific time. This is the partition date used by the tables. |
epoch_version | An integer, where 0 is the first time data was saved to this table for this date. In the rare case that Search Console needs to go back and adjust previous data, perhaps because of a data anomaly, this value will be incremented by 1 each time the data is updated for that table and date. |
publish_time | The Pacific Time when the export completed. |
Table “searchdata_site_impression“
Field | Description |
data_date | The day on which the data in this row was generated |
site_url | URL of the property |
query | The user query |
is_anonymized_query | Rare queries (called anonymized queries) are marked with this bool |
country | Country from where the query was made |
search_type | web/image/video/news/discover/googleNews |
device | The device category from which the query was made |
impressions | The number of impressions for this row. |
clicks | The number of clicks for this row. |
sum_top_position | The sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. |
Table “searchdata_url_impression”
Field | Description |
data_date | The day on which the data in this row was generated |
site_url | URL of the property |
url | URL of the website page |
query | The user query |
is_anonymized_query | Rare queries (called anonymized queries) are marked with this bool |
is_anonymized_discover | Whether the data row is under the Discover anonymization threshold. When under the threshold, some other fields (like URL and country) will be missing to protect user privacy. |
country | Country from where the query was made |
search_type | web/image/video/news/discover/googleNews |
device | The device category from which the query was made |
is_[search_appearance_type] | There are several boolean fields used to indicate search appearance type, such as is_amp_top_stories, is_job_listing, and is_job_details |
impressions | The number of impressions for this row. |
clicks | The number of clicks for this row. |
sum_top_position | The sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. |
Although the setup of this method is relatively easy because you don’t need a developer to implement it, the core limitation of this export option is that it doesn’t load all historical data available in Search Console (last 16 months) but only loads new data.
For instance, I activated the linking on March 26, 2023, for the property that had data since September 11, 2022, and I got data only from March 24, 2023. 2 days delay is an ordinary data processing delay that Search Console has.
Therefore, if you don’t need all historical data, I recommend using this approach, but if you want to upload historical data into BigQuery, use the second one.
Option #2: Export data using Google Cloud Functions and Search Console API
So, there are many options and solutions that you can use to store historical data in BigQuery, even starting with Search Console for Google Sheets add-on and uploading it manually. My option is something other than creating a solution that can work for historical data and for new data and will upload data directly into BigQuery.
The solution is based on Search Console API and Python code knowledge and requires you to comprehend code a bit. As core resources that can automate and store the process in the cloud, I use Google Cloud resources such as Cloud Scheduler and Cloud Functions.
Before you start to implement it, make sure that you have access to tools in BigQuery, enable Search Console API, you have a service account with the proper permissions, and you can create new schedulers and new cloud functions.
So, the steps are the following:
- Open Google Cloud project and go to Cloud Scheduler
- Click on “Create job” and provide the following settings:
- Name – every_day
- Region – your BigQuery dataset region
- Description -“Triggers something every day at 12:30 AM”
- Timezone – yours
- Frequency – 30 0 * * *
- Select “Pub/Sub” as target type. If you don’t have any, create a new one.
- Click on button create
- Go to “Cloud Functions” and click on “Create function”
- Environment – 1st gen
- Function name – “search_console_data_to_bigquery”
- Region – your BigQuery dataset region
- Trigger type – same “Pub/Sub” you used in Cloud Scheduler
- Click Save and click “Next”
- Runtime : Python 3.8
- Entry point: gsc_pages_to_bigquery
- Copy my requirements from here
- Create a new file “service_account.json” and put your service account there
- Copy my main.py file and input your data where it’s necessary. If you want to load more than 5 last days data, you can adjust number in “(days=5)” in the code below.
- After all of the modifications are done, deploy the function.
main.py
# install necessary libraries to access google search console
from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
import datetime as datetime
from google.cloud import bigquery
#define dates
today = datetime.datetime.today()
today_str = datetime.datetime.today().strftime("%Y-%m-%d")
two_days_before = datetime.datetime.today() - datetime.timedelta(days=5)
two_days_before_str = two_days_before.strftime("%Y-%m-%d")
# define key to access BigQuery and GSC
key = 'service_account.json'
#schema
schema_pages = [
{'name': 'date', 'type': 'DATE'},
{'name': 'page', 'type': 'STRING'},
{'name': 'clicks', 'type': 'INTEGER'},
{'name': 'impressions', 'type': 'INTEGER'},
{'name': 'ctr', 'type': 'FLOAT64'},
{'name': 'position', 'type': 'FLOAT64'}
]
schema_queries = [
{'name': 'date', 'type': 'DATE'},
{'name': 'query', 'type': 'STRING'},
{'name': 'clicks', 'type': 'INTEGER'},
{'name': 'impressions', 'type': 'INTEGER'},
{'name': 'ctr', 'type': 'FLOAT64'},
{'name': 'position', 'type': 'FLOAT64'}
]
schema_countries = [
{'name': 'date', 'type': 'DATE'},
{'name': 'country', 'type': 'STRING'},
{'name': 'clicks', 'type': 'INTEGER'},
{'name': 'impressions', 'type': 'INTEGER'},
{'name': 'ctr', 'type': 'FLOAT64'},
{'name': 'position', 'type': 'FLOAT64'}
]
schema_devices = [
{'name': 'date', 'type': 'DATE'},
{'name': 'device', 'type': 'STRING'},
{'name': 'clicks', 'type': 'INTEGER'},
{'name': 'impressions', 'type': 'INTEGER'},
{'name': 'ctr', 'type': 'FLOAT64'},
{'name': 'position', 'type': 'FLOAT64'}
]
schema_all = [
{'name': 'date', 'type': 'DATE'},
{'name': 'page', 'type': 'STRING'},
{'name': 'query', 'type': 'STRING'},
{'name': 'country', 'type': 'STRING'},
{'name': 'device', 'type': 'STRING'},
{'name': 'clicks', 'type': 'INTEGER'},
{'name': 'impressions', 'type': 'INTEGER'},
{'name': 'ctr', 'type': 'FLOAT64'},
{'name': 'position', 'type': 'FLOAT64'}
]
# Scopes
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/bigquery.insertdata'
]
# connect to GSC
def connect(key):
scope = ['https://www.googleapis.com/auth/webmasters',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/bigquery.insertdata'
]
credentials = service_account.Credentials.from_service_account_file(key,
scopes=scope)
service = build(
'webmasters',
'v3',
credentials=credentials
)
return service
# query GSC data
def query(service, site_url, payload):
response = service.searchanalytics().query(siteUrl=site_url, body=payload).execute()
results = []
for row in response['rows']:
data = {}
for i in range(len(payload['dimensions'])):
data[payload['dimensions'][i]] = row['keys'][i]
data['clicks'] = row['clicks']
data['impressions'] = row['impressions']
data['ctr'] = round(row['ctr'] * 100, 2)
data['position'] = round(row['position'], 2)
results.append(data)
return pd.DataFrame.from_dict(results)
# getting GSC data
service = connect(key)
site_url = ‘sc-domain:{your GSC property id}’
pages_payload = {
'startDate': two_days_before_str,
'endDate': today_str,
'dimensions': ["date", "page"],
'rowLimit': 25000,
'startRow': 0
}
queries_payload = {
'startDate': two_days_before_str,
'endDate': today_str,
'dimensions': ["date", "query"],
'rowLimit': 25000,
'startRow': 0
}
country_payload = {
'startDate': two_days_before_str,
'endDate': today_str,
'dimensions': ["date", "country"],
'rowLimit': 25000,
'startRow': 0
}
device_payload = {
'startDate': two_days_before_str,
'endDate': today_str,
'dimensions': ["date", "page"],
'rowLimit': 25000,
'startRow': 0
}
all_payload = {
'startDate': two_days_before_str,
'endDate': today_str,
'dimensions': ["date", "page", "query", "country", "device"],
'rowLimit': 25000,
'startRow': 0
}
def gsc_pages_to_bigquery(event, context):
try:
gsc_pages = query(service, site_url, pages_payload)
gsc_queries = query(service, site_url, queries_payload)
gsc_countries = query(service, site_url, country_payload)
gsc_devices = query(service, site_url, device_payload)
gsc_all = query(service, site_url, all_payload)
# Credentials again
credentials = service_account.Credentials.from_service_account_file(key, scopes=SCOPES)
# writing GSD data into BigQuery
gsc_pages.to_gbq(destination_table='searchconsole.pages_raw', project_id=‘{{Your BigQuery Project ID}}’, if_exists='append', table_schema=schema_pages, credentials=credentials)
gsc_queries.to_gbq(destination_table='searchconsole.queries_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_queries, credentials=credentials)
gsc_countries.to_gbq(destination_table='searchconsole.countries_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_countries, credentials=credentials)
gsc_devices.to_gbq(destination_table='searchconsole.devices_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_devices, credentials=credentials)
gsc_all.to_gbq(destination_table='searchconsole.full_detailed_report_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_all, credentials=credentials)
# working with BigQuery to get non-duplicated data
project_id = '{{Your BigQuery Project ID}}'
client = bigquery.Client(credentials=credentials,project=project_id)
query_string = """
-- Find and delete rows that have the data of the same dates as a new file --
-- Page Data --
DELETE
FROM `{{Your BigQuery Project ID}}.searchconsole.pages` AS bd
WHERE DATE(bd.date) in (select date(bdr.date) FROM `{{Your BigQuery Project ID}}.searchconsole.pages_raw` as bdr);
-- Insert new data into final dataset
INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.pages`
SELECT *
FROM `{{Your BigQuery Project ID}}.searchconsole.pages_raw` as pr;
DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.pages_raw`;
-- Queries Data --
DELETE
FROM `{{Your BigQuery Project ID}}.searchconsole.queries` AS bd
WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.queries_raw` as bdr);
-- Insert new data into final dataset
INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.queries`
SELECT *
FROM `{{Your BigQuery Project ID}}.searchconsole.queries_raw` as pr;
DROP TABLE `{{Your BigQuery Project ID}}.searchconsole.queries_raw`;
-- Countries Data --
DELETE
FROM `{{Your BigQuery Project ID}}.searchconsole.countries` AS bd
WHERE DATE(bd.date) in (select date(bdr.date) FROM `{{Your BigQuery Project ID}}.searchconsole.countries_raw` as bdr);
-- Insert new data into final dataset
INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.countries`
SELECT *
FROM `{{Your BigQuery Project ID}}.searchconsole.countries_raw` as pr;
DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.countries_raw`;
-- Devices Data --
DELETE
FROM `{{Your BigQuery Project ID}}.searchconsole.devices` AS bd
WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.devices_raw` as bdr);
-- Insert new data into final dataset
INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.devices`
SELECT *
FROM `{{Your BigQuery Project ID}}.searchconsole.devices_raw` as pr;
DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.devices_raw`;
-- All Data --
DELETE
FROM `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report` AS bd
WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.full_detailed_report_raw` as bdr);
-- Insert new data into final dataset
INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report`
SELECT *
FROM `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report_raw` as pr;
DROP TABLE `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report_raw`;
"""
client.query(query_string)
print("Executed Succesfully!")
return
except Exception as e:
print(str(e))
requirements.txt
pandas==1.4.2
pandas-gbq==0.17.5
numpy==1.19.1
searchconsole==0.0.4
google-api-core==2.8.1
google-api-python-client==2.79.0
google-auth==2.6.6
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.5.1
google-cloud==0.34.0
google-cloud-bigquery==3.1.0
google-cloud-bigquery-storage==2.13.1
google-cloud-core==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.2
regex==2022.4.24
requests==2.27.1
requests-oauthlib==1.3.1
datetime==4.4
pydata-google-auth==1.7.0
If you make all steps right and will have all necessary access, Cloud Functions will start to load data into your BigQuery project. If cloud function failed to be deployed, you should check logs and see what’s the issue.
Again, this second approach require python knowledge. If you don’t know how to code or have only basic knowledge, I recommend you to go with the first approach or contact me for paid service or consultation.
Final Word
Almost every website uses Search Console to monitor organic search traffic performance. Sending data from Google Search Console to BigQuery can bring additional benefits. You can build any report you want using the raw data. This article presents two solutions how to achieve it. If your website is relatively new or you are interested in only new data, use the first approach. Also, the first approach creates the partitioned tables that allow you to optimise BigQuery costs.
However, if you want to have all historical data in BigQuery, use the second solution however, be prepared to know how to read and write the code.
Frequently Asked Questions
The Search Console team announced the new bulk data export on February 21, 2023, allowing you to send data from Google Search Console to BigQuery using the BigQuery and Search Console interface. It’s the best free option available on the market.
You don’t need to use any external connector to connect Search Console and BigQuery. Google introduced a new bulk data export for Search Console that you can use to send GSC data into BigQuery automatically. There is no need for developers.
You Might Also Like
Written By
Ihar Vakulski
With over 8 years of experience working with SaaS, iGaming, and eCommerce companies, Ihar shares expert insights on building and scaling businesses for sustainable growth and success.
KEEP LEARNING
GA4 Data Import is the function provided by Google that allows companies to merge data accessible in GA4 with data from other sources such as…
Most companies that run paid ad campaigns always aspire to optimise them as much as possible to increase the number of conversions and decrease the…
Leave a Comment
Your email address will not be published. Required fields are marked *
Stay Updated About Every New GA4 Feature
Subscribe to this newsletter to learn more about new Google Analytics 4 features and adjustments.
No SPAM and only relevant content guaranteed!
One Reply to “How to send data from Google Search Console to BigQuery”
Excellent post. I absolutely love this site.