How to send data from Google Search Console to BigQuery

Almost every website uses Google Search Console as the primary source of information about organic traffic performance. However, this tool doesn’t have all the necessary filters to break down your data to the deep bottom, and even changing the line chart from daily to weekly or monthly is impossible. 

Therefore, storing raw data where you can access it quickly and visualise it in the format you need is critical. Of course, you can export data into Google Sheets and work there, but it’s not a topic of this article; this article will present two robust methods of sending data from Google Search Console to BigQuery. 

After having data there, you can run SQL queries and visualise organic data the way you need using Google Looker Studio (Tableau or PowerBI) or more advanced methods such as Python visualisation packages (Seaborn, Matplotlib, Plotly). For instance, you can collect keywords into groups by their position and see the performance of each group over time. 

Option #1 (Recommended): New bulk data export of Search Console data to BigQuery

The Search Console team announced the new bulk data export on February 21, 2023, allowing you to send data from Google Search Console to BigQuery using the Search Console interface. 

Unlike exporting into Google Sheets or using the second option, it’s not limited by a daily row limit that allows you not to worry that some data won’t be exported or something will break in future.

So, to integrate Search Console and BigQuery, you should take the following steps:

  1. Create the Google Cloud project and enable billing.

    Step 1 Create Cloud Project, Enable Billing

  2. Go to “API & Services“ and click on “Enable API and Services” button, Select BigQuery API and activate it

    Step 2 Enable BigQuery API

  3. Click on the left-side menu and select “IAM & Admin” 

    Step 3 Go to IAM & Roles in Google Cloud

  4. Click on “Grant access” button at the top of the page and specify the permissions:

    1) Principal: search-console-data-export@system.gserviceaccount.com
    2) Roles:  BigQuery Job User and BigQuery Data Editor
    3) Save the changesStep 4 Specify roles and permissions

  5. Now, open Google Search Console and go to Settings and select “Bulk data export”

    Step 5 Open Search Console, go to Settings - Bulk Data Export

  6. Adjust the settings accordingly:

    1) Provide your Google Cloud project id
    2) Dataset name. You can leave it as it is if you want to send data from one Search Console property to BigQuery
    3) Dataset location 
    4) Click “continue” to save your settings and confirm your setup. Step 6 Specify Data Export Settings in Search Console

After you click “continue”, you should be informed relatively quickly if there is an immediately detectable problem (such as access). Otherwise, Search Console should begin the export process within a day. You will be able to see the first data within the next 48 hours.

search_console_dataset_bigquery
New “searchconsole” dataset in BigQuery property

After 48 hours, you can return to your BigQuery project and notice an additional dataset, “searchconsole”, which will include 3 tables with the following schemas. 

Table “ExportLog”

FieldDescription
agendaWhich type of data was saved. Currently the only value is SEARCHDATA
namespaceWhich table was saved to. Search Console typically exports data separately for each table, so each day typically has two export rows.
data_dateDate of the click or impression, in Pacific time. This is the partition date used by the tables.
epoch_version An integer, where 0 is the first time data was saved to this table for this date. In the rare case that Search Console needs to go back and adjust previous data, perhaps because of a data anomaly, this value will be incremented by 1 each time the data is updated for that table and date.
publish_timeThe Pacific Time when the export completed.

Table “searchdata_site_impression

FieldDescription
data_dateThe day on which the data in this row was generated
site_urlURL of the property
queryThe user query
is_anonymized_queryRare queries (called anonymized queries) are marked with this bool
countryCountry from where the query was made
search_typeweb/image/video/news/discover/googleNews
deviceThe device category from which the query was made
impressionsThe number of impressions for this row.
clicksThe number of clicks for this row.
sum_top_positionThe sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. 

Table “searchdata_url_impression”

FieldDescription
data_dateThe day on which the data in this row was generated
site_urlURL of the property
urlURL of the website page
queryThe user query
is_anonymized_queryRare queries (called anonymized queries) are marked with this bool
is_anonymized_discoverWhether the data row is under the Discover anonymization threshold. When under the threshold, some other fields (like URL and country) will be missing to protect user privacy.
countryCountry from where the query was made
search_typeweb/image/video/news/discover/googleNews
deviceThe device category from which the query was made
is_[search_appearance_type]There are several boolean fields used to indicate search appearance type, such as is_amp_top_stories, is_job_listing, and is_job_details
impressionsThe number of impressions for this row.
clicksThe number of clicks for this row.
sum_top_positionThe sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. 

Although the setup of this method is relatively easy because you don’t need a developer to implement it, the core limitation of this export option is that it doesn’t load all historical data available in Search Console (last 16 months) but only loads new data. 

For instance, I activated the linking on March 26, 2023, for the property that had data since September 11, 2022, and I got data only from March 24, 2023. 2 days delay is an ordinary data processing delay that Search Console has. 

Therefore, if you don’t need all historical data, I recommend using this approach, but if you want to upload historical data into BigQuery, use the second one. 

Option #2: Export data using Google Cloud Functions and Search Console API

So, there are many options and solutions that you can use to store historical data in BigQuery, even starting with Search Console for Google Sheets add-on and uploading it manually. My option is something other than creating a solution that can work for historical data and for new data and will upload data directly into BigQuery

The solution is based on Search Console API and Python code knowledge and requires you to comprehend code a bit. As core resources that can automate and store the process in the cloud, I use Google Cloud resources such as Cloud Scheduler and Cloud Functions

Before you start to implement it, make sure that you have access to tools in BigQuery, enable Search Console API, you have a service account with the proper permissions, and you can create new schedulers and new cloud functions.

So, the steps are the following: 

  1. Open Google Cloud project and go to Cloud Scheduler 
  2. Click on “Create job” and provide the following settings:
    1. Name – every_day
    2. Region – your BigQuery dataset region
    3. Description -“Triggers something every day at 12:30 AM”
    4. Timezone – yours
    5. Frequency – 30 0 * * * 
    6. Select “Pub/Sub” as target type. If you don’t have any, create a new one.
    7. Click on button create
  3. Go to “Cloud Functions” and click on “Create function”
    1. Environment – 1st gen
    2. Function name – “search_console_data_to_bigquery”
    3. Region – your BigQuery dataset region
    4. Trigger type – same “Pub/Sub” you used in Cloud Scheduler
    5. Click Save and click “Next”
    6. Runtime : Python 3.8 
    7. Entry point: gsc_pages_to_bigquery
    8. Copy my requirements from here
    9. Create a new file “service_account.json” and put your service account there
    10. Copy my main.py file and input your data where it’s necessary. If you want to load more than 5 last days data, you can adjust number in “(days=5)” in the code below.  
    11. After all of the modifications are done, deploy the function.

main.py

# install necessary libraries to access google search console
from google.oauth2 import service_account
from googleapiclient.discovery import build
import requests
import json
import pandas as pd
import datetime as datetime
from google.cloud import bigquery


#define dates
today = datetime.datetime.today()
today_str = datetime.datetime.today().strftime("%Y-%m-%d")
two_days_before = datetime.datetime.today() - datetime.timedelta(days=5)
two_days_before_str = two_days_before.strftime("%Y-%m-%d")

# define key to access BigQuery and GSC
key = 'service_account.json'

#schema 

schema_pages = [
    {'name': 'date', 'type': 'DATE'},
    {'name': 'page', 'type': 'STRING'},
    {'name': 'clicks', 'type': 'INTEGER'},
    {'name': 'impressions', 'type': 'INTEGER'},
    {'name': 'ctr', 'type': 'FLOAT64'},
    {'name': 'position', 'type': 'FLOAT64'}
]


schema_queries = [
    {'name': 'date', 'type': 'DATE'},
    {'name': 'query', 'type': 'STRING'},
    {'name': 'clicks', 'type': 'INTEGER'},
    {'name': 'impressions', 'type': 'INTEGER'},
    {'name': 'ctr', 'type': 'FLOAT64'},
    {'name': 'position', 'type': 'FLOAT64'}
]


schema_countries = [
    {'name': 'date', 'type': 'DATE'},
    {'name': 'country', 'type': 'STRING'},
    {'name': 'clicks', 'type': 'INTEGER'},
    {'name': 'impressions', 'type': 'INTEGER'},
    {'name': 'ctr', 'type': 'FLOAT64'},
    {'name': 'position', 'type': 'FLOAT64'}
]

schema_devices = [
    {'name': 'date', 'type': 'DATE'},
    {'name': 'device', 'type': 'STRING'},
    {'name': 'clicks', 'type': 'INTEGER'},
    {'name': 'impressions', 'type': 'INTEGER'},
    {'name': 'ctr', 'type': 'FLOAT64'},
    {'name': 'position', 'type': 'FLOAT64'}
]


schema_all = [
    {'name': 'date', 'type': 'DATE'},
    {'name': 'page', 'type': 'STRING'},
    {'name': 'query', 'type': 'STRING'},
    {'name': 'country', 'type': 'STRING'},
    {'name': 'device', 'type': 'STRING'},
    {'name': 'clicks', 'type': 'INTEGER'},
    {'name': 'impressions', 'type': 'INTEGER'},
    {'name': 'ctr', 'type': 'FLOAT64'},
    {'name': 'position', 'type': 'FLOAT64'}
]

# Scopes 
SCOPES = [
    'https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/bigquery',
    'https://www.googleapis.com/auth/bigquery.insertdata'
]


# connect to GSC
def connect(key):

    scope = ['https://www.googleapis.com/auth/webmasters',
             'https://www.googleapis.com/auth/cloud-platform',
             'https://www.googleapis.com/auth/drive',
             'https://www.googleapis.com/auth/bigquery',
             'https://www.googleapis.com/auth/bigquery.insertdata'
            ]
    credentials = service_account.Credentials.from_service_account_file(key, 
                                                                        scopes=scope)
    service = build(
        'webmasters',
        'v3',
        credentials=credentials
    )
    
    return service

# query GSC data
def query(service, site_url, payload):
    
    response = service.searchanalytics().query(siteUrl=site_url, body=payload).execute()
    
    results = []
    
    for row in response['rows']:    
        data = {}
        
        for i in range(len(payload['dimensions'])):
            data[payload['dimensions'][i]] = row['keys'][i]

        data['clicks'] = row['clicks']
        data['impressions'] = row['impressions']
        data['ctr'] = round(row['ctr'] * 100, 2)
        data['position'] = round(row['position'], 2)        
        results.append(data)
    
    return pd.DataFrame.from_dict(results)

# getting GSC data
service = connect(key)
site_url = ‘sc-domain:{your GSC property id}’

pages_payload = {
    'startDate': two_days_before_str,
    'endDate': today_str,
    'dimensions': ["date", "page"],
    'rowLimit': 25000,
    'startRow': 0
}


queries_payload = {
    'startDate': two_days_before_str,
    'endDate': today_str,
    'dimensions': ["date", "query"],
    'rowLimit': 25000,
    'startRow': 0
}

country_payload = {
    'startDate': two_days_before_str,
    'endDate': today_str,
    'dimensions': ["date", "country"],
    'rowLimit': 25000,
    'startRow': 0
}


device_payload = {
    'startDate': two_days_before_str,
    'endDate': today_str,
    'dimensions': ["date", "page"],
    'rowLimit': 25000,
    'startRow': 0
}

all_payload = {
    'startDate': two_days_before_str,
    'endDate': today_str,
    'dimensions': ["date", "page", "query", "country", "device"],
    'rowLimit': 25000,
    'startRow': 0
}


def gsc_pages_to_bigquery(event, context):
  try:
    gsc_pages = query(service, site_url, pages_payload)
    gsc_queries = query(service, site_url, queries_payload)
    gsc_countries = query(service, site_url, country_payload)
    gsc_devices = query(service, site_url, device_payload)
    gsc_all = query(service, site_url, all_payload)
    # Credentials again
    credentials = service_account.Credentials.from_service_account_file(key, scopes=SCOPES)
    # writing GSD data into BigQuery
    gsc_pages.to_gbq(destination_table='searchconsole.pages_raw', project_id=‘{{Your BigQuery Project ID}}’, if_exists='append', table_schema=schema_pages, credentials=credentials)
    gsc_queries.to_gbq(destination_table='searchconsole.queries_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_queries, credentials=credentials)
    gsc_countries.to_gbq(destination_table='searchconsole.countries_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_countries, credentials=credentials)
    gsc_devices.to_gbq(destination_table='searchconsole.devices_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_devices, credentials=credentials)
    gsc_all.to_gbq(destination_table='searchconsole.full_detailed_report_raw', project_id='{{Your BigQuery Project ID}}', if_exists='append', table_schema=schema_all, credentials=credentials)
    # working with BigQuery to get non-duplicated data
    project_id = '{{Your BigQuery Project ID}}'
    client = bigquery.Client(credentials=credentials,project=project_id)
    query_string = """
    -- Find and delete rows that have the data of the same dates as a new file --
    -- Page Data --
    DELETE 
    FROM `{{Your BigQuery Project ID}}.searchconsole.pages` AS bd
    WHERE DATE(bd.date) in (select date(bdr.date) FROM `{{Your BigQuery Project ID}}.searchconsole.pages_raw` as bdr);
    -- Insert new data into final dataset 
    INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.pages`
    SELECT *
    FROM `{{Your BigQuery Project ID}}.searchconsole.pages_raw` as pr;
    DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.pages_raw`;
    -- Queries Data --
    DELETE 
    FROM `{{Your BigQuery Project ID}}.searchconsole.queries` AS bd
    WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.queries_raw` as bdr);
    -- Insert new data into final dataset 
    INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.queries`
    SELECT *
    FROM `{{Your BigQuery Project ID}}.searchconsole.queries_raw` as pr;
    DROP TABLE `{{Your BigQuery Project ID}}.searchconsole.queries_raw`;
    -- Countries Data --
    DELETE 
    FROM `{{Your BigQuery Project ID}}.searchconsole.countries` AS bd
    WHERE DATE(bd.date) in (select date(bdr.date) FROM `{{Your BigQuery Project ID}}.searchconsole.countries_raw` as bdr);
    -- Insert new data into final dataset 
    INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.countries`
    SELECT *
    FROM `{{Your BigQuery Project ID}}.searchconsole.countries_raw` as pr;
    DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.countries_raw`;
    -- Devices Data --
    DELETE 
    FROM `{{Your BigQuery Project ID}}.searchconsole.devices` AS bd
    WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.devices_raw` as bdr);
    -- Insert new data into final dataset 
    INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.devices`
    SELECT *
    FROM `{{Your BigQuery Project ID}}.searchconsole.devices_raw` as pr;
    DROP TABLE`{{Your BigQuery Project ID}}.searchconsole.devices_raw`;
    -- All Data --
    DELETE 
    FROM `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report` AS bd
    WHERE DATE(bd.date) in (select date(bdr.date) FROM `vakulskigroup.searchconsole.full_detailed_report_raw` as bdr);
    -- Insert new data into final dataset 
    INSERT INTO `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report`
    SELECT *
    FROM `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report_raw` as pr;
    DROP TABLE `{{Your BigQuery Project ID}}.searchconsole.full_detailed_report_raw`;
    """
    client.query(query_string)
    print("Executed Succesfully!")
    return
  except Exception as e:
    print(str(e))

requirements.txt

pandas==1.4.2
pandas-gbq==0.17.5
numpy==1.19.1
searchconsole==0.0.4
google-api-core==2.8.1
google-api-python-client==2.79.0
google-auth==2.6.6
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.5.1
google-cloud==0.34.0
google-cloud-bigquery==3.1.0
google-cloud-bigquery-storage==2.13.1
google-cloud-core==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.2
regex==2022.4.24
requests==2.27.1
requests-oauthlib==1.3.1
datetime==4.4
pydata-google-auth==1.7.0

If you make all steps right and will have all necessary access, Cloud Functions will start to load data into your BigQuery project. If cloud function failed to be deployed, you should check logs and see what’s the issue. 

Again, this second approach require python knowledge. If you don’t know how to code or have only basic knowledge, I recommend you to go with the first approach or contact me for paid service or consultation. 

Final Word 

Almost every website uses Search Console to monitor organic search traffic performance. Sending data from Google Search Console to BigQuery can bring additional benefits. You can build any report you want using the raw data. This article presents two solutions how to achieve it. If your website is relatively new or you are interested in only new data, use the first approach. Also, the first approach creates the partitioned tables that allow you to optimise BigQuery costs.

However, if you want to have all historical data in BigQuery, use the second solution however, be prepared to know how to read and write the code. 


Frequently Asked Questions

How do I connect Google Search Console to BigQuery?

The Search Console team announced the new bulk data export on February 21, 2023, allowing you to send data from Google Search Console to BigQuery using the BigQuery and Search Console interface. It’s the best free option available on the market.

What’s the best connector of Search Console and BigQuery?

You don’t need to use any external connector to connect Search Console and BigQuery. Google introduced a new bulk data export for Search Console that you can use to send GSC data into BigQuery automatically. There is no need for developers. 

3 Replies to “How to send data from Google Search Console to BigQuery”

  1. Hi,

    Thanks for the effort in putting this together,

    quick question, in method 2, is that collecting the same data and in the same table format as method 1.

    I’m thinking of doing method 1, then running method 2 just once to backfill the tables with the missing data.

    Thanks
    John

    1. Hi John,

      Thank you for the question.

      Both of them contain similar columns, but they only match partially.
      You can read more about available metrics in Search Console API here.

      Best regards,
      Ihar

Leave a comment

Your email address will not be published. Required fields are marked *