When html page makes a request for a ga.js file the http protocol sends big amount of data, about IP, refer, browers, language, system. There is no need to use ajax.
But still some data cant be achieved this way, so GA script puts image into html with additional parameters, take a look at this example:
http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1464271798&utmhn=www.example.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=10.0%20r22&utmdt=Page title&utmhid=1805038256&utmr=0&utmp=/&utmac=cookie value
This is a blank image, sometimes called a tracking pixel, that GA puts into HTML.
Answer from Thinker on Stack OverflowVideos
When html page makes a request for a ga.js file the http protocol sends big amount of data, about IP, refer, browers, language, system. There is no need to use ajax.
But still some data cant be achieved this way, so GA script puts image into html with additional parameters, take a look at this example:
http://www.google-analytics.com/__utm.gif?utmwv=4.3&utmn=1464271798&utmhn=www.example.com&utmcs=UTF-8&utmsr=1920x1200&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=10.0%20r22&utmdt=Page title&utmhid=1805038256&utmr=0&utmp=/&utmac=cookie value
This is a blank image, sometimes called a tracking pixel, that GA puts into HTML.
Some good answers here which individually tend to hit on one method or another for sending the data. There's a valuable reference which I feel is missing from the above answers, though, and covers all the methods.
Google refers to the different methods of sending data 'transport mechanisms'
From the Analytics.js documentation Google mentions the three main transport mechanisms that it uses to send data.
This specifies the transport mechanism with which hits will be sent. The options are 'beacon', 'xhr', or 'image'. By default, analytics.js will try to figure out the best method based on the hit size and browser capabilities. If you specify 'beacon' and the user's browser does not support the
navigator.sendBeaconmethod, it will fall back to 'image' or 'xhr' depending on hit size.
- One of the common and standard ways to send some of the data to Google (which is shown in Thinker's answer) is by adding the data as GET parameters to a tracking pixel. This would fall under the category which Google calls an 'image' transport.
- Secondly, Google can use the 'beacon' transport method if the client's browser supports it. This is often my preferred method because it will attempt to send the information immediately. Or in Google's words:
This is useful in cases where you wish to track an event just before a user navigates away from your site, without delaying the navigation.
- The 'xhr' transport mechanism is the third way that Google Analytics can send data back home, and the particular transport mechanism that is used can depend on things such as the size of the hit. (I'm not sure what other factors go into GA deciding the optimal transport mechanism to use)
In case you are curious how to force GA into using a specific transport mechanism, here is a sample code snippet which forces this event hit to be sent as a 'beacon':
ga('send', 'event', 'click', 'download-me', {transport: 'beacon'});
Hope this helps.
Also, if you are curious about this topic because you'd like to capture and send this data to your own site too, I recommend creating a binding to Google Analytics' send, which allows you to grab the payload and AJAX it to your own server.
ga(function(tracker) {
// Grab a reference to the default sendHitTask function.
originalSendHitTask = tracker.get('sendHitTask');
// Modifies sendHitTask to send a copy of the request to a local server after
// sending the normal request to www.google-analytics.com/collect.
tracker.set('sendHitTask', function(model) {
var payload = model.get('hitPayload');
originalSendHitTask(model);
var xhr = new XMLHttpRequest();
xhr.open('POST', '/index.php?task=mycollect', true);
xhr.send(payload);
});
});
... identify what data is actually collected by the default script .... I also have a list of all the possible dimensions and metrics that can be collected
Just to be clear, GA collects more information than what they share with Analytics consumers. While their client-side script may allow for additional data to be collected (like custom query string parameters), most of what they collect data seems to be similar on every site, regardless of what the analytics user chooses to consume (with the exception of a few configuration items such as "anonymizeIp").
Google's policies are cleverly worded to indicate that turning on "Advertising Features" doesn't necessarily change what they collect with GA, other than the fact that a new cookie might be present:
By enabling the Advertising Features, you enable Google Analytics to collect data about your traffic via Google advertising cookies and identifiers
Knowing what GA collects (even when you don't ask it to) is particularly important given the ambiguity around whether GA is really GDPR compliant (which includes IP addresses, cookie identifiers, and GPS locations as "personal data").
Looking at the source code
Google Analytics is a moving target, BUT there is value in having a snapshot of the identifying information about the client and browser that was being leaked to Google Analytics at a given point in time,
Even though it's a bit outdated, this analysis was done using a Manually Deobfuscated Google Analytics javascript file, snapshot taken Mar 27, 2018.
1. Data available in Document and Window Objects
Some key objects to look for in the analytics JS: DOCUMENT, WINDOW, NAVIGATOR, SCREEN, LOCATION
Here are the items that are utilized by GA (doesn't necessarily mean this data is sent back to google in a raw form).
Data Utilized | Code Snippet
------------- | ------------
Url | LOCATION.protocol + "//" + LOCATION.hostname + LOCATION.pathname + LOCATION.search
ReferringPage | DOCUMENT.referrer
PageTitle | DOCUMENT.title
HowLongIsPageVisible | DOCUMENT.visibilityState .. DOCUMENT,"visibilitychange"
DocumentSize | DOCUMENT.documentElement .clientWidth && .clientHeight
ScreenResolution | SCREEN.width SCREEN.height
ScreenColors | SCREEN.colorDepth + "-bit"
ClientSize | e = document.body; e.clientWidth && e.clientHeight
ViewportSize | ca = [documentEl.clientWidth .... : ca = [e.clientWidth .... ca.join("x")
FlashVersion | getFlashVersion
Encoding | characterSet || DOCUMENT.charset
JSONAvailable | window.JSON
JavaEnabled | NAVIGATOR.javaEnabled()
Language | NAVIGATOR.language || NAVIGATOR.browserLanguage
UserAgent | NAVIGATOR.userAgent
Timezone/LocalTime | c.getTimezoneOffset(), c.getYear(), c.getDate(), c.getHours(), c.getMinutes()
PerformanceData | WINDOW.performance || WINDOW.webkitPerformance ... loadEventStart,domainLookupEnd,domainLookupStart,connectStart,responseStart,requestStart,responseEnd,responseStart,fetchStart,domInteractive,domContentLoadedEventStart
Plugins | NAVIGATOR.plugins
SignalUserLeaving | navigator.sendBeacon() // how long the user was on the page
HistoryLength | WINDOW.history.length // number of pages viewed with this browser tab
IsTopSiteForUser | navigator.loadPurpose // "Top Sites" section of Safari
NameOfPage (JS) | WINDOW.name
IsFrame | WINDOW.top != WINDOW
IsEmbedded | WINDOW.external
RandomData | WINDOW.crypto.getRandomValues // because of the try/catch, it doesn't appear to leak anything other than random values
ScriptTags | getElementsByTagName("script"); // probably for Ads, AutoLink decorating [https://support.google.com/analytics/answer/4627488?hl=en] and cross-domain tracking [https://developers.google.com/analytics/devguides/collection/analyticsjs/cross-domain]
Cookies (JS) | DOCUMENT.cookie.split(";") // limited to cookies not marked as server only
2. Data available from the QueryString and Hash
By default, GA seems to only explicitly collect querystring parameters that are documented as specific to Google Analytics. But keep in mind that they also have the entire URL available to extract this data server-side, querystring and hash included:
_ga
_gac
gclid
gclsrc
dclid
utm_id
utm_campaign
utm_source
utm_medium
utm_term
utm_content
3. Data available in the HTTP Header
They can choose to capture anything on the request header from the browser. Most notably:
Cookies (Google) | for the google analytics domain, to track the user between sites
IP Address | (parameter "anonymizeIp" claims to anonymize the IP address)
Browser w/ version |
Operating system |
Device Type |
Referer | (in this context, only the url of the page the client is currently on)
X-Forwarded-For | Is a proxy being used? And, if not used for privacy, the actual IP address
4. Other inferred data
Javascript enabled
Cookies enabled
Other identifying information they don't appear to track/utilize
Some other metrics that are readily available, but GA doesn't appear to access:
Canvas Supported
CPU Architecture
CPU Number of cores
AudioContext Supported
Bluetooth Supported
Battery Status
Memory (RAM)
Number of speakers
Number of microphones
Number of webcams
Device Orientation
Device input is Touchscreen
System Fonts
LocalStorage Data
IndexedDB Data
WebRTC Supported
WebGL Supported
WebSocket Supported
Misc Hacks
They don't appear to use any known hacks to extract additional unique user information, such as finding the video card model of the current machine using Canvas and GL. This is not too surprising, since Google can just expose any data they want in chromium/webkit.
However, their control of 70% of the browser market gives them the power to manipulate otherwise innocuous functions (like the random number generator) to leak data for user tracking, if they so desire.
Summary
What you choose to see from the Google Analytics portal does not necessarily impact what they collect.
GA helps Google determine how well a site performs for Search Ranking, and creates a User Fingerprint to track what each internet user looks at and for how long. The latter helps them select ads, which is where they make the bulk of their money. Much of the data they touch in their script doesn't get sent back in raw form, but rather, is used to create said fingerprint.
If you dig deeper you'll find plenty of literature on Google Analytics architecture.
According to the official documentation:
Google Analytics works by the inclusion of a block of JavaScript code on pages in your website. When users to your website view a page, this JavaScript code references a JavaScript file which then executes the tracking operation for Analytics. The tracking operation retrieves data about the page request through various means and sends this information to the Analytics server via a list of parameters attached to a single-pixel image request.
Source: How Does Google Analytics Collect Data?
Additional reading: Google Analytics Features
The official information can be found here
The visitor tracking information that you can get in the Google Analytics reports depends on Javascript code that you include in your website pages, referred to as the Google Analytics Tracking Code (GATC). Initial releases of the GATC used a Javascript file called urchin.js.
That script is then discussed in detail in that blog, and Google Analytics Help group can also provide some details.
a More detailed list of what that javascript collect is listed here.
I found the official google documentation here:
http://code.google.com/apis/analytics/docs/tracking/gaTrackingTroubleshooting.html
i also found this very discussion VERY useful:
http://www.google.com/support/forum/p/Google%20Analytics/thread?tid=5f11a529100f1d47&hl=en
helped me find out WTF utmcc actually DID