Security Signals: Making Web Security Posture Measurable At Scale

15 Nov 2024

The Idea

Security Signals is a framework used internally in Google which has helped them identify and prioritise security issues in their web applications.

It is based on the observation that the HTTP request/response is mostly sufficient to gain an understanding of the potential attacks that might affect an endpoint and the mitigations applied by the application that prevent it from being vulnerable. Ex: CSRF will mostly happen on POST endpoints, presence of X-Frame-Options response header ensures safety from clickjacking.

While this is a good starting point, more data needed than the request/response to evaluate the security posture meaningfully. Ex: tech versions, presence of access control check etc. This is collected via custom HTTP headers (Google uses X-Google-Security-Signals) which will be stripped before responses are sent to the user at the reverse proxy. These are termed as Synthetic Signals.

The data is collected at the reverse proxy level as that is the place with the least variation in the entire stack (100s of frameworks/languages vs <10 major reverse proxies) and fits well with most organisations existing architecture. Whenever this is not a good fit, the request logging feature in HTTP servers can be leveraged.

Google’s Implementation

The system is described as a distributed map-reduce pipeline which collates data from various sources. There is also focus on reducing the size of the collected data and removing privacy-sensitive information. The output is stored in database table that can be queried in SQL for a specific period of time after cardinality reduction.

Input Sources

  • External HTTP Traffic Logs
  • Internal HTTP Traffic Logs (Helps analyse internal/yet-to-be launched services and can have increased sampling rate due to lesser traffic.)
  • Security Scanner Logs (Logs from Google’s internal security scanner.)

What is Collected?

  • Basic HTTP Request & Response Data (HTTP method, host, redacted path, headers etc.)
  • HTTP Security Headers (Content-Security-Policy, X-Frame-Options, Cross-Origin-Resource-Policy etc)
  • Synthetic Signals (Whether CSRF checks are present, server-side isolation policies are present to prevent XSS)
  • Auxiliary Data (Framework used, metadata about application’s build environment, the method that generated the response; to identify ownership)
  • Risk Signals (traffic volume, sensitivity of the hosting domain; for prioritisation)

How is the Size of the Collected Data Reduced?

  • Path Redaction (Replacement of query parameters with placeholders)
  • User Agent Parsing (parse the user agent information and keep only browser name and major version)

How Does Google Use This System?

  • Measure the adoption of web security features in older applications
  • Measurement of large scale security deployment progress
  • Prioritisation of security rollouts
  • Automated alerting and bug filing for security issues
  • Identify dependencies between different services
  • Improve the internal scanner coverage by supplying better targets
  • Different dashboards for executives, developers, and security engineers.

The Paper

Security Signals: Making Web Security Posture Measurable At Scale

Tags