How I Saved Tons of GBs with HTTPs Caching

Simplest Guide for Squid SSL Bumping

Rasika Perera
10 min readSep 18, 2021
Photo by Milada Vigerova on Unsplash

“📢 Reader Alert❗If you are using an uncapped Internet package, probably this might not be your cup of tea ☕️…”

If you are still interested, hold my beer 🍺 I am going to tell you how to save your data ( and of course money ! 💰). Your savings are proportional to your repeatable web-behavior.

During my day job, I am working on building a high-performance and scalable machine-learning platform. I love it to the fullest. Then, as always good things come with a price 💵. Building these products involves burning many GBs right in front of my eyes 🔥.

Living in a country with not-so-cheap data, I began exploring alternative ways to save data. So I ended up here, introducing a local HTTP- Cache since I did not want to modify the code nor bring ad-hoc hacks on the table 🍽.

And this is what it looks like 😎;

How Build Process is Fronted by A Http Cache (Author Produced)

It was using curl, wget, conda and pip on different parts of the building process and all I wanted was to re-use artifacts whenever possible across different versioning changes. Yes I know, some of these tools have their own caching as well. But, when they hit the TCP/IP stack, it means they really want to bring it from the Internet with a data cost. That’s exactly where our own Http-Cache(aka. Squid Proxy) is going to sit between.

“YES! You heard it correctly, We are using Squid 🐙”

We are using Squid since it is designed to act as a caching proxy for the web supporting HTTP, HTTPS, FTP, and more.

How Build Process Interacts with Our Http Cache (Author Produced)

“Ops! most web traffic from these tools are in HTTPS, Our Squid in-the-middle needs to know what is being transferred 🙇

…This why we need [SSL bumping] 💡”

When ssl-bumping is enabled, Squid will decrypt and re-encrypt the SSL traffic using a configurable CA certificate. These dynamically generated certificates are more similar to their original certificates to ease tracing certificate failures.

How HTTPS Traffic Flows through the Http Cache (Author Produced)

That’s enough! Let’s Check Out Our Steps… 👇

  1. Compile and Install Squid
  2. Configure OpenSSL
  3. Configure Squid
  4. Starting Squid
  5. Forwarding Web Traffic through Squid

Bonus! 🎁

6. Does it work ? — Testing Squid

7. How much Did I Save? —Let’s generate a report

STEP 1: Compile and Install Squid

First, we need a source distribution! 📥

Download and extract Squid 5.1 source distribution.

wget http://www.squid-cache.org/Versions/v5/squid-5.1.tar.gz
tar -xvf squid-5.1.tar.gz

Make sure you have build essentials! 🔌

Run the following terminal commands to install pre-requsities.

sudo apt-get update
sudo apt-get install build-essential openssl libssl-dev pkg-config

Let’s compile and install Squid 🔧

Navigate into extracted directory and run the following commands to compile squid with dynamic certificate generation support.

cd squid-5.1
./configure --with-default-user=proxy --with-openssl --enable-ssl-crtd
make
sudo make install

STEP 2: Configure OpenSSL

Let’s Configure OpenSSL to generate certificates 📜

First, we need to add KeyUsage configuration into openssl. Open openssl.conf the file using the below terminal command.

sudo vim /etc/ssl/openssl.cnf

Then, add(or uncomment) the KeyUsage configuration under the [V3_CA] block.

[ v3_ca ]
keyUsage = cRLSign, keyCertSign

Okay, Now we need a self-signed root CA certificate 🔐

Hold on, We need to create a temp folder to generate self-signed certificates.

mkdir /tmp/ssl_cert
cd /tmp/ssl_cert

Now, generate the following self-signed root CA certificate files.

openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -extensions v3_ca -keyout squid-self-signed.key -out squid-self-signed.crt

A sample input would look like this;

# Country Name (2 letter code) [AU]:LK
# State or Province Name (full name) [Some-State]:Western
# Locality Name (eg, city) []:Mt. Lavinia
# Organization Name (eg, company) [Internet Widgits Pty Ltd]:ABC
# Organizational Unit Name (eg, section) []:Eng Team
# Common Name (e.g. server FQDN or YOUR name) []: Rasika Perera
# Email Address []:info.rasika@gmail.com

Let’s convert the CRT into DER and PEM formats.

# Convert the cert into a trusted certificate in DER format.
openssl x509 -in squid-self-signed.crt -outform DER -out squid-self-signed.der
# Convert the cert into a trusted certificate in PEM format.
openssl x509 -in squid-self-signed.crt -outform PEM -out squid-self-signed.pem
# Generate the settings file for the Diffie-Hellman algorithm.
openssl dhparam -outform PEM -out squid-self-signed_dhparam.pem 2048

Alright! Let’s copy certificates into Squid 📁

sudo cp -rf /tmp/ssl_cert /usr/local/squid/etc/ssl_cert

Let’s add our CA cert as a trusted CA into the local machine.

sudo cp /usr/local/squid/etc/ssl_cert/squid-self-signed.pem /usr/local/share/ca-certificates/squid-self-signed.crt# Update CA certificate cache
sudo update-ca-certificates

STEP 3: Configure Squid

Let’s configure Squid Configuration File 📜

sudo vim /usr/local/squid/etc/squid.conf

Add the following directives to the beginning of the file or before the first http_access directive.

acl intermediate_fetching transaction_initiator certificate-fetching http_access allow intermediate_fetching

Add below after acl Safe_ports port 777 directives.

acl Safe_ports port 777  # multiling http                       acl CONNECT method CONNECT

Replace the http_port directive with the following. We are enabling ssl_bump for port 3128 with dynamic certificate generation using our self-signed CA. Further, we are configuring certificate_cache of 20MB (in general can store ~5000 certificates).

http_port 3128 tcpkeepalive=60,30,3 ssl-bump generate-host-certificates=on dynamic_cert_mem_cache_size=20MB tls-cert=/usr/local/squid/etc/ssl_cert/squid-self-signed.crt tls-key=/usr/local/squid/etc/ssl_cert/squid-self-signed.key cipher=HIGH:MEDIUM:!LOW:!RC4:!SEED:!IDEA:!3DES:!MD5:!EXP:!PSK:!DSS options=NO_TLSv1,NO_SSLv3,SINGLE_DH_USE,SINGLE_ECDH_USE tls-dh=prime256v1:/usr/local/squid/etc/ssl_cert/squid-self-signed_dhparam.pemsslcrtd_program /usr/local/squid/libexec/security_file_certgen -s /usr/local/squid/var/logs/ssl_db -M 20MB
sslcrtd_children 5
ssl_bump server-first all
ssl_bump stare all
sslproxy_cert_error deny all

Replace commented cache_dir directive with the following. When unspecified, the default settings of maximum_object_size is 4MB, cache_mem is 256MB and cache_dir is 100MB. These are not ideal when caching bigger file objects, tweak them as per your requirement. Please refer squid memory docs for further details.

#cache_dir ufs /usr/local/squid/var/cache/squid 100 16 256
maximum_object_size 6 GB
cache_mem 8192 MB
cache_dir ufs /usr/local/squid/var/cache/squid 32000 16 256 # 32GB as Cache

Add following just before the dot refresh_pattern. In this case; we are caching all URLs with jar, zip, wheels and gzip. Please refer squid docs for further details.

refresh_pattern -i .(jar|zip|whl|gz|bz)  259200 20% 259200 ignore-reload ignore-no-store ignore-private override-expire
refresh_pattern -i conda.anaconda.org\/.* 259200 20% 259200 ignore-reload ignore-no-store ignore-private override-expire
refresh_pattern . 0 20% 4320

“📢 That’s All ! Please refer [here] for the complete file with above configurations 💯 ”.

Let’s set permissions for the directory 🛡

sudo chown -R proxy:proxy /usr/local/squid

STEP 4: Starting Squid

Before Running Squid for the first-time; ⚠️

As a one-time step, you need to initialize SSL database and cache directories.

sudo -u proxy -- /usr/local/squid/libexec/security_file_certgen -c -s /usr/local/squid/var/logs/ssl_db -M 20MBsudo -u proxy -- /usr/local/squid/sbin/squid -z

Starting and Shutting Down Squid 🚀

Below command starts squid as proxy user. Please note that `-d 10` is optional and it will set the log level for debugging purposes.

sudo -u proxy -- /usr/local/squid/sbin/squid -d 10

If everything was done as expected, you should see the below terminal output.

kid1| Done scanning /usr/local/squid/var/cache/squid dir (0 entries)
kid1| Finished rebuilding storage from disk.
kid1| 0 Entries scanned
kid1| 0 Invalid entries.
kid1| 0 With invalid flags.
kid1| 0 Objects loaded.
kid1| 0 Objects expired.
kid1| 0 Objects cancelled.
kid1| 0 Duplicate URLs purged.
kid1| 0 Swapfile clashes avoided.
kid1| Took 0.04 seconds ( 0.00 objects/sec).
kid1| Beginning Validation Procedure
kid1| Completed Validation Procedure
kid1| Validated 0 Entries
kid1| store_swap_size = 0.00 KB
kid1| storeLateRelease: released 0 objects

You can shutdown Squid gracefully with the below command;

sudo -u proxy -- /usr/local/squid/sbin/squid -k shutdown

Can also stop immediately without any waiting time with the below command;

sudo -u proxy -- /usr/local/squid/sbin/squid -k interrupt

STEP 5: Forwarding Web Traffic through Squid

If required, you can set proxy settings globally in Linux/Ubuntu by setting http_proxy and https_proxy environment variables. But it is highly recommended to configure each web-client tools (eg. wget, curl, conda, pip) to have better control and avoid blind request failures with https.

“📢 We can also setup Squid proxy only for specific processes and let other processes interacts the Web directly…💻”.

It is possible to setup Squid for specific processes with environment variables available for web-clients. However, the capability of enabling Squid proxy for specific processes through environment variables will be limited by the web-client tool being used and its support for such configurations.

Setting Process-Specific Environment Variables On Enabling Squid Proxy (Author Produced)

📌 Setting cURL to Use Squid Proxy

Open ~/.curlrc file and add the below configs.

proxy = 127.0.0.1:3128
cacert=/etc/ssl/certs/squid-self-signed.pem

Please note that we are not using insecure option, instead we point our CA certificate to validate dynamic certificates.

To enable Squid only for a specific process, move above ~/.curlrc into a new directory called ~/.proxy_curlrc/.

mkdir ~/.proxy_curlrc
mv ~/.curlrc ~/.proxy_curlrc/.curlrc

Then set env variable as below before the process is spawned;

export CURL_HOME=~/.proxy_curlrc

Please refer to curl documentation for further details.

📌 Setting Wget to Use Squid Proxy

Open~/.wgetrc file and add the below configs.

use_proxy=yes
http_proxy=127.0.0.1:3128
https_proxy=127.0.0.1:3128
ca_certificate=/etc/ssl/certs/squid-self-signed.pem

Please note that we are not using check_certificate=off option, instead we point our CA certificate to validate dynamic certificates.

To enable Squid only for a specific process, move above ~/.wgetrc into ~/.proxy_wgetrc. Then set the env variable as below before the process is spawned;

export WGETRC=~/.proxy_wgetrc

Please refer to wget documentation for further details.

📌 Setting Conda to Use Squid Proxy

Open ~/.condarc file and add the below configs.

channels:
- defaults
show_channel_urls: True
allow_other_channels: True
proxy_servers:
http: http://127.0.0.1:3128
https: http://127.0.0.1:3128
ssl_verify: /etc/ssl/certs/squid-self-signed.pem

Please note that we are not using SSL_VERIFY=false, instead we point our CA certificate to validate dynamic certificates.

To enable Squid only for a specific process, move above ~/.condarc into ~/.proxy_condarc. Then set the env variable as below before the process is spawned;

export CONDARC=~/.proxy_condarc

Please refer to conda documentation for further details.

📌 Setting pip to Use Squid Proxy

You can configure pip to use the caching server as below config.

Open~/.config/pip/pip.conf file and add below configs.

[global]
proxy = 127.0.0.1:3128
cert = /etc/ssl/certs/squid-self-signed.pem

To enable Squid only for a specific process, move above ~/.config/pip/pip.conf into ~/.proxy_pip.conf. Then set the env variable as below before the process is spawned;

export PIP_CONFIG_FILE=~/.proxy_pip.conf

Please refer to pip documentation for further details.

STEP 6: But Does It Work? — Testing Squid

Photo by Rob Wicks on Unsplash

No matter how hard we try, things won’t always work as expected. Many things could have gone wrong.

Let’s Check Whether Our Caching Proxy is Working as Expected 🕵

While the traffic flows, you can listen to access.log file of the proxy. This contains traffic information that pass-through the Squid Server.

sudo tail -f /usr/local/squid/var/logs/access.log
  • TCP_MEM_HIT/200 indicates that the cached file is read from the memory cache and served. That’s lightning fast, and your squid just worked!
  • TCP_HIT/200 indicates the cached file is read from the disk and served. This is comparatively slower than MEM_HIT but still significant than the round trip to the server.
  • TCP_MISS/200 does not find any cached copy. A copy of the fetched file will be saved for later usage.
  • TCP_REFRESH_UNMODIFIED/200 means there’s an If-Modified-Since request sent to external server and produced unmodified status. Thus, client receives a HTTP 200-OK response.
  • NONE for unclassified results.

Squid documentation provides the complete list of these tags.

“🗞 When referring [docs], you might need to decompose log entries into a series of questions.

eg. TCP_MEM_HIT/200
-> ‘TCP’ code?, ‘MEM’ code? ‘HIT’ code? ‘200’ status?” 💬

Let’s Check whether proxy configs are parsing 🚥

If you can’t see any logs in access.log, chances are high that your squid could have failed to start. You can check whether the Squid is running with the below command;

ps aux | grep squid

Another useful command is to parse the config file as below; this will let you know if there’s any issues related to configurations.

sudo -u proxy -- /usr/local/squid/sbin/squid -k parse

STEP 7: How Much Did I Save? — Let’s generate a report

Photo by Shridhar Vashistha on Unsplash

Last but not least, we need insights into our savings in-terms of data bandwidth and response times. We can straight-way generate a report out of access.log using Calamaris tool.

Let’s Install Calamaris 📥

sudo apt-get install calamaris

Generate Report using Calamaris 📈

When you pipe the access.log output to calamaris, it generates an intuitive report with speedups and bandwidth savings you got.

sudo cat /usr/local/squid/var/logs/access.log | calamaris

The command above will print an output like;

...
------------------------------------------ -------------- ----------
Proxy statistics
------------------------------------------ -------------- ----------
Total amount: requests 27504
Total Bandwidth: Byte 109126058K
Proxy efficiency (HIT [kB/sec] / DIRECT [kB/sec]): factor 27.71
Average speed increase: % 561.46
------------------------------------------ -------------- ----------
Cache statistics
------------------------------------------ -------------- ----------
Total amount cached: requests 10133
Request hit rate: % 36.84
Bandwidth savings: Byte 96096551K
Bandwidth savings in Percent (Byte hit rate): % 88.06
------------------------------------------ -------------- ----------
...

Whoa, isn’t it great? 96GB bandwidth savings with 5x average speedup.

Well…that’s not all they got, please refer here for a complete list of other Squid log analysis tools as well. 🔨

“Well, congratulations! We just learned how to save GBs by placing a local Caching Proxy. Enjoy your Day! ✌️”

Thanks for reading. If you enjoyed this article, feel free to hit that clap button 👏 a few times to help others find it.

--

--

Rasika Perera

Lead Software Developer @H2O.ai Ex-WSO2, Open-source Contributor, Blogger