How I Saved Tons of GBs with HTTPs Caching
Simplest Guide for Squid SSL Bumping
“📢 Reader Alert❗If you are using an uncapped Internet package, probably this might not be your cup of tea ☕️…”
If you are still interested, hold my beer 🍺 I am going to tell you how to save your data ( and of course money ! 💰). Your savings are proportional to your repeatable web-behavior.
During my day job, I am working on building a high-performance and scalable machine-learning platform. I love it to the fullest. Then, as always good things come with a price 💵. Building these products involves burning many GBs right in front of my eyes 🔥.
Living in a country with not-so-cheap data, I began exploring alternative ways to save data. So I ended up here, introducing a local HTTP- Cache since I did not want to modify the code nor bring ad-hoc hacks on the table 🍽.
And this is what it looks like 😎;
It was using curl, wget, conda and pip on different parts of the building process and all I wanted was to re-use artifacts whenever possible across different versioning changes. Yes I know, some of these tools have their own caching as well. But, when they hit the TCP/IP stack, it means they really want to bring it from the Internet with a data cost. That’s exactly where our own Http-Cache(aka. Squid Proxy) is going to sit between.
“YES! You heard it correctly, We are using Squid 🐙”
We are using Squid since it is designed to act as a caching proxy for the web supporting HTTP, HTTPS, FTP, and more.
“Ops! most web traffic from these tools are in HTTPS, Our Squid in-the-middle needs to know what is being transferred 🙇
…This why we need [SSL bumping] 💡”
When ssl-bumping is enabled, Squid will decrypt and re-encrypt the SSL traffic using a configurable CA certificate. These dynamically generated certificates are more similar to their original certificates to ease tracing certificate failures.
That’s enough! Let’s Check Out Our Steps… 👇
- Compile and Install Squid
- Configure OpenSSL
- Configure Squid
- Starting Squid
- Forwarding Web Traffic through Squid
Bonus! 🎁
6. Does it work ? — Testing Squid
7. How much Did I Save? —Let’s generate a report
STEP 1: Compile and Install Squid
First, we need a source distribution! 📥
Download and extract Squid 5.1 source distribution.
wget http://www.squid-cache.org/Versions/v5/squid-5.1.tar.gz
tar -xvf squid-5.1.tar.gz
Make sure you have build essentials! 🔌
Run the following terminal commands to install pre-requsities.
sudo apt-get update
sudo apt-get install build-essential openssl libssl-dev pkg-config
Let’s compile and install Squid 🔧
Navigate into extracted directory and run the following commands to compile squid with dynamic certificate generation support.
cd squid-5.1
./configure --with-default-user=proxy --with-openssl --enable-ssl-crtd
make
sudo make install
STEP 2: Configure OpenSSL
Let’s Configure OpenSSL to generate certificates 📜
First, we need to add KeyUsage configuration into openssl. Open openssl.conf
the file using the below terminal command.
sudo vim /etc/ssl/openssl.cnf
Then, add(or uncomment) the KeyUsage configuration under the [V3_CA] block.
[ v3_ca ]
keyUsage = cRLSign, keyCertSign
Okay, Now we need a self-signed root CA certificate 🔐
Hold on, We need to create a temp folder to generate self-signed certificates.
mkdir /tmp/ssl_cert
cd /tmp/ssl_cert
Now, generate the following self-signed root CA certificate files.
openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -extensions v3_ca -keyout squid-self-signed.key -out squid-self-signed.crt
A sample input would look like this;
# Country Name (2 letter code) [AU]:LK
# State or Province Name (full name) [Some-State]:Western
# Locality Name (eg, city) []:Mt. Lavinia
# Organization Name (eg, company) [Internet Widgits Pty Ltd]:ABC
# Organizational Unit Name (eg, section) []:Eng Team
# Common Name (e.g. server FQDN or YOUR name) []: Rasika Perera
# Email Address []:info.rasika@gmail.com
Let’s convert the CRT into DER and PEM formats.
# Convert the cert into a trusted certificate in DER format.
openssl x509 -in squid-self-signed.crt -outform DER -out squid-self-signed.der# Convert the cert into a trusted certificate in PEM format.
openssl x509 -in squid-self-signed.crt -outform PEM -out squid-self-signed.pem# Generate the settings file for the Diffie-Hellman algorithm.
openssl dhparam -outform PEM -out squid-self-signed_dhparam.pem 2048
Alright! Let’s copy certificates into Squid 📁
sudo cp -rf /tmp/ssl_cert /usr/local/squid/etc/ssl_cert
Let’s add our CA cert as a trusted CA into the local machine.
sudo cp /usr/local/squid/etc/ssl_cert/squid-self-signed.pem /usr/local/share/ca-certificates/squid-self-signed.crt# Update CA certificate cache
sudo update-ca-certificates
STEP 3: Configure Squid
Let’s configure Squid Configuration File 📜
sudo vim /usr/local/squid/etc/squid.conf
Add the following directives to the beginning of the file or before the first http_access directive.
acl intermediate_fetching transaction_initiator certificate-fetching http_access allow intermediate_fetching
Add below after acl Safe_ports port 777 directives.
acl Safe_ports port 777 # multiling http acl CONNECT method CONNECT
Replace the http_port directive with the following. We are enabling ssl_bump for port 3128 with dynamic certificate generation using our self-signed CA. Further, we are configuring certificate_cache of 20MB (in general can store ~5000 certificates).
http_port 3128 tcpkeepalive=60,30,3 ssl-bump generate-host-certificates=on dynamic_cert_mem_cache_size=20MB tls-cert=/usr/local/squid/etc/ssl_cert/squid-self-signed.crt tls-key=/usr/local/squid/etc/ssl_cert/squid-self-signed.key cipher=HIGH:MEDIUM:!LOW:!RC4:!SEED:!IDEA:!3DES:!MD5:!EXP:!PSK:!DSS options=NO_TLSv1,NO_SSLv3,SINGLE_DH_USE,SINGLE_ECDH_USE tls-dh=prime256v1:/usr/local/squid/etc/ssl_cert/squid-self-signed_dhparam.pemsslcrtd_program /usr/local/squid/libexec/security_file_certgen -s /usr/local/squid/var/logs/ssl_db -M 20MB
sslcrtd_children 5
ssl_bump server-first all
ssl_bump stare all
sslproxy_cert_error deny all
Replace commented cache_dir directive with the following. When unspecified, the default settings of maximum_object_size is 4MB, cache_mem is 256MB and cache_dir is 100MB. These are not ideal when caching bigger file objects, tweak them as per your requirement. Please refer squid memory docs for further details.
#cache_dir ufs /usr/local/squid/var/cache/squid 100 16 256
maximum_object_size 6 GB
cache_mem 8192 MB
cache_dir ufs /usr/local/squid/var/cache/squid 32000 16 256 # 32GB as Cache
Add following just before the dot refresh_pattern. In this case; we are caching all URLs with jar, zip, wheels and gzip. Please refer squid docs for further details.
refresh_pattern -i .(jar|zip|whl|gz|bz) 259200 20% 259200 ignore-reload ignore-no-store ignore-private override-expire
refresh_pattern -i conda.anaconda.org\/.* 259200 20% 259200 ignore-reload ignore-no-store ignore-private override-expire
refresh_pattern . 0 20% 4320
“📢 That’s All ! Please refer [here] for the complete file with above configurations 💯 ”.
Let’s set permissions for the directory 🛡
sudo chown -R proxy:proxy /usr/local/squid
STEP 4: Starting Squid
Before Running Squid for the first-time; ⚠️
As a one-time step, you need to initialize SSL database and cache directories.
sudo -u proxy -- /usr/local/squid/libexec/security_file_certgen -c -s /usr/local/squid/var/logs/ssl_db -M 20MBsudo -u proxy -- /usr/local/squid/sbin/squid -z
Starting and Shutting Down Squid 🚀
Below command starts squid as proxy user. Please note that `-d 10` is optional and it will set the log level for debugging purposes.
sudo -u proxy -- /usr/local/squid/sbin/squid -d 10
If everything was done as expected, you should see the below terminal output.
kid1| Done scanning /usr/local/squid/var/cache/squid dir (0 entries)
kid1| Finished rebuilding storage from disk.
kid1| 0 Entries scanned
kid1| 0 Invalid entries.
kid1| 0 With invalid flags.
kid1| 0 Objects loaded.
kid1| 0 Objects expired.
kid1| 0 Objects cancelled.
kid1| 0 Duplicate URLs purged.
kid1| 0 Swapfile clashes avoided.
kid1| Took 0.04 seconds ( 0.00 objects/sec).
kid1| Beginning Validation Procedure
kid1| Completed Validation Procedure
kid1| Validated 0 Entries
kid1| store_swap_size = 0.00 KB
kid1| storeLateRelease: released 0 objects
You can shutdown Squid gracefully with the below command;
sudo -u proxy -- /usr/local/squid/sbin/squid -k shutdown
Can also stop immediately without any waiting time with the below command;
sudo -u proxy -- /usr/local/squid/sbin/squid -k interrupt
STEP 5: Forwarding Web Traffic through Squid
If required, you can set proxy settings globally in Linux/Ubuntu by setting http_proxy and https_proxy environment variables. But it is highly recommended to configure each web-client tools (eg. wget, curl, conda, pip) to have better control and avoid blind request failures with https.
“📢 We can also setup Squid proxy only for specific processes and let other processes interacts the Web directly…💻”.
It is possible to setup Squid for specific processes with environment variables available for web-clients. However, the capability of enabling Squid proxy for specific processes through environment variables will be limited by the web-client tool being used and its support for such configurations.
📌 Setting cURL to Use Squid Proxy
Open ~/.curlrc file and add the below configs.
proxy = 127.0.0.1:3128
cacert=/etc/ssl/certs/squid-self-signed.pem
Please note that we are not using insecure option, instead we point our CA certificate to validate dynamic certificates.
To enable Squid only for a specific process, move above ~/.curlrc into a new directory called ~/.proxy_curlrc/.
mkdir ~/.proxy_curlrc
mv ~/.curlrc ~/.proxy_curlrc/.curlrc
Then set env variable as below before the process is spawned;
export CURL_HOME=~/.proxy_curlrc
Please refer to curl documentation for further details.
📌 Setting Wget to Use Squid Proxy
Open~/.wgetrc file and add the below configs.
use_proxy=yes
http_proxy=127.0.0.1:3128
https_proxy=127.0.0.1:3128
ca_certificate=/etc/ssl/certs/squid-self-signed.pem
Please note that we are not using check_certificate=off option, instead we point our CA certificate to validate dynamic certificates.
To enable Squid only for a specific process, move above ~/.wgetrc into ~/.proxy_wgetrc. Then set the env variable as below before the process is spawned;
export WGETRC=~/.proxy_wgetrc
Please refer to wget documentation for further details.
📌 Setting Conda to Use Squid Proxy
Open ~/.condarc file and add the below configs.
channels:
- defaultsshow_channel_urls: True
allow_other_channels: Trueproxy_servers:
http: http://127.0.0.1:3128
https: http://127.0.0.1:3128ssl_verify: /etc/ssl/certs/squid-self-signed.pem
Please note that we are not using SSL_VERIFY=false, instead we point our CA certificate to validate dynamic certificates.
To enable Squid only for a specific process, move above ~/.condarc into ~/.proxy_condarc. Then set the env variable as below before the process is spawned;
export CONDARC=~/.proxy_condarc
Please refer to conda documentation for further details.
📌 Setting pip to Use Squid Proxy
You can configure pip to use the caching server as below config.
Open~/.config/pip/pip.conf file and add below configs.
[global]
proxy = 127.0.0.1:3128
cert = /etc/ssl/certs/squid-self-signed.pem
To enable Squid only for a specific process, move above ~/.config/pip/pip.conf into ~/.proxy_pip.conf. Then set the env variable as below before the process is spawned;
export PIP_CONFIG_FILE=~/.proxy_pip.conf
Please refer to pip documentation for further details.
STEP 6: But Does It Work? — Testing Squid
No matter how hard we try, things won’t always work as expected. Many things could have gone wrong.
Let’s Check Whether Our Caching Proxy is Working as Expected 🕵
While the traffic flows, you can listen to access.log file of the proxy. This contains traffic information that pass-through the Squid Server.
sudo tail -f /usr/local/squid/var/logs/access.log
- TCP_MEM_HIT/200 indicates that the cached file is read from the memory cache and served. That’s lightning fast, and your squid just worked!
- TCP_HIT/200 indicates the cached file is read from the disk and served. This is comparatively slower than MEM_HIT but still significant than the round trip to the server.
- TCP_MISS/200 does not find any cached copy. A copy of the fetched file will be saved for later usage.
- TCP_REFRESH_UNMODIFIED/200 means there’s an If-Modified-Since request sent to external server and produced unmodified status. Thus, client receives a HTTP 200-OK response.
- NONE for unclassified results.
Squid documentation provides the complete list of these tags.
“🗞 When referring [docs], you might need to decompose log entries into a series of questions.
eg. TCP_MEM_HIT/200
-> ‘TCP’ code?, ‘MEM’ code? ‘HIT’ code? ‘200’ status?” 💬
Let’s Check whether proxy configs are parsing 🚥
If you can’t see any logs in access.log, chances are high that your squid could have failed to start. You can check whether the Squid is running with the below command;
ps aux | grep squid
Another useful command is to parse the config file as below; this will let you know if there’s any issues related to configurations.
sudo -u proxy -- /usr/local/squid/sbin/squid -k parse
STEP 7: How Much Did I Save? — Let’s generate a report
Last but not least, we need insights into our savings in-terms of data bandwidth and response times. We can straight-way generate a report out of access.log using Calamaris tool.
Let’s Install Calamaris 📥
sudo apt-get install calamaris
Generate Report using Calamaris 📈
When you pipe the access.log output to calamaris, it generates an intuitive report with speedups and bandwidth savings you got.
sudo cat /usr/local/squid/var/logs/access.log | calamaris
The command above will print an output like;
...
------------------------------------------ -------------- ----------
Proxy statistics
------------------------------------------ -------------- ----------
Total amount: requests 27504
Total Bandwidth: Byte 109126058K
Proxy efficiency (HIT [kB/sec] / DIRECT [kB/sec]): factor 27.71
Average speed increase: % 561.46
------------------------------------------ -------------- ----------
Cache statistics
------------------------------------------ -------------- ----------
Total amount cached: requests 10133
Request hit rate: % 36.84
Bandwidth savings: Byte 96096551K
Bandwidth savings in Percent (Byte hit rate): % 88.06
------------------------------------------ -------------- ----------
...
Whoa, isn’t it great? 96GB bandwidth savings with 5x average speedup.
Well…that’s not all they got, please refer here for a complete list of other Squid log analysis tools as well. 🔨
“Well, congratulations! We just learned how to save GBs by placing a local Caching Proxy. Enjoy your Day! ✌️”