Home Data Center Series CloudFlare Tutorial (VI) CF Cache Rules Function Introduction and Detailed Configuration Tutorial
This article was last updated 163 days ago. The information in it may have developed or changed. If it is invalid, please leave a message in the comment section.

Preface

In the completed CF series tutorials "One" to "Five", in addition to the foundation-building trilogy from "One" to "Three", in the subsequent "Four" and "Five" I introduced CF WAF and CF DDoS protection respectively, because I think these two functions are the two most valuable key functions in the CF traffic sequence and deserve my detailed introduction in a separate article.

However, since everyone's actual environment and needs are different, there are many other functions in the CF traffic sequence, which can be said to be valuable key functions for those who need them, such as Cache Rules.

In fact, the CDN function provided by Cache Rules should have been the most popular function of CF. However, due to the particularity of the domestic network environment, after users of the Free plan use CF's CDN, no matter the access address assigned to the visitor or the location where the back-to-source request is initiated, it is likely to be assigned to the CF data center in the United States (with the most in the San Jose data center in the western United States). This results in the visitor experience being likely to be worse than direct access without using CF CDN (after all, the round trip requires crossing the Pacific Ocean), so it is known as "negative optimization" for domestic access.

However, except for domestic visitors and a few similar special countries, the access experience in other parts of the world is still good, so it is still necessary to configure CF's CDN function (especially for those whose strategy is global rather than just domestic).

In addition, for some content (such as images in image hosting), the slowdown is not noticeable, so CF's Cache Rules is indeed one of the important and most valuable features.

Cache Rules

Introduction to Cache Rules

By default (when no cache rules are configured), CF will only take effect on static resources with certain extensions (TTL is generally 2 hours). These static resources include but are not limited to:

  1. Image File:
    .jpg, .jpeg, .png, .gif, .ico, .svg, etc.
  2. Font Files:
    .woff, .woff2, .ttf, .otf, .eot, etc.
  3. CSS Files:
    .css
  4. JavaScript Files:
    .js
  5. Multimedia files:
    .mp3, .mp4, .avi, .mkv, .webm, .ogg, etc.
  6. Documentation Files:
    .pdf, .doc, etc.

This is very inconvenient. I may have other normal needs, such as:

  • I don't want to cache video files with .mkv extensions, because there are many video files of this format on my website and they are all very large. If I cache them and the number of visits is large, it is likely that I will be banned from CF.
  • I want to cache HTML files, but the default static resources do not include HTML
  • The default 2 hours is too short, I want to cache for a longer time
  • ……..

Custom requirements such as the above are too many and too common. In order to solve these normal and common custom requirements of users for cache, CF launched Cache Rules.

CF's Cache Rules feature allows users to customize cache behavior based on specific conditions and requirements, thereby controlling the content and effective time of CDN and browser caches, thereby optimizing the loading speed and cache efficiency of website content.

The main functions and features of Cache Rules are as follows:

  1. Flexible cache control:Users can set rules to define what content should be cached and what should not be cached. For example, specific paths, request headers, query parameters, etc. can be specified to control caching behavior.
  2. Support complex conditions: Cache Rules supports matching based on multiple conditions such as URL path, HTTP header, query parameters, etc. Users can create complex caching strategies, such as caching specific file types, region-based caching strategies, etc.
  3. Priority Management:Users can set priorities for different cache rules to ensure that high-priority rules are executed first in cache control, helping to achieve more accurate cache management.
  4. Distinguishing between dynamic and static content:Users can set different caching strategies according to content type (static or dynamic) to optimize cache hit rate and server load.
  5. Real-time validation and debugging: Rule changes can take effect in real time, and users can manage and debug through Cloudflare's dashboard or API, making it easy to quickly adjust and test caching strategies.
  6. Integrating with other Cloudflare services: Cache Rules is integrated with other Cloudflare services (such as WAF, Rate Limiting, Argo Smart Routing, etc.) to provide more comprehensive security and performance optimization capabilities.

Getting Started with Cache Rules

Cache Rules

A Cache Rules rule consists of two parts: incoming request matching and cache eligibility.

Request incoming matching

Allows users to set specific conditions for incoming requests to match, which serves as a precondition for subsequent caching.
Common matching conditions are as follows:

  1. URL Path
    • Matching based on the URL path, such as a specific folder or file extension.
    • Supports wildcard and regular expression matching.
  2. CPU name
    • Match based on the requested host name.
  3. Query Parameters
    • Matching based on the request's query parameters allows cache control for requests with specific parameters.
  4. HTTP Request Methods
    • Match based on HTTP request methods such as GET, POST, etc.
  5. HTTP Request Header
    • Match based on specific request headers, such as User-Agent, Accept-Language, etc.
  6. Cookie
    • Matches against specific cookies included in the request.

Other optional judgment conditions include: referrer, SSL/HTTPS, user agent, X_Forwarded_For, etc.

Cache Eligibility

For access flows that match incoming requests that meet the settings, choose whether to bypass the cache or cache. There are two options:

  1. Bypassing the cache
    • Tell CF not to cache the origin server's response content to access requests that match the incoming request. This is usually used when the access request is for dynamic content or content that requires login access.
    • The only option is browser TTL
    image.png
  2. Eligible for caching
    • Let CF cache the origin server's response content to the access request that satisfies the incoming matching request, usually used for access requests to static content.
    • There are 6 options, among which Edge TTL and Browser TTL are the most commonly used. This article will use these two options as examples for demonstration:
    image.png

Note: Edge TTL refers to the existence time of static content in CF edge cache, and browser TTL refers to the existence time of static content in the browser's local cache. Generally speaking, in order to ensure the consistency of page content display, the browser TTL should be shorter than the edge TTL.

Cache Rules configuration logic

In the Free plan, there are 10 Cache Rules, which is enough for most people. However, there are a few points to note in the configuration logic.

1. The smaller the rule number, the higher the priority. Therefore, it is necessary to sort out the logical order of cache rules and put the ones with higher priority in front. Generally, the ones that need to bypass the cache are in front, as shown in the following figure:

image.png

2. The non-dynamic content part of the dynamic website can also be cached, but the dynamic content part needs to be excluded first in the rules of bypassing the cache part with higher priority.
3. In the traffic sequence, Cache Rules have a higher priority. Therefore, if an access request needs to be processed by a node function with a lower priority in the traffic sequence (such as Workers), for the cache rules, it is necessary to ensure that the access request hits the "bypass cache" of the cache rule instead of hitting the "meet the cache condition".
4. If there are multiple subdomain websites under the hosted second-level domain name, and there are both dynamic and static content, then when creating cache rules, it is best to add the host name as a restriction condition to avoid unclear cache logic.

Cache Rules Configuration Example

Static Website

My group site navigation page URLwww.tangwudi.comFor example, assuming that the image hosting URL isimage.tangwudi.com, the API path used iswww.tangwudi.com/api*According to the configuration logic of the cache rules mentioned above, it should be set as follows:
1. API is dynamic content and needs to bypass cache

image.png

2. Cache the images in the image hosting service
image.png

image.png

Note: Edge TTL and Browser TTL are optional. The default setting is 2 hours. For static content such as images, whether to set TTL or how long to set it depends on the type of website, mainly the frequency of image updates. For my group site navigation homepage, the images basically do not move, so it is completely possible to set it as long as you want. For example, it is fine to set the edge TTL directly to 1 year. As for the browser TTL, in theory it can be set to 1 year, but it is not necessary. To be on the safe side, 1 month is enough. The same is true for static personal blog type websites.
3. Caching of other static content (mainly HTML)
image.png

Edge TTL is the same as browser TTL and image cache:
image.png

Final result:
image.png


Note 1: The TTL (edge TTL and browser TTL) of the image cache and HTML cache in this example are the same. In theory, the two rules can be combined into one. However, this is only due to the particularity of this example. Generally speaking, the possibility that the TTL of the image cache is the same as the TTL of the HTML cache is not great, so it is reasonable to use two "cache-eligible" rules separately. Different TTL cache content needs to be set with different rules. As for how many cache rules are ultimately needed, it depends on the sophistication of the website content in terms of cache control.
Note 2: Please change the host name according to the actual environment.blog.tangwudi.com" with your actual hostname.


Dynamic Website

My blog is built using WordPress, which is also a representative of dynamic blog types, so I will use my blog URLblog.tangwudi.comFor example, in this example, it is assumed that the image is uploaded to the media library of WordPress itself, and no external image hosting is used. Due to the characteristics of the dynamic content of the WordPress site, such as specific URIs (content starting with "/wp-admin", "/wp-login", "/wp-comment", etc.) and requests containing specific cookie fields (wp-, logged_in, wordpress, comment_, woocommerce_) cannot be cached. Therefore, in order to ensure accuracy, a hierarchical rule matching method is used. The configuration recommended rules are as follows:

1. Bypass the cache of access requests for special URIs (matched at the URI level) that contain dynamic content in the access link

Pay attention to the operators of the rules. The operators for the URI paths are all "contains" (this is for laziness. Of course, you can also choose the operator "starts with", and then write it in the format, for example, "starts with" is followed by "/wp-admin"):
image.png

The expression is:

(http.host eq "blog.tangwudi.com" and http.request.uri.path contains "wp-admin") or (http.host eq "blog.tangwudi.com" and http.request.uri.path contains " wp-login") or (http.host eq "blog.tangwudi.com" and http.request.uri.path contains "wp-comment") or (http.host eq "blog.tangwudi.com" and http.request .uri.path contains "?s=") or (http.host eq "blog.tangwudi.com" and http.request.uri.path contains "xmlrpc") or (http.host eq "blog.tangwudi.com" and http.request.uri.path contains "preview=true")

2. Bypass the cache of access requests containing specific characters in the cookie when accessing WordPress

Note that the operator for "Cookie" in each rule is inclusive:
image.png

The expression is:

(http.host eq "blog.tangwudi.com" and http.cookie contains "_logged_in_") or (http.host eq "blog.tangwudi.com" and http.cookie contains "wordpress") or (http.host eq " blog.tangwudi.com" and http.cookie contains "comment_") or (http.host eq "blog.tangwudi.com" and http.cookie contains "woocommerce_")

3. Bypass pages ending with .php

image.png

Note: In fact, the above two rules are sufficient to cover dynamic content, this one can only be regarded as a patch.

4. Cache image content

image.png

Note: For the settings of edge TTL and browser TTL in this part, refer to the description in the previous part of the article.

5. Cache HTML
image.png

expression:

(http.host eq "blog.tangwudi.com" and ends_with(http.request.uri.path, ".html") and not http.cookie contains "_logged_in_" and not http.cookie contains "wordpress" and not http. cookie contains "comment_" and not http.cookie contains "woocommerce_") or (http.host eq "blog.tangwudi.com" and ends_with(http.request.uri.path, ".htm") and not http.cookie contains "_logged_in_" and not http.cookie contains "wordpress" and not http.cookie contains "comment_" and not http.cookie contains "woocommerce_")

Note: For the settings of edge TTL and browser TTL in this part, refer to the description in the previous part of the article.

6. Cache other content

image.png

This is mainly for content other than HTML, such as css files.

Note: For the settings of edge TTL and browser TTL in this part, refer to the description in the previous part of the article.

The final configuration is as follows:
image.png

Note 1: In fact, of the 4th, 5th and 6th rules, you can just keep the 6th one and delete the 4th and 5th ones directly. The effect is the same. However, the reason for separating them is to reduce the risk and allow the edge TTL and browser TTL to set their own appropriate values according to different types of cache.

Note 2: In fact, the URI part and the cookie part of the bypass cache rule overlap each other. Since I don't know the details of WordPress well enough, I wrote two rules for safety, focusing on one. In fact, there must be many repeated rules, such as bypassing the cookie part_logged_in_andWordPressThese two are a bit repetitive, but I read some information and it sayswordpress_logged_in,Just a minutewp_loggen_inI am too lazy to go into it, so I might as well write both of them down. It will be a bit of duplication at most, but it is safe.

Note 3: There are many types of websites built using WordPress. The dynamic content of different types of sites may have their own unique URI or unique cookie value, so you may need to modify the content of the URI part and the cookie part according to the characteristics of your own site.

Note 4: Although the above configuration takes WordPress as an example, the configuration ideas of other dynamic blogs are the same. Just modify them according to the actual situation.

Note 5: Please change the host name according to the actual environment.blog.tangwudi.com" with your actual hostname.


In addition to Cache Rules, there is another function item that can set CF cache behavior, which is "Page Rules", and the processing priority of Page Rules in the traffic sequence is higher than Cache Rules:
image.png

However, it is generally not recommended to use "page rules" for caching for the following reasons:

1. Page rules only have 3 quotas (Free plan), which is too few. In addition to the caching function, page rules can also achieve many other functions, as follows:

image.png

image.png

image.png

Therefore, page rules should be used to implement more complex and compound requirements, or for temporary processing of some special traffic that requires higher priority. It would be a waste to only use them to implement functions that can be completed by Cache Rules (after all, Cache Rules has a limit of 10 rules, which is much more relaxed than the 3 rules of page rules).

2. The matching condition of the page rule can only use the URL, which is too simple:

image.png

Unlike Cache Rules, "Request Incoming Match" supports multiple conditions and combinations of or and and.

3. The maximum edge TTL is one month, and the Cache Rules can be set to 1 year:
image.png

4. In fact, another reason is the attitude towards the future of CF. Around June, the page rules were actually announced to be abolished, but I don’t know why the abolition notice was withdrawn later. I don’t know what will happen if it is not cleared. So don’t use it if you can. Even if you have to use it, try to use it for temporary processing. Usually, try to use the existing node functions in the traffic sequence to complete it. This is the safest.


Other options in the Cache section

In the "Cache" section, there are several options that you can configure according to your needs.

clear cache

image.png

Clearing cache files forces CF to pull the latest versions of these files from your web server.

You can select "Custom Cleanup" to clean specific URLs (Free plans can only clean resources that fully match the URL, and do not support*Wildcard characters):

image.png

You can also choose to "clear everything":
image.png

Cache Levels

image.png

The cache level determines how much of the website's static content CF should cache. CF's CDN caches static content according to the following levels:

Unless you have special requirements, just keep the default "Standard".

Browser TTL

image.png

This is actually the default browser TTL value. If the access request does not match the browser TTL value set in the cache-related rules, or the browser TTL value set by the source server is less than this value, the value here will be used. You can set it according to your needs.

Crawler Hints

image.png

After careful study, I found this feature to be quite interesting: when the content of a website hosted on CF changes, this feature will record these changes, and then provide prompts to search engines and other legitimate crawlers about the frequency and importance of content changes, helping crawlers to arrange crawling plans more intelligently, giving priority to crawling frequently changing or important content, ensuring that the information indexed by the search engine is the latest. Because of the improved crawling efficiency, unnecessary crawling of the source site is reduced, indirectly reducing the load on the source site server.

Another: Why do you feel that if Blog Garden had used CF and turned on this function, Baidu spider crawler would not have crashed Blog Garden?

This feature is turned off by default and is not dangerous, so you can keep it turned on.

Always Online™

image.png

Simply put, when the source site fails, if the content of the link page accessed by the user is in the cache, CF will directly return the content to the user and display a message at the top of the page: There is a problem with the source site, this is just cached content.

The key problem is that this feature is only effective for static content that can be cached. Dynamic content and interactive features (such as user login, shopping cart, etc.) will not work properly when the source server is unavailable. So this feature is very useful for static websites, but it is better than nothing for dynamic websites (I have encountered problems with the WordPress source station. Some pages can be opened, but the layout and color do not look normal, so it is better than nothing).

This feature is not dangerous and can be left enabled.

Development Mode

image.png

When this mode is turned on, all requests bypass CF's cache and pass directly to the origin server. This feature is very useful when you need to see the changes you make instantly. Once enabled, if not turned off manually, development mode will last for three hours and then turn off automatically.

Tiered Cache

image.png

Simply put, if this function is not enabled, any CF data center in the world can directly initiate a back-to-origin request to the source server, even if the data center is far away from the source server.

After enabling this function, CF will divide the global data centers into two layers, upper and lower, and then use Argo performance and routing data to dynamically find a single best upper layer for the source server (the best upper layer is closest to the source server and theoretically has the highest back-to-source efficiency). Only the upper layer can initiate back-to-source requests to the source server, while the lower layer can only query the upper layer.

This feature is not dangerous and can be left on.

How to verify cache status

Verification of static sites

To verify the cache status of a general static website (taking www.tangwudi.com as an example), you can follow the following process to verify.

Open the Chrome browser (or Edge), press "F12" to enter the developer tools, click the "Network" tab, and select "All" for type:

image.png

Type in the browser address barwww.tangwudi.comand press Enter, then right-click on the "Reload" button and select "Clear Cache and Hard Reload":
image.png

Select the name on the left.www.tangwudi.comIn the "Header" section on the right, in the "Response Header" section, you can check whether the cache is successful by clicking on "cf-cache-status":
image.png

Dynamic site verification

Using my temporary test siteblog.tangwudi.xyzFor example:
image.png

Note 1: In general, if Cache Rules (or page rules) are used to control CDN to complete caching, cf-cache-status usually has three results: HIT (complete caching), DYNAMIC (partial caching, usually there is no special Cache Rules configured and only the default caching rules of CF are triggered, or after logging into a dynamic website, the keywords in the cookie match the "bypass cache" policy, so only a portion of the cache is cached), BYPASS (bypass, indicating that a cache rule setting bypasses the cache of this address).

Note 2: The above verification method is only suitable for the verification of cache effects implemented by conventional cache strategies set by Cache Rules or page rules. It is not suitable for the verification of intelligent cache effects implemented by Workers or APO. The verification of intelligent cache effects will be discussed in the relevant tutorials later.

Note 3: For dynamic websites (such as WordPress), the regular caching effect achieved by using Cache Rules or page rules is similar to the "common people's shooting method" of Sakuragi Hanamichi in Slam Dunk. The effect is average and cannot take advantage of some advanced features of CF Enterprise Edition users. Therefore, it can only be used as a makeshift method. To achieve an effect similar to "Slam Dunk", it is necessary to rely on Workers or APO, which will be written in subsequent articles.

Afterword

Actually, I originally planned to write about "Cache Rules, Redirection, Page Rules" together in this article, because I thought there was not much to write about Cache Rules alone, and it could be finished in a few sentences. However, I didn't expect that the more I wrote, the more I wrote... I had to write it in a separate article.

But if you think about it carefully, this is as it should be: for most people, the three functions of WAF, DDoS and Cache Rules combined can solve the most common access and security problems. So, if the first, second and third steps of the tutorial are the "foundation-building" trilogy, then the fourth, fifth and sixth steps can be called the "golden elixir" trilogy. Knowing how to use them is enough to protect yourself when you venture into the world.

The content of the blog is original. Please indicate the source when reprinting! For more blog articles, you can go toSitemapUnderstand. The RSS address of the blog is:https://blog.tangwudi.com/feed, welcome to subscribe; if necessary, you can joinTelegram GroupDiscuss the problem together.

Comments

  1. 404
    Windows Chrome 131.0.0.0
    3 weeks ago
    2025-1-04 9:50:54

    感觉cookie可以换成wordpress_sec,这样作为博客所有者自己能在退出后也能看到缓存界面,如果按照上面设置博客所有者只要点击了login界面就不能缓存了。

    • Owner
      404
      Macintosh Chrome 131.0.0.0
      3 weeks ago
      2025-1-04 14:46:58

      这个我倒是没有详细测试,不过的确可能,因为我在测试使用缓存规则缓存wordpress时,的确也感觉访问后台速度不快,有时还有些问题,只是我没有太在意,因为我平时的管理是直接访问wordpress进行管理,并未通过CDN缓存(我一直推荐wordpress站点对外提供访问的方式和平时的管理方式分离),这样在设置WAF策略的时候也可以更加严格。谢谢你的建议,我之后有空的时候会来优化一下缓存规则的策略。

Send Comment Edit Comment


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠(ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ°Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
Emoticons
Emoji
Little Dinosaur
flower!
Previous
Next
       

This site has disabled the right mouse button and various shortcut keys. The code block content can be copied directly by clicking the copy button in the upper right corner

en_US