Django源码剖析(02)--缓存

Django所提供的@cache_page装饰器以及缓存中间件的代码还是比较有意思的, 值得一读.

1. cache_page装饰器

源码如下:

1
2
3
4
5
6
7
8
9
10
11
def cache_page(*args, **kwargs):
if len(args) != 1 or callable(args[0]):
raise TypeError("cache_page has a single mandatory positional argument: timeout")
cache_timeout = args[0] # 300
cache_alias = kwargs.pop('cache', None) # None
key_prefix = kwargs.pop('key_prefix', None) # None
if kwargs:
raise TypeError("cache_page has two optional keyword arguments: cache and key_prefix")
return decorator_from_middleware_with_args(CacheMiddleware)(
cache_timeout=cache_timeout, cache_alias=cache_alias, key_prefix=key_prefix
)

其实这里我觉得使用*args这个变长非命名参数是不大合适的, 因为不管传了什么非命名参数, 也只是使用了一个过期时间而已.从cache_timeout = args[0]可以很明显的看出来.而且还在这之前做了一下参数长度判断, emmm…..有种一言难尽的感觉.
余下的就是从kwargs字典中取出想要的参数数据, 最重要的就是return中的内容.

2. decorator_from_middleware_with_args

decorator_from_middleware_with_args这个函数并没有做太多的事情:

1
2
def decorator_from_middleware_with_args(middleware_class):
return make_middleware_decorator(middleware_class)

3. make_middleware_decorator

再来看make_middleware_decorator函数,这个函数真的非常有意思:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def make_middleware_decorator(middleware_class):
def _make_decorator(*m_args, **m_kwargs):
middleware = middleware_class(*m_args, **m_kwargs)
# m_kwargs = {'key_prefix': None, 'cache_timeout': 300, 'cache_alias': None}
def _decorator(view_func):
@wraps(view_func, assigned=available_attrs(view_func))
def _wrapped_view(request, *args, **kwargs):
if hasattr(middleware, 'process_request'):
result = middleware.process_request(request)
if result is not None:
return result
if hasattr(middleware, 'process_view'):
result = middleware.process_view(request, view_func, args, kwargs)
if result is not None:
return result
try:
response = view_func(request, *args, **kwargs)
except Exception as e:
if hasattr(middleware, 'process_exception'):
result = middleware.process_exception(request, e)
if result is not None:
return result
raise
if hasattr(response, 'render') and callable(response.render):
if hasattr(middleware, 'process_template_response'):
response = middleware.process_template_response(request, response)
if hasattr(middleware, 'process_response'):
def callback(response):
return middleware.process_response(request, response)
response.add_post_render_callback(callback)
else:
if hasattr(middleware, 'process_response'):
return middleware.process_response(request, response)
return response
return _wrapped_view
return _decorator
return _make_decorator

这个”巨型闭包”最开始看上去完全不知道在做些什么, 只能运行时断点调试了.
一层一层的来吧, 首先是decorator_from_middleware_with_args(CacheMiddleware)函数调用, 返回的是make_middleware_decorator(middleware_class), 第一次执行make_middleware_decorator函数, 返回_make_decorator函数句柄, 那么也就是说decorator_from_middleware_with_args(CacheMiddleware)这个函数返回的是_make_decorator
继续调用

1
_make_decorator(cache_timeout=cache_timeout, cache_alias=cache_alias, key_prefix=key_prefix)

函数_make_decorator是这样的:

1
2
3
4
5
6
7
def _make_decorator(*m_args, **m_kwargs):
middleware = middleware_class(*m_args, **m_kwargs)
# m_kwargs = {'key_prefix': None, 'cache_timeout': 300, 'cache_alias': None}
def _decorator(view_func):
...
return _wrapped_view
return _decorator

实例化了middleware_class, 而middleware_class在这里为CacheMiddleware, 其中m_args为空元组, m_kwargs为函数_make_decorator中的参数.最终函数返回_decorator函数句柄.
所以最终在我们的视图函数中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
response_dict = {
"code": 0,
"msg": "success",
"data": [
{
"title": "测试缓存代码"
},
]
}


@cache_page(60 * 5)
def test_cache(request):
if request.method == "GET":
return HttpResponse(json.dumps(response_dict))

# test_cache = _decorator(test_cache)

继续执行_decorator(test_cache), 最终得到了_wrapped_view函数句柄.装饰器@wraps保留了原有视图函数的相关信息.
到这里, 装饰器的作用就结束了, 装饰完毕, 就剩下请求了.test_cache视图函数最终变成了_wrapped_view函数.

django.core.handlers.base.BaseHandler中有一个私有方法_get_response(self, request),

1
2
3
4
5
6
7
8
9
10
11
resolver_match = resolver.resolve(request.path_info)
callback, callback_args, callback_kwargs = resolver_match
request.resolver_match = resolver_match

if response is None:
wrapped_callback = self.make_view_atomic(callback)
try:
# 在这里执行视图函数, 传入request对象以及一些其它参数
response = wrapped_callback(request, *callback_args, **callback_kwargs)
except Exception as e:
response = self.process_exception_by_middleware(e, request)

而此时test_cache已经不是原来的test_cache了, 它变了, 变得膨胀了.现在是_wrapped_view.现在执行该函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def _wrapped_view(request, *args, **kwargs):
if hasattr(middleware, 'process_request'):
# 执行process_request
result = middleware.process_request(request)
if result is not None:
return result
if hasattr(middleware, 'process_view'):
# 执行process_view
result = middleware.process_view(request, view_func, args, kwargs)
if result is not None:
return result
try:
# 执行视图函数
response = view_func(request, *args, **kwargs)
except Exception as e:
if hasattr(middleware, 'process_exception'):
result = middleware.process_exception(request, e)
if result is not None:
return result
raise
if hasattr(response, 'render') and callable(response.render):
... # 该分支主要与Template有关, 不在我们的考虑范围内
else:
if hasattr(middleware, 'process_response'):
return middleware.process_response(request, response)
return response

现在来看CacheMiddleware代码, 其实也只是一些实例属性的赋值而已

1
2
3
4
5
6
7
class CacheMiddleware(UpdateCacheMiddleware, FetchFromCacheMiddleware):
def __init__(self, get_response=None, cache_timeout=None, **kwargs):
self.get_response = get_response
self.key_prefix = key_prefix
self.cache_alias = cache_alias
self.cache_timeout = cache_timeout
self.cache = caches[self.cache_alias]

上面的代码做了一定程度的简化, 总得来说就是赋值.更加关键的是UpdateCacheMiddleware以及FetchFromCacheMiddleware
这两个Middleware均继承自MiddlewareMixin, MiddlewareMixin没有什么特别的地方, 定义了一些基本的方法.
UML图如下:
Alt text
可以很清晰的看到, Fetch处理request, Update处理response.再来看一下官网给出的缓存策略:

1
2
3
4
5
6
7
given a URL, try finding that page in the cache
if the page is in the cache:
return the cached page
else:
generate the page
save the generated page in the cache (for next time)
return the generated page

代码与策略一致.

  • process_request
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def process_request(self, request):
# 对于POST, PUT, PATCH, DELETE等对数据进行修改的请求, 不做缓存处理, 即直接怼到视图函数中
if request.method not in ('GET', 'HEAD'):
request._cache_update_cache = False
return None # Don't bother checking the cache.

# try and get the cached GET response
# 在这个地方就会获取到缓存的key
"""
在这里需要进行一个补充说明, 在get_cache_key方法中, 有这样一段代码
headerlist = cache.get(cache_key)
if headerlist is not None:
return _generate_cache_key(request, method, headerlist, key_prefix)
else:
return None
那么也就是在这个函数中去尝试获取缓存数据, 如果没有, 则返回None
"""
cache_key = get_cache_key(request, self.key_prefix, 'GET', cache=self.cache)
if cache_key is None:
# 在第一次请求的时候, cache_key == None, 并且将某一个标志位置为True
# 第一次请求在这里结束, 回到_wrapped_view函数中
request._cache_update_cache = True
return None # No cache information available, need to rebuild.
response = self.cache.get(cache_key)
# if it wasn't found and we are looking for a HEAD, try looking just for that
if response is None and request.method == 'HEAD':
cache_key = get_cache_key(request, self.key_prefix, 'HEAD', cache=self.cache)
response = self.cache.get(cache_key)

if response is None:
request._cache_update_cache = True
return None # No cache information available, need to rebuild.

# hit, return cached response
request._cache_update_cache = False
return response

在前面的UML图中可以看到, 中间件并没有process_view方法, 所以直接执行视图函数, 得到response, 之后进入process_response方法中对response进行处理.

  • process_response
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def process_response(self, request, response):
"""Sets the cache, if needed."""
if not self._should_update_cache(request, response):
# We don't need to update the cache, just return.
return response
# 错误响应码直接返回, 没有缓存的价值
if response.streaming or response.status_code not in (200, 304):
return response

# Don't cache responses that set a user-specific (and maybe security
# sensitive) cookie in response to a cookie-less request.
if not request.COOKIES and response.cookies and has_vary_header(response, 'Cookie'):
return response

# Try to get the timeout from the "max-age" section of the "Cache-
# Control" header before reverting to using the default cache_timeout
# length.
timeout = get_max_age(response)
if timeout is None:
timeout = self.cache_timeout
elif timeout == 0:
# max-age was set to 0, don't bother caching.
return response
patch_response_headers(response, timeout)
if timeout and response.status_code == 200:
cache_key = learn_cache_key(request, response, timeout, self.key_prefix, cache=self.cache)
if hasattr(response, 'render') and callable(response.render):
... # 与Template相关, pass掉
else:
# 进行缓存的设置, key-value-expireTime
self.cache.set(cache_key, response, timeout)
return response

最终我们的cache_key

1
'views.decorators.cache.cache_page..GET.15e585e58b05970a7be785828893971e.d41d8cd98f00b204e9800998ecf8427e.en-us.UTC'

这两串儿数字分别代表什么?
首先来看learn_cache_key:

1
2
3
4
5
6
7
8
9
10
11
def learn_cache_key(request, response, cache_timeout=None, key_prefix=None, cache=None):
# 初始化设置, 代码略去
# 关键代码
cache_key = _generate_cache_header_key(key_prefix, request)
if response.has_header('Vary'):
... # 这里的函数段暂时也可以不管
else:
# 这里已经将header_key进行了缓存, 只不过value为空list而已
cache.set(cache_key, [], cache_timeout)
# 在这里生成真正的数据缓存key
return _generate_cache_key(request, request.method, [], key_prefix)

  • _generate_cache_header_key
1
2
3
4
5
6
def _generate_cache_header_key(key_prefix, request):
"""Returns a cache key for the header cache."""
url = hashlib.md5(force_bytes(iri_to_uri(request.build_absolute_uri())))
cache_key = 'views.decorators.cache.cache_header.%s.%s' % (
key_prefix, url.hexdigest())
return _i18n_cache_key_suffix(request, cache_key)

iri_to_uri函数主要将UTF-8编码或者Unicode编码转换为ASCII编码.'/I ♥ Django/'将会被转换为/I%20%E2%99%A5%20Django/.request.build_absolute_uri()函数将会返回完整的请求url: http://localhost:6060/cache.
此时cache_key值为: views.decorators.cache.cache_header..15e585e58b05970a7be785828893971e
最终cache_key值为: 'views.decorators.cache.cache_header..15e585e58b05970a7be785828893971e.en-us.UTC'
多了一个小尾巴.
继续调用_generate_cache_key函数生成数据的cache_key, 原理和生成header_cache_key基本相同, 都是md5值的一个拼接.最终得到了这样的cache_key
views.decorators.cache.cache_page..GET.15e585e58b05970a7be785828893971e.d41d8cd98f00b204e9800998ecf8427e.en-us.UTC
d41d8cd98f00b204e9800998ecf8427e这一串儿的值为hashlib.md5().hexdigest(), 而前面一串儿的值就是urlmd5值.
最终在Redis中:

1
2
3
127.0.0.1:6379[1]> keys *
1) ":1:views.decorators.cache.cache_header..15e585e58b05970a7be785828893971e.en-us.UTC"
2) ":1:views.decorators.cache.cache_page..GET.15e585e58b05970a7be785828893971e.d41d8cd98f00b204e9800998ecf8427e.en-us.UTC"

需要注意的是Django的缓存在Redis中默认是使用1号库的, 并不是0号库.
当我们下次请求时, process_request就能够拿到cache_key, 不为None, 直接返回给客户端.

那么到这里, 最简单的缓存获取与缓存设置流程就结束了.整体的流程很清晰, 涉及的代码并不是特别的复杂与难懂.


4. Session

在一个商城系统中(似乎商城真的是很好举例啊…), 会有用户的订单信息, 算是高频访问-低频修改的数据, 可以使用Cache-Aside的模式来进行缓存更新, 保证数据的一致性.
那么在这个场景下, 就需要根据用户的Cookie或者是Token来进行专门的缓存.
设想一下如果这个功能需要我们自己来完成, 该如何实现?
CookieToken都是存放于headers中的, 那么根据请求的headers.Cookie或者Token来进行专门的cache_key生成, 这样的话在Fetch的时候就会根据携带的认证信息来取出不同的缓存.以达到公/私缓存的区分.
Django中实现的原理基本类似, 关键函数patch_vary_headers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
response_dict = {
"code": 0,
"msg": "success",
"data": [
{
"title": "测试缓存代码"
},
]
}


@cache_page(60 * 5)
def test_cache(request):
if request.method == "GET":
response = HttpResponse(json.dumps(response_dict))
patch_vary_headers(response, ["Cookie"])
return response

看看该函数做了什么:

1
2
3
4
5
6
7
8
9
def patch_vary_headers(response, newheaders):
if response.has_header('Vary'):
vary_headers = cc_delim_re.split(response['Vary'])
else:
vary_headers = []
existing_headers = set(header.lower() for header in vary_headers)
additional_headers = [newheader for newheader in newheaders
if newheader.lower() not in existing_headers]
response['Vary'] = ', '.join(vary_headers + additional_headers)

断点进行调试, 在默认的response中, 可以看到是没有Vary这个属性的.newheaders参数就是我们传入的["Cookie"], 是一个List.
然后执行join操作, 将列表转换为字符串, 以,进行连接.
那么Vary这个属性是在哪儿使用的? 在上面的源码分析中, 我们知道是在learn_cache_key这个函数中生成的cache_key, 再看代码.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
if response.has_header('Vary'):
is_accept_language_redundant = settings.USE_I18N or settings.USE_L10N
headerlist = []
for header in cc_delim_re.split(response['Vary']):
header = header.upper().replace('-', '_')
if header == 'ACCEPT_LANGUAGE' and is_accept_language_redundant:
continue
# 在这里做了一下`HTTP_`转义, 进行标准化操作.
headerlist.append('HTTP_' + header)
headerlist.sort() # 排序 ['HTTP_COOKIE'] 测试用例中只有1个参数, 也没啥好排的
cache.set(cache_key, headerlist, cache_timeout)
return _generate_cache_key(request, request.method, headerlist, key_prefix)
else:
# if there is no Vary header, we still need a cache key
# for the request.build_absolute_uri()
cache.set(cache_key, [], cache_timeout)
return _generate_cache_key(request, request.method, [], key_prefix)

# 函数_generate_cache_key
def _generate_cache_key(request, method, headerlist, key_prefix):
"""Returns a cache key from the headers given in the header list."""
ctx = hashlib.md5()
for header in headerlist:
"""
headerlist = ['HTTP_COOKIE']
这里在request.META中准备去取出我们所定义的header, 那么也算是一种验证, 如果请求头压根儿没有这个字段的话, 直接作为公共缓存使用.
在该实例中取出的为cookie
"""
value = request.META.get(header)
if value is not None:
ctx.update(force_bytes(value))
url = hashlib.md5(force_bytes(iri_to_uri(request.build_absolute_uri())))
cache_key = 'views.decorators.cache.cache_page.%s.%s.%s.%s' % (
key_prefix, method, url.hexdigest(), ctx.hexdigest()) # 在这里将cookie的md5值作为cache_key的一部分写入, 达到了区分的作用.
return _i18n_cache_key_suffix(request, cache_key)

Alt text

可以看到返回的headers中多了一个字段Vary, 下次请求时, 浏览器携带该字段, process_request函数同样会对该字段进行相应的处理(因为它们调的是同一个函数, 没有理由不一样), 从而取出准确的cache_key.
并且从上面的代码分析可以看到, 拓展性非常好.如果是Token认证的话, 把Cookie换成Token就好了.