thundering herd best practises

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

thundering herd best practises

Mateusz Zajakala
Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.
Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda
There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.


Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Miles Libbey
Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation


miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.




Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation


miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.






Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Mateusz Zajakala
Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.







Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.









Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Mateusz Zajakala
Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.










Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.












Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Mateusz Zajakala
Still, your comments are very helpful and much appreaciated! Your explanation is interesting, however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time milliseconds before checking if the object can be served from cache. If the object is still being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries times."

So I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining the read lock (because "dirent" is available, what is dirent?). This could explain the leakage, but then I don't understand under what circumstances "open_read_retry" would kick in (if at all)...

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <[hidden email]> wrote:
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.













Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda

A "dirent" (proxy.process.cache.direntries.used) is basically the index for the object's location in the cache (similar to inode).

The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created).

The behavior you described ("I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin") is precisely what read-while-writer (rww) does, but, like I wrote in the last email, it doesn't kick in, until the object's response headers are validated. There's a small window before rww kicks in, during which one of the following could occur for multiple concurrent requests for the same object:

  a) open read fails --> open_read_retry would help in this case
  b) open read is successful, but, open write fails, 
       *) rww is not kicked in yet --> use open_write_fail_action (max_open_write_retries may also help (not sure)). 
       *) rww kicks in ----> use rww to collapse the connections.


Thanks,

Sudheer



On Friday, July 10, 2015 9:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Still, your comments are very helpful and much appreaciated! Your explanation is interesting, however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time milliseconds before checking if the object can be served from cache. If the object is still being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries times."

So I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining the read lock (because "dirent" is available, what is dirent?). This could explain the leakage, but then I don't understand under what circumstances "open_read_retry" would kick in (if at all)...

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <[hidden email]> wrote:
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.















Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Sudheer Vinukonda
"The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created)."

Correction/Clarification - Before a dirent is created, the scenario should be just a regular cache miss - Open read retry is not performed on a regular cache miss. 

The scenario when the open read retry is applied is slightly more subtle, in that the dirent is created, but, the open read fails due to either write_vc still not closed. TS-3622 and TS-3767 are important fixes to include related to these scenarios. Without these fixes, we've observed the read-while-write can get stuck indefinitely (until an eventual inactivity timer at the txn level occurs, which could be quite far away). 

Thanks,

Sudheer 



On Friday, July 10, 2015 9:51 AM, Sudheer Vinukonda <[hidden email]> wrote:



A "dirent" (proxy.process.cache.direntries.used) is basically the index for the object's location in the cache (similar to inode).

The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created).

The behavior you described ("I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin") is precisely what read-while-writer (rww) does, but, like I wrote in the last email, it doesn't kick in, until the object's response headers are validated. There's a small window before rww kicks in, during which one of the following could occur for multiple concurrent requests for the same object:

  a) open read fails --> open_read_retry would help in this case
  b) open read is successful, but, open write fails, 
       *) rww is not kicked in yet --> use open_write_fail_action (max_open_write_retries may also help (not sure)). 
       *) rww kicks in ----> use rww to collapse the connections.


Thanks,

Sudheer



On Friday, July 10, 2015 9:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Still, your comments are very helpful and much appreaciated! Your explanation is interesting, however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time milliseconds before checking if the object can be served from cache. If the object is still being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries times."

So I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining the read lock (because "dirent" is available, what is dirent?). This could explain the leakage, but then I don't understand under what circumstances "open_read_retry" would kick in (if at all)...

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <[hidden email]> wrote:
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.

















Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Torluemke, Mark
Hi Mat,

I don’t think I saw it mentioned below, so I have to bring it up — read_while_writer does not work properly in ATS v3.0.x. Additionally, you really want the other concurrency fixes that I believe just went into ATS v5.3.0. All the other advice Sudheer gave is good, but we’ve found that RWW functioning properly makes the biggest difference.

Cheers,
Mark

From: Sudheer Vinukonda <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, July 15, 2015 at 2:11 PM
To: "[hidden email]" <[hidden email]>
Subject: Re: thundering herd best practises

"The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created)."

Correction/Clarification - Before a dirent is created, the scenario should be just a regular cache miss - Open read retry is not performed on a regular cache miss. 

The scenario when the open read retry is applied is slightly more subtle, in that the dirent is created, but, the open read fails due to either write_vc still not closed. TS-3622 and TS-3767 are important fixes to include related to these scenarios. Without these fixes, we've observed the read-while-write can get stuck indefinitely (until an eventual inactivity timer at the txn level occurs, which could be quite far away). 

Thanks,

Sudheer 



On Friday, July 10, 2015 9:51 AM, Sudheer Vinukonda <[hidden email]> wrote:



A "dirent" (proxy.process.cache.direntries.used) is basically the index for the object's location in the cache (similar to inode).

The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created).

The behavior you described ("I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin") is precisely what read-while-writer (rww) does, but, like I wrote in the last email, it doesn't kick in, until the object's response headers are validated. There's a small window before rww kicks in, during which one of the following could occur for multiple concurrent requests for the same object:

  a) open read fails --> open_read_retry would help in this case
  b) open read is successful, but, open write fails, 
       *) rww is not kicked in yet --> use open_write_fail_action (max_open_write_retries may also help (not sure)). 
       *) rww kicks in ----> use rww to collapse the connections.


Thanks,

Sudheer



On Friday, July 10, 2015 9:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Still, your comments are very helpful and much appreaciated! Your explanation is interesting, however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time milliseconds before checking if the object can be served from cache. If the object is still being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries times."

So I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining the read lock (because "dirent" is available, what is dirent?). This could explain the leakage, but then I don't understand under what circumstances "open_read_retry" would kick in (if at all)...

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <[hidden email]> wrote:
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at <a rel="nofollow" shape="rect" target="_blank" href="https://cwiki.apache.org/confluence/display/TS/Stale-While-Revalidate&#43;in&#43;the&#43;core" style="background-color:rgb(255,255,255);">Stale-While-Revalidate-in-the-core. 

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.

















Reply | Threaded
Open this post in threaded view
|

Re: thundering herd best practises

Mateusz Zajakala

Ok thx. From what I read it seems clear that first we need to upgrade to 5.3.0 and then do the testing.

Thanks to all for advice, I'll follow up with our results on latest ATS.

On Jul 17, 2015 3:28 PM, "Torluemke, Mark" <[hidden email]> wrote:
Hi Mat,

I don’t think I saw it mentioned below, so I have to bring it up — read_while_writer does not work properly in ATS v3.0.x. Additionally, you really want the other concurrency fixes that I believe just went into ATS v5.3.0. All the other advice Sudheer gave is good, but we’ve found that RWW functioning properly makes the biggest difference.

Cheers,
Mark

From: Sudheer Vinukonda <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, July 15, 2015 at 2:11 PM
To: "[hidden email]" <[hidden email]>
Subject: Re: thundering herd best practises

"The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created)."

Correction/Clarification - Before a dirent is created, the scenario should be just a regular cache miss - Open read retry is not performed on a regular cache miss. 

The scenario when the open read retry is applied is slightly more subtle, in that the dirent is created, but, the open read fails due to either write_vc still not closed. TS-3622 and TS-3767 are important fixes to include related to these scenarios. Without these fixes, we've observed the read-while-write can get stuck indefinitely (until an eventual inactivity timer at the txn level occurs, which could be quite far away). 

Thanks,

Sudheer 



On Friday, July 10, 2015 9:51 AM, Sudheer Vinukonda <[hidden email]> wrote:



A "dirent" (proxy.process.cache.direntries.used) is basically the index for the object's location in the cache (similar to inode).

The settings for open_read_retry come into play only when open read fails (i.e. before the dirent for the cache object is created).

The behavior you described ("I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin") is precisely what read-while-writer (rww) does, but, like I wrote in the last email, it doesn't kick in, until the object's response headers are validated. There's a small window before rww kicks in, during which one of the following could occur for multiple concurrent requests for the same object:

  a) open read fails --> open_read_retry would help in this case
  b) open read is successful, but, open write fails, 
       *) rww is not kicked in yet --> use open_write_fail_action (max_open_write_retries may also help (not sure)). 
       *) rww kicks in ----> use rww to collapse the connections.


Thanks,

Sudheer



On Friday, July 10, 2015 9:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Still, your comments are very helpful and much appreaciated! Your explanation is interesting, however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time milliseconds before checking if the object can be served from cache. If the object is still being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries times."

So I'd expect the second Txn to see that there is write lock (so object is being fetched) and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining the read lock (because "dirent" is available, what is dirent?). This could explain the leakage, but then I don't understand under what circumstances "open_read_retry" would kick in (if at all)...

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <[hidden email]> wrote:
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache open read (basically, a simple lookup for the dirent). If the open read fails (on a cache miss), the Txn tries a open write (basically, gets the write lock for the object) and goes onto the origin to download the object. At this point the dirent for the object is created and the write lock held by this Txn. 

If a second request comes in at this point, the Txn associated with it tries an open read, and, it doesn't fail (since, the dirent is already available). However, then the object in cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn would then, simply disable cache and goes to the origin.  The logic for a cache stale is more or less similar. 

This is where the new feature "open_write_fail_action" comes into play, to either return an error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries and perhaps, that might make things better too.


Thanks,

Sudheer

Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something that may not be entirely accurate.




On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1) should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same file.

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one connection starts retrieval of origin data others should not repeat this. However, with very high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed to open read session to origin, so "open read" does not kick in? I have no idea how synchronization is done between multiple requests for the same file, but I imagine one of them has to start reading as the first one and this info would be available to others trying to read (and they would then be stopped on open_read_retry)? 




On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <[hidden email]> wrote:
You may want to read through the below:


"While some other HTTP proxies permit clients to begin reading the response immediately upon the proxy receiving data from the origin server, ATS does not begin allowing clients to read until after the complete HTTP response headers have been read and processed. This is a side-effect of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique to different client requests, ATS will not enable read-while-writer functionality until it has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers for an object are received and validated. For a live streaming scenario, this leaves a tiny window large enough (due to the large number of concurrent requests) to leak more than a single request to the origin, despite enabling read-while-writer. 

The open read retry settings do help to reduce this problem to a large extent, by attempting to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries> that can be tuned to further improve this situation.



However, despite all the above tuning, we still noticed multiple requests leaking (although significantly lower than without the tuning). Hence the need for the new feature Open Write Fail Action. With this setting, you can configure to return a 502 error on a cache miss, but, when there's an ongoing concurrent request for the same object. This lets the client (player) reattempt the request, by when the original concurrent request would have filled the cache. With this feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.


Thanks,

Sudheer









On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <[hidden email]> wrote:


Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds of bytes). I don't think we're having a problem with revalidation of these files. However sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are never revalidatoins, according to squid.blog these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.095 12 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -
1436442292.133 17 10.10.99.112 TCP_MISS/200 668669 GET http://origin-server.example.com/ehls/video/20150703T123156-01-143602692.ts - DIRECT/origin-server.example.com video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short time to process, they are separated in time, however all of them are TCP_MISS. With my setting I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it usually works (even with high loads), only once in a while we see more requests getting through to origin. When this happens origin slows down, procesing time is longer, more requests are TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..

Thanks
Mat

On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <[hidden email]> wrote:
I've updated the settings and the feature description in the relevant places. Also, it looks like these are available in 6.0.0 (and are not in 5.3.x).




Thanks,

Sudheer






On Thursday, July 9, 2015 10:44 AM, Miles Libbey <[hidden email]> wrote:


Thanks Sudheer-
I read through the comments in TS-3549, but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
 
 
 
 
 
 
HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it becomes stale in cache. records.config contains the settings:
Preview by Yahoo
 

 
 
 
 
 
 
records.config — Apache Traffic Server 6.0.0 documentation
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/) is a list of configurable variables used by the Traffic Server software.
Preview by Yahoo
 

miles




On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <[hidden email]> wrote:


There's no way to completely avoid multiple concurrent requests to the origin, without using something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate in the core - ASF JIRA. There are a number of timers and other settings that are relevant to the issues you mentioned (e.g TS-3622).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin. I've not used it myself (we have an internal more efficient version of the same plugin) but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any kind of request, let alone the HLS use case).

Thanks,

Sudheer








On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <[hidden email]> wrote:


Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio. However there are cases when increasing the number of incoming requests to ATS causes it to flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make ALL clients requesting a file (but the first one) wait until the first one retrieves headers and because of enable_read_while_writer they would then serve the retrieved file. However I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and origin server requests for the same file. I tried tweaking values of open_read timeout and retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can make client wait even 5s if necessary), but I don't want to hit origin. Is this possible? Do I need to adjust the settings? Or is there some reason that this cannot be achieved on high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine with 2x10Gbps eth. No observable load problems with >1K requests /s.