The slow response when accessing ASP.NET Core at the same time may be due to the behavior of the thread pool.

13 minute read

Introduction

Since the response of simultaneous access to ASP.NET Core is strangely slow, when I investigated the cause, it was a problem of starting the thread pool, more specifically, a problem of how to use asynchronous processing in the .NET environment, so make a note of it.

Experiment environment

macOS 10.16.6
ASP.NET Core 3.1
Web API created with the dotnet new webapi command
ʻAb -n 100 -c 100 URL` Measure the time of simultaneous 100 accesses with the ab (Apache Bench) command.

For synchronous processing

Prepare the following simple API processing.

[HttpGet]
[Route("sync")]
public string Sync()
{
    Stopwatch sw = new Stopwatch();
    sw.Start();
    Thread.Sleep(1000);
    sw.Stop();
    this._logger.LogInformation(sw.ElapsedMilliseconds + "ms");
    return "sync";
}

Since it sleeps for 1 second, the response should come back in 1 second.

result

ab -n 100 -c 100 http://localhost:5000/test/sync
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        Kestrel
Server Hostname:        localhost
Server Port:            5000

Document Path:          /test/sync
Document Length:        4 bytes

Concurrency Level:      100
Time taken for tests:   15.031 seconds
Complete requests:      100
Failed requests:        39
   (Connect: 0, Receive: 0, Length: 39, Exceptions: 0)
Total transferred:      8357 bytes
HTML transferred:       244 bytes
Requests per second:    6.65 [#/sec](mean)
Time per request:       15031.445 [ms](mean)
Time per request:       150.314 [ms](mean, across all concurrent requests)
Transfer rate:          0.54 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   1.8      5       8
Processing:  1020 11970 2007.4  13046   14009
Waiting:        0 7868 6438.3  13046   14009
Total:       1020 11975 2006.7  13050   14011

Percentage of the requests served within a certain time (ms)
  50%  13050
  66%  13052
  75%  13053
  80%  13053
  90%  13055
  95%  13100
  98%  13943
  99%  14011
 100%  14011 (longest request)

14 seconds at worst. slow. On the other hand, if you look at the log, you can see that each process is completed in 1 second.

1004ms
1003ms
...
1000ms
1004ms

In other words, it can be seen that there is a long wait before or after the process starts.
Looking at what this was waiting for, I found that it was waiting to add a thread to the thread pool.

Click here for a description of the thread pool
Managed Thread Pool

Regardless of ASP.NET, in the .NET environment it is recommended that threads be reused in the thread pool using tasks rather than being launched and managed by the process itself. Some threads are waiting in the thread pool, and when a task (such as HTTP request in ASP.NET) is queued, it is assigned to a free thread, and when the process is completed, it is returned to the wait state. ing.

The question here is how many threads are pooled, which is described below.

Minimum thread pool

The thread pool will provide new worker threads or I / O completed threads on demand until the minimum specified for each category is reached.
When the thread pool minimum is reached, it waits until additional threads are created or some tasks are completed. Starting with the .NET Framework 4, worker threads are created and destroyed in the thread pool to optimize throughput. Throughput is defined as the number of tasks completed per unit time.

In summary, it looks like this.

–Threads are started immediately up to the specified minimum value
–The number of threads above the minimum value is awaited.
–Generation is adjusted to optimize the number of completed tasks per unit time

Threads start up quickly up to a fixed number, but after that, they wait until other processing is completed or a new thread is created by the .NET execution environment.

This fixed number can be obtained with ThreadPool.GetMinThreads method. .. When I check it on my Mac,

Worker thread: 4
Waiting for IO: 4

was. This is By default, the minimum number of threads is set to the number of processors on the system. It is the number of CPUs as written as , and it was the number of vCPUs even if it was confirmed in other environments. If the server has 2 vCPUs, up to 2 threads will be started immediately. Isn’t there two threads or so?

Check the log of the time when the process started and the thread ID with the following code.

[HttpGet]
[Route("sync")]
public string Sync()
{
    var startTime = DateTimeOffset.Now;
    Stopwatch sw = new Stopwatch();
    sw.Start();
    Thread.Sleep(1000);
    sw.Stop();
    this._logger.LogInformation("{0,2}: {1:D2}:{2:D2}:{3:D2}:{4:D3} {5,5}ms",  Thread.CurrentThread.ManagedThreadId, startTime.Hour, startTime.Minute, startTime.Second, startTime.Millisecond, sw.ElapsedMilliseconds);
    return "sync";
}

Click here for the results. Added the number of threads started at about the same time in parentheses.

   9: 18:31:40:893  1000ms (1 piece)
   9: 18:31:41:912  1004ms
   8: 18:31:41:912  1004ms
   4: 18:31:41:911  1004ms
  11: 18:31:41:912  1004ms
   6: 18:31:41:917  1004ms (5 pieces)
  12: 18:31:42:907  1001ms
   4: 18:31:42:918  1000ms
   8: 18:31:42:918  1000ms
  11: 18:31:42:918  1000ms
   9: 18:31:42:919  1000ms
   6: 18:31:42:921  1004ms (6)
   4: 18:31:43:919  1004ms
   8: 18:31:43:919  1004ms
  11: 18:31:43:919  1004ms
   9: 18:31:43:919  1004ms
   6: 18:31:43:926  1004ms (5 pieces)
   4: 18:31:44:942  1003ms
  11: 18:31:44:943  1003ms
   9: 18:31:44:942  1003ms
   6: 18:31:44:945  1004ms
   8: 18:31:44:944  1004ms
  12: 18:31:44:958  1000ms (6)
  13: 18:31:45:933  1004ms
   8: 18:31:45:956  1004ms
   9: 18:31:45:956  1005ms
  11: 18:31:45:958  1003ms
   6: 18:31:45:957  1003ms
  12: 18:31:45:961  1003ms
   4: 18:31:45:985  1001ms (Seven)
  13: 18:31:46:940  1002ms
  14: 18:31:46:961  1004ms
   6: 18:31:46:968  1003ms
  11: 18:31:46:968  1003ms
  12: 18:31:46:968  1003ms
   9: 18:31:46:968  1004ms
   4: 18:31:46:988  1001ms
   8: 18:31:46:992  1003ms (8 pieces)
  15: 18:31:47:940  1002ms
  13: 18:31:47:944  1004ms
  14: 18:31:47:968  1004ms
   6: 18:31:47:977  1004ms
   9: 18:31:47:979  1004ms
  12: 18:31:47:980  1003ms
   4: 18:31:47:990  1004ms
   8: 18:31:47:996  1004ms
  11: 18:31:48:078  1002ms (Nine)
  13: 18:31:48:950  1000ms
  14: 18:31:48:973  1002ms
   6: 18:31:48:982  1003ms
  12: 18:31:48:984  1002ms
   9: 18:31:48:984  1003ms
   4: 18:31:48:995  1001ms
   8: 18:31:49:001  1004ms
  15: 18:31:49:081  1003ms
  11: 18:31:49:081  1003ms (Nine)
  13: 18:31:49:951  1003ms
  16: 18:31:49:953  1003ms
  14: 18:31:49:977  1004ms
   6: 18:31:49:985  1004ms
   9: 18:31:49:987  1004ms
  12: 18:31:49:987  1004ms
   4: 18:31:49:997  1000ms
   8: 18:31:50:006  1000ms
  15: 18:31:50:085  1003ms
  11: 18:31:50:085  1003ms (10 pieces)
  16: 18:31:50:957  1001ms
  14: 18:31:50:981  1004ms
   6: 18:31:50:990  1004ms
   9: 18:31:50:992  1004ms
  12: 18:31:50:992  1004ms
   4: 18:31:50:998  1004ms
   8: 18:31:51:007  1005ms
  13: 18:31:51:015  1003ms
  15: 18:31:51:089  1000ms
  11: 18:31:51:089  1000ms (10 pieces)
  16: 18:31:51:962  1004ms
  14: 18:31:51:987  1001ms
   6: 18:31:51:995  1003ms
  12: 18:31:51:997  1003ms
   9: 18:31:51:996  1003ms
   4: 18:31:52:003  1002ms
  13: 18:31:52:021  1001ms
   8: 18:31:52:021  1001ms
  11: 18:31:52:090  1001ms
  15: 18:31:52:090  1001ms (10 pieces)
  16: 18:31:52:967  1005ms
  14: 18:31:52:989  1002ms
   6: 18:31:52:999  1004ms
   9: 18:31:53:000  1004ms
  12: 18:31:53:000  1004ms
   4: 18:31:53:006  1003ms
   8: 18:31:53:026  1000ms
  13: 18:31:53:026  1000ms
  11: 18:31:53:092  1003ms
  15: 18:31:53:092  1003ms
  17: 18:31:53:092  1003ms (11 pieces)
  16: 18:31:53:972  1002ms
  14: 18:31:53:992  1000ms
   6: 18:31:54:003  1000ms (Three)

You can see that the number of threads that can be used is gradually increasing, or that the number of threads is increasing only gradually.

Increase the minimum number of threads

The simplest solution is to increase the number of threads that can be launched immediately.
To do this, use the ThreadPool.SetMinThreads method.
For the time being, try setting the minimum number of worker threads to 100 as shown below.

public static void Main(string[] args)
{
    ThreadPool.SetMinThreads(100, 4);
    CreateHostBuilder(args).Build().Run();
}

When you execute the ab command,

ab -n 100 -c 100 http://localhost:5000/test/sync
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        Kestrel
Server Hostname:        localhost
Server Port:            5000

Document Path:          /test/sync
Document Length:        4 bytes

Concurrency Level:      100
Time taken for tests:   2.082 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      13700 bytes
HTML transferred:       400 bytes
Requests per second:    48.04 [#/sec](mean)
Time per request:       2081.569 [ms](mean)
Time per request:       20.816 [ms](mean, across all concurrent requests)
Transfer rate:          6.43 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   11   3.0     12      13
Processing:  1020 1055   5.1   1056    1061
Waiting:     1007 1054   6.3   1055    1060
Total:       1020 1066   7.0   1068    1074

Percentage of the requests served within a certain time (ms)
  50%   1068
  66%   1069
  75%   1070
  80%   1070
  90%   1072
  95%   1073
  98%   1073
  99%   1074
 100%   1074 (longest request)

It finished in almost 1 second. From this, it was also found that the slow response at the time of simultaneous access is due to the waiting for the thread to start. However, as you can see in the note below, it may not be good to increase the number of threads unnecessarily.

Minimum thread pool

The minimum number of idle threads can be increased using the ThreadPool.SetMinThreads method. However, increasing these values more than necessary can cause performance issues. If you start too many tasks at the same time, all tasks can slow down. In most cases, thread pool performance is improved by using a thread pool-specific algorithm for allocating threads.

When waiting for IO synchronization

Let’s also look at the case of waiting for IO.
Write the API to request for the API that waits for 1 second and responds (listen at http: // localhost: 5001) as follows. Make the asynchronous method GetAsync wait synchronously with Result.

[HttpGet]
[Route("io")]
public string Io()
{
    var startTime = DateTimeOffset.Now;
    Stopwatch sw = new Stopwatch();
    sw.Start();
    var response = client.GetAsync("http://localhost:5001/test/sync").Result;
    sw.Stop();
    this._logger.LogInformation("{0,2}: {1:D2}:{2:D2}:{3:D2}:{4:D3} {5,5}ms",  Thread.CurrentThread.ManagedThreadId, startTime.Hour, startTime.Minute, startTime.Second, startTime.Millisecond, sw.ElapsedMilliseconds);
    return "iosy";
}

What happens if you start this with worker thread 4 and IO waiting thread 4 and access 100 at the same time with ab as well?

ab -n 100 -c 100 -s 300 http://localhost:5000/test/io
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        Kestrel
Server Hostname:        localhost
Server Port:            5000

Document Path:          /test/io
Document Length:        4 bytes

Concurrency Level:      100
Time taken for tests:   65.697 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      13700 bytes
HTML transferred:       400 bytes
Requests per second:    1.52 [#/sec](mean)
Time per request:       65696.601 [ms](mean)
Time per request:       656.966 [ms](mean, across all concurrent requests)
Transfer rate:          0.20 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   1.5      6       7
Processing:  1021 63970 6359.5  64624   64674
Waiting:     1014 63970 6360.2  64623   64674
Total:       1022 63975 6360.0  64630   64681

Percentage of the requests served within a certain time (ms)
  50%  64630
  66%  64640
  75%  64642
  80%  64648
  90%  64662
  95%  64673
  98%  64675
  99%  64681
 100%  64681 (longest request)

It was really late. The default timeout of ab is 30 seconds, so I had to extend the timeout time. The log looks like this (sorted in order of start time)

   6: 21:09:51:730  1080ms
  11: 21:09:54:925  1006ms
   4: 21:09:55:949 63195ms
   6: 21:09:55:949 63129ms
   8: 21:09:55:949 62054ms
   9: 21:09:55:949 63110ms
  11: 21:09:55:954 63080ms
  12: 21:09:57:038 62084ms
  13: 21:09:57:540 61490ms
  14: 21:09:58:050 61037ms
  15: 21:09:59:057 59987ms
  16: 21:10:00:068 58999ms
  17: 21:10:01:073 58031ms
  18: 21:10:01:581 57523ms
  19: 21:10:02:584 56543ms
  20: 21:10:03:589 55471ms
  21: 21:10:04:607 54465ms
  22: 21:10:05:610 53450ms
  23: 21:10:06:616 52488ms
  24: 21:10:07:613 51514ms
  25: 21:10:08:620 50446ms
  26: 21:10:09:127 49940ms
  27: 21:10:10:145 48959ms
  28: 21:10:11:151 47928ms
  29: 21:10:12:147 46873ms
  30: 21:10:13:151 45952ms
  31: 21:10:13:658 45388ms
  32: 21:10:14:662 44427ms
  33: 21:10:15:665 43401ms
  34: 21:10:16:170 42933ms
  35: 21:10:16:678 42382ms
  36: 21:10:17:184 41877ms
  37: 21:10:17:685 41405ms
  38: 21:10:18:200 40873ms
  39: 21:10:19:223 39843ms
  40: 21:10:20:260 38832ms
  41: 21:10:21:219 37814ms
  42: 21:10:21:722 37305ms
  43: 21:10:22:227 36798ms
  44: 21:10:22:729 36313ms
  45: 21:10:23:284 35757ms
  46: 21:10:24:291 34757ms
  47: 21:10:25:298 33782ms
  48: 21:10:26:288 32781ms
  49: 21:10:27:264 31802ms
  50: 21:10:28:267 30822ms
  51: 21:10:28:769 30322ms
  52: 21:10:29:271 29777ms
  53: 21:10:29:776 29286ms
  54: 21:10:30:282 28779ms
  55: 21:10:30:787 28279ms
  56: 21:10:31:292 27757ms
  57: 21:10:31:797 27272ms
  58: 21:10:32:302 26748ms
  59: 21:10:32:808 26290ms
  60: 21:10:33:321 25733ms
  61: 21:10:34:337 24696ms
  62: 21:10:35:329 23731ms
  63: 21:10:35:834 23264ms
  64: 21:10:36:337 22761ms
  65: 21:10:36:841 22256ms
  66: 21:10:37:346 21757ms
  67: 21:10:37:851 21213ms
  68: 21:10:38:356 20671ms
  69: 21:10:38:860 20202ms
  70: 21:10:39:361 19700ms
  71: 21:10:39:866 19208ms
  72: 21:10:40:370 18694ms
  73: 21:10:40:871 18244ms
  74: 21:10:41:376 17740ms
  75: 21:10:41:877 17238ms
  76: 21:10:42:879 16245ms
  77: 21:10:43:381 15650ms
  78: 21:10:43:885 15195ms
  79: 21:10:44:390 14671ms
  80: 21:10:44:894 14229ms
  81: 21:10:45:398 13730ms
  82: 21:10:45:903 13225ms
  83: 21:10:46:408 12634ms
  84: 21:10:46:913 12210ms
  85: 21:10:47:418 11649ms
  86: 21:10:47:920 11154ms
  87: 21:10:48:422 10635ms
  88: 21:10:48:933 10199ms
  89: 21:10:49:433  9699ms
  90: 21:10:49:938  9195ms
  91: 21:10:50:443  8689ms
  92: 21:10:50:948  8142ms
  93: 21:10:51:453  7640ms
  94: 21:10:51:956  7183ms
  95: 21:10:52:460  6644ms
  96: 21:10:52:965  6052ms
  97: 21:10:53:470  5670ms
  98: 21:10:53:972  5143ms
  99: 21:10:54:477  4592ms
 100: 21:10:54:982  4158ms
 101: 21:10:55:486  3654ms
 102: 21:10:55:990  3155ms
 103: 21:10:56:492  2541ms
 104: 21:10:56:997  2149ms
 105: 21:10:57:501  1644ms

One thread is spawned every 500ms or every second, almost 100 threads are spawned (98 to be exact), and they finish at about the same time. Access to http: // localhost: 5000 / test / io is being processed gradually, but access to http: // localhost: 5001 / test / sync arrives at about the same time.

It is about the same even if the IO waiting thread is increased to 100. When the worker thread is set to 100, it becomes 1 second.

At the time of access, 100 threads are accumulated in the thread pool queue, and threads that process them are generated in sequence, but it seems that they are not in IO processing until all of them are exhausted. I’m not sure exactly, but the process waiting for Wait or Result is queued in the worker thread, and unless it is assigned, it may not be possible to proceed to the asynchronous part process. In this case as well, you will have to set the number of worker threads for the expected number of simultaneous requests. As a result, it is the worst, so it seems better not to wait for asynchronous processing synchronously.

When waiting for IO asynchronous

So let’s wait asynchronously with await and let it process. It becomes as follows.

[HttpGet]
[Route("asyncio")]
public async Task<string> AsyncIo()
{
    var startTime = DateTimeOffset.Now;
    Stopwatch sw = new Stopwatch();
    sw.Start();
    var response = await client.GetAsync("http://localhost:5001/test/sync");
    sw.Stop();
    this._logger.LogInformation("{0,2}: {1:D2}:{2:D2}:{3:D2}:{4:D3} {5,5}ms",  Thread.CurrentThread.ManagedThreadId, startTime.Hour, startTime.Minute, startTime.Second, startTime.Millisecond, sw.ElapsedMilliseconds);
    return "ioas";
}

Add async to the controller method, set the return type to Task , and wait for GetAsync with await.

As before, it started as worker thread 4 and IO waiting thread 4, and accessed 100 at the same time with ab.

ab -n 100 -c 100 http://localhost:5000/test/asyncio
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        Kestrel
Server Hostname:        localhost
Server Port:            5000

Document Path:          /test/asyncio
Document Length:        4 bytes

Concurrency Level:      100
Time taken for tests:   2.206 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      13700 bytes
HTML transferred:       400 bytes
Requests per second:    45.32 [#/sec](mean)
Time per request:       2206.289 [ms](mean)
Time per request:       22.063 [ms](mean, across all concurrent requests)
Transfer rate:          6.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    5   2.0      4       9
Processing:  1015 1149  26.1   1156    1165
Waiting:     1015 1148  26.6   1155    1164
Total:       1023 1153  25.5   1159    1169

Percentage of the requests served within a certain time (ms)
  50%   1159
  66%   1162
  75%   1164
  80%   1164
  90%   1166
  95%   1167
  98%   1168
  99%   1169
 100%   1169 (longest request)

It can be seen that even if the number of threads is small, it could be processed in about 1 second. I think this is the correct answer for ASP.NET Core.

When I read Article about waiting for IO, it says that there is no thread while waiting for the result of IO. thing. When the process reaches await client.GetAsync, the thread is released at that point and the next thread can be spawned. (It seems that the thread will not be released if you do something like waiting in Result)

When the asynchronous wait ends, the process is assigned to the thread at that point and the process is restarted, but since the thread is not used at this time, a small number of threads in the thread pool can be assigned. When IO wait is the main process such as API access and DB access, calling with await will allow a small number of threads to handle a large number of requests. Basically, it should be made in this form.

in conclusion

I learned about asynchronous processing of C # and .NET Core such as threads, thread pools, tasks, and async / await.
At first glance you don’t know what async / await is doing, but it’s very useful if you know. Let’s use it positively where it can be used.