![]() ![]() So it's up to you to code and decode the TCP/IP headers or use an existing lib like mTCP that does that for you. This block contains raw network data received from the network card. Once you have a block you can process it. In the user space you use select/epoll/kqueue or polling if the waiting time is very small. When data arrives, it is directly written in place in a block of the memory mapped zone. ![]() I never used it so I don't know the details. Blocks have a fixed size big enough to hold the ~1500 IP blocks. This zone is organized as a pool of blocks managed with non blocking lists. The way it works, as I was explained, is that they use a shared memory mapped zone. The "zero copy" networking stacks avoids the data copy. Then your read operation copies that data in your user space. Normally, when the kernel recieves data from the network, it allocates a block in the kernel and copy the data into it. The core functionality required is a "zero copy" networking lib : dpdk, netmap. Under load, a naive application which resets events on a descriptor every time it processes that descriptor is pushing much more data through syscalls than select would. Nonetheless, if you use epoll without actually making use of and benefitting from persistent events (as some naive applications do, including a much-touted "high-performance" platform), select can even be faster than epoll under load. CPUs are fast enough that you'll rarely get more than a handful of pending events per call even with thousands of active connections, so the cost isn't amortized well. IME once you get past an FD_SETSIZE of 8192 the overhead of preparing 3KB of data on every single select call begins to show appreciable CPU load. That can cause problems when sharing fd_sets between different libraries, though, so isn't a good idea. The userspace libraries (including glibc) even often permit you to redefine FD_SETSIZE at compile time, which makes it even easier. Most unix-like kernels, including Linux, will accept a much larger fd_set. The limit is an artifact of the way the fd_set structure and macros are defined and implemented. ![]() Select doesn't have such an inherent limit. There's like a gazillion commits over its life that have fixed various networking bugs and quirks. ![]() And, if you want to do it for fun and experience, that's also cool. As you note, if you have a team dedicated to it, cool. Yes, most of the standards are well-written and well-defined, and you can spend a couple weeks really grokking Stevens' book (or whatever the modern equivalent is, I dunno, as I don't implement anything at that level anymore), but you're gonna spend years becoming bug-compatible with the rest of the Internet (or coming to realize your interpretation and the rest of the world's interpretation of the spec differ).ΔΆ million requests a second sounds amazing. I kinda feel like it's comparable to rolling your own encryption. The list of things that can go wrong is endless. If you respond incorrectly to ICMP messages, you're going to waste someone's day. If you don't deal with encapsulation right, you're going to waste someone's day. If you make an off-by-one mistake in your PMTU discovery code, you're gonna waste somebody's day. My first company sold proxy servers, and I can't count the number of times a buggy router stack or other embedded thing broke the web for some users some of the time. Also, it likely makes you and your company a bad Internet citizen to roll your own, for the reasons you've mentioned. But if you are a startup trying to get a product off the ground, this is Premature optimization. If you are Google, Facebook or another internet behemoth that is optimizing for efficiently at scale and can afford to dedicate a team to the problem, do it. We started making plans to eliminate the dependency, now much more complicated because we had to transition active deployments away. In both cases, after about a year, we found ourselves wishing we had not rewritten the network stack. Accommodating these cluttered that pretty code further. We wasted weeks chasing implementation bugs in other network stack that were defacto but undocumented parts of the internet's "real" spec. Our clean implementations started to get cluttered with nuances in the spec we didn't appreciate. We could rewrite core Internet protocol implementations and be better! "So much faster" and "wow, my code is a lot simpler than the kernel equivalent, I am smart!" We shipped versions that worked, with high confidence and enthusiasm. The projects were filled with lots of early confidence and successes. The justifications were different each time, though never perf. Twice in my career I have been on teams where we decided to rewrite IP or TCP stacks. Please don't rewrite your network stack unless you can afford to dedicate a team to support it full time. ![]()
0 Comments
Leave a Reply. |