Purpose and Goal

07 March 2005

This purpose of this document is to outline the design of the Kylimar MUD Server. It shall be refined over time as the design is developed. The goal is to design a server that can support between 10,000 and 20,000 simultaneous connections on a machine that roughly one to two gigahertz of processing power, one to two gigabytes of main memory, and an network connection that can support the required bandwidth.

References

07 March 2005

The following references have been and will continue to be useful as the design moves forward:

Terms and Definitions

08 March 2005

These are terms and definitions used in this document.

Platform

07 March 2005

The initial platform for development will be Linux with a 2.4 kernel. It is thought that upgrading to the 2.6 kernel will not present any problems. It is hoped that the architecture and implementation will make the server very portable to other platforms. However, since the initial destination platform is Linux 2.4, some decisions will be slightly tailored to that.

Decisions

07 March 2005

Through this project the following libraries and APIs will be used:

The Vstr library was chosen because it has excellent support for network I/O. Combining the MUD Server with a string library means that we are not reliant on the host system's default C string library, which may or may not be broken.

The epoll API was chosen because it demonstrates superior performance over other similar APIs as of this writing. Asynchronous I/O was avoided because it is just threading hidden by the GNU libc library. Also, it was decided that epoll should be used in an Edge Triggered way to prevent epoll from constantly returning a socket that was already being processed. However, it is still possible for an event to arrive on a socket that is already being processed. For details, see the Threading section below.

Configuration

Data Storage

07 March 2005

In designing a data storage system for the Kylimar MUD Server, we first consider a database. It has many useful features:

However, databases have some downsides:

The downsides of a complete database system seem fairly substantial. Therefore another idea was put forth: bundle the Kylimar MUD Server with its own Data Management Backend Server. A server like this would have these benefits:

The biggest issues facing such a system would be transactions, both for data integrity between system hiccups and for locking. These issues have yet to be addressed.

Threading

07 March 2005

The Kylimar MUD Server will be multithreaded in order to handle the required number of simultaneous Client connections. However, it will not have one thread dedicated to each client connection, as that is not scalable. The system cannot handle more than few hundred threads, less if the threads are very active. There will be at least three threads, configurable to many more:

Of the worker threads, only one may be polling for events at a time. This is enforced by a pthread mutex. Normally, a Client connection is polled for EPOLLIN, EPOLLPRI, EPOLLERR, and EPOLLHUP, where EPOLLPRI, EPOLLERR, and EPOLLHUP indicate a connection needing to be closed. When there is data to be sent to a Client, the thread determining that there is data to be sent will call epoll_ctl and add the EPOLLOUT event, despite the fact that another thread MAY be waiting for events. This is not a problem. Once all data is sent to the Client, epoll_ctl is called again and the EPOLLOUT event is removed. Remember, we are using epoll in an Edge Triggered paradigm.

It is possible for events to happen in this way:

  1. Thread # 1 begins to wait for events on socket S by calling epoll_wait.
  2. A data packet arrives on socket S.
  3. Thread # 1 sees the event and stops waiting, but has not yet begun processing.
  4. Thread # 2 begins to wait for events on socket S by calling epoll_wait; it does not see the event from before even though it has not yet been processed because we are in Edge Triggered mode.
  5. A new packet arrives on socket S.
  6. Thread # 2 sees this event and stops waiting.
  7. Thread # 1 begins processing.
  8. Thread # 2 begins processing.
  9. Since thread # 1 and thread # 2 are now both processing an event from socket S albeit different events, we have a problem.

The solution to this problem is that once an event is seen, the processing thread must still retrieve a lock on the Client connection before processing occurs. Once the lock is retrieved, the Client connection is processed, even though the data causing the original event to be fired may have been eaten by the previous lock holder. This is why all I/O must be marked as non-blocking. If the event is noticed, but the read returns EAGAIN, no processing is really done, except the wasted locking and unlocking of the Client connection just to determine that there is nothing to be done. Hopefully, event processing will occur fast enough and events will enough infrequently enough that in practice this situation hardly ever happens.

08 March 2005

The sweeper thread handles memory deallocation. It runs occasionally, sleeping between runs. When it runs, it deallocates any memory that has not been used for some considerable time. For details about what "some considerable time" is, see the Memory Management section below. The sweeper thread will probably run once every five seconds or so (sleep(5)), but that interval will be tunable by the Administrator.

Worker threads will also be involved in listening for connections. When the worker notices an event on the listening socket, it will call accept() to retrieve the new connection. It will then perform some minor initialization on the connection before adding it to the list of connections that the workers are waiting for events on.

To minimize context switches, we only allow the number of threads competing for processor time to do work to be the same as the number of processors (or maybe +1). This is tunable by the administrator; we do not attempt to detect the number of processors. This is enforced by a counting semaphore. Any thread performing "work" (i.e. not sleeping or waiting for an event) must have retrieved the semaphore by calling sem_wait(). When any thread has finished doing "work" and is about to sleep or wait for an event, it will release the semaphore by calling sem_post().

09 March 2005

There will be another thread which is a maintainance thread. It will run at regular intervals, being the heartbeat or tick of the server. The administator will tune the interval, but the default (and suggested) value will be thirty seconds. This thread will be devoted almost entirely to tasks specific to the application. Some examples tasks might be:

Memory Management

08 March 2005

It is expensive to call the memory allocator. Therefore, we will try to avoid the allocator as much as reasonably possible. When a thread needs more memory, it will request it from a list of unused objects.

The following are object types that will be pooled and cleaned by the sweeper.

Object TypeWhen RequestedWhen Released
Client Connection ObjectUpon initial client accept()Upon client close()
User Data ObjectUpon first associated Client authenticationUpon last associated Client close()

String objects are not contained in the list because the Vstr library maintains its own internal pool.

Pooled objects will be contained in two lists. When an object is requested, it is first pulled off the "A" list. If none exist on the "A" list, then it is pulled off the "B" list. If none exist on the "B" list, then it is allocated from the system memory. When an object is released, it is added to the "A" list. When the sweeper runs, it creates a new "A" list, moves the old "A" list to the "B" list, and deallocates the memory associated with the old "B" list.

Creating / Closing Connections

08 March 2005

Steps for handling a new connection:

  1. accept() is called and returns a new connection.
  2. Check to see if the socket is marked O_NONBLOCK. If it is not, set that flag.
  3. A new Client Connection object is requested from the pool.
  4. Initialization of the object occurs, including any initial processing that might happen on a connection (such as sending a Welcome screen).
  5. Client connection socket is added to the list of sockets that are being waited on with a call to epoll_ctl.

Steps for handling the termination of a connection, initiated by the server:

  1. Queue any desired outgoing goodbye message.
  2. Mark the client as ready for closing.
  3. If there is any pending output, quit processing at this point. Resume after all pending output is sent. This requires a check for "close desired" after each output; if "close desired" is true and there is no pending output, the termination continues from this point. This also requires a check for "close required" when input is received; if "close desired" is true, then no processing is done on that input -- it is thrown away.
  4. close() the Client connection socket.
  5. Release the Client Connection object.
  6. If that was the last remaining authenticated Client Connection for a User Data object, save and release the User Data object.

Steps for handling the termination of a connection, initiated by the remote end:

  1. close() the Client connection socket.
  2. Release the Client Connection object.
  3. If that was the last remaining authenticated Client Connection for a User Data object, save and release the User Data object.

Compression

Logging

Questions still to be discussed and addressed

07 March 2005

08 March 2005

09 March 2005

Comments and Suggestions

07 March 2005

I would greatly appreciate any comments and suggestions you might have. This is definitely an ongoing project. My mailing address is ben at this domain.

Copyright

Copyright (C) 2005 Ben Love. All Rights Reserved.