web space | free website | Business Web Hosting | Free Website Submission | shopping cart | php hosting

Building an Operating System, Chapter 2

Wow, you are still with me? That's pretty impressive. You made it through context switching, or more likely you skipped it in hopes of finding something a little easier to wrap your brain around? I cannot say I blame you context switching is bloody hard.

Just to make to start things off nice and light, here is an example I just found of a good reason why you need context switching. This is from: The IBM 7094 and CTSS

IBM had been very generous to MIT in the fifties and sixties, donating its biggest scientific computers. When a new top of the line came out, MIT expected to get one. In the early sixties, the deal was that MIT got one 8-hour shift [on the 7094], all the other New England colleges and universities got a shift, and the third shift was available to IBM for its own use. One use IBM made of it was yacht handicapping: the president of IBM raced big yachts on Long Island Sound, and these boats were assigned handicap points by a complicated formula. There was a special job deck kept at the MIT Computation Center, and if a request came in to run it, operators were to stop whatever was running on the machine and do the yacht handicapping job immediately.

Of course an Operating System is more than just context switching so there is still lots to learn if you want to warm your brain up on something else and then come back to context switching later. So with no beating around the bush let's consider the C function malloc().

I can almost hear the novice programmers cringe. malloc() is a function in C that basically tells the Operating System that this program needs some space in memory, malloc() returns the memory address of the space the OS has given to the program. Just in case anyone is interested, here is an example of the use of malloc(), note, the only parameter malloc() needs is the size of the memory area to be allocated. Also one should note, to free up the allocated memory one would use the function free().

int bar = 3;
int * foo;		/* The * means that foo points to an address and is not, itself an */
			/* integer, rather foo is a pointer - pointing to an integer. */
foo = malloc(sizeof(int));
*foo = 6;
free(foo);
foo = &bar;		/* *foo == 3 */
What of this sizeof() function? This is a standard C function which returns, in bytes, the size of any data structure. For example in a typical 32-bit system sizeof(int) would return 4.

What happens if we remove the line with the call to malloc()? Then we basically have a pointer, that is pointing to some random location in memory and suddenly we overwrite whatever is in that random location with the value 6. What was there before? Lets hope it's not a copy of your resume! What if we remove the call to free()? Well if there is only one malloc() in the whole program it is not a big deal. But if we are allocated memory all over the place it is not impossible that we will run out of memory. One should be careful therefore to use malloc() with as much caution as possible.(1)

What about &bar? This tells the compiler to set foo to point to bar. In other words, since bar really is nothing more than a memory address we are saying, foo, point to bar just as if I had done a malloc().

Does this sound confusing? Not to worry if you are building an operating system, you can not use malloc().

All right Cole, why did you just put me through that confusing romp? Because, if you are at all an experienced programmer you have probably used malloc() or maybe you called it new(). But you used something. Well in Operating Systems you do NOT have dynamic memory. Let me say that again. You build an OS? Kiss your dynamic memory bye-bye!

Except, you actually can use a dynamic memory, sort of. You see the problem with dynamic memory is that although you can request space in constant time. That is you can run the command malloc() ten thousand times and each time it should take about as long to return a result. That free() or delete() operation is a real killer. The first execution might take only three or four CPU cycles, but the second call might take three million CPU cycles, and while 3 million clocks is not that long to an outside observer it can be awfully long if the temperature monitor process has exceeded its threshold and it has just sent a message to the coolant pump monitor but now you are waiting for the CPU to finish sorting out a call to free() before you can prevent a meltdown in your nuclear reactor.

When it comes to OS design, you want all calls to the OS to be serviced in a constant time. Now this does not mean that you need to service a call and return to the original process in a constant time, that only holds in Real Time systems; but you don't want to be fooling around in Kernel mode while your Lunar Lander is making its descent.

All right so where in all this talk does dynamic memory, or malloc() and all that stuff come into play? Well let's consider the queue of processes that are ready to run. Often called the "ready queue".


Now there are some really important points about the above diagram. First, I removed the tail pointer from the list at priority level 1, I did this to make the diagram easier to read. Second, there are no tasks at priority level n, because I was getting sick of drawing those little boxes. Finally those boxes represent the PCB or Task Descriptor of tasks that are ready to run. Each task PCB is put in the queue at the appropriate priority level. For example, suppose a task, t, with PCB P is ready to run at priority level i, and later a task, u, with PCB Q is ready to run also at priority i. When the first task, t, is ready to run it is added to the end of the i'th list, that is, P is now pointed to by the tail pointer at priority level i. Now P, like all other PCBs, will have a next_to_run (or something similar) field. At first this field is null then when u is ready to run P's next_to_run is reset to point to Q. In addition, the tail pointer is set to point to Q as well. If anyone wants to make an animated gif of the above idea I would be more than happy to put it here!

What happens when it is time to run a new task? Well to start with the OS will go to the above array, or ready queue and look at priority level 0. If there are no PCBs pointed to in priority level 0 then the OS will look at priority level 1, then 2, etc, then n. Eventually at say priority level i, where i is between 0 and n, one of the queues will not be empty. The OS will run the task that has its PCB pointed to by the head pointer at priority level i, suppose this PCB is the PCB for the foo process. The OS will keep a pointer to foo's PCB handy and set the head pointer to be equal to the next_to_run in foo's PCB. There are special cases, like what if foo is the only task ready to run at a priority level, but these can all be handled without too much creativity. What if there are no tasks to run? If you are writing a standard monolithic kernel, like Linux you would probably just start looping around the ready queue waiting for something to change. In FEMOS as well as other micro-kernel OSs the object is to be in kernel mode for the shortest amount of time possible. So what you do is run an idle task, which is a task that runs in user mode that does nothing. A good idle task is provided below.

main() {
   while(1) ;
}
Curiously in CS 452 most idle tasks seemed to do cool things like SETI or RSA decryption, but really the idle task I provided should be quite satisfactory.

There is one thing I should note here, if you are building a Real Time Operating System you should probably have enough priority levels that every task runs at its own unique priority level. This is actually pretty easy to do, you assert that your system will support no more than k processes, k is usually 32 or maybe 64, but sometimes less, seldom much more. Then you have k priority levels and tell your users that every new process must be created at a unique priority level. End users may feel kind of silly assigning higher priorities to one process over other processes, but it actually makes sense from a theoretical view.

Now remember, you have no dynamic memory, so how do you generate data structures like the ready queue? Well you have an array of every single PCB, or Task Descriptor, hence you have a Task Descriptor Table. Each next_to_run, head and tail pointer is just an integer indexing into your array. What if you run out of PCBs? I can hear the Java programmers snicker, use a vector class! Well all you Java programmers eat some dirt class and invoke the upset stomach method. The vector class is a cheap hack and will not work here, all the vector class does is see that you are trying to add an entry to a full array so it simply creates a new array that is twice the size of the old and deletes the old array. So what is wrong with that? Well let's look at that last sentence. The vector class simply creates a new array that is twice the size of the old and deletes the old array. Oops, Houston we have a problem. So what if we run out of PCB's? Then we say too bad whenever a user try's to create a new process. This does not actually happen all that often, a general purpose OS will have support for thousands of PCBs and remember we can reuse PCBs and an embedded OS will only run a couple processes, maybe a thousand at theoretical most (really an embedded OS will probably only run 32 or 64 processes as I said above). Remember also, no matter how many PCBs we allocate space for, we are almost certain to run out of memory before we run out of PCBs.

All right, so lets consider, we have this array, and since we are building an embedded OS we might support say 64 processes, which is probably about 32 processes more than we really need but I digress. So we have this struct TaskDescriptor TDTable[64]; array. What is in each Task Descriptor?

What is all this stuff for and why is there this empty space at the bottom?

Well the empty space is there because later on I will want to add more details to this picture. ESP register is the value in the Extended Stack Pointer register when we context switch out of this process (or it is the default value assigned by the kernel when space was allocated for this process). SS register is the value in Stack Segment Selector register for this process. Note that SS will never change for a process, that is, when a process is loaded it will be assigned a value for SS that never ever changes. Why? (At this point you should be able to answer my Why? on your own.) Also because the SS register is only 16 bits across, only 16 bits need to be stored. The remaining grey space is probably unused, at least GCC will not use that space by default (2).

Under the SS register there is Process ID and Parent PID, Process ID or PID is a unique identifier for this process. In order for other processes to communicate with this process we need some identifier and this is it. Parent PID tells us who created this process, this is used chiefly for debugging. Priority tells us where in the ready queue this process goes, it is information we already have, in an easy to find place. Next_To_Run was explained above and tells us the next process to run at this Priority level. (A true Real Time OS will not even have this field since it will always be null. Why? - You should know the answer.) Receive_Queue is needed for InterProcess Communications (IPC) which will be explained later. State is the current state of this process, is it ready to run, running, waiting for something to happen (blocked)? There are several other states that are needed for IPCing as we will see later.

Intel Architecture 32 bits of nastiness.

Note that those of you who are not experienced programmers will have a lot of difficulty with this section. Not to worry, you do not need to understand it all that clearly, in fact you can probably skip and move on to Chapter 3 and still have a pretty good idea as to what is going on. If you actually do want to make an Operating System that runs on an x86 this section is crucial. If you are working on some other platform (besides x86) this section should be interesting, but not very useful.

Okay so now we know how processes are organized but there are two really important data structures that I better hit on that I have so far avoided, the Global Descriptor Table (GDT) and the Interrupt Descriptor Table (IDT). Before I discuss the GDT and IDT you may want to download the Intel Processor Documents that are mentioned on the OS home page.

Now the really important document for building an operating system is that third manual, the System Programming Guide. You will probably want to download a copy of it right away, you can ask Intel to send you a copy but at 300 or so pages that is an awful waste of paper since you can see all the relevant stuff right on your computer screen, and Intel does run out of stock from time to time.

Now in Chapter 1 I mentioned that data and code is divided into different segments. This logical division facilitated by the processor and the operating system, using a data structure called the Global Descriptor Table. The GDT is basically a large array, or table, where each entry is defined, to the location of the last bit, on page 73 (chapter 3.4.3) of the System Programming Guide. Now take a close look at Figure 3-8 on page 74 (page 3-10), this figure represents the arrangement of the bits in each entry that makes up the GDT. Notice that there are two rows, each row is four bytes, or 32 bits across, thus each entry is eight bytes, 64 bits in size.

The figure titled "Typical GDT Entry" is the same as what you see in figure 3-8, only my figure is at higher resolution, and marks off each bit.



Now there are a few things to note, one the GDT is broken into two bars. Why? Because an Intel x86 is a 32 bit chip(3), so each x86 word(4) is 32 bits across, thus the 64 bit GDT entry spans two words so it makes more sense to represent it as two bars. Now if you look more carefully than is worth (trust me I got a headache drawing this thing) you should notice that both bars are divided into 32 even sections. The dividers are either the solid line that go from top to bottom of the bar, or the little notch that marks off just a bit of the top of the bar. I made those divisions to reflect each bit. That's why the Type field, which is four bits long as three notches, remember count the space between the notches.

Now you will notice that Segment Limit and Base Address have those two numbers with a colon delimiter. What for? Well consider Segment Limit, a 20 bit number, we note that the lowest 16 bits (0 to 15) of the segment limit are also the lowest 16 bits of the GDT entry. The last four bits (16 to 19) are between AVL and P. Why not defragment the whole damned mess and give the base address its own word? Because that would make life easier and the sick soul who designed this obviously did not want to make life easy for anyone!(5) I should note that there is logic in making the Base Address part of the entry, it belongs there, but I don't see why Base Address could not be one word, Segment Limit could be the lowest 20 bits of the next work and the rest of the space could be things like Type and G, and D or B and so on.

Now I will try to give a simple explanation of what all the different data is. A more detailed explanation is offered in the System Programming Guide, the Intel document, on pages 74 - 76 (3-10 through 3-12 of chapter 3.4.3), of course I am much easier to read, I hope!

Segment Limit
This is the size of our segment, this value is 20 bits long. Depending on if the granularity flag is clear the segment size can be 1 byte long (segment limit is zeroed) to as much as 1 megabyte long (segment limit is all ones). Put another way, if the granularity flag is cleared segment limit measures in bytes. If the granularity flag is set the segment limit measures in 4-kilobyte blocks. Thus if the granularity flag is set the segment limit varies from 4-kilobytes to 4 Gigabytes.

Base address
This field marks the start of the segment, in other words, byte zero of the data or stack or code segment is at this address. Intel recommends that this value be aligned to the 16-byte boundary. In other words, this number should divide by 16 with no remainder. This field is 32 bits long so it can be anywhere in an entire 4 gigabyte address space, accurate to the byte.

Type
This four bit field tells the system what the segment is, code, stack, data. There are 24 possible values, or 16 (of course) and all of them are listed on page 77 (or 3-13), in table 3-1 of chapter 3.4.3.1. The type field works differently depending on if the system flag is set or cleared. If the system flag is set then look at table 3-1. If the system flag is clear then you are working with a system segment in protected mode and things get really ugly really fast. Read chapter 3.5 of the Intel book for more nasty details.

System flag
See the type Type descriptor above, also see chapter 3.5 of the Intel document. Note that as a personal convention all flags are 1 bit.

DPL (Descriptor Privilege Level)
This two bit field stores the task privilege level. If you are building an embedded OS everything will probably run at level zero for a number of reasons I will not get into here. If you want to learn more about privilege levels, read chapter 4.5 on page 111 (or 4-7). You are probably best off to leave the Privilege Level out if you can, it is very very hard to get right and when I was taking CS 452 only one group of about 20 groups actually made a go of using multiple privilege levels.

P (segment Present flag)
Indicates whether the segment is present (set) or not yet loaded (clear). If the segment is not yet loaded the system will generate an exception and then you can load the segment. So you can have demand loading of segments. Chances are you will set the value to 1 and load the segment right away before execution, if this is an embedded OS. If this is a general purpose OS you will need to learn a lot about paging first then this will seem redundant anyway!

D/B (Default operation size OR default stack pointer size OR upper bound flag)
Note that this flag should always be set (to 1) for 32-bit code and data segments and cleared (set to 0) for 16-bit code and data. Now this flag does different things, depending of if it is a code segment, an expand-down data segment or a stack segment. (See the Type field for more details on how to set code or expand-down data or stack segments). If this is a code segment basically set (to 1) for 32-bit code or if you are silly enough to support 16-bit code clear (set to 0) when you have a 16-bit program.

If this is a Stack Segment (the descriptor is pointed to by the SS register - more on that later) the flag specifies if the stack pointer is 32-bit (flag is set to 1) or 16-bit (flag is cleared to 0).

If this is an expand-down segment than a cleared (set to 0) flag means the highest possible address is the 16-bit address 0xFFFF. A set (to 1) flag means the highest possible address is 32-bit 0xFFFF FFFF.

In short this flag should always be set (to 1) for 32-bit code and data segments and cleared (set to 0) for 16-bit code and data.

G (Granularity flag)
See Segment Limit above. Also see page 76 (or 3-12) of the Intel document.

Note that there is also a flag marked AVL this is free space, you can use it or ignore it as you see fit. Also there is a flag which must be zero, between AVL and D/B.

Gosh now was that not an exciting experience? There are two last things before we move on to the Interrupt Descriptor Table: how to set and get the values in the GDT; and how the GDT interrelates to the segment selector registers and the GDTR.

In order to get and set the GDT we need to define a GDT. If you look in the file segments.h you will find the following type definition:

struct segment_descriptor {
	unsigned sd_lolimit:16;		/* segment extent (lsb) */
	unsigned sd_lobase:24;		/* segment base address (lsb) */
	unsigned sd_type:5;		/* segment type */
	unsigned sd_dpl:2;			/* segment descriptor priority level */
	unsigned sd_p:1;			/* segment descriptor present */
	unsigned sd_hilimit:4;		/* segment extent (msb) */
	unsigned sd_xx:2;			/* unused */
	unsigned sd_def32:1;		/* default 32 vs 16 bit size */
	unsigned sd_gran:1;		/* limit granularity (byte/page) */
	unsigned sd_hibase:8;		/* segment base address (msb) */
};
Now if you look at the diagram I had above, (it is also right below for comparison) things might make some sense now.



Note that sd_lolimit is Segment Limit 15:00 and sd_hibase is Base 31:24. You should be able to fill in the rest in an obvious manner. A few points, the portion of the base address (the first 24 bits) span the low word, 15:00 and the first eight bits of the high word, hence, sd_lobase is 24 bits. Since type depends on the S flag in this definition type and S are merged. Also since we do not need AVL and 0 those two entries are merged into sd_xx. Also D/B has been renamed into something a little more logical sd_def32, since all D/B really tells us is that this segment is for a 16 bit or 32 bit program.

Now we are allowed to have a GDT that spans 64 Kilobytes for reasons I will explain shortly. Each entry in the GDT is 64 bits, or 8 bytes, since we can have up to 64 kilobytes (= 216 bytes) of data divided by 8 bytes (= 23 bytes) per descriptor is 216 / 23 = 216 - 3 = 213 = 8192 descriptors. Except Intel requires that we leave the first descriptor blank so that the system will General Protection Fault (GPF) if we try to dereference a NULL descriptor, so we have 8191 usable descriptors.

Lets take a look at the file gdt.c. First of all, the first 8 entries in the GDT are used by the system loader and by the OS kernel. (Actually the first entry is blank as I said above, the next 7 are used by the loader and the system.) Now notice the line:

struct segment_descriptor SegDesc[NumSelectors];
This is where we define a new GDT that we, the system programmers, can access like an array. In fact there was a simple GDT created by the loader but it is not big enough for us. Thus we have defined a new GDT to replace the old one. What do we do with the space used by the old one? Nothing! Unless you really need that memory, and it is only about 64 bytes, its not worth trying to reclaim.

Now look at the two functions, getCodeSelector(int tid) and getDataSelector(int tid). A close examination should reveal that for each user process, the code segment comes first in the GDT then comes the data segment. Note also that there are only two segments per process. The stack is shared with the general data. Things do not need to be this way, but to change things would be hard and not very useful for us. Like Linux Insop and I did not bother to take full advantage of the GDT. Note also that the selNum variable was multiplied by 8 because the segment selector registers are bit shifted three times. Make sure you understand that one line that assigns the selNum variable. (I should note that the Kernel has TID 0 and the first user process has TID 1.)

Now lets take a look how to access data from a descriptor, consider the function int getBaseS( int i ). This function gets the starting address of the segment, in other words, it gets the Base Address.

int getBaseS( int i ) {
    return(SegDesc[i].sd_hibase<<24)|(SegDesc[i].sd_lobase);
}
I hope you are up on your bit twidling! Note that all we are doing here is take the high part of the address, bitwise shifting it to the left and then bitwise ORing the low portion of the address. The result is the full address. Any other data that is fragmented (the segment limit) can be acquired in a similar manner.

To assign values to the GDT we consider the function int setGDTVal().

int setGDTVal(unsigned long limit, unsigned long base, int type) {
    /* Set SegDesc[entry] to have limit base and type as listed in the calling
    * parameters. Return the GDT index number of the GDT entry modified.
    */

    SegDesc[freeGDT].sd_lolimit = limit&0xffff;
    SegDesc[freeGDT].sd_lobase = base&0xffffff;
    SegDesc[freeGDT].sd_type = type;
    /* Suggested value in the course notes */
    SegDesc[freeGDT].sd_dpl = 0;            
    /* Suggested value in the course notes */
    SegDesc[freeGDT].sd_p = 1;              
    SegDesc[freeGDT].sd_hilimit = (limit >> 16)&0xf;
    /* Suggested value in the course notes */
    SegDesc[freeGDT].sd_def32 = 1;          
    /* Don't need paging - so gran is by byte */
    SegDesc[freeGDT].sd_gran = 0;           
    SegDesc[freeGDT].sd_hibase = (base >> 24)&0xff;

    freeGDT++;
    return (freeGDT - 1);
}
See how we had to separate the low part of an address from the high part? By bitwise ANDing we can make a mask that will only allow through the low order bits. By bit shifting and then ANDing we can make a second mask that will allow through only the high order bits. Of course we have to increase the number of used descriptors and then the function returns the array index that this descriptor will reside in.

Now the fact is, we supported the GDT because we had to. Intel clearly documents that while you can turn paging on and off, you cannot turn the GDT on and off. The processor requires the GDT for context switching to work. So how does the GDT fit in? Well first there is a GDTR (Global Descriptor Table Register, see figure 2-4 on page 48 or 2-10 for details) which points to the start of the GDT and stores things like the size of the GDT. Next there are the segment selector registers, ss, ds, es, fs and gs. Now the compiler, GCC, requires that ds, es, fs and gs must all be equal. I can not remember if ss has to be equal to the other segment selectors for GCC to work, but generally most system programmers never want to use more than a few segments per process anyway. The way the x86 works is that the value in the segment selector registers contain the offset into the GDT where the desired descriptor is located. Put very simply the value in the CS register would be added to the value in the GDTR to get the location of a processes code segment. In fact it is much more complicated.

If you look at section 3.4.1 of the Intel document you can see what data is supposed to go into the segment selector registers. If you look at the function loadTask() in the file kernelMain.cc you can actually see the segment selector registers being set. What happens in loadTask()? Lets look at just the relevant sections.

 cs = setGDTVal(
     (pHeader->a_text + pHeader->a_data + sizeof(struct exec) + pHeader->a_bss),
     (pEntry->offset + aAoutHeader.kernelBaseAddress),GDTTypeCode);
 ds = setGDTVal(
	(pHeader->a_data + pHeader->a_bss + pEntry->stackSize), 
	aKernel.getStackTop(), GDTTypeData);
...
    userCS = getCodeSelector( pTask->getTid() );
    userDS = getDataSelector( pTask->getTid() );
    userSS = userDS;
    userSP = pTask->dataTotalSize - 4;
Now we have already seen the function setGDTVal(), we know that it takes all the values that go into the descriptor for this task, assigns the values and returns the index into the GDT array where this entry resides. We have also seen the getCodeSelector() and getDataSelector() functions, we know that both apply a simple mathematical function to the processes Task IDentifier (TID) to get the location in the GDT (already bit shifted for the segment selectors) of the desired descriptor, so what happened? Well it turns out that to assign a value to a segment selector in FEMOS is really easy because FEMOS does not use the Local Descriptor Table (LDT) and FEMOS does not use Privilege Levels. So we just take the index into the GDT, figure out how far the descriptor we want is from the GDTR bit shift the result, put the shifted value into the segment selector and we are done.

Each segment selector is 16 bits long and stores locations as GDT indices. Now there are three bits in each selector that are reserved, two bits for privilege level and one bit for determining if this descriptor is in the LDT or GDT. So we have 16-3=13 bits for location in each selector that gives 213=8192 possible addresses. If you look on page 71 (or 3-7) of the Intel System Programmer Guide you will notice that "The processor multiplies the index value [in the segment selector] by 8". Thus 213 * 8 = 213 addresses * 23 bytes per address = 216 = 64 kilobytes. In short a GDT can not be more than 64 kilobytes in size.

Now I should add the job of the processor is not done. Look at figure 3.7 of the Intel document, it turns out that when you change the segment selector value the processor goes and caches whatever is in the GDT at that location in a hidden register, this is done to improve performance.

All the above said, if you are not using the IA 32, that is an x86 chip, than all of this was irrelevant except for general interest. If you are working with an x86 I should point out that for example, Linux, tries to avoid segmenting as much as possible, as does FEMOS. General purpose OSs use paging and I only hope that I have the time to explain paging later. Paging provides more flexibility than segmenting does (traditionally OS designers use both, but the segmentation is done purely as a software thing.) Paging provides more robust code and just makes a lot more sense than this nonsense does. But if you work with x86 chips you NEED segmentation, you don't have a choice, I believe Intel made a bad design choice years ago and now we are all stuck with it.

The Interrupt Descriptor Table

[BACK] Take me back to the table of contents.

1
Okay, using malloc() all over the place will not cause a system to run out of memory, at least not with swapping. But all those memory allocations still are not very good because it forces the OS to thrash, a topic beyond the scope of this web page. What if this is a Real Time OS? Then you don't swap. Don't swap? Then you really do run out of memory if you over use malloc().

2
Actually there are ways of forcing GCC to stick data in that space except that in an Intel 32 bit chip, like any other 32 bit chip there really is no point. You want all the data to be word aligned, that is you want every new piece of data to start at a byte that's address is a multiple of four, for speed efficiency reasons. And even if we wanted to save those whole 16 bits of memory (at the cost of several CPU clocks every time we sifted into that particular structure) the fact is unless we can convert some other long 32 bit word, in the above data structure into a 16 bit word forget it - the C compiler will force an alignment at the end of the structure. No matter if we use GCC or Visual C++.

3
Okay, only the 80386, and later chips are 32 bit, the 80286 was 16 bit, and if you go back far enough, the 8080 was 8 bit and the 4004 was only 4 bits, but that's being pedantic, since I am assuming we are working with at least an 80386 processor.

4
A word is the basic unit of execution, in other words an instruction and the data that the instruction works with, for example the assembly pseudo code add r1, r1 r2 would be one word. The registers r1 and r2 would be the data and add would be the instruction.

5
More knowledable programmers might note that it does not really matter all that much if the data is fragmented, but my retort to them would be, you want to twiddle bits more than you need to? Although you need create each entry in the GDT only once, and as I will show its not really that hard, I fail to see why things can not be made as easy as possible. Besides the easier the code is, the shorter, the shorter the code is the faster the execution time.