Short Contents ************** GNU libavl 2.0.1 Preface 1 Introduction 2 The Table ADT 3 Search Algorithms 4 Binary Search Trees 5 AVL Trees 6 Red-Black Trees 7 Threaded Binary Search Trees 8 Threaded AVL Trees 9 Threaded Red-Black Trees 10 Right-Threaded Binary Search Trees 11 Right-Threaded AVL Trees 12 Right-Threaded Red-Black Trees 13 BSTs with Parent Pointers 14 AVL Trees with Parent Pointers 15 Red-Black Trees with Parent Pointers Appendix A References Appendix B Supplementary Code Appendix C Glossary Appendix D Answers to All the Exercises Appendix E Catalogue of Algorithms Appendix F Index Table of Contents ***************** GNU libavl 2.0.1 Preface Acknowledgements Contacting the Author 1 Introduction 1.1 Audience 1.2 Reading the Code 1.3 Code Conventions 1.4 License 2 The Table ADT 2.1 Informal Definition 2.2 Identifiers 2.3 Comparison Function 2.4 Item and Copy Functions 2.5 Memory Allocation 2.6 Creation and Destruction 2.7 Count 2.8 Insertion and Deletion 2.9 Assertions 2.10 Traversers 2.10.1 Constructors 2.10.2 Manipulators 2.11 Table Headers 2.12 Additional Exercises 3 Search Algorithms 3.1 Sequential Search 3.2 Sequential Search with Sentinel 3.3 Sequential Search of Ordered Array 3.4 Sequential Search of Ordered Array with Sentinel 3.5 Binary Search of Ordered Array 3.6 Binary Search Tree in Array 3.7 Dynamic Lists 4 Binary Search Trees 4.1 Vocabulary 4.1.1 Aside: Differing Definitions 4.2 Data Types 4.2.1 Node Structure 4.2.2 Tree Structure 4.2.3 Maximum Height 4.3 Rotations 4.4 Operations 4.5 Creation 4.6 Search 4.7 Insertion 4.7.1 Aside: Root Insertion 4.8 Deletion 4.8.1 Aside: Deletion by Merging 4.9 Traversal 4.9.1 Traversal by Recursion 4.9.2 Traversal by Iteration 4.9.2.1 Improving Convenience 4.9.3 Better Iterative Traversal 4.9.3.1 Starting at the Null Node 4.9.3.2 Starting at the First Node 4.9.3.3 Starting at the Last Node 4.9.3.4 Starting at a Found Node 4.9.3.5 Starting at an Inserted Node 4.9.3.6 Initialization by Copying 4.9.3.7 Advancing to the Next Node 4.9.3.8 Backing Up to the Previous Node 4.9.3.9 Getting the Current Item 4.9.3.10 Replacing the Current Item 4.10 Copying 4.10.1 Recursive Copying 4.10.2 Iterative Copying 4.10.3 Error Handling 4.11 Destruction 4.11.1 Destruction by Rotation 4.11.2 Aside: Recursive Destruction 4.11.3 Aside: Iterative Destruction 4.12 Balance 4.12.1 From Tree to Vine 4.12.2 From Vine to Balanced Tree 4.12.2.1 General Trees 4.12.2.2 Implementation 4.12.2.3 Implementing Compression 4.13 Aside: Joining BSTs 4.14 Testing 4.14.1 Testing BSTs 4.14.1.1 BST Verification 4.14.1.2 Displaying BST Structures 4.14.2 Test Set Generation 4.14.3 Testing Overflow 4.14.4 Memory Manager 4.14.5 User Interaction 4.14.6 Utility Functions 4.14.7 Main Program 4.15 Additional Exercises 5 AVL Trees 5.1 Balancing Rule 5.1.1 Analysis 5.2 Data Types 5.3 Operations 5.4 Insertion 5.4.1 Step 1: Search 5.4.2 Step 2: Insert 5.4.3 Step 3: Update Balance Factors 5.4.4 Step 4: Rebalance 5.4.5 Symmetric Case 5.4.6 Example 5.4.7 Aside: Recursive Insertion 5.5 Deletion 5.5.1 Step 1: Search 5.5.2 Step 2: Delete 5.5.3 Step 3: Update Balance Factors 5.5.4 Step 4: Rebalance 5.5.5 Step 5: Finish Up 5.5.6 Symmetric Case 5.6 Traversal 5.7 Copying 5.8 Testing 6 Red-Black Trees 6.1 Balancing Rule 6.1.1 Analysis 6.2 Data Types 6.3 Operations 6.4 Insertion 6.4.1 Step 1: Search 6.4.2 Step 2: Insert 6.4.3 Step 3: Rebalance 6.4.4 Symmetric Case 6.4.5 Aside: Initial Black Insertion 6.4.5.1 Symmetric Case 6.5 Deletion 6.5.1 Step 2: Delete 6.5.2 Step 3: Rebalance 6.5.3 Step 4: Finish Up 6.5.4 Symmetric Case 6.6 Testing 7 Threaded Binary Search Trees 7.1 Threads 7.2 Data Types 7.3 Operations 7.4 Creation 7.5 Search 7.6 Insertion 7.7 Deletion 7.8 Traversal 7.8.1 Starting at the Null Node 7.8.2 Starting at the First Node 7.8.3 Starting at the Last Node 7.8.4 Starting at a Found Node 7.8.5 Starting at an Inserted Node 7.8.6 Initialization by Copying 7.8.7 Advancing to the Next Node 7.8.8 Backing Up to the Previous Node 7.9 Copying 7.10 Destruction 7.11 Balance 7.11.1 From Tree to Vine 7.11.2 From Vine to Balanced Tree 7.12 Testing 8 Threaded AVL Trees 8.1 Data Types 8.2 Rotations 8.3 Operations 8.4 Insertion 8.4.1 Steps 1 and 2: Search and Insert 8.4.2 Step 4: Rebalance 8.4.3 Symmetric Case 8.5 Deletion 8.5.1 Step 1: Search 8.5.2 Step 2: Delete 8.5.3 Step 3: Update Balance Factors 8.5.4 Step 4: Rebalance 8.5.5 Symmetric Case 8.5.6 Finding the Parent of a Node 8.6 Copying 8.7 Testing 9 Threaded Red-Black Trees 9.1 Data Types 9.2 Operations 9.3 Insertion 9.3.1 Steps 1 and 2: Search and Insert 9.3.2 Step 3: Rebalance 9.3.3 Symmetric Case 9.4 Deletion 9.4.1 Step 1: Search 9.4.2 Step 2: Delete 9.4.3 Step 3: Rebalance 9.4.4 Step 4: Finish Up 9.4.5 Symmetric Case 9.5 Testing 10 Right-Threaded Binary Search Trees 10.1 Data Types 10.2 Operations 10.3 Search 10.4 Insertion 10.5 Deletion 10.5.1 Right-Looking Deletion 10.5.2 Left-Looking Deletion 10.5.3 Aside: Comparison of Deletion Algorithms 10.6 Traversal 10.6.1 Starting at the First Node 10.6.2 Starting at the Last Node 10.6.3 Starting at a Found Node 10.6.4 Advancing to the Next Node 10.6.5 Backing Up to the Previous Node 10.7 Copying 10.8 Destruction 10.9 Balance 10.10 Testing 11 Right-Threaded AVL Trees 11.1 Data Types 11.2 Operations 11.3 Rotations 11.4 Insertion 11.4.1 Steps 1-2: Search and Insert 11.4.2 Step 4: Rebalance 11.5 Deletion 11.5.1 Step 1: Search 11.5.2 Step 2: Delete 11.5.3 Step 3: Update Balance Factors 11.5.4 Step 4: Rebalance 11.6 Copying 11.7 Testing 12 Right-Threaded Red-Black Trees 12.1 Data Types 12.2 Operations 12.3 Insertion 12.3.1 Steps 1 and 2: Search and Insert 12.3.2 Step 3: Rebalance 12.4 Deletion 12.4.1 Step 2: Delete 12.4.2 Step 3: Rebalance 12.4.3 Step 4: Finish Up 12.5 Testing 13 BSTs with Parent Pointers 13.1 Data Types 13.2 Operations 13.3 Insertion 13.4 Deletion 13.5 Traversal 13.5.1 Starting at the First Node 13.5.2 Starting at the Last Node 13.5.3 Starting at a Found Node 13.5.4 Starting at an Inserted Node 13.5.5 Advancing to the Next Node 13.5.6 Backing Up to the Previous Node 13.6 Copying 13.7 Balance 13.8 Testing 14 AVL Trees with Parent Pointers 14.1 Data Types 14.2 Rotations 14.3 Operations 14.4 Insertion 14.4.1 Steps 1 and 2: Search and Insert 14.4.2 Step 3: Update Balance Factors 14.4.3 Step 4: Rebalance 14.4.4 Symmetric Case 14.5 Deletion 14.5.1 Step 2: Delete 14.5.2 Step 3: Update Balance Factors 14.5.3 Step 4: Rebalance 14.5.4 Symmetric Case 14.6 Traversal 14.7 Copying 14.8 Testing 15 Red-Black Trees with Parent Pointers 15.1 Data Types 15.2 Operations 15.3 Insertion 15.3.1 Step 2: Insert 15.3.2 Step 3: Rebalance 15.3.3 Symmetric Case 15.4 Deletion 15.4.1 Step 2: Delete 15.4.2 Step 3: Rebalance 15.4.3 Step 4: Finish Up 15.4.4 Symmetric Case 15.5 Testing Appendix A References Appendix B Supplementary Code B.1 Option Parser B.2 Command-Line Parser Appendix C Glossary Appendix D Answers to All the Exercises Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 13 Chapter 14 Appendix E Catalogue of Algorithms Binary Search Tree Algorithms AVL Tree Algorithms Red-Black Tree Algorithms Threaded Binary Search Tree Algorithms Threaded AVL Tree Algorithms Threaded Red-Black Tree Algorithms Right-Threaded Binary Search Tree Algorithms Right-Threaded AVL Tree Algorithms Right-Threaded Red-Black Tree Algorithms Binary Search Tree with Parent Pointers Algorithms AVL Tree with Parent Pointers Algorithms Red-Black Tree with Parent Pointers Algorithms Appendix F Index GNU libavl 2.0.1 **************** Preface ******* Early in 1998, I wanted an AVL tree library for use in writing GNU PSPP. At the time, few of these were available on the Internet. Those that were had licenses that were not entirely satisfactory for inclusion in GNU software. I resolved to write my own. I sat down with Knuth's `The Art of Computer Programming' and did so. The result was the earliest version of libavl. As I wrote it, I learned valuable lessons about implementing algorithms for binary search trees, and covered many notebook pages with scribbled diagrams. Later, I decided that what I really wanted was a similar library for threaded AVL trees, so I added an implementation to libavl. Along the way, I ended up having to relearn many of the lessons I'd already painstakingly uncovered in my earlier work. Even later, I had much the same experience in writing code for right-threaded AVL trees and red-black trees, which was done as much for my own education as any intention of using the code in real software. In late 1999, I contributed a chapter on binary search trees and balanced trees to a book on programming in C. This again required a good deal of duplication of effort as I rediscovered old techniques. By now I was beginning to see the pattern, so I decided to document once and for all the algorithms I had chosen and the tradeoffs I had made. Along the way, the project expanded in scope several times. You are looking at the results. I hope you find that it is as useful for reading and reference as I found that writing it was enjoyable for me. As I wrote later chapters, I referred less and less to my other reference books and more and more to my own earlier chapters, so I already know that it can come in handy for me. (On the other hand, GNU PSPP, the program that started off the whole saga, has been long neglected and development may never resume. It would need to be rewritten from the top anyhow.) Please feel free to copy and distribute this book, in accordance with the license agreement. If you make multiple printed copies, consider contacting me by email first to check whether there are any late-breaking corrections or new editions in the pipeline. Acknowledgements ================ libavl has grown into its current state over a period of years. During that time, many people have contributed advice, bug reports, and occasional code fragments. I have attempted to individually acknowledge all of these people, along with their contributions, in the `NEWS' and `ChangeLog' files included with the libavl source distribution. Without their help, libavl would not be what it is today. If you believe that you should be listed in one of these files, but are not, please contact me. Many people have indirectly contributed by providing computer science background and software infrastructure, without which libavl would not have been possible at all. For a partial list, please see `THANKS' in the libavl source distribution. Special thanks are due to Erik Goodman of the A. H. Case Center for Computer-Aided Engineering and Manufacturing at Michigan State University for making it possible for me to receive MSU honors credit for rewriting libavl as a literate program, and to Dann Corbit for his invaluable suggestions during development. Contacting the Author ===================== libavl, including this book, the source code, the TexiWEB software, and related programs, was written by Ben Pfaff, who welcomes your feedback. Please send libavl-related correspondence, including bug reports and suggestions for improvement, to him at . Ben received his B.S. in electrical engineering from Michigan State University in May 2001. He is now studying for a Ph.D. in computer science at Stanford University as a Stanford Graduate Fellow. Ben's personal webpage is at `http://benpfaff.org/', where you can find a list of his current projects, including the status of libavl test releases. You can also find him hanging out in the Internet newsgroup comp.lang.c. 1 Introduction ************** libavl is a library in ANSI C for manipulation of various types of binary trees. This book provides an introduction to binary tree techniques and presents all of libavl's source code, along with annotations and exercises for the reader. It also includes practical information on how to use libavl in your programs and discussion of the larger issues of how to choose efficient data structures and libraries. The book concludes with suggestions for further reading, answers to all the exercises, glossary, and index. 1.1 Audience ============ This book is intended both for novices interested in finding out about binary search trees and practicing programmers looking for a cookbook of algorithms. It has several features that will be appreciated by both groups: * Tested code: With the exception of code presented as counterexamples, which are clearly marked, all code presented has been tested. Most code comes with a working program for testing or demonstrating it. * No pseudo-code: Pseudo-code can be confusing, so it is not used. * Motivation: An important goal is to demonstrate general methods for programming, not just the particular algorithms being examined. As a result, the rationale for design choices is explained carefully. * Exercises and answers: To clarify issues raised within the text, many sections conclude with exercises. All exercises come with complete answers in an appendix at the back of the book. Some exercises are marked with one or more stars (*). Exercises without stars are recommended for all readers, but starred exercises deal with particularly obscure topics or make reference to topics covered later. Experienced programmers should find the exercises particularly interesting, because many of them present alternatives to choices made in the main text. * Asides: Occasionally a section is marked as an "aside". Like exercises, asides often highlight alternatives to techniques in the main text, but asides are more extensive than most exercises. Asides are not essential to comprehension of the main text, so readers not interested may safely skip over them to the following section. * Minimal C knowledge assumed: Basic familiarity with the C language is assumed, but obscure constructions are briefly explained the first time they occur. Those who wish for a review of C language features before beginning should consult [Summit 1999]. This is especially recommended for novices who feel uncomfortable with pointer and array concepts. * References: When appropriate, other texts that cover the same or related material are referenced at the end of sections. * Glossary: Terms are "emphasized" and defined the first time they are used. Definitions for these terms and more are collected into a glossary at the back of the book. * Catalogue of algorithms: *Note Catalogue of Algorithms::, for a handy list of all the algorithms implemented in this book. 1.2 Reading the Code ==================== This book contains all the source code to libavl. Conversely, much of the source code presented in this book is part of libavl. libavl is written in ANSI/ISO C89 using TexiWEB, a "literate programming" system. Literate programming is a philosophy that regards software as a kind of literature. The ideas behind literate programming have been around for a long time, but the term itself was invented by computer scientist Donald Knuth in 1984, who wrote two of his most famous programs (TeX and METAFONT) with a literate programming system of his own design. That system, called WEB, inspired the form and much of the syntax of TexiWEB. A TexiWEB document is a C program that has been cut into sections, rearranged, and annotated, with the goal to make the program as a whole as comprehensible as possible to a reader who starts at the beginning and reads the entire program in order. Of course, understanding large, complex programs cannot be trivial, but TexiWEB tries to make it as easy as possible. Each section of a TexiWEB program is assigned both a number and a name. Section numbers are assigned sequentially, starting from 1 with the first section, and they are used for cross-references between sections. Section names are words or phrases assigned by the TexiWEB program's author to describe the role of the section's code. Here's a sample TexiWEB section: 19. = for (i = 0; i < hash->m; i++) hash->entry[i] = NULL; This code is included in 15. The first line of a section, as shown here, gives the section's name and its number within angle brackets. The section number is also given at the left margin to make individual sections easy to find. Code segments often contain references to other code segments, shown as a section name and number within angle brackets. These act something like macros, in that they stand for the corresponding replacement text. For instance, consider the following segment: 15. = hash->m = 13; See also 16. This means that the code for `Clear hash table entries' should be inserted as part of `Initialize hash table'. Because the name of a section explains what it does, it's often unnecessary to know anything more. If you do want more detail, the section number 19 in can easily be used to find the full text and annotations for `Clear hash table entries'. At the bottom of section 19 you will find a note reading `This code is included in 15.', making it easy to move back to section 15 that includes it. There's also a note following the code in the section above: `See also 16.'. This demonstrates how TexiWEB handles multiple sections that have the same name. When a name that corresponds to multiple sections is referenced, code from all the sections with that name is substituted, in order of appearance. The first section with the name ends with a note listing the numbers of all other same-named sections. Later sections show their own numbers in the left margin, but the number of the first section within angle brackets, to make the first section easy to find. For example, here's another line of code for : 16. += hash->n = 0; Code segment references have one more feature: the ability to do special macro replacements within the referenced code. These replacements are made on all words within the code segment referenced and recursively within code segments that the segment references, and so on. Word prefixes as well as full words are replaced, as are even occurrences within comments in the referenced code. Replacements take place regardless of case, and the case of the replacement mirrors the case of the replaced text. This odd feature is useful for adapting a section of code written for one library having a particular identifier prefix for use in a different library with another identifier prefix. For instance, the reference ` avl>' inserts the contents of the segment named `BST types', replacing `bst' by `avl' wherever the former appears at the beginning of a word. When a TexiWEB program is converted to C, conversion conceptually begins from sections named for files; e.g., <`foo.c' 37>. Within these sections, all section references are expanded, then references within those sections are expanded, and so on. When expansion is complete, the specified files are written out. A final resource in reading a TexiWEB is the index, which contains an entry for the points of declaration of every section name, function, type, structure, union, global variable, and macro. Declarations within functions are not indexed. See also: [Knuth 1992], "How to read a WEB". 1.3 Code Conventions ==================== Where possible, the libavl source code complies to the requirements imposed by ANSI/ISO C89 and C99. Features present only in C99 are not used. In addition, most of the GNU Coding Standards are followed. Indentation style is an exception to the latter: in print, to conserve vertical space, K&R indentation style is used instead of GNU style. See also: [ISO 1990]; [ISO 1999]; [FSF 2001], "Writing C". 1.4 License =========== This book, including the code in it, is subject to the following license: 1. = /* GNU libavl - library for manipulation of binary trees. Copyright (C) 1998--2002 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. The author may be contacted at on the Internet, or write to Ben Pfaff, Stanford University, Computer Science Dept., 353 Serra Mall, Stanford CA 94305, USA. */ This code is included in 24, 25, 97, 98, 99, 142, 143, 186, 192, 193, 238, 247, 248, 290, 297, 298, 330, 333, 334, 368, 372, 373, 411, 415, 416, 449, 452, 453, 482, 486, 487, 515, 519, 520, 548, 551, 552, 583, 595, 599, 617, and 649. 2 The Table ADT *************** Most of the chapters in this book implement a table structure as some kind of binary tree, so it is important to understand what a table is before we begin. That is this chapter's purpose. This chapter begins with a brief definition of the meaning of "table" for the purposes of this book, then moves on to describe in a more formal way the interface of a table used by all of the tables in this book. The next chapter motivates the basic idea of a binary tree starting from simple, everyday concepts. Experienced programmers may skip these chapters after skimming through the definitions below. 2.1 Informal Definition ======================= If you've written even a few programs, you've probably noticed the necessity for searchable collections of data. Compilers search their symbol tables for identifiers and network servers often search tables to match up data with users. Many applications with graphical user interfaces deal with mouse and keyboard activity by searching a table of possible actions. In fact, just about every nontrivial program, regardless of application domain, needs to maintain and search tables of some kind. In this book, the term "table" does not refer to any particular data structure. Rather, it is the name for a abstract data structure or ADT, defined in terms of the operations that can be performed on it. A table ADT can be implemented in any number of ways. Later chapters will show how to implement tables in terms of various binary tree data structures. The purpose of a table is to keep track of a collection of items, all of the same type. Items can be inserted into and deleted from a table, with no arbitrary limit on the number of items in the table. We can also search a table for items that match a given item. Other operations are supported, too. Traversal is the most important of these: all of the items in a table can be visited, in sorted order from smallest to largest, or from largest to smallest. Traversals can also start from an item in the middle, or a newly inserted item, and move in either direction. The data in a table may be of any C type, but all the items in a table must be of the same type. Structure types are common. Often, only part of each data item is used in item lookup, with the rest for storage of auxiliary information. A table that contains two-part data items like this is called a "dictionary" or an "associative array". The part of table data used for lookup, whether the table is a dictionary or not, is the "key". In a dictionary, the remainder is the "value". Our tables cannot contain duplicates. An attempt to insert an item into a table that already contains a matching item will fail. Exercises: 1. Suggest a way to simulate the ability to insert duplicate items in a table. 2.2 Identifiers =============== In C programming it is necessary to be careful if we expect to avoid clashes between our own names and those used by others. Any identifiers that we pick might also be used by others. The usual solution is to adopt a prefix that is applied to the beginning of every identifier that can be visible in code outside a single source file. In particular, most identifiers in a library's public header files must be prefixed. libavl is a collection of mostly independent modules, each of which implements the table ADT. Each module has its own, different identifier prefix. Identifiers that begin with this prefix are reserved for any use in source files that #include the module header file. Also reserved (for use as macro names) are identifiers that begin with the all-uppercase version of the prefix. Both sets of identifiers are also reserved as external names(1) throughout any program that uses the module. In addition, all identifiers that begin with libavl_ or LIBAVL_ are reserved for any use in source files that #include any libavl module. Likewise, these identifiers are reserved as external names in any program that uses any libavl module. This is primarily to allow for future expansion, but see *Note Memory Allocation:: and Exercise 2.5-1 for a sample use. The prefix used in code samples in this chapter is tbl_, short for "table". This can be considered a generic substitute for the prefix used by any of the table implementation. All of the statements about these functions here apply equally to all of the table implementation in later chapters, except that the tbl_ prefix must be replaced by the prefix used by the chapter's table implementation. Exercises: 1. The following kinds of identifiers are among those that might appear in a header file. Which of them can be safely appear unprefixed? Why? a. Parameter names within function prototypes. b. Macro parameter names. c. Structure and union tags. d. Structure and union member names. 2. Suppose that we create a module for reporting errors. Why is err_ a poorly chosen prefix for the module's identifiers? ---------- Footnotes ---------- (1) External names are identifiers visible outside a single source file. These are, mainly, non-static functions and variables declared outside a function. 2.3 Comparison Function ======================= The C language provides the void * generic pointer for dealing with data of unknown type. We will use this type to allow our tables to contain a wide range of data types. This flexibility does keep the table from working directly with its data. Instead, the table's user must provide means to operate on data items. This section describes the user-provided functions for comparing items, and the next section describes two other kinds of user-provided functions. There is more than one kind of generic algorithm for searching. We can search by comparison of keys, by digital properties of the keys, or by computing a function of the keys. In this book, we are only interested in the first possibility, so we need a way to compare data items. This is done with a user-provided function compatible with tbl_comparison_func, declared as follows: 2. = /* Function types. */ typedef int tbl_comparison_func (const void *tbl_a, const void *tbl_b, void *tbl_param); See also 4. This code is included in 14. A comparison function takes two pointers to data items, here called a and b, and compares their keys. It returns a negative value if a < b, zero if a == b, or a positive value if a > b. It takes a third parameter, here called param, which is user-provided. A comparison function must work more or less like an arithmetic comparison within the domain of the data. This could be alphabetical ordering for strings, a set of nested sort orders (e.g., sort first by last name, with duplicates by first name), or any other comparison function that behaves in a "natural" way. A comparison function in the exact class of those acceptable is called a "strict weak ordering", for which the exact rules are explained in Exercise 5. Here's a function that can be used as a comparison function for the case that the void * pointers point to single ints: 3. = /* Comparison function for pointers to ints. param is not used. */ int compare_ints (const void *pa, const void *pb, void *param) { const int *a = pa; const int *b = pb; if (*a < *b) return -1; else if (*a > *b) return +1; else return 0; } This code is included in 134. Here's another comparison function for data items that point to ordinary C strings: /* Comparison function for strings. param is not used. */ int compare_strings (const void *pa, const void *pb, void *param) { return strcmp (pa, pb); } See also: [FSF 1999], node "Defining the Comparison Function"; [ISO 1998], section 25.3, "Sorting and related operations"; [SGI 1993], section "Strict Weak Ordering". Exercises: 1. In C, integers may be cast to pointers, including void *, and vice versa. Explain why it is not a good idea to use an integer cast to void * as a data item. When would such a technique would be acceptable? 2. When would the following be an acceptable alternate definition for compare_ints()? int compare_ints (const void *pa, const void *pb, void *param) { return *((int *) pa) - *((int *) pb); } 3. Could strcmp(), suitably cast, be used in place of compare_strings()? 4. Write a comparison function for data items that, in any particular table, are character arrays of fixed length. Among different tables, the length may differ, so the third parameter to the function points to a size_t specifying the length for a given table. *5. For a comparison function f() to be a strict weak ordering, the following must hold for all possible data items a, b, and c: * _Irreflexivity:_ For every a, f(a, a) == 0. * _Antisymmetry_: If f(a, b) > 0, then f(b, a) < 0. * _Transitivity_: If f(a, b) > 0 and f(b, c) > 0, then f(a, c) > 0. * _Transitivity of equivalence_: If f(a, b) == 0 and f(b, c) == 0, then f(a, c) == 0. Consider the following questions that explore the definition of a strict weak ordering. a. Explain how compare_ints() above satisfies each point of the definition. b. Can the standard C library function strcmp() be used for a strict weak ordering? c. Propose an irreflexive, antisymmetric, transitive function that lacks transitivity of equivalence. *6. libavl uses a ternary comparison function that returns a negative value for <, zero for ==, positive for >. Other libraries use binary comparison functions that return nonzero for < or zero for >=. Consider these questions about the differences: a. Write a C expression, in terms of a binary comparison function f() and two items a and b, that is nonzero if and only if a == b as defined by f(). Write a similar expression for a > b. b. Write a binary comparison function "wrapper" for a libavl comparison function. c. Rewrite bst_find() based on a binary comparison function. (You can use the wrapper from above to simulate a binary comparison function.) 2.4 Item and Copy Functions =========================== Besides tbl_comparison_func, there are two kinds of functions used in libavl to manipulate item data: 4.
+= typedef void tbl_item_func (void *tbl_item, void *tbl_param); typedef void *tbl_copy_func (void *tbl_item, void *tbl_param); Both of these function types receive a table item as their first argument tbl_item and the tbl_param associated with the table as their second argument. This tbl_param is the same one passed as the third argument to tbl_comparison_func. libavl will never pass a null pointer as tbl_item to either kind of function. A tbl_item_func performs some kind of action on tbl_item. The particular action that it should perform depends on the context in which it is used and the needs of the calling program. A tbl_copy_func creates and returns a new copy of tbl_item. If copying fails, then it returns a null pointer. 2.5 Memory Allocation ===================== The standard C library functions malloc() and free() are the usual way to obtain and release memory for dynamic data structures like tables. Most users will be satisfied if libavl uses these routines for memory management. On the other hand, some users will want to supply their own methods for allocating and freeing memory, perhaps even different methods from table to table. For these users' benefit, each table is associated with a memory allocator, which provides functions for memory allocation and deallocation. This allocator has the same form in each table implementation. It looks like this: 5. = #ifndef LIBAVL_ALLOCATOR #define LIBAVL_ALLOCATOR /* Memory allocator. */ struct libavl_allocator { void *(*libavl_malloc) (struct libavl_allocator *, size_t libavl_size); void (*libavl_free) (struct libavl_allocator *, void *libavl_block); }; #endif This code is included in 14, 99, and 649. Members of struct libavl_allocator have the same interfaces as the like-named standard C library functions, except that they are each additionally passed a pointer to the struct libavl_allocator * itself as their first argument. The table implementations never call tbl_malloc() with a zero size or tbl_free() with a null pointer block. The struct libavl_allocator type is shared between all of libavl's modules, so its name begins with libavl_, not with the specific module prefix that we've been representing generically here as tbl_. This makes it possible for a program to use a single allocator with multiple libavl table modules, without the need to declare instances of different structures. The default allocator is just a wrapper around malloc() and free(). Here it is: 6. = /* Allocates size bytes of space using malloc(). Returns a null pointer if allocation fails. */ void * tbl_malloc (struct libavl_allocator *allocator, size_t size) { assert (allocator != NULL && size > 0); return malloc (size); } /* Frees block. */ void tbl_free (struct libavl_allocator *allocator, void *block) { assert (allocator != NULL && block != NULL); free (block); } /* Default memory allocator that uses malloc() and free(). */ struct libavl_allocator tbl_allocator_default = { tbl_malloc, tbl_free }; This code is included in 29, 145, 196, 251, 300, 336, 375, 418, 455, 489, 522, 554, and 649. The default allocator comes along with header file declarations: 7. = /* Default memory allocator. */ extern struct libavl_allocator tbl_allocator_default; void *tbl_malloc (struct libavl_allocator *, size_t); void tbl_free (struct libavl_allocator *, void *); This code is included in 14 and 649. See also: [FSF 1999], nodes "Malloc Examples" and "Changing Block Size". Exercises: 1. This structure is named with a libavl_ prefix because it is shared among all of libavl's module. Other types are shared among libavl modules, too, such as tbl_item_func. Why don't the names of these other types also begin with libavl_? 2. Supply an alternate allocator, still using malloc() and free(), that prints an error message to stderr and aborts program execution when memory allocation fails. *3. Some kinds of allocators may need additional arguments. For instance, if memory for each table is taken from a separate Apache-style "memory pool", then a pointer to the pool structure is needed. Show how this can be done without modifying existing types. 2.6 Creation and Destruction ============================ This section describes the functions that create and destroy tables. 8.
= /* Table functions. */ struct tbl_table *tbl_create (tbl_comparison_func *, void *, struct libavl_allocator *); struct tbl_table *tbl_copy (const struct tbl_table *, tbl_copy_func *, tbl_item_func *, struct libavl_allocator *); void tbl_destroy (struct tbl_table *, tbl_item_func *); This code is included in 15. * tbl_create(): Creates and returns a new, empty table as a struct tbl_table *. The table is associated with the given arguments. The void * argument is passed as the third argument to the comparison function when it is called. If the allocator is a null pointer, then tbl_allocator_default is used. * tbl_destroy(): Destroys a table. During destruction, the tbl_item_func provided, if non-null, is called once for every item in the table, in no particular order. The function, if provided, must not invoke any table function or macro on the table being destroyed. * tbl_copy(): Creates and returns a new table with the same contents as the existing table passed as its first argument. Its other three arguments may all be null pointers. If a tbl_copy_func is provided, then it is used to make a copy of each table item as it is inserted into the new table, in no particular order (a "deep copy"). Otherwise, the void * table items are copied verbatim (a "shallow copy"). If the table copy fails, either due to memory allocation failure or a null pointer returned by the tbl_copy_func, tbl_copy() returns a null pointer. In this case, any provided tbl_item_func is called once for each new item already copied, in no particular order. By default, the new table uses the same memory allocator as the existing one. If non-null, the struct libavl_allocator * given is used instead as the new memory allocator. To use the tbl_allocator_default allocator, specify &tbl_allocator_default explicitly. 2.7 Count ========= This function returns the number of items currently in a table. 9.
= size_t tbl_count (const struct tbl_table *); The actual tables instead use a macro for implementation. Exercises: 1. Implement tbl_count() as a macro, on the assumption that struct tbl_table keeps the number of items in the table in a size_t member named tbl_count. 2.8 Insertion and Deletion ========================== These functions insert and delete items in tables. There is also a function for searching a table without modifying it. The design behind the insertion functions takes into account a couple of important issues: * What should happen if there is a matching item already in the tree? If the items contain only keys and no values, then there's no point in doing anything. If the items do contain values, then we might want to leave the existing item or replace it, depending on the particular circumstances. The tbl_insert() and tbl_replace() functions are handy in simple cases like these. * Occasionally it is convenient to insert one item into a table, then immediately replace it by a different item that has identical key data. For instance, if there is a good chance that a data item already exists within a table, then it might make sense to insert data allocated as a local variable into a table, then replace it by a dynamically allocated copy if it turned out that the item wasn't already in the table. That way, we save the time required to make an additional copy of the item to insert. The tbl_probe() function allows for this kind of flexibility. 10.
= void **tbl_probe (struct tbl_table *, void *); void *tbl_insert (struct tbl_table *, void *); void *tbl_replace (struct tbl_table *, void *); void *tbl_delete (struct tbl_table *, const void *); void *tbl_find (const struct tbl_table *, const void *); This code is included in 15. Each of these functions takes a table to manipulate as its first argument and a table item as its second argument, here called table and item, respectively. Both arguments must be non-null in all cases. All but tbl_probe() return a table item or a null pointer. * tbl_probe(): Searches in table for an item matching item. If found, a pointer to the void * data item is returned. Otherwise, item is inserted into the table and a pointer to the copy within the table is returned. Memory allocation failure causes a null pointer to be returned. The pointer returned can be used to replace the item found or inserted by a different item. This must only be done if the replacement item has the same position relative to the other items in the table as did the original item. That is, for existing item e, replacement item r, and the table's comparison function f(), the return values of f(e, x) and f(r, x) must have the same sign for every other item x currently in the table. Calling any other table function invalidates the pointer returned and it must not be referenced subsequently. * tbl_insert(): Inserts item into table, but not if a matching item exists. Returns a null pointer if successful or if a memory allocation error occurs. If a matching item already exists in the table, returns that item. * tbl_replace(): Inserts item into table, replacing and returning any matching item. Returns a null pointer if the item was inserted but there was no matching item to replace, or if a memory allocation error occurs. * tbl_delete(): Removes from table and returns an item matching item. Returns a null pointer if no matching item exists in the table. * tbl_find(): Searches table for an item matching item and returns any item found. Returns a null pointer if no matching item exists in the table. Exercises: 1. Functions tbl_insert() and tbl_replace() return NULL in two very different situations: an error or successful insertion. Why is this not necessarily a design mistake? 2. Suggest a reason for disallowing insertion of a null item. 3. Write generic implementations of tbl_insert() and tbl_replace() in terms of tbl_probe(). 2.9 Assertions ============== Sometimes an insertion or deletion must succeed because it is known in advance that there is no way that it can fail. For instance, we might be inserting into a table from a list of items known to be unique, using a memory allocator that cannot return a null pointer. In this case, we want to make sure that the operation succeeded, and abort if not, because that indicates a program bug. We also would like to be able to turn off these tests for success in our production versions, because we don't want them slowing down the code. 11.
= void tbl_assert_insert (struct tbl_table *, void *); void *tbl_assert_delete (struct tbl_table *, void *); This code is included in 15. These functions provide assertions for tbl_insert() and tbl_delete(). They expand, via macros, directly into calls to those functions when NDEBUG, the same symbol used to turn off assert() checks, is declared. As for the standard C header , header files for tables may be included multiple times in order to turn these assertions on or off. Exercises: 1. Write a set of preprocessor directives for a table header file that implement the behavior described in the final paragraph above. 2. Write a generic implementation of tbl_assert_insert() and tbl_assert_delete() in terms of existing table functions. Consider the base functions carefully. Why must we make sure that assertions are always enabled for these functions? 3. Why must tbl_assert_insert() not be used if the table's memory allocator can fail? (See also Exercise 2.8-1.) 2.10 Traversers =============== A struct tbl_traverser is a table "traverser" that allows the items in a table to be examined. With a traverser, the items within a table can be enumerated in sorted ascending or descending order, starting from either end or from somewhere in the middle. The user of the traverser declares its own instance of struct tbl_traverser, typically as a local variable. One of the traverser constructor functions described below can be used to initialize it. Until then, the traverser is invalid. An invalid traverser must not be passed to any traverser function other than a constructor. Seen from the viewpoint of a table user, a traverser has only one attribute: the current item. The current item is either an item in the table or the "null item", represented by a null pointer and not associated with any item. Traversers continue to work when their tables are modified. Any number of insertions and deletions may occur in the table without affecting the current item selected by a traverser, with only a few exceptions: * Deleting a traverser's current item from its table invalidates the traverser (even if the item is later re-inserted). * Using the return value of tbl_probe() to replace an item in the table invalidates all traversers with that item current, unless the replacement item has the same key data as the original item (that is, the table's comparison function returns 0 when the two items are compared). * Similarly, tbl_t_replace() invalidates all _other_ traversers with the same item selected, unless the replacement item has the same key data. * Destroying a table with tbl_destroy() invalidates all of that table's traversers. There is no need to destroy a traverser that is no longer needed. An unneeded traverser can simply be abandoned. 2.10.1 Constructors ------------------- These functions initialize traversers. A traverser must be initialized with one of these functions before it is passed to any other traverser function. 12. = /* Table traverser functions. */ void tbl_t_init (struct tbl_traverser *, struct tbl_table *); void *tbl_t_first (struct tbl_traverser *, struct tbl_table *); void *tbl_t_last (struct tbl_traverser *, struct tbl_table *); void *tbl_t_find (struct tbl_traverser *, struct tbl_table *, void *); void *tbl_t_insert (struct tbl_traverser *, struct tbl_table *, void *); void *tbl_t_copy (struct tbl_traverser *, const struct tbl_traverser *); This code is included in 15. All of these functions take a traverser to initialize as their first argument, and most take a table to associate the traverser with as their second argument. These arguments are here called trav and table. All, except tbl_t_init(), return the item to which trav is initialized, using a null pointer to represent the null item. None of the arguments to these functions may ever be a null pointer. * tbl_t_init(): Initializes trav to the null item in table. * tbl_t_first(): Initializes trav to the least-valued item in table. If the table is empty, then trav is initialized to the null item. * tbl_t_last(): Same as tbl_t_first(), for the greatest-valued item in table. * tbl_t_find(): Searches table for an item matching the one given. If one is found, initializes trav with it. If none is found, initializes trav to the null item. * tbl_t_insert(): Attempts to insert the given item into table. If it is inserted succesfully, trav is initialized to its location. If it cannot be inserted because of a duplicate, the duplicate item is set as trav's current item. If there is a memory allocation error, trav is initialized to the null item. * tbl_t_copy(): Initializes trav to the same table and item as a second valid traverser. Both arguments pointing to the same valid traverser is valid and causes no change in either. 2.10.2 Manipulators ------------------- These functions manipulate valid traversers. 13. = void *tbl_t_next (struct tbl_traverser *); void *tbl_t_prev (struct tbl_traverser *); void *tbl_t_cur (struct tbl_traverser *); void *tbl_t_replace (struct tbl_traverser *, void *); This code is included in 15. Each of these functions takes a valid traverser, here called trav, as its first argument, and returns a data item. All but tbl_t_replace() can also return a null pointer that represents the null item. All arguments to these functions must be non-null pointers. * tbl_t_next(): Advances trav to the next larger item in its table. If trav was at the null item in a nonempty table, then the smallest item in the table becomes current. If trav was already at the greatest item in its table or the table is empty, the null item becomes current. Returns the new current item. * tbl_t_prev(): Advances trav to the next smaller item in its table. If trav was at the null item in a nonempty table, then the greatest item in the table becomes current. If trav was already at the lowest item in the table or the table is empty, the null item becomes current. Returns the new current item. * tbl_t_cur(): Returns trav's current item. * tbl_t_replace(): Replaces the data item currently selected in trav by the one provided. The replacement item is subject to the same restrictions as for the same replacement using tbl_probe(). The item replaced is returned. If the null item is current, the behavior is undefined. Seen from the outside, the traverser treats the table as a circular arrangement of items, with the null item at the top of the circle and the least-valued item just clockwise of it, then the next-lowest-valued item, and so on until the greatest-valued item is just counterclockwise of the null item. Moving clockwise in the circle is equivalent, under our traverser, to moving to the next item with tbl_t_next(). Moving counterclockwise is equivalent to moving to the previous item with tbl_t_prev(). An equivalent view is that the traverser treats the table as a linear arrangement of nodes: .-> 1 <-> 2 <-> 3 <-> 4 <-> 5 <-> 6 <-> 7 <-> 8 <-. | | `---------------------> NULL <--------------------' From this perspective, nodes are arranged from least to greatest in left to right order, and the null node lies in the middle as a connection between the least and greatest nodes. Moving to the next node is the same as moving to the right and moving to the previous node is motion to the left, except where the null node is concerned. 2.11 Table Headers ================== Here we gather together in one place all of the types and prototypes for a generic table. 14.
=
This code is included in 24, 142, 192, 247, 297, 333, 372, 415, 452, 486, 519, and 551. 15.
=
This code is included in 24, 142, 192, 247, 297, 333, 372, 415, 452, 486, 519, and 551. All of our tables fit the specification given in Exercise 2.7-1, so
is directly included above. 2.12 Additional Exercises ========================= Exercises: *1. Compare and contrast the design of libavl's tables with that of the set container in the C++ Standard Template Library. 2. What is the smallest set of table routines such that all of the other routines can be implemented in terms of the interfaces of that set as defined above? 3 Search Algorithms ******************* In libavl, we are primarily concerned with binary search trees and balanced binary trees. If you're already familiar with these concepts, then you can move right into the code, starting from the next chapter. But if you're not, then a little motivation and an explanation of exactly what a binary search tree is can't hurt. That's the goal of this chapter. More particularly, this chapter concerns itself with algorithms for searching. Searching is one of the core problems in organizing a table. As it will turn out, arranging a table for fast searching also facilitates some other table features. 3.1 Sequential Search ===================== Suppose that you have a bunch of things (books, magazines, CDs, ...) in a pile, and you're looking for one of them. You'd probably start by looking at the item at the top of the pile to check whether it was the one you were looking for. If it wasn't, you'd check the next item down the pile, and so on, until you either found the one you wanted or ran out of items. In computer science terminology, this is a "sequential search". It is easy to implement sequential search for an array or a linked list. If, for the moment, we limit ourselves to items of type int, we can write a function to sequentially search an array like this: 16. = /* Returns the smallest i such that array[i] == key, or -1 if key is not in array[]. array[] must be an array of n ints. */ int seq_search (int array[], int n, int key) { int i; for (i = 0; i < n; i++) if (array[i] == key) return i; return -1; } This code is included in 595 and 600. We can hardly hope to improve on the data requirements, space, or complexity of simple sequential search, as they're about as good as we can want. But the speed of sequential search leaves something to be desired. The next section describes a simple modification of the sequential search algorithm that can sometimes lead to big improvements in performance. See also: [Knuth 1998b], algorithm 6.1S; [Kernighan 1976], section 8.2; [Cormen 1990], section 11.2; [Bentley 2000], sections 9.2 and 13.2, appendix 1. Exercises: 1. Write a simple test framework for seq_search(). It should read sample data from stdin and collect them into an array, then search for each item in the array in turn and compare the results to those expected, reporting any discrepancies on stdout and exiting with an appropriate return value. You need not allow for the possibility of duplicate input values and may limit the maximum number of input values. 3.2 Sequential Search with Sentinel =================================== Try to think of some ways to improve the speed of sequential search. It should be clear that, to speed up a program, it pays to concentrate on the parts that use the most time to begin with. In this case, it's the loop. Consider what happens each time through the loop: 1. The loop counter i is incremented and compared against n. 2. array[i] is compared against key. If we could somehow eliminate one of these comparisons, the loop might be a lot faster. So, let's try... why do we need step 1? It's because, otherwise, we might run off the end of array[], causing undefined behavior, which is in turn because we aren't sure that key is in array[]. If we knew that key was in array[], then we could skip step 1. But, hey! we _can_ ensure that the item we're looking for is in the array. How? By putting a copy of it at the end of the array. This copy is called a "sentinel", and the search technique as a whole is called "sequential search with sentinel". Here's the code: 17. = /* Returns the smallest i such that array[i] == key, or -1 if key is not in array[]. array[] must be an modifiable array of n ints with room for a (n + 1)th element. */ int seq_sentinel_search (int array[], int n, int key) { int *p; array[n] = key; for (p = array; *p != key; p++) /* Nothing to do. */; return p - array < n ? p - array : -1; } This code is included in 600. Notice how the code above uses a pointer, int *p, rather than a counter i as in earlier. For the most part, this is simply a style preference: for iterating through an array, C programmers usually prefer pointers to array indexes. Under older compilers, code using pointers often compiled into faster code as well, but modern C compilers usually produce the same code whether pointers or indexes are used. The return statement in this function uses two somewhat advanced features of C: the conditional or "ternary" operator ?: and pointer arithmetic. The former is a bit like an expression form of an if statement. The expression a ? b : c first evaluates a. Then, if a != 0, b is evaluated and the expression takes that value. Otherwise, a == 0, c is evaluated, and the result is the expression's value. Pointer arithmetic is used in two ways here. First, the expression p++ acts to advance p to point to the next int in array. This is analogous to the way that i++ would increase the value of an integer or floating point variable i by one. Second, the expression p - array results in the "difference" between p and array, i.e., the number of int elements between the locations to which they point. For more information on these topics, please consult a good C reference, such as [Kernighan 1988]. Searching with a sentinel requires that the array be modifiable and large enough to hold an extra element. Sometimes these are inherently problematic--the array may not be modifiable or it might be too small--and sometimes they are problems because of external circumstances. For instance, a program with more than one concurrent "thread" cannot modify a shared array for sentinel search without expensive locking. Sequential sentinel search is an improvement on ordinary sequential search, but as it turns out there's still room for improvement--especially in the runtime for unsuccessful searches, which still always take n comparisons. In the next section, we'll see one technique that can reduce the time required for unsuccessful searches, at the cost of longer runtime for successful searches. See also: [Knuth 1998b], algorithm 6.1Q; [Cormen 1990], section 11.2; [Bentley 2000], section 9.2. 3.3 Sequential Search of Ordered Array ====================================== Let's jump back to the pile-of-things analogy from the beginning of this chapter (*note Sequential Search::). This time, suppose that instead of being in random order, the pile you're searching through is ordered on the property that you're examining; e.g., magazines sorted by publication date, if you're looking for, say, the July 1988 issue. Think about how this would simplify searching through the pile. Now you can sometimes tell that the magazine you're looking for isn't in the pile before you get to the bottom, because it's not between the magazines that it otherwise would be. On the other hand, you still might have to go through the entire pile if the magazine you're looking for is newer than the newest magazine in the pile (or older than the oldest, depending on the ordering that you chose). Back in the world of computers, we can apply the same idea to searching a sorted array: 18. = /* Returns the smallest i such that array[i] == key, or -1 if key is not in array[]. array[] must be an array of n ints sorted in ascending order. */ int seq_sorted_search (int array[], int n, int key) { int i; for (i = 0; i < n; i++) if (key <= array[i]) return key == array[i] ? i : -1; return -1; } This code is included in 600. At first it might be a little tricky to see exactly how seq_sorted_search() works, so we'll work through a few examples. Suppose that array[] has the four elements {3, 5, 6, 8}, so that n is 4. If key is 6, then the first time through the loop the if condition is 6 <= 3, or false, so the loop repeats with i == 1. The second time through the loop we again have a false condition, 6 <= 5, and the loop repeats again. The third time the if condition, 6 <= 6, is true, so control passes to the if statement's dependent return. This return verifies that 6 == 6 and returns i, or 2, as the function's value. On the other hand, suppose key is 4, a value not in array[]. For the first iteration, when i is 0, the if condition, 4 <= 3, is false, but in the second iteration we have 4 <= 5, which is true. However, this time key == array[i] is 4 == 5, or false, so -1 is returned. See also: [Sedgewick 1998], program 12.4. 3.4 Sequential Search of Ordered Array with Sentinel ==================================================== When we implemented sequential search in a sorted array, we lost the benefits of having a sentinel. But we can reintroduce a sentinel in the same way we did before, and obtain some of the same benefits. It's pretty clear how to proceed: 19. = /* Returns the smallest i such that array[i] == key, or -1 if key is not in array[]. array[] must be an modifiable array of n ints, sorted in ascending order, with room for a (n + 1)th element at the end. */ int seq_sorted_sentinel_search (int array[], int n, int key) { int *p; array[n] = key; for (p = array; *p < key; p++) /* Nothing to do. */; return p - array < n && *p == key ? p - array : -1; } This code is included in 600. With a bit of additional cleverness we can eliminate one objection to this sentinel approach. Suppose that instead of using the value being searched for as the sentinel value, we used the maximum possible value for the type in question. If we did this, then we could use almost the same code for searching the array. The advantage of this approach is that there would be no need to modify the array in order to search for different values, because the sentinel is the same value for all searches. This eliminates the potential problem of searching an array in multiple contexts, due to nested searches, threads, or signals, for instance. (In the code below, we will still put the sentinel into the array, because our generic test program won't know to put it in for us in advance, but in real-world code we could avoid the assignment.) We can easily write code for implementation of this technique: 20. = /* Returns the smallest i such that array[i] == key, or -1 if key is not in array[]. array[] must be an array of n ints, sorted in ascending order, with room for an (n + 1)th element to set to INT_MAX. */ int seq_sorted_sentinel_search_2 (int array[], int n, int key) { int *p; array[n] = INT_MAX; for (p = array; *p < key; p++) /* Nothing to do. */; return p - array < n && *p == key ? p - array : -1; } This code is included in 600. Exercises: 1. When can't the largest possible value for the type be used as a sentinel? 3.5 Binary Search of Ordered Array ================================== At this point we've squeezed just about all the performance we can out of sequential search in portable C. For an algorithm that searches faster than our final refinement of sequential search, we'll have to reconsider our entire approach. What's the fundamental idea behind sequential search? It's that we examine array elements in order. That's a fundamental limitation: if we're looking for an element in the middle of the array, we have to examine every element that comes before it. If a search algorithm is going to be faster than sequential search, it will have to look at fewer elements. One way to look at search algorithms based on repeated comparisons is to consider what we learn about the array's content at each step. Suppose that array[] has n elements in sorted order, without duplicates, that array[j] contains key, and that we are trying to learn the value j. In sequential search, we learn only a little about the data set from each comparison with array[i]: either key == array[i] so that i == j, or key != array[i] so that i != j and therefore j > i. As a result, we eliminate only one possibility at each step. Suppose that we haven't made any comparisons yet, so that we know nothing about the contents of array[]. If we compare key to array[i] for arbitrary i such that 0 <= i < n, what do we learn? There are three possibilities: * key < array[i]: Now we know that key < array[i] < array[i + 1] < ... < array[n - 1].(1) Therefore, 0 <= j < i. * key == array[i]: We're done: j == i. * key > array[i]: Now we know that key > array[i] > array[i - 1] > ... > array[0]. Therefore, i < j < n. So, after one step, if we're not done, we know that j > i or that j < i. If we're equally likely to be looking for each element in array[], then the best choice of i is n / 2: for that value, we eliminate about half of the possibilities either way. (If n is odd, we'll round down.) After the first step, we're back to essentially the same situation: we know that key is in array[j] for some j in a range of about n / 2. So we can repeat the same process. Eventually, we will either find key and thus j, or we will eliminate all the possibilities. Let's try an example. For simplicity, let array[] contain the values 100 through 114 in numerical order, so that array[i] is 100 + i and n is 15. Suppose further that key is 110. The steps that we'd go through to find j are described below. At each step, the facts are listed: the known range that j can take, the selected value of i, the results of comparing key to array[i], and what was learned from the comparison. 1. 0 <= j <= 14: i becomes (0 + 14) / 2 == 7. 110 > array[i] == 107, so now we know that j > 7. 2. 8 <= j <= 14: i becomes (8 + 14) / 2 == 11. 110 < array[i] == 111, so now we know that j < 11. 3. 8 <= j <= 10: i becomes (8 + 10) / 2 == 9. 110 > array[i] == 109, so now we know that j > 9. 4. 10 <= j <= 10: i becomes (10 + 10) / 2 == 10. 110 == array[i] == 110, so we're done and i == j == 10. In case you hadn't yet figured it out, this technique is called "binary search". We can make an initial C implementation pretty easily: 21. = /* Returns the offset within array[] of an element equal to key, or -1 if key is not in array[]. array[] must be an array of n ints sorted in ascending order. */ int binary_search (int array[], int n, int key) { int min = 0; int max = n - 1; while (max >= min) { int i = (min + max) / 2; if (key < array[i]) max = i - 1; else if (key > array[i]) min = i + 1; else return i; } return -1; } This code is included in 600. The maximum number of comparisons for a binary search in an array of n elements is about log2(n), as opposed to a maximum of n comparisons for sequential search. For moderate to large values of n, this is a lot better. On the other hand, for small values of n, binary search may actually be slower because it is more complicated than sequential search. We also have to put our array in sorted order before we can use binary search. Efficiently sorting an n-element array takes time proportional to n * log2(n) for large n. So binary search is preferred if n is large enough (see the answer to Exercise 4 for one typical value) and if we are going to do enough searches to justify the cost of the initial sort. Further small refinements are possible on binary search of an ordered array. Try some of the exercises below for more information. See also: [Knuth 1998b], algorithm 6.2.1B; [Kernighan 1988], section 3.3; [Bentley 2000], chapters 4 and 5, section 9.3, appendix 1; [Sedgewick 1998], program 12.6. Exercises: 1. Function binary_search() above uses three local variables: min and max for the ends of the remaining search range and i for its midpoint. Write and test a binary search function that uses only two variables: i for the midpoint as before and m representing the width of the range on either side of i. You may require the existence of a dummy element just before the beginning of the array. Be sure, if so, to specify what its value should be. 2. The standard C library provides a function, bsearch(), for searching ordered arrays. Commonly, bsearch() is implemented as a binary search, though ANSI C does not require it. Do the following: a. Write a function compatible with the interface for binary_search() that uses bsearch() "under the hood." You'll also have to write an additional callback function for use by bsearch(). b. Write and test your own version of bsearch(), implementing it using a binary search. (Use a different name to avoid conflicts with the C library.) 3. An earlier exercise presented a simple test framework for seq_search(), but now we have more search functions. Write a test framework that will handle all of them presented so far. Add code for timing successful and unsuccessful searches. Let the user specify, on the command line, the algorithm to use, the size of the array to search, and the number of search iterations to run. 4. Run the test framework from the previous exercise on your own system for each algorithm. Try different array sizes and compiler optimization levels. Be sure to use enough iterations to make the searches take at least a few seconds each. Analyze the results: do they make sense? Try to explain any apparent discrepancies. ---------- Footnotes ---------- (1) This sort of notation means very different things in C and mathematics. In mathematics, writing a < b < c asserts both of the relations a < b and b < c, whereas in C, it expresses the evaluation of a < b, then the comparison of the 0 or 1 result to the value of c. In mathematics this notation is invaluable, but in C it is rarely meaningful. As a result, this book uses this notation only in the mathematical sense. 3.6 Binary Search Tree in Array =============================== Binary search is pretty fast. Suppose that we wish to speed it up anyhow. Then, the obvious speed-up targets in above are the while condition and the calculations determining values of i, min, and max. If we could eliminate these, we'd have an incrementally faster technique, all else being equal. And, as it turns out, we _can_ eliminate both of them, the former by use of a sentinel and the latter by precalculation. Let's consider precalculating i, min, and max first. Think about the nature of the choices that binary search makes at each step. Specifically, in above, consider the dependence of min and max upon i. Is it ever possible for min and max to have different values for the same i and n? The answer is no. For any given i and n, min and max are fixed. This is important because it means that we can represent the entire "state" of a binary search of an n-element array by the single variable i. In other words, if we know i and n, we know all the choices that have been made to this point and we know the two possible choices of i for the next step. This is the key insight in eliminating calculations. We can use an array in which the items are labeled with the next two possible choices. An example is indicated. Let's continue with our example of an array containing the 16 integers 100 to 115. We define an entry in the array to contain the item value and the array index of the item to examine next for search values smaller and larger than the item: 22. = /* One entry in a binary search tree stored in an array. */ struct binary_tree_entry { int value; /* This item in the binary search tree. */ int smaller; /* Array index of next item for smaller targets. */ int larger; /* Array index of next item for larger targets. */ }; This code is included in 617. Of course, it's necessary to fill in the values for smaller and larger. A few moments' reflection should allow you to figure out one method for doing so. Here's the full array, for reference: const struct binary_tree_entry bins[16] = { {100, 15, 15}, {101, 0, 2}, {102, 15, 15}, {103, 1, 5}, {104, 15, 15}, {105, 4, 6}, {106, 15, 15}, {107, 3, 11}, {108, 15, 15}, {109, 8, 10}, {110, 15, 15}, {111, 9, 13}, {112, 15, 15}, {113, 12, 14}, {114, 15, 15}, {0, 0, 0}, }; For now, consider only bins[]'s first 15 rows. Within these rows, the first column is value, the item value, and the second and third columns are smaller and larger, respectively. Values 0 through 14 for smaller and larger indicate the index of the next element of bins[] to examine. Value 15 indicates "element not found". Element array[15] is not used for storing data. Try searching for key == 110 in bins[], starting from element 7, the midpoint: 1. i == 7: 110 > bins[i].value == 107, so let i = bins[i].larger, or 11. 2. i == 11: 110 < bins[i].value == 111, so let i = bins[i].smaller, or 10. 3. i == 10: 110 == bins[i].value == 110, so we're done. We can implement this search in C code. The function uses the common C idiom of writing for (;;) for an "infinite" loop: 23. = /* Returns i such that array[i].value == key, or -1 if key is not in array[]. array[] is an array of n elements forming a binary search tree, with its root at array[n / 2], and space for an (n + 1)th value at the end. */ int binary_search_tree_array (struct binary_tree_entry array[], int n, int key) { int i = n / 2; array[n].value = key; for (;;) if (key > array[i].value) i = array[i].larger; else if (key < array[i].value) i = array[i].smaller; else return i != n ? i : -1; } This code is included in 617. Examination of the code above should reveal the purpose of bins[15]. It is used as a sentinel value, allowing the search to always terminate without the use of an extra test on each loop iteration. The result of augmenting binary search with "pointer" values like smaller and larger is called a "binary search tree". Exercises: 1. Write a function to automatically initialize smaller and larger within bins[]. 2. Write a simple automatic test program for binary_search_tree_array(). Let the user specify the size of the array to test on the command line. You may want to use your results from the previous exercise. 3.7 Dynamic Lists ================= Up until now, we've considered only lists whose contents are fixed and unchanging, that is, "static" lists. But in real programs, many lists are "dynamic", with their contents changing rapidly and unpredictably. For the case of dynamic lists, we need to reconsider some of the attributes of the types of lists that we've examined.(1) Specifically, we want to know how long it takes to insert a new element into a list and to remove an existing element from a list. Think about it for each type of list examined so far: Unordered array Adding items to the list is easy and fast, unless the array grows too large for the block and has to be copied into a new area of memory. Just copy the new item to the end of the list and increase the size by one. Removing an item from the list is almost as simple. If the item to delete happens to be located at the very end of the array, just reduce the size of the list by one. If it's located at any other spot, you must also copy the element that is located at the very end onto the location that the deleted element used to occupy. Ordered array In terms of inserting and removing elements, ordered arrays are mechanically the same as unordered arrays. The difference is that insertions and deletions can only be at one end of the array if the item in question is the largest or smallest in the list. The practical upshot is that dynamic ordered arrays are only efficient if items are added and removed in sorted order. Binary search tree Insertions and deletions are where binary search trees have their chance to shine. Insertions and deletions are efficient in binary search trees whether they're made at the beginning, middle, or end of the lists. Clearly, binary search trees are superior to ordered or unordered arrays in situations that require insertion and deletion in random positions. But insertion and deletion operations in binary search trees require a bit of explanation if you've never seen them before. This is what the next chapter is for, so read on. ---------- Footnotes ---------- (1) These uses of the words "static" and "dynamic" are different from their meanings in the phrases "static allocation" and "dynamic allocation." *Note Glossary::, for more details. 4 Binary Search Trees ********************* The previous chapter motivated the need for binary search trees. This chapter implements a table ADT backed by a binary search tree. Along the way, we'll see how binary search trees are constructed and manipulated in abstract terms as well as in concrete C code. The library includes a header file and an implementation file , outlined below. We borrow most of the header file from the generic table headers designed a couple of chapters back, simply replacing tbl by bst, the prefix used in this table module. 24. = #ifndef BST_H #define BST_H 1 #include
bst 14>
bst 15> #endif /* bst.h */
bst 593> 25. = #include #include #include #include #include "bst.h" Exercises: 1. What is the purpose of #ifndef BST_H ... #endif in above? 4.1 Vocabulary ============== When binary search trees, or BSTs, were introduced in the previous chapter, the reason that they were called binary search trees wasn't explained. The diagram below should help to clear up matters, and incidentally let us define some BST-related vocabulary: 107 ____....---' `---...___ 103 111 __..-' `._ __..-' `._ 101 105 109 113 _' \ _' \ _' \ _' \ 100 102 104 106 108 110 112 114 This diagram illustrates the binary search tree example from the previous chapter. The circle or "node" at the top, labeled 107, is the starting point of any search. As such, it is called the "root" of the tree. The node connected to it below to the left, labeled 103, is the root's "left child", and node 111 to its lower right is its "right child". A node's left child corresponds to smaller from the array-based BST of the previous chapter, and a right child corresponds to larger. Some nodes, such as 106 here, don't have any children. Such a node is called a "leaf" or "terminal node". Although not shown here, it's also possible for a node to have only one child, either on the left or the right side. A node with at least one child is called a "nonterminal node". Each node in a binary search tree is, conceptually, the root of its own tree. Such a tree is called a "subtree" of the tree that contains it. The left child of a node and recursively all of that child's children is a subtree of the node, called the "left subtree" of the node. The term "right subtree" is defined similarly for the right side of the node. For instance, above, nodes 104, 105, and 106 are the right subtree of node 103, with 105 as the subtree's root. A BST without any nodes is called an "empty tree". Both subtrees of all even-numbered nodes in the BST above are empty trees. In a binary search tree, the left child of a node, if it exists, has a smaller value than the node, and the right child of a node has a larger value. The more general term "binary tree", on the other hand, refers to a data structure with the same form as a binary search tree, but which does not necessarily share this property. There are also related, but different, structures simply called "trees". In this book, all our binary trees are binary search trees, and this book will not discuss plain trees at all. As a result, we will often be a bit loose in terminology and use the term "binary tree" or "tree" when "binary search tree" is the proper term. Although this book discusses binary search trees exclusively, it is instructive to occasionally display, as a counterexample, a diagram of a binary tree whose nodes are out of order and therefore not a BST. Such diagrams are marked ** to reinforce their non-BST nature to the casual browser. See also: [Knuth 1997], section 2.3; [Knuth 1998b], section 6.2.2; [Cormen 1990], section 13.1; [Sedgewick 1998], section 5.4. 4.1.1 Aside: Differing Definitions ---------------------------------- The definitions in the previous section are the ones used in this book. They are the definitions that programmers often use in designing and implementing real programs. However, they are slightly different from the definitions used in formal computer science textbooks. This section gives these formal definitions and contrasts them against our own. The most important difference is in the definition of a binary tree itself. Formally, a binary tree is either an "external node" or an "internal node" connected to a pair of binary trees called the internal node's left subtree and right subtree. Internal nodes correspond to our notion of nodes, and external nodes correspond roughly to nodes' empty left or right subtrees. The generic term "node" includes both internal and external nodes. Every internal node always has exactly two children, although those children may be external nodes, so we must also revise definitions that depend on a node's number of children. Then, a "leaf" is an internal node with two external node children and a "nonterminal node" is an internal node at least one of whose children is an internal node. Finally, an "empty tree" is a binary tree that contains of only an external node. Tree diagrams in books that use these formal definitions show both internal and external nodes. Typically, internal nodes are shown as circles, external nodes as square boxes. Here's an example BST in the format used in this book, shown alongside an identical BST in the format used in formal computer science books: 4 4 __..-' `_ / \ 2 5 2 5 _.' `_ / \ ^ 1 3 [] [] 1 3 / \ / \ [] [] [] [] See also: [Sedgewick 1998], section 5.4. 4.2 Data Types ============== The types for memory allocation and managing data as void * pointers were discussed previously (*note The Table ADT::), but to build a table implementation using BSTs we must define some additional types. In particular, we need struct bst_node to represent an individual node and struct bst_table to represent an entire table. The following sections take care of this. 4.2.1 Node Structure -------------------- When binary search trees were introduced in the last chapter, we used indexes into an array to reference items' smaller and larger values. But in C, BSTs are usually constructed using pointers. This is a more general technique, because pointers aren't restricted to references within a single array. 26. = /* A binary search tree node. */ struct bst_node { struct bst_node *bst_link[2]; /* Subtrees. */ void *bst_data; /* Pointer to data. */ }; This code is included in 24. In struct bst_node, bst_link[0] takes the place of smaller, and bst_link[1] takes the place of larger. If, in our array implementation of binary search trees, either of these would have pointed to the sentinel, it instead is assigned NULL, the null pointer constant. In addition, bst_data replaces value. We use a void * generic pointer here, instead of int as used in the last chapter, to let any kind of data be stored in the BST. *Note Comparison Function::, for more information on void * pointers. 4.2.2 Tree Structure -------------------- The struct bst_table structure ties together all of the data needed to keep track of a table implemented as a binary search tree: 27. = /* Tree data structure. */ struct bst_table { struct bst_node *bst_root; /* Tree's root. */ bst_comparison_func *bst_compare; /* Comparison function. */ void *bst_param; /* Extra argument to bst_compare. */ struct libavl_allocator *bst_alloc; /* Memory allocator. */ size_t bst_count; /* Number of items in tree. */ unsigned long bst_generation; /* Generation number. */ }; This code is included in 24, 142, and 192. Most of struct bst_table's members should be familiar. Member bst_root points to the root node of the BST. Together, bst_compare and bst_param specify how items are compared (*note Item and Copy Functions::). The members of bst_alloc specify how to allocate memory for the BST (*note Memory Allocation::). The number of items in the BST is stored in bst_count (*note Count::). The final member, bst_generation, is a "generation number". When a tree is created, it starts out at zero. After that, it is incremented every time the tree is modified in a way that might disturb a traverser. We'll talk more about the generation number later (*note Better Iterative Traversal::). Exercises: *1. Why is it a good idea to include bst_count in struct bst_table? Under what circumstances would it be better to omit it? 4.2.3 Maximum Height -------------------- For efficiency, some of the BST routines use a stack of a fixed maximum height. This maximum height affects the maximum number of nodes that can be fully supported by libavl in any given tree, because a binary tree of height n contains at most 2**n - 1 nodes. The BST_MAX_HEIGHT macro sets the maximum height of a BST. The default value of 32 allows for trees with up to 2**32 - 1 = 4,294,967,295 nodes. On today's common 32-bit computers that support only 4 GB of memory at most, this is hardly a limit, because memory would be exhausted long before the tree became too big. The BST routines that use fixed stacks also detect stack overflow and call a routine to "balance" or restructure the tree in order to reduce its height to the permissible range. The limit on the BST height is therefore not a severe restriction. 28. = /* Maximum BST height. */ #ifndef BST_MAX_HEIGHT #define BST_MAX_HEIGHT 32 #endif This code is included in 24, 142, 297, 415, and 519. Exercises: 1. Suggest a reason why the BST_MAX_HEIGHT macro is defined conditionally. Are there any potential pitfalls? 4.3 Rotations ============= Soon we'll jump right in and start implementing the table functions for BSTs. But before that, there's one more topic to discuss, because they'll keep coming up from time to time throughout the rest of the book. This topic is the concept of a "rotation". A rotation is a simple transformation of a binary tree that looks like this: | | Y X / \ / \ X c a Y ^ ^ a b b c In this diagram, X and Y represent nodes and a, b, and c are arbitrary binary trees that may be empty. A rotation that changes a binary tree of the form shown on the left to the form shown on the right is called a "right rotation" on Y. Going the other way, it is a "left rotation" on X. This figure also introduces new graphical conventions. First, the line leading vertically down to the root explicitly shows that the BST may be a subtree of a larger tree. Also, the use of both uppercase and lowercase letters emphasizes the distinction between individual nodes and subtrees: uppercase letters are nodes, lowercase letters represent (possibly empty) subtrees. A rotation changes the local structure of a binary tree without changing its ordering as seen through inorder traversal. That's a subtle statement, so let's dissect it bit by bit. Rotations have the following properties: Rotations change the structure of a binary tree. In particular, rotations can often, depending on the tree's shape, be used to change the height of a part of a binary tree. Rotations change the local structure of a binary tree. Any given rotation only affects the node rotated and its immediate children. The node's ancestors and its children's children are unchanged. Rotations do not change the ordering of a binary tree. If a binary tree is a binary search tree before a rotation, it is a binary search tree after a rotation. So, we can safely use rotations to rearrange a BST-based structure, without concerns about upsetting its ordering. See also: [Cormen 1990], section 14.2; [Sedgewick 1998], section 12.8. Exercises: 1. For each of the binary search trees below, perform a right rotation at node 4. 4 4 4 / \ / / \ 2 5 2 2 6 ^ / ^ ^ 1 3 1 1 3 5 7 2. Write a pair of functions, one to perform a right rotation at a given BST node, one to perform a left rotation. What should be the type of the functions' parameter? 4.4 Operations ============== Now can start to implement the operations that we'll want to perform on BSTs. Here's the outline of the functions we'll implement. We use the generic table insertion convenience functions from Exercise 2.8-3 to implement bst_insert() and bst_replace(), as well the generic assertion function implementations from Exercise 2.9-2 to implement tbl_assert_insert() and tbl_assert_delete(). We also include a copy of the default memory allocation functions for use with BSTs: 29. =
bst 592> bst 6>
bst 594> This code is included in 25. 4.5 Creation ============ We need to write bst_create() to create an empty BST. All it takes is a little bit of memory allocation and initialization: 30. = struct bst_table * bst_create (bst_comparison_func *compare, void *param, struct libavl_allocator *allocator) { struct bst_table *tree; assert (compare != NULL); if (allocator == NULL) allocator = &bst_allocator_default; tree = allocator->libavl_malloc (allocator, sizeof *tree); if (tree == NULL) return NULL; tree->bst_root = NULL; tree->bst_compare = compare; tree->bst_param = param; tree->bst_alloc = allocator; tree->bst_count = 0; tree->bst_generation = 0; return tree; } This code is included in 29, 145, and 196. 4.6 Search ========== Searching a binary search tree works just the same way as it did before when we were doing it inside an array. We can implement bst_find() immediately: 31. = void * bst_find (const struct bst_table *tree, const void *item) { const struct bst_node *p; assert (tree != NULL && item != NULL); for (p = tree->bst_root; p != NULL; ) { int cmp = tree->bst_compare (item, p->bst_data, tree->bst_param); if (cmp < 0) p = p->bst_link[0]; else if (cmp > 0) p = p->bst_link[1]; else /* cmp == 0 */ return p->bst_data; } return NULL; } This code is included in 29, 145, 196, 489, 522, and 554. See also: [Knuth 1998b], section 6.2.2; [Cormen 1990], section 13.2; [Kernighan 1988], section 3.3; [Bentley 2000], chapters 4 and 5, section 9.3, appendix 1; [Sedgewick 1998], program 12.7. 4.7 Insertion ============= Inserting new nodes into a binary search tree is easy. To start out, we work the same way as in a search, traversing the tree from the top down, as if we were searching for the item that we're inserting. If we find one, the item is already in the tree, and we need not insert it again. But if the new item is not in the tree, eventually we "fall off" the bottom of the tree. At this point we graft the new item as a child of the node that we last examined. An example is in order. Consider this binary search tree: 5 / `_ 3 8 ^ / 2 4 6 Suppose that we wish to insert a new item, 7, into the tree. 7 is greater than 5, so examine 5's right child, 8. 7 is less than 8, so examine 8's left child, 6. 7 is greater than 6, but 6 has no right child. So, make 7 the right child of 6: 5 / `._ 3 8 ^ _' 2 4 6 \ 7 We cast this in a form compatible with the abstract description as follows: 32. = void ** bst_probe (struct bst_table *tree, void *item) { struct bst_node *p, *q; /* Current node in search and its parent. */ int dir; /* Side of q on which p is located. */ struct bst_node *n; /* Newly inserted node. */ assert (tree != NULL && item != NULL); for (q = NULL, p = tree->bst_root; p != NULL; q = p, p = p->bst_link[dir]) { int cmp = tree->bst_compare (item, p->bst_data, tree->bst_param); if (cmp == 0) return &p->bst_data; dir = cmp > 0; } n = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof *p); if (n == NULL) return NULL; tree->bst_count++; n->bst_link[0] = n->bst_link[1] = NULL; n->bst_data = item; if (q != NULL) q->bst_link[dir] = n; else tree->bst_root = n; return &n->bst_data; } This code is included in 29. See also: [Knuth 1998b], algorithm 6.2.2T; [Cormen 1990], section 13.3; [Bentley 2000], section 13.3; [Sedgewick 1998], program 12.7. Exercises: 1. Explain the expression p = (struct bst_node *) &tree->bst_root. Suggest an alternative. 2. Rewrite bst_probe() to use only a single local variable of type struct bst_node **. 3. Suppose we want to make a new copy of an existing binary search tree, preserving the original tree's shape, by inserting items into a new, currently empty tree. What constraints are there on the order of item insertion? 4. Write a function that calls a provided bst_item_func for each node in a provided BST in an order suitable for reproducing the original BST, as discussed in Exercise 3. 4.7.1 Aside: Root Insertion --------------------------- One side effect of the usual method for BST insertion, implemented in the previous section, is that items inserted more recently tend to be farther from the root, and therefore it takes longer to find them than items inserted longer ago. If all items are equally likely to be requested in a search, this is unimportant, but this is regrettable for some common usage patterns, where recently inserted items tend to be searched for more often than older items. In this section, we examine an alternative scheme for insertion that addresses this problem, called "insertion at the root" or "root insertion". An insertion with this algorithm always places the new node at the root of the tree. Following a series of such insertions, nodes inserted more recently tend to be nearer the root than other nodes. As a first attempt at implementing this idea, we might try simply making the new node the root and assigning the old root as one of its children. Unfortunately, this and similar approaches will not work because there is no guarantee that nodes in the existing tree have values all less than or all greater than the new node. An approach that will work is to perform a conventional insertion as a leaf node, then use a series of rotations to move the new node to the root. For example, the diagram below illustrates rotations to move node 4 to the root. A left rotation on 3 changes the first tree into the second, a right rotation on 5 changes the second into the third, and finally a left rotation on 1 moves 4 into the root position: 1 1 1 4 `._ `-..__ `-._ _.' \ 5 5 4 1 5 / \ / \ / \ => `_ \ 3 6 => 4 6 => 3 5 3 6 ^ / / \ / 2 4 3 2 6 2 / 2 The general rule follows the pattern above. If we moved down to the left from a node x during the insertion search, we rotate right at x. If we moved down to the right, we rotate left. The implementation is straightforward. As we search for the insertion point we keep track of the nodes we've passed through, then after the insertion we return to each of them in reverse order and perform a rotation: 33. = void ** bst_probe (struct bst_table *tree, void *item) { bst 198> return &n->bst_data; } 34. = pa[0] = (struct bst_node *) &tree->bst_root; da[0] = 0; k = 1; for (p = tree->bst_root; p != NULL; p = p->bst_link[da[k - 1]]) { int cmp = tree->bst_compare (item, p->bst_data, tree->bst_param); if (cmp == 0) return &p->bst_data; if (k >= BST_MAX_HEIGHT) { bst_balance (tree); return bst_probe (tree, item); } pa[k] = p; da[k++] = cmp > 0; } This code is included in 33. 35. = n = pa[k - 1]->bst_link[da[k - 1]] = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof *n); if (n == NULL) return NULL; n->bst_link[0] = n->bst_link[1] = NULL; n->bst_data = item; tree->bst_count++; tree->bst_generation++; This code is included in 33. 36. = for (; k > 1; k--) { struct bst_node *q = pa[k - 1]; if (da[k - 1] == 0) { q->bst_link[0] = n->bst_link[1]; n->bst_link[1] = q; } else /* da[k - 1] == 1 */ { q->bst_link[1] = n->bst_link[0]; n->bst_link[0] = q; } pa[k - 2]->bst_link[da[k - 2]] = n; } This code is included in 33, 622, and 627. See also: [Sedgewick 1998], section 12.8. Exercises: 1. Root insertion will prove useful later when we write a function to join a pair of disjoint BSTs (*note Joining BSTs::). For that purpose, we need to be able to insert a preallocated node as the root of an arbitrary tree that may be a subtree of some other tree. Write a function to do this matching the following prototype: static int root_insert (struct bst_table *tree, struct bst_node **root, struct bst_node *new_node); Your function should insert new_node at *root using root insertion, storing new_node into *root, and return nonzero only if successful. The subtree at *root is in tree. You may assume that no node matching new_node exists within subtree root. 2. Now implement a root insertion as in Exercise 1, except that the function is not allowed to fail, and rebalancing the tree is not acceptable either. Use the same prototype with the return type changed to void. *3. Suppose that we perform a series of root insertions in an initially empty BST. What kinds of insertion orders require a large amount of stack? 4.8 Deletion ============ Deleting an item from a binary search tree is a little harder than inserting one. Before we write any code, let's consider how to delete nodes from a binary search tree in an abstract fashion. Here's a BST from which we can draw examples during the discussion: 5 _.-' `-.__ 2 9 / \ / 1 3 8 \ _' 4 6 \ 7 It is more difficult to remove some nodes from this tree than to remove others. Here, we recognize three distinct cases (Exercise 4.8-1 offers a fourth), described in detail below in terms of the deletion of a node designated p. Case 1: p has no right child ............................ It is trivial to delete a node with no right child, such as node 1, 4, 7, or 8 above. We replace the pointer leading to p by p's left child, if it has one, or by a null pointer, if not. In other words, we replace the deleted node by its left child. For example, the process of deleting node 8 looks like this: 5 _.-' `-..__ 5 2 9 _.-' `._ / \ _' 2 9 1 3 8,p => / \ _' \ _' 1 3 6 4 6 \ \ \ 4 7 7 This diagram shows the convention of separating multiple labels on a single node by a comma: node 8 is also node p. Case 2: p's right child has no left child ......................................... This case deletes any node p with a right child r that itself has no left child. Nodes 2, 3, and 6 in the tree above are examples. In this case, we move r into p's place, attaching p's former left subtree, if any, as the new left subtree of r. For instance, to delete node 2 in the tree above, we can replace it by its right child 3, giving node 2's left child 1 to node 3 as its new left child. The process looks like this: 5 5 ___..--' `-.__ _.-' `-.__ 2,p 9 3,r 9 / \ / / \ / 1 3,r 8 => 1 4 8 \ _' _' 4 6 6 \ \ 7 7 Case 3: p's right child has a left child ........................................ This is the "hard" case, where p's right child r has a left child. but if we approach it properly we can make it make sense. Let p's "inorder successor", that is, the node with the smallest value greater than p, be s. Then, our strategy is to detach s from its position in the tree, which is always an easy thing to do, and put it into the spot formerly occupied by p, which disappears from the tree. In our example, to delete node 5, we move inorder successor node 6 into its place, like this: 5,p _.-' `--..__ 6,s 2 9 _.-' `-._ / \ / 2 9 1 3 8 => / \ / \ _.-' 1 3 8 4 6,s \ / \ 4 7 7 But how do we know that node s exists and that we can delete it easily? We know that it exists because otherwise this would be case 1 or case 2 (consider their conditions). We can easily detach from its position for a more subtle reason: s is the inorder successor of p and is therefore has the smallest value in p's right subtree, so s cannot have a left child. (If it did, then this left child would have a smaller value than s, so it, rather than s, would be p's inorder successor.) Because s doesn't have a left child, we can simply replace it by its right child, if any. This is the mirror image of case 1. Implementation .............. The code for BST deletion closely follows the above discussion. Let's start with an outline of the function: 37. = void * bst_delete (struct bst_table *tree, const void *item) { struct bst_node *p, *q; /* Node to delete and its parent. */ int cmp; /* Comparison between p->bst_data and item. */ int dir; /* Side of q on which p is located. */ assert (tree != NULL && item != NULL); } This code is included in 29. We begin by finding the node to delete, in much the same way that bst_find() did. But, in every case above, we replace the link leading to the node being deleted by another node or a null pointer. To do so, we have to keep track of the pointer that led to the node to be deleted. This is the purpose of q and dir in the code below. 38. = p = (struct bst_node *) &tree->bst_root; for (cmp = -1; cmp != 0; cmp = tree->bst_compare (item, p->bst_data, tree->bst_param)) { dir = cmp > 0; q = p; p = p->bst_link[dir]; if (p == NULL) return NULL; } item = p->bst_data; This code is included in 37. Now we can actually delete the node. Here is the code to distinguish between the three cases: 39. = if (p->bst_link[1] == NULL) { } else { struct bst_node *r = p->bst_link[1]; if (r->bst_link[0] == NULL) { } else { } } This code is included in 37. In case 1, we simply replace the node by its left subtree: 40. = q->bst_link[dir] = p->bst_link[0]; This code is included in 39. In case 2, we attach the node's left subtree as its right child r's left subtree, then replace the node by r: 41. = r->bst_link[0] = p->bst_link[0]; q->bst_link[dir] = r; This code is included in 39. We begin case 3 by finding p's inorder successor as s, and the parent of s as r. Node p's inorder successor is the smallest value in p's right subtree and that the smallest value in a tree can be found by descending to the left until a node with no left child is found: 42. = struct bst_node *s; for (;;) { s = r->bst_link[0]; if (s->bst_link[0] == NULL) break; r = s; } See also 43. This code is included in 39. Case 3 wraps up by adjusting pointers so that s moves into p's place: 43. += r->bst_link[0] = s->bst_link[1]; s->bst_link[0] = p->bst_link[0]; s->bst_link[1] = p->bst_link[1]; q->bst_link[dir] = s; As the final step, we decrement the number of nodes in the tree, free the node, and return its data: 44. = tree->bst_alloc->libavl_free (tree->bst_alloc, p); tree->bst_count--; tree->bst_generation++; return (void *) item; This code is included in 37. See also: [Knuth 1998b], algorithm 6.2.2D; [Cormen 1990], section 13.3. Exercises: 1. Write code for a case 1.5 which handles deletion of nodes with no left child. 2. In the code presented above for case 3, we update pointers to move s into p's position, then free p. An alternate approach is to replace p's data by s's and delete s. Write code to use this approach. Can a similar modification be made to either of the other cases? *3. The code in the previous exercise is a few lines shorter than that in the main text, so it would seem to be preferable. Explain why the revised code, and other code based on the same idea, cannot be used in libavl. (Hint: consider the semantics of libavl traversers.) 4.8.1 Aside: Deletion by Merging -------------------------------- The libavl algorithm for deletion is commonly used, but it is also seemingly ad-hoc and arbitrary in its approach. In this section we'll take a look at another algorithm that may seem a little more uniform. Unfortunately, though it is conceptually simpler in some ways, in practice this algorithm is both slower and more difficult to properly implement. The idea behind this algorithm is to consider deletion as breaking the links between the deleted node and its parent and children. In the most general case, we end up with three disconnected BSTs, one that contains the deleted node's parent and two corresponding to the deleted node's former subtrees. The diagram below shows how this idea works out for the deletion of node 5 from the tree on the left: 2 2 / `._ / 1 5 1 _' `._ 3 9 => 3 9 \ / \ / 4 7 4 7 ^ ^ 6 8 6 8 Of course, the problem then becomes to reassemble the pieces into a single binary search tree. We can do this by merging the two former subtrees of the deleted node and attaching them as the right child of the parent subtree. As the first step in merging the subtrees, we take the minimum node r in the former right subtree and repeatedly perform a right rotation on its parent, until it is the root of its subtree. The process up to this point looks like this for our example, showing only the subtree containing r: 9 6,r 9 __..-' `._ _' 6,r 9 7 => \ => _' _' \ 7 7 6,r 8 \ \ 8 8 Now, because r is the root and the minimum node in its subtree, r has no left child. Also, all the nodes in the opposite subtree are smaller than r. So to merge these subtrees, we simply link the opposite subtree as r's left child and connect r in place of the deleted node: 2 / 2 1 / `._ 1 6,r 3 6,r _' `._ \ `._ => 3 9 4 9 \ _' _' 4 7 7 \ \ 8 8 The function outline is straightforward: 45. = void * bst_delete (struct bst_table *tree, const void *item) { struct bst_node *p; /* The node to delete, or a node part way to it. */ struct bst_node *q; /* Parent of p. */ int cmp, dir; /* Result of comparison between item and p. */ assert (tree != NULL && item != NULL); return (void *) item; } First we search for the node to delete, storing it as p and its parent as q: 46. = p = (struct bst_node *) &tree->bst_root; for (cmp = -1; cmp != 0; cmp = tree->bst_compare (item, p->bst_data, tree->bst_param)) { dir = cmp > 0; q = p; p = p->bst_link[dir]; if (p == NULL) return NULL; } This code is included in 45. The actual deletion process is not as simple. We handle specially the case where p has no right child. This is unfortunate for uniformity, but simplifies the rest of the code considerably. The main case starts off with a loop on variable r to build a stack of the nodes in the right subtree of p that will need to be rotated. After the loop, r is the minimum value in p's right subtree. This will be the new root of the merged subtrees after the rotations, so we set r as q's child on the appropriate side and r's left child as p's former left child. After that the only remaining task is the rotations themselves, so we perform them and we're done: 47. = if (p->bst_link[1] != NULL) { struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[BST_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k = 0; /* Stack height. */ struct bst_node *r; /* Iterator; final value is minimum node in subtree. */ for (r = p->bst_link[1]; r->bst_link[0] != NULL; r = r->bst_link[0]) { if (k >= BST_MAX_HEIGHT) { bst_balance (tree); return bst_delete (tree, item); } pa[k] = r; da[k++] = 0; } q->bst_link[dir] = r; r->bst_link[0] = p->bst_link[0]; for (; k > 0; k--) { struct bst_node *y = pa[k - 1]; struct bst_node *x = y->bst_link[0]; y->bst_link[0] = x->bst_link[1]; x->bst_link[1] = y; if (k > 1) pa[k - 2]->bst_link[da[k - 2]] = x; } } else q->bst_link[dir] = p->bst_link[0]; This code is included in 45. Finally, there's a bit of obligatory bookkeeping: 48. = item = p->bst_data; tree->bst_alloc->libavl_free (tree->bst_alloc, p); tree->bst_count--; tree->bst_generation++; This code is included in 45. See also: [Sedgewick 1998], section 12.9. 4.9 Traversal ============= After we've been manipulating a binary search tree for a while, we will want to know what items are in it. The process of enumerating the items in a binary search tree is called "traversal". libavl provides the bst_t_* functions for a particular kind of traversal called "inorder traversal", so-called because items are enumerated in sorted order. In this section we'll implement three algorithms for traversal. Each of these algorithms is based on and in some way improves upon the previous algorithm. The final implementation is the one used in libavl, so we will implement all of the bst_t_* functions for it. Before we start looking at particular algorithms, let's consider some criteria for evaluating traversal algorithms. The following are not the only criteria that could be used, but they are indeed important:(1) complexity Is it difficult to describe or to correctly implement the algorithm? Complex algorithms also tend to take more code than simple ones. efficiency Does the algorithm make good use of time and memory? The ideal traversal algorithm would require time proportional to the number of nodes traversed and a constant amount of space. In this chapter we will meet this ideal time criterion and come close on the space criterion for the average case. In future chapters we will be able to do better even in the worst case. convenience Is it easy to integrate the traversal functions into other code? Callback functions are not as easy to use as other methods that can be used from for loops (*note Improving Convenience::). reliability Are there pathological cases where the algorithm breaks down? If so, is it possible to fix these problems using additional time or space? generality Does the algorithm only allow iteration in a single direction? Can we begin traversal at an arbitrary node, or just at the least or greatest node? resilience If the tree is modified during a traversal, is it possible to continue traversal, or does the modification invalidate the traverser? The first algorithm we will consider uses recursion. This algorithm is worthwhile primarily for its simplicity. In C, such an algorithm cannot be made as efficient, convenient, or general as other algorithms without unacceptable compromises. It is possible to make it both reliable and resilient, but we won't bother because of its other drawbacks. We arrive at our second algorithm through a literal transformation of the recursion in the first algorithm into iteration. The use of iteration lets us improve the algorithm's memory efficiency, and, on many machines, its time efficiency as well. The iterative algorithm also lets us improve the convenience of using the traverser. We could also add reliability and resilience to an implementation of this algorithm, but we'll save that for later. The only problem with this algorithm, in fact, lies in its generality: it works best for moving only in one direction and starting from the least or greatest node. The importance of generality is what draws us to the third algorithm. This algorithm is based on ideas from the previous iterative algorithm along with some simple observations. This algorithm is no more complex than the previous one, but it is more general, allowing easily for iteration in either direction starting anywhere in the tree. This is the algorithm used in libavl, so we build an efficient, convenient, reliable, general, resilient implementation. ---------- Footnotes ---------- (1) Some of these terms are not generic BST vocabulary. Rather, they have been adopted for these particular uses in this text. You can consider the above to be our working definitions of these terms. 4.9.1 Traversal by Recursion ---------------------------- To figure out how to traverse a binary search tree in inorder, think about a BST's structure. A BST consists of a root, a left subtree, and right subtree. All the items in the left subtree have smaller values than the root and all the items in the right subtree have larger values than the root. That's good enough right there: we can traverse a BST in inorder by dealing with its left subtree, then doing with the root whatever it is we want to do with each node in the tree (generically, "visit" the root node), then dealing with its right subtree. But how do we deal with the subtrees? Well, they're BSTs too, so we can do the same thing: traverse its left subtree, then visit its root, then traverse its right subtree, and so on. Eventually the process terminates because at some point the subtrees are null pointers, and nothing needs to be done to traverse an empty tree. Writing the traversal function is almost trivial. We use bst_item_func to visit a node (*note Item and Copy Functions::): 49. = static void traverse_recursive (struct bst_node *node, bst_item_func *action, void *param) { if (node != NULL) { traverse_recursive (node->bst_link[0], action, param); action (node->bst_data, param); traverse_recursive (node->bst_link[1], action, param); } } See also 50. We also want a wrapper function to insulate callers from the existence of individual tree nodes: 50. += void walk (struct bst_table *tree, bst_item_func *action, void *param) { assert (tree != NULL && action != NULL); traverse_recursive (tree->bst_root, action, param); } See also: [Knuth 1997], section 2.3.1; [Cormen 1990], section 13.1; [Sedgewick 1998], program 12.8. Exercises: 1. Instead of checking for a null node at the top of traverse_recursive(), would it be better to check before calling in each place that the function is called? Why or why not? 2. Some languages, such as Pascal, support the concept of "nested functions", that is, functions within functions, but C does not. Some algorithms, including recursive tree traversal, can be expressed much more naturally with this feature. Rewrite walk(), in a hypothetical C-like language that supports nested functions, as a function that calls an inner, recursively defined function. The nested function should only take a single parameter. (The GNU C compiler supports nested functions as a language extension, so you may want to use it to check your code.) 4.9.2 Traversal by Iteration ---------------------------- The recursive approach of the previous section is one valid way to traverse a binary search tree in sorted order. This method has the advantages of being simple and "obviously correct". But it does have problems with efficiency, because each call to traverse_recursive() receives its own duplicate copies of arguments action and param, and with convenience, because writing a new callback function for each traversal is unpleasant. It has other problems, too, as already discussed, but these are the ones to be addressed immediately. Unfortunately, neither problem can be solved acceptably in C using a recursive method, the first because the traversal function has to somehow know the action function and the parameter to pass to it, and the second because there is simply no way to jump out of and then back into recursive calls in C.(1) Our only option is to use an algorithm that does not involve recursion. The simplest way to eliminate recursion is by a literal conversion of the recursion to iteration. This is the topic of this section. Later, we will consider a slightly different, and in some ways superior, iterative solution. Converting recursion into iteration is an interesting problem. There are two main ways to do it: tail recursion elimination If a recursive call is the last action taken in a function, then it is equivalent to a goto back to the beginning of the function, possibly after modifying argument values. (If the function has a return value then the recursive call must be a return statement returning the value received from the nested call.) This form of recursion is called "tail recursion". save-and-restore recursion elimination In effect, a recursive function call saves a copy of argument values and local variables, modifies the arguments, then executes a goto to the beginning of the function. Accordingly, the return from the nested call is equivalent to restoring the saved arguments and local variables, then executing a goto back to the point where the call was made. We can make use of both of these rules in converting traverse_recursive() to iterative form. First, does traverse_recursive() ever call itself as its last action? The answer is yes, so we can convert that to an assignment plus a goto statement: 51. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { start: if (node != NULL) { traverse_iterative (node->bst_link[0], action, param); action (node->bst_data, param); node = node->bst_link[1]; goto start; } } Sensible programmers are not fond of goto. Fortunately, it is easy to eliminate by rephrasing in terms of a while loop: 52. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { while (node != NULL) { traverse_iterative (node->bst_link[0], action, param); action (node->bst_data, param); node = node->bst_link[1]; } } This still leaves another recursive call, one that is not tail recursive. This one must be eliminated by saving and restoring values. A stack is ideal for this purpose. For now, we use a stack of fixed size BST_MAX_HEIGHT and deal with stack overflow by aborting. Later, we'll handle overflow more gracefully. Here's the code: 53. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { struct bst_node *stack[BST_MAX_HEIGHT]; size_t height = 0; start: while (node != NULL) { if (height >= BST_MAX_HEIGHT) { fprintf (stderr, "tree too deep\n"); exit (EXIT_FAILURE); } stack[height++] = node; node = node->bst_link[0]; goto start; resume: action (node->bst_data, param); node = node->bst_link[1]; } if (height > 0) { node = stack[--height]; goto resume; } } This code, an ugly mash of statements, is a prime example of why goto statements are discouraged, but its relationship with the earlier code is clear. To make it acceptable for real use, we must rephrase it. First, we can eliminate label resume by recognizing that it can only be reached from the corresponding goto statement, then moving its code appropriately: 54. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { struct bst_node *stack[BST_MAX_HEIGHT]; size_t height = 0; start: while (node != NULL) { if (height >= BST_MAX_HEIGHT) { fprintf (stderr, "tree too deep\n"); exit (EXIT_FAILURE); } stack[height++] = node; node = node->bst_link[0]; goto start; } if (height > 0) { node = stack[--height]; action (node->bst_data, param); node = node->bst_link[1]; goto start; } } The first remaining goto statement can be eliminated without any other change, because it is redundant; the second, by enclosing the whole function body in an "infinite loop": 55. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { struct bst_node *stack[BST_MAX_HEIGHT]; size_t height = 0; for (;;) { while (node != NULL) { if (height >= BST_MAX_HEIGHT) { fprintf (stderr, "tree too deep\n"); exit (EXIT_FAILURE); } stack[height++] = node; node = node->bst_link[0]; } if (height == 0) break; node = stack[--height]; action (node->bst_data, param); node = node->bst_link[1]; } } This initial iterative version takes care of the efficiency problem. Exercises: 1. Function traverse_iterative() relies on stack[], a stack of nodes yet to be visited, which as allocated can hold up to BST_MAX_HEIGHT nodes. Consider the following questions concerning stack[]: a. What is the maximum height this stack will attain in traversing a binary search tree containing n nodes, if the binary tree has minimum possible height? b. What is the maximum height this stack can attain in traversing any binary tree of n nodes? The minimum height? c. Under what circumstances is it acceptable to use a fixed-size stack as in the example code? d. Rewrite traverse_iterative() to dynamically expand stack[] in case of overflow. e. Does traverse_recursive() also have potential for running out of "stack" or "memory"? If so, more or less than traverse_iterative() as modified by the previous part? ---------- Footnotes ---------- (1) This is possible in some other languages, such as Scheme, that support "coroutines" as well as subroutines. 4.9.2.1 Improving Convenience ............................. Now we can work on improving the convenience of our traversal function. But, first, perhaps it's worthwhile to demonstrate how inconvenient it really can be to use walk(), regardless of how it's implemented internally. Suppose that we have a BST of character strings and, for whatever reason, want to know the total length of all the strings in it. We could do it like this using walk(): 56. = static void process_node (void *data, void *param) { const char *string = data; size_t *total = param; *total += strlen (string); } size_t total_length (struct bst_table *tree) { size_t total = 0; walk (tree, process_node, &total); return total; } With the functions first_item() and next_item() that we'll write in this section, we can rewrite these functions as the single function below: 57. = size_t total_length (struct bst_table *tree) { struct traverser t; const char *string; size_t total = 0; for (string = first_item (tree, &t); string != NULL; string = next_item (&t)) total += strlen (string); return total; } You're free to make your own assessment, of course, but many programmers prefer the latter because of its greater brevity and fewer "unsafe" conversions to and from void pointers. Now to actually write the code. Our task is to modify traverse_iterative() so that, instead of calling action, it returns node->bst_data. But first, some infrastructure. We define a structure to contain the state of the traversal, equivalent to the relevant argument and local variables in traverse_iterative(). To emphasize that this is not our final version of this structure or the related code, we will call it struct traverser, without any name prefix: 58. = struct traverser { struct bst_table *table; /* Tree being traversed. */ struct bst_node *node; /* Current node in tree. */ struct bst_node *stack[BST_MAX_HEIGHT]; /* Parent nodes to revisit. */ size_t height; /* Number of nodes in stack. */ }; See also 59 and 60. Function first_item() just initializes a struct traverser and returns the first item in the tree, deferring most of its work to next_item(): 59. += /* Initializes trav for tree. Returns data item in tree with the smallest value, or NULL if tree is empty. In the former case, next_item() may be called with trav to retrieve additional data items. */ void * first_item (struct bst_table *tree, struct traverser *trav) { assert (tree != NULL && trav != NULL); trav->table = tree; trav->node = tree->bst_root; trav->height = 0; return next_item (trav); } Function next_item() is, for the most part, a simple modification of traverse_iterative(): 60. += /* Returns the next data item in inorder within the tree being traversed with trav, or if there are no more data items returns NULL. In the former case next_item() may be called again to retrieve the next item. */ void * next_item (struct traverser *trav) { struct bst_node *node; assert (trav != NULL); node = trav->node; while (node != NULL) { if (trav->height >= BST_MAX_HEIGHT) { fprintf (stderr, "tree too deep\n"); exit (EXIT_FAILURE); } trav->stack[trav->height++] = node; node = node->bst_link[0]; } if (trav->height == 0) return NULL; node = trav->stack[--trav->height]; trav->node = node->bst_link[1]; return node->bst_data; } See also: [Knuth 1997], algorithm 2.3.1T; [Knuth 1992], p. 50-54, section "Recursion Elimination" within article "Structured Programming with go to statements". Exercises: 1. Make next_item() reliable by providing alternate code to execute on stack overflow. This code will work by calling bst_balance() to "balance" the tree, reducing its height such that it can be traversed with the small stack that we use. We will develop bst_balance() later. For now, consider it a "black box" that simply needs to be invoked with the tree to balance as an argument. Don't forget to adjust the traverser structure so that later calls will work properly, too. 2. Without modifying next_item() or first_item(), can a function prev_item() be written that will move to and return the previous item in the tree in inorder? 4.9.3 Better Iterative Traversal -------------------------------- We have developed an efficient, convenient function for traversing a binary tree. In the exercises, we made it reliable, and it is possible to make it resilient as well. But its algorithm makes it difficult to add generality. In order to do that in a practical way, we will have to use a new algorithm. Let us start by considering how to understand how to find the successor or predecessor of any node in general, as opposed to just blindly transforming code as we did in the previous section. Back when we wrote bst_delete(), we already solved half of the problem, by figuring out how to find the successor of a node that has a right child: take the least-valued node in the right subtree of the node (*note Deletion Case 3: successor.). The other half is the successor of a node that doesn't have a right child. Take a look at the code for one of the previous traversal functions--recursive or iterative, whichever you better understand--and mentally work out the relationship between the current node and its successor for a node without a right child. What happens is that we move up the tree, from a node to its parent, one node at a time, until it turns out that we moved up to the right (as opposed to up to the left) and that is the successor node. Think of it this way: if we move up to the left, then the node we started at has a lesser value than where we ended up, so we've already visited it, but if we move up to the right, then we're moving to a node with a greater value, so we've found the successor. Using these instructions, we can find the predecessor of a node, too, just by exchanging "left" and "right". This suggests that all we have to do in order to generalize our traversal function is to keep track of all the nodes above the current node, not just the ones that are up and to the left. This in turn suggests our final implementation of struct bst_traverser, with appropriate comments: 61. = /* BST traverser structure. */ struct bst_traverser { struct bst_table *bst_table; /* Tree being traversed. */ struct bst_node *bst_node; /* Current node in tree. */ struct bst_node *bst_stack[BST_MAX_HEIGHT]; /* All the nodes above bst_node. */ size_t bst_height; /* Number of nodes in bst_parent. */ unsigned long bst_generation; /* Generation number. */ }; This code is included in 24, 142, and 192. Because user code is expected to declare actual instances of struct bst_traverser, struct bst_traverser must be defined in and therefore all of its member names are prefixed by bst_ for safety. The only surprise in struct bst_traverser is member bst_generation, the traverser's generation number. This member is set equal to its namesake in struct bst_table when a traverser is initialized. After that, the two values are compared whenever the stack of parent pointers must be accessed. Any change in the tree that could disturb the action of a traverser will cause their generation numbers to differ, which in turn triggers an update to the stack. This is what allows this final implementation to be resilient. We need a utility function to actually update the stack of parent pointers when differing generation numbers are detected. This is easy to write: 62. = /* Refreshes the stack of parent pointers in trav and updates its generation number. */ static void trav_refresh (struct bst_traverser *trav) { assert (trav != NULL); trav->bst_generation = trav->bst_table->bst_generation; if (trav->bst_node != NULL) { bst_comparison_func *cmp = trav->bst_table->bst_compare; void *param = trav->bst_table->bst_param; struct bst_node *node = trav->bst_node; struct bst_node *i; trav->bst_height = 0; for (i = trav->bst_table->bst_root; i != node; ) { assert (trav->bst_height < BST_MAX_HEIGHT); assert (i != NULL); trav->bst_stack[trav->bst_height++] = i; i = i->bst_link[cmp (node->bst_data, i->bst_data, param) > 0]; } } } This code is included in 63 and 178. The following sections will implement all of the traverser functions bst_t_*(). *Note Traversers::, for descriptions of the purpose of each of these functions. The traversal functions are collected together into : 63. = This code is included in 29. Exercises: 1. The bst_probe() function doesn't change the tree's generation number. Why not? *2. The main loop in trav_refresh() contains the assertion assert (trav->bst_height < BST_MAX_HEIGHT); Prove that this assertion is always true. 3. In trav_refresh(), it is tempting to avoid calls to the user-supplied comparison function by comparing the nodes on the stack to the current state of the tree; e.g., move up the stack, starting from the bottom, and for each node verify that it is a child of the previous one on the stack, falling back to the general algorithm at the first mismatch. Why won't this work? 4.9.3.1 Starting at the Null Node ................................. The trav_t_init() function just initializes a traverser to the null item, indicated by a null pointer for bst_node. 64. = void bst_t_init (struct bst_traverser *trav, struct bst_table *tree) { trav->bst_table = tree; trav->bst_node = NULL; trav->bst_height = 0; trav->bst_generation = tree->bst_generation; } This code is included in 63 and 178. 4.9.3.2 Starting at the First Node .................................. To initialize a traverser to start at the least valued node, we simply descend from the root as far down and left as possible, recording the parent pointers on the stack as we go. If the stack overflows, then we balance the tree and start over. 65. = void * bst_t_first (struct bst_traverser *trav, struct bst_table *tree) { struct bst_node *x; assert (tree != NULL && trav != NULL); trav->bst_table = tree; trav->bst_height = 0; trav->bst_generation = tree->bst_generation; x = tree->bst_root; if (x != NULL) while (x->bst_link[0] != NULL) { if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (tree); return bst_t_first (trav, tree); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[0]; } trav->bst_node = x; return x != NULL ? x->bst_data : NULL; } This code is included in 63. Exercises: *1. Show that bst_t_first() will never make more than one recursive call to itself at a time. 4.9.3.3 Starting at the Last Node ................................. The code to start from the greatest node in the tree is analogous to that for starting from the least node. The only difference is that we descend to the right instead: 66. = void * bst_t_last (struct bst_traverser *trav, struct bst_table *tree) { struct bst_node *x; assert (tree != NULL && trav != NULL); trav->bst_table = tree; trav->bst_height = 0; trav->bst_generation = tree->bst_generation; x = tree->bst_root; if (x != NULL) while (x->bst_link[1] != NULL) { if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (tree); return bst_t_last (trav, tree); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[1]; } trav->bst_node = x; return x != NULL ? x->bst_data : NULL; } This code is included in 63. 4.9.3.4 Starting at a Found Node ................................ Sometimes it is convenient to begin a traversal at a particular item in a tree. This function works in the same was as bst_find(), but records parent pointers in the traverser structure as it descends the tree. 67. = void * bst_t_find (struct bst_traverser *trav, struct bst_table *tree, void *item) { struct bst_node *p, *q; assert (trav != NULL && tree != NULL && item != NULL); trav->bst_table = tree; trav->bst_height = 0; trav->bst_generation = tree->bst_generation; for (p = tree->bst_root; p != NULL; p = q) { int cmp = tree->bst_compare (item, p->bst_data, tree->bst_param); if (cmp < 0) q = p->bst_link[0]; else if (cmp > 0) q = p->bst_link[1]; else /* cmp == 0 */ { trav->bst_node = p; return p->bst_data; } if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (trav->bst_table); return bst_t_find (trav, tree, item); } trav->bst_stack[trav->bst_height++] = p; } trav->bst_height = 0; trav->bst_node = NULL; return NULL; } This code is included in 63. 4.9.3.5 Starting at an Inserted Node .................................... Another operation that can be useful is to insert a new node and construct a traverser to the inserted node in a single operation. The following code does this: 68. = void * bst_t_insert (struct bst_traverser *trav, struct bst_table *tree, void *item) { struct bst_node **q; assert (tree != NULL && item != NULL); trav->bst_table = tree; trav->bst_height = 0; q = &tree->bst_root; while (*q != NULL) { int cmp = tree->bst_compare (item, (*q)->bst_data, tree->bst_param); if (cmp == 0) { trav->bst_node = *q; trav->bst_generation = tree->bst_generation; return (*q)->bst_data; } if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (tree); return bst_t_insert (trav, tree, item); } trav->bst_stack[trav->bst_height++] = *q; q = &(*q)->bst_link[cmp > 0]; } trav->bst_node = *q = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof **q); if (*q == NULL) { trav->bst_node = NULL; trav->bst_generation = tree->bst_generation; return NULL; } (*q)->bst_link[0] = (*q)->bst_link[1] = NULL; (*q)->bst_data = item; tree->bst_count++; trav->bst_generation = tree->bst_generation; return (*q)->bst_data; } This code is included in 63. 4.9.3.6 Initialization by Copying ................................. This function copies one traverser to another. It only copies the stack of parent pointers if they are up-to-date: 69. = void * bst_t_copy (struct bst_traverser *trav, const struct bst_traverser *src) { assert (trav != NULL && src != NULL); if (trav != src) { trav->bst_table = src->bst_table; trav->bst_node = src->bst_node; trav->bst_generation = src->bst_generation; if (trav->bst_generation == trav->bst_table->bst_generation) { trav->bst_height = src->bst_height; memcpy (trav->bst_stack, (const void *) src->bst_stack, sizeof *trav->bst_stack * trav->bst_height); } } return trav->bst_node != NULL ? trav->bst_node->bst_data : NULL; } This code is included in 63 and 178. Exercises: 1. Without the check that trav != src before copying src into trav, what might happen? 4.9.3.7 Advancing to the Next Node .................................. The algorithm of bst_t_next(), the function for finding a successor, divides neatly into three cases. Two of these are the ones that we discussed earlier in the introduction to this kind of traverser (*note Better Iterative Traversal::). The third case occurs when the last node returned was NULL, in which case we return the least node in the table, in accordance with the semantics for libavl tables. The function outline is this: 70. = void * bst_t_next (struct bst_traverser *trav) { struct bst_node *x; assert (trav != NULL); if (trav->bst_generation != trav->bst_table->bst_generation) trav_refresh (trav); x = trav->bst_node; if (x == NULL) { return bst_t_first (trav, trav->bst_table); } else if (x->bst_link[1] != NULL) { } else { } trav->bst_node = x; return x->bst_data; } This code is included in 63. The case where the current node has a right child is accomplished by stepping to the right, then to the left until we can't go any farther, as discussed in detail earlier. The only difference is that we must check for stack overflow. When stack overflow does occur, we recover by calling trav_balance(), then restarting bst_t_next() using a tail-recursive call. The tail recursion will never happen more than once, because trav_balance() ensures that the tree's height is small enough that the stack cannot overflow again: 71. = if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (trav->bst_table); return bst_t_next (trav); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[1]; while (x->bst_link[0] != NULL) { if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (trav->bst_table); return bst_t_next (trav); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[0]; } This code is included in 70. In the case where the current node has no right child, we move upward in the tree based on the stack of parent pointers that we saved, as described before. When the stack underflows, we know that we've run out of nodes in the tree: 72. = struct bst_node *y; do { if (trav->bst_height == 0) { trav->bst_node = NULL; return NULL; } y = x; x = trav->bst_stack[--trav->bst_height]; } while (y == x->bst_link[1]); This code is included in 70. 4.9.3.8 Backing Up to the Previous Node ....................................... Moving to the previous node is analogous to moving to the next node. The only difference, in fact, is that directions are reversed from left to right. 73. = void * bst_t_prev (struct bst_traverser *trav) { struct bst_node *x; assert (trav != NULL); if (trav->bst_generation != trav->bst_table->bst_generation) trav_refresh (trav); x = trav->bst_node; if (x == NULL) { return bst_t_last (trav, trav->bst_table); } else if (x->bst_link[0] != NULL) { if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (trav->bst_table); return bst_t_prev (trav); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[0]; while (x->bst_link[1] != NULL) { if (trav->bst_height >= BST_MAX_HEIGHT) { bst_balance (trav->bst_table); return bst_t_prev (trav); } trav->bst_stack[trav->bst_height++] = x; x = x->bst_link[1]; } } else { struct bst_node *y; do { if (trav->bst_height == 0) { trav->bst_node = NULL; return NULL; } y = x; x = trav->bst_stack[--trav->bst_height]; } while (y == x->bst_link[0]); } trav->bst_node = x; return x->bst_data; } This code is included in 63. 4.9.3.9 Getting the Current Item ................................ 74. = void * bst_t_cur (struct bst_traverser *trav) { assert (trav != NULL); return trav->bst_node != NULL ? trav->bst_node->bst_data : NULL; } This code is included in 63, 178, 268, 395, 502, and 546. 4.9.3.10 Replacing the Current Item ................................... 75. = void * bst_t_replace (struct bst_traverser *trav, void *new) { void *old; assert (trav != NULL && trav->bst_node != NULL && new != NULL); old = trav->bst_node->bst_data; trav->bst_node->bst_data = new; return old; } This code is included in 63, 178, 268, 395, 502, and 546. 4.10 Copying ============ In this section, we're going to write function bst_copy() to make a copy of a binary tree. This is the most complicated function of all those needed for BST functionality, so pay careful attention as we proceed. 4.10.1 Recursive Copying ------------------------ The "obvious" way to copy a binary tree is recursive. Here's a basic recursive copy, hard-wired to allocate memory with malloc() for simplicity: 76. = /* Makes and returns a new copy of tree rooted at x. */ static struct bst_node * bst_copy_recursive_1 (struct bst_node *x) { struct bst_node *y; if (x == NULL) return NULL; y = malloc (sizeof *y); if (y == NULL) return NULL; y->bst_data = x->bst_data; y->bst_link[0] = bst_copy_recursive_1 (x->bst_link[0]); y->bst_link[1] = bst_copy_recursive_1 (x->bst_link[1]); return y; } But, again, it would be nice to rewrite this iteratively, both because the iterative version is likely to be faster and for the sheer mental exercise of it. Recall, from our earlier discussion of inorder traversal, that tail recursion (recursion where a function calls itself as its last action) is easier to convert to iteration than other types. Unfortunately, neither of the recursive calls above are tail-recursive. Fortunately, we can rewrite it so that it is, if we change the way we allocate data: 77. = /* Copies tree rooted at x to y, which latter is allocated but not yet initialized. */ static void bst_copy_recursive_2 (struct bst_node *x, struct bst_node *y) { y->bst_data = x->bst_data; if (x->bst_link[0] != NULL) { y->bst_link[0] = malloc (sizeof *y->bst_link[0]); bst_copy_recursive_2 (x->bst_link[0], y->bst_link[0]); } else y->bst_link[0] = NULL; if (x->bst_link[1] != NULL) { y->bst_link[1] = malloc (sizeof *y->bst_link[1]); bst_copy_recursive_2 (x->bst_link[1], y->bst_link[1]); } else y->bst_link[1] = NULL; } Exercises: 1. When malloc() returns a null pointer, bst_copy_recursive_1() fails "silently", that is, without notifying its caller about the error, and the output is a partial copy of the original tree. Without removing the recursion, implement two different ways to propagate such errors upward to the function's caller: a. Change the function's prototype to: static int bst_robust_copy_recursive_1 (struct bst_node *, struct bst_node **); b. Without changing the function's prototype. (Hint: use a statically declared struct bst_node). In each case make sure that any allocated memory is safely freed if an allocation error occurs. 2. bst_copy_recursive_2() is even worse than bst_copy_recursive_1() at handling allocation failure. It actually invokes undefined behavior when an allocation fails. Fix this, changing it to return an int, with nonzero return values indicating success. Be careful not to leak memory. 4.10.2 Iterative Copying ------------------------ Now we can factor out the recursion, starting with the tail recursion. This process is very similar to what we did with the traversal code, so the details are left for Exercise 1. Let's look at the results part by part: 78. = /* Copies org to a newly created tree, which is returned. */ struct bst_table * bst_copy_iterative (const struct bst_table *org) { struct bst_node *stack[2 * (BST_MAX_HEIGHT + 1)]; /* Stack. */ int height = 0; /* Stack height. */ See also 79, 80, and81. This time, our stack will have two pointers added to it at a time, one from the original tree and one from the copy. Thus, the stack needs to be twice as big. In addition, we'll see below that there'll be an extra item on the stack representing the pointer to the tree's root, so our stack needs room for an extra pair of items, which is the reason for the "+ 1" in stack[]'s size. 79. += struct bst_table *new; /* New tree. */ const struct bst_node *x; /* Node currently being copied. */ struct bst_node *y; /* New node being copied from x. */ new = bst_create (org->bst_compare, org->bst_param, org->bst_alloc); new->bst_count = org->bst_count; if (new->bst_count == 0) return new; x = (const struct bst_node *) &org->bst_root; y = (struct bst_node *) &new->bst_root; This is the same kind of "dirty trick" already described in Exercise 4.7-1. 80. += for (;;) { while (x->bst_link[0] != NULL) { y->bst_link[0] = org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[0]); stack[height++] = (struct bst_node *) x; stack[height++] = y; x = x->bst_link[0]; y = y->bst_link[0]; } y->bst_link[0] = NULL; This code moves x down and to the left in the tree until it runs out of nodes, allocating space in the new tree for left children and pushing nodes from the original tree and the copy onto the stack as it goes. The cast on x suppresses a warning or error due to x, a pointer to a const structure, being stored into a non-constant pointer in stack[]. We won't ever try to store into the pointer that we store in there, so this is legitimate. We've switched from using malloc() to using the allocation function provided by the user. This is easy now because we have the tree structure to work with. To do this earlier, we would have had to somehow pass the tree structure to each recursive call of the copy function, wasting time and space. 81. += for (;;) { y->bst_data = x->bst_data; if (x->bst_link[1] != NULL) { y->bst_link[1] = org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[1]); x = x->bst_link[1]; y = y->bst_link[1]; break; } else y->bst_link[1] = NULL; if (height <= 2) return new; y = stack[--height]; x = stack[--height]; } } } We do not pop the bottommost pair of items off the stack because these items contain the fake struct bst_node pointer that is actually the address of bst_root. When we get down to these items, we're done copying and can return the new tree. See also: [Knuth 1997], algorithm 2.3.1C; [ISO 1990], section 6.5.2.1. Exercises: 1. Suggest a step between bst_copy_recursive_2() and bst_copy_iterative(). 4.10.3 Error Handling --------------------- So far, outside the exercises, we've ignored the question of handling memory allocation errors during copying. In our other routines, we've been careful to implement to handle allocation failures by cleaning up and returning an error indication to the caller. Now we will apply this same policy to tree copying, as libavl semantics require (*note Creation and Destruction::): a memory allocation error causes the partially copied tree to be destroyed and returns a null pointer to the caller. This is a little harder to do than recovering after a single operation, because there are potentially many nodes that have to be freed, and each node might include additional user data that also has to be freed. The new BST might have as-yet-uninitialized pointer fields as well, and we must be careful to avoid reading from these fields as we destroy the tree. We could use a number of strategies to destroy the partially copied tree while avoiding uninitialized pointers. The strategy that we will actually use is to initialize these pointers to NULL, then call the general tree destruction routine bst_destroy(). We haven't yet written bst_destroy(), so for now we'll treat it as a "black box" that does what we want, even if we don't understand how. Next question: _which_ pointers in the tree are not initialized? The answer is simple: during the copy, we will not revisit nodes not currently on the stack, so only pointers in the current node (y) and on the stack can be uninitialized. For its part, depending on what we're doing to it, y might not have any of its fields initialized. As for the stack, nodes are pushed onto it because we have to come back later and build their right subtrees, so we must set their right child pointers to NULL. We will need this error recovery code in a number of places, so it is worth making it into a small helper function: 82. = static void copy_error_recovery (struct bst_node **stack, int height, struct bst_table *new, bst_item_func *destroy) { assert (stack != NULL && height >= 0 && new != NULL); for (; height > 2; height -= 2) stack[height - 1]->bst_link[1] = NULL; bst_destroy (new, destroy); } This code is included in 83 and 185. Another problem that can arise in copying a binary tree is stack overflow. We will handle stack overflow by destroying the partial copy, balancing the original tree, and then restarting the copy. The balanced tree is guaranteed to have small enough height that it will not overflow the stack. The code below for our final version of bst_copy() takes three new parameters: two function pointers and a memory allocator. The meaning of these parameters was explained earlier (*note Creation and Destruction::). Their use within the function should be self-explanatory. 83. = struct bst_table * bst_copy (const struct bst_table *org, bst_copy_func *copy, bst_item_func *destroy, struct libavl_allocator *allocator) { struct bst_node *stack[2 * (BST_MAX_HEIGHT + 1)]; int height = 0; struct bst_table *new; const struct bst_node *x; struct bst_node *y; assert (org != NULL); new = bst_create (org->bst_compare, org->bst_param, allocator != NULL ? allocator : org->bst_alloc); if (new == NULL) return NULL; new->bst_count = org->bst_count; if (new->bst_count == 0) return new; x = (const struct bst_node *) &org->bst_root; y = (struct bst_node *) &new->bst_root; for (;;) { while (x->bst_link[0] != NULL) { if (height >= 2 * (BST_MAX_HEIGHT + 1)) { y->bst_data = NULL; y->bst_link[0] = y->bst_link[1] = NULL; copy_error_recovery (stack, height, new, destroy); bst_balance ((struct bst_table *) org); return bst_copy (org, copy, destroy, allocator); } y->bst_link[0] = new->bst_alloc->libavl_malloc (new->bst_alloc, sizeof *y->bst_link[0]); if (y->bst_link[0] == NULL) { if (y != (struct bst_node *) &new->bst_root) { y->bst_data = NULL; y->bst_link[1] = NULL; } copy_error_recovery (stack, height, new, destroy); return NULL; } stack[height++] = (struct bst_node *) x; stack[height++] = y; x = x->bst_link[0]; y = y->bst_link[0]; } y->bst_link[0] = NULL; for (;;) { if (copy == NULL) y->bst_data = x->bst_data; else { y->bst_data = copy (x->bst_data, org->bst_param); if (y->bst_data == NULL) { y->bst_link[1] = NULL; copy_error_recovery (stack, height, new, destroy); return NULL; } } if (x->bst_link[1] != NULL) { y->bst_link[1] = new->bst_alloc->libavl_malloc (new->bst_alloc, sizeof *y->bst_link[1]); if (y->bst_link[1] == NULL) { copy_error_recovery (stack, height, new, destroy); return NULL; } x = x->bst_link[1]; y = y->bst_link[1]; break; } else y->bst_link[1] = NULL; if (height <= 2) return new; y = stack[--height]; x = stack[--height]; } } } This code is included in 29. 4.11 Destruction ================ Eventually, we'll want to get rid of the trees we've spent all this time constructing. When this happens, it's time to destroy them by freeing their memory. 4.11.1 Destruction by Rotation ------------------------------ The method actually used in libavl for destruction of binary trees is somewhat novel. This section will cover this method. Later sections will cover more conventional techniques using recursive or iterative "postorder traversal". To destroy a binary tree, we must visit and free each node. We have already covered one way to traverse a tree (inorder traversal) and used this technique for traversing and copying a binary tree. But, both times before, we were subject to both the explicit constraint that we had to visit the nodes in sorted order and the implicit constraint that we were not to change the structure of the tree, or at least not to change it for the worse. Neither of these constraints holds for destruction of a binary tree. As long as the tree finally ends up freed, it doesn't matter how much it is mangled in the process. In this case, "the end justifies the means" and we are free to do it however we like. So let's consider why we needed a stack before. It was to keep track of nodes whose left subtree we were currently visiting, in order to go back later and visit them and their right subtrees. Hmm...what if we rearranged nodes so that they _didn't have_ any left subtrees? Then we could just descend to the right, without need to keep track of anything on a stack. We can do this. For the case where the current node p has a left child q, consider the transformation below where we rotate right at p: p q / \ / \ q c => a p ^ ^ a b b c where a, b, and c are arbitrary subtrees or even empty trees. This transformation shifts nodes from the left to the right side of the root (which is now q). If it is performed enough times, the root node will no longer have a left child. After the transformation, q becomes the current node. For the case where the current node has no left child, we can just destroy the current node and descend to its right. Because the transformation used does not change the tree's ordering, we end up destroying nodes in inorder. It is instructive to verify this by simulating with paper and pencil the destruction of a few trees this way. The code to implement destruction in this manner is brief and straightforward: 84. = void bst_destroy (struct bst_table *tree, bst_item_func *destroy) { struct bst_node *p, *q; assert (tree != NULL); for (p = tree->bst_root; p != NULL; p = q) if (p->bst_link[0] == NULL) { q = p->bst_link[1]; if (destroy != NULL && p->bst_data != NULL) destroy (p->bst_data, tree->bst_param); tree->bst_alloc->libavl_free (tree->bst_alloc, p); } else { q = p->bst_link[0]; p->bst_link[0] = q->bst_link[1]; q->bst_link[1] = p; } tree->bst_alloc->libavl_free (tree->bst_alloc, tree); } This code is included in 29, 145, 196, 489, 522, and 554. See also: [Stout 1986], tree_to_vine procedure. Exercises: 1. Before calling destroy() above, we first test that we are not passing it a NULL pointer, because we do not want destroy() to have to deal with this case. How can such a pointer get into the tree in the first place, since bst_probe() refuses to insert such a pointer into a tree? 4.11.2 Aside: Recursive Destruction ----------------------------------- The algorithm used in the previous section is easy and fast, but it is not the most common method for destroying a tree. The usual way is to perform a traversal of the tree, in much the same way we did for tree traversal and copying. Once again, we'll start from a recursive implementation, because these are so easy to write. The only tricky part is that subtrees have to be freed _before_ the root. This code is hard-wired to use free() for simplicity: 85. = static void bst_destroy_recursive (struct bst_node *node) { if (node == NULL) return; bst_destroy_recursive (node->bst_link[0]); bst_destroy_recursive (node->bst_link[1]); free (node); } 4.11.3 Aside: Iterative Destruction ----------------------------------- As we've done before for other algorithms, we can factor the recursive destruction algorithm into an equivalent iteration. In this case, neither recursive call is tail recursive, and we can't easily modify the code so that it is. We could still factor out the recursion by our usual methods, although it would be more difficult, but this problem is simple enough to figure out from first principles. Let's do it that way, instead, this time. The idea is that, for the tree's root, we traverse its left subtree, then its right subtree, then free the root. This pattern is called a "postorder traversal". Let's think about how much state we need to keep track of. When we're traversing the root's left subtree, we still need to remember the root, in order to come back to it later. The same is true while traversing the root's right subtree, because we still need to come back to free the root. What's more, we need to keep track of what state we're in: have we traversed the root's left subtree or not, have we traversed the root's right subtree or not? This naturally suggests a stack that holds two-part items (root, state), where root is the root of the tree or subtree and state is the state of the traversal at that node. We start by selecting the tree's root as our current node p, then pushing (p, 0) onto the stack and moving down to the left as far as we can, pushing as we go. Then we start popping off the stack into (p, state) and notice that state is 0, which tells us that we've traversed p's left subtree but not its right. So, we push (p, 1) back onto the stack, then we traverse p's right subtree. When, later, we pop off that same node back off the stack, the 1 tells us that we've already traversed both subtrees, so we free the node and keep popping. The pattern follows as we continue back up the tree. That sounds pretty complicated, so let's work through an example to help clarify. Consider this binary search tree: 4 / \ 2 5 ^ 1 3 Abstractly speaking, we start with 4 as p and an empty stack. First, we work our way down the left-child pointers, pushing onto the stack as we go. We push (4, 0), then (2, 0), then (1, 0), and then p is NULL and we've fallen off the bottom of the tree. We pop the top item off the stack into (p, state), getting (1, 0). Noticing that we have 0 for state, we push (1, 1) on the stack and traverse 1's right subtree, but it is empty so there is nothing to do. We pop again and notice that state is 1, meaning that we've fully traversed 1's subtrees, so we free node 1. We pop again, getting 2 for p and 0 for state. Because state is 0, we push (2, 1) and traverse 2's right subtree, which means that we push (3, 0). We traverse 3's null right subtree (again, it is empty so there is nothing to do), pushing and popping (3, 1), then free node 3, then move back up to 2. Because we've traversed 2's right subtree, state is 1 and p is 2, and we free node 2. You should be able to figure out how 4 and 5 get freed. A straightforward implementation of this approach looks like this: 86. = void bst_destroy (struct bst_table *tree, bst_item_func *destroy) { struct bst_node *stack[BST_MAX_HEIGHT]; unsigned char state[BST_MAX_HEIGHT]; int height = 0; struct bst_node *p; assert (tree != NULL); p = tree->bst_root; for (;;) { while (p != NULL) { if (height >= BST_MAX_HEIGHT) { fprintf (stderr, "tree too deep\n"); exit (EXIT_FAILURE); } stack[height] = p; state[height] = 0; height++; p = p->bst_link[0]; } for (;;) { if (height == 0) { tree->bst_alloc->libavl_free (tree->bst_alloc, tree); return; } height--; p = stack[height]; if (state[height] == 0) { state[height++] = 1; p = p->bst_link[1]; break; } else { if (destroy != NULL && p->bst_data != NULL) destroy (p->bst_data, tree->bst_param); tree->bst_alloc->libavl_free (tree->bst_alloc, p); } } } } See also: [Knuth 1997], exercise 13 in section 2.3.1. 4.12 Balance ============ Sometimes binary trees can grow to become much taller than their optimum height. For example, the following binary tree was one of the tallest from a sample of 100 15-node trees built by inserting nodes in random order: 0 `-..__ 5 _' `---...___ 3 12 _' \ __..--' \ 1 4 7 13 \ / `-.__ \ 2 6 11 14 _.-' 8 `_ 10 / 9 The average number of comparisons required to find a random node in this tree is (1 + 2 + (3 * 2) + (4 * 4) + (5 * 4) + 6 + 7 + 8) / 15 = 4.4 comparisons. In contrast, the corresponding optimal binary tree, shown below, requires only (1 + (2 * 2) + (3 * 4) + (4 * 8))/15 = 3.3 comparisons, on average. Moreover, the optimal tree requires a maximum of 4, as opposed to 8, comparisons for any search: 7 _.' `-..__ 3 11 / \ _.' `_ 1 5 9 13 ^ ^ / \ / \ 0 2 4 6 8 10 12 14 Besides this inefficiency in time, trees that grow too tall can cause inefficiency in space, leading to an overflow of the stack in bst_t_next(), bst_copy(), or other functions. For both reasons, it is helpful to have a routine to rearrange a tree to its minimum possible height, that is, to "balance" the tree. The algorithm we will use for balancing proceeds in two stages. In the first stage, the binary tree is "flattened" into a pathological, linear binary tree, called a "vine." In the second stage, binary tree structure is restored by repeatedly "compressing" the vine into a minimal-height binary tree. Here's a top-level view of the balancing function: 87. = void bst_balance (struct bst_table *tree) { assert (tree != NULL); tree_to_vine (tree); vine_to_tree (tree); tree->bst_generation++; } This code is included in 29. 88. = /* Special BST functions. */ void bst_balance (struct bst_table *tree); This code is included in 24, 247, 372, and 486. See also: [Stout 1986], rebalance procedure. 4.12.1 From Tree to Vine ------------------------ The first stage of balancing converts a binary tree into a linear structure resembling a linked list, called a "vine". The vines we will create have the greatest value in the binary tree at the root and decrease descending to the left. Any binary search tree that contains a particular set of values, no matter its shape, corresponds to the same vine of this type. For instance, all binary search trees of the integers 0...4 will be transformed into the following vine: 4 / 3 / 2 / 1 / 0 The method for transforming a tree into a vine of this type is similar to that used for destroying a tree by rotation (*note Destroying a BST by Rotation::). We step pointer p through the tree, starting at the root of the tree, maintaining pointer q as p's parent. (Because we're building a vine, p is always the left child of q.) At each step, we do one of two things: * If p has no right child, then this part of the tree is already the shape we want it to be. We step p and q down to the left and continue. * If p has a right child r, then we rotate left at p, performing the following transformation: | | q q _.' _' p r / \ => / \ a r p c ^ ^ b c a b where a, b, and c are arbitrary subtrees or empty trees. Node r then becomes the new p. If c is an empty tree, then, in the next step, we will continue down the tree. Otherwise, the right subtree of p is smaller (contains fewer nodes) than previously, so we're on the right track. This is all it takes: 89. = /* Converts tree into a vine. */ static void tree_to_vine (struct bst_table *tree) { struct bst_node *q, *p; q = (struct bst_node *) &tree->bst_root; p = tree->bst_root; while (p != NULL) if (p->bst_link[1] == NULL) { q = p; p = p->bst_link[0]; } else { struct bst_node *r = p->bst_link[1]; p->bst_link[1] = r->bst_link[0]; r->bst_link[0] = p; p = r; q->bst_link[0] = r; } } This code is included in 87, 511, and 679. See also: [Stout 1986], tree_to_vine procedure. 4.12.2 From Vine to Balanced Tree --------------------------------- Converting the vine, once we have it, into a balanced tree is the interesting and clever part of the balancing operation. However, at first it may be somewhat less than obvious how this is actually done. We will tackle the subject by presenting an example, then the generalized form. Suppose we have a vine, as above, with 2**n - 1 nodes for positive integer n. For the sake of example, take n = 4, corresponding to a tree with 15 nodes. We convert this vine into a balanced tree by performing three successive "compression" operations. To perform the first compression, move down the vine, starting at the root. Conceptually assign each node a "color", alternating between red and black and starting with red at the root.(1) Then, take each red node, except the bottommost, and remove it from the vine, making it the child of its black former child node. After this transformation, we have something that looks a little more like a tree. Instead of a 15-node vine, we have a 7-node black vine with a 7-node red vine as its right children and a single red node as its left child. Graphically, this first compression step on a 15-node vine looks like this: 14 __..-' \ 15 12 15 _' __..-' \ 14 10 13 _' __..-' \ 13 8 11 _' => __..-' \ ... 6 9 _' 2 __..-' \ 4 7 _' 1 __..-' \ 2 5 _' \ 1 3 To perform the second compression, recolor all the red nodes to white, then change the color of alternate black nodes to red, starting at the root. As before, extract each red node, except the bottommost, and reattach it as the child of its black former child node. Attach each black node's right subtree as the left subtree of the corresponding red node. Thus, we have the following: 14 __.-' \ 12 15 12 __.-' \ 10 13 ___...---' `_ 8 14 _.-' \ 8 11 ___...--' `_ / \ 4 10 13 15 _.-' \ => 6 9 _.-' `_ / \ 2 6 9 11 _.-' \ 4 7 / \ / \ 1 3 5 7 _.-' \ 2 5 / \ 1 3 The third compression is the same as the first two. Nodes 12 and 4 are recolored red, then node 12 is removed and reattached as the right child of its black former child node 8, receiving node 8's right subtree as its left subtree: 12 8 ___...--' `_ 8 14 __.-' `--..__ / \ 4 12 __.-' `_ 13 15 4 10 => / \ _.-' `_ / \ 2 6 10 14 / \ 9 11 ^ ^ / \ / \ 2 6 1 3 5 7 9 11 13 15 ^ ^ 1 3 5 7 The result is a fully balanced tree. ---------- Footnotes ---------- (1) These colors are for the purpose of illustration only. They are not stored in the nodes and are not related to those used in a "red-black tree". 4.12.2.1 General Trees ...................... A compression is the repeated application of a right rotation, called in this context a "compression transformation", once for each black node, like so: | | R B _.-' \ / `_ B c => a R / \ / \ a b b c So far, all of the compressions we've performed have involved all 2**k - 1 nodes composing the "main vine." This works out well for an initial vine of exactly 2**n - 1 nodes. In this case, a total of n - 1 compressions are required, where for successive compressions k = n, n - 1, ..., 2. For trees that do not have exactly one fewer than a power of two nodes, we need to begin with a compression that does not involve all of the nodes in the vine. Suppose that our vine has m nodes, where 2**n - 1 < m < 2**(n+1) - 1 for some value of n. Then, by applying the compression transformation shown above m - (2**n - 1) times, we reduce the length of the main vine to exactly 2**n - 1 nodes. After that, we can treat the problem in the same way as the former case. The result is a balanced tree with n full levels of nodes, and a bottom level containing m - (2**n - 1) nodes and (2**(n + 1) - 1) - m vacancies. An example is indicated. Suppose that the vine contains m == 9 nodes numbered from 1 to 9. Then n == 3 since we have 2**3 - 1 = 7 < 9 < 15 = 2**4 - 1, and we must perform the compression transformation shown above 9 - (2**3 - 1) = 2 times initially, reducing the main vine's length to 7 nodes. Afterward, we treat the problem the same way as for a tree that started off with only 7 nodes, performing one compression with k == 3 and one with k == 2. The entire sequence, omitting the initial vine, looks like this: 8 6 4 _.-' \ _.-' \ 6 9 4 8 / `_ ^ 2 6 _' \ _.-' \ 7 9 => ^ / \ 5 7 => 2 5 1 3 5 8 ^ _' / \ 7 9 ... 1 3 _' 1 Now we have a general technique that can be applied to a vine of any size. 4.12.2.2 Implementation ....................... Implementing this algorithm is more or less straightforward. Let's start from an outline: 90. = /* Converts tree, which must be in the shape of a vine, into a balanced tree. */ static void vine_to_tree (struct bst_table *tree) { unsigned long vine; /* Number of nodes in main vine. */ unsigned long leaves; /* Nodes in incomplete bottom level, if any. */ int height; /* Height of produced balanced tree. */ } This code is included in 87. The first step is to calculate the number of compression transformations necessary to reduce the general case of a tree with m nodes to the special case of exactly 2**n - 1 nodes, i.e., calculate m - (2**n - 1), and store it in variable leaves. We are given only the value of m, as tree->bst_count. Rewriting the calculation as the equivalent m + 1 - 2**n, one way to calculate it is evident from looking at the pattern in binary: m n m + 1 2**n m + 1 - 2**n 1 1 2 = 00010 2 = 00010 0 = 00000 2 1 3 = 00011 2 = 00010 1 = 00001 3 2 4 = 00100 4 = 00100 0 = 00000 4 2 5 = 00101 4 = 00100 1 = 00001 5 2 6 = 00110 4 = 00100 2 = 00010 6 2 7 = 00111 4 = 00100 3 = 00011 7 3 8 = 01000 8 = 01000 0 = 00000 8 3 9 = 01001 8 = 01000 1 = 00000 9 3 10 = 01001 8 = 01000 2 = 00000 See the pattern? It's simply that m + 1 - 2**n is m with the leftmost 1-bit turned off. So, if we can find the leftmost 1-bit in , we can figure out the number of leaves. In turn, there are numerous ways to find the leftmost 1-bit in a number. The one used here is based on the principle that, if x is a positive integer, then x & (x - 1) is x with its rightmost 1-bit turned off. Here's the code that calculates the number of leaves and stores it in leaves: 91. = leaves = tree->bst_count + 1; for (;;) { unsigned long next = leaves & (leaves - 1); if (next == 0) break; leaves = next; } leaves = tree->bst_count + 1 - leaves; This code is included in 90, 285, 512, and 680. Once we have the number of leaves, we perform a compression composed of leaves compression transformations. That's all it takes to reduce the general case to the 2**n - 1 special case. We'll write the compress() function itself later: 92. = compress ((struct bst_node *) &tree->bst_root, leaves); This code is included in 90, 512, and 680. The heart of the function is the compression of the vine into the tree. Before each compression, vine contains the number of nodes in the main vine of the tree. The number of compression transformations necessary for the compression is vine / 2; e.g., when the main vine contains 7 nodes, 7 / 2 = 3 transformations are necessary. The number of nodes in the vine afterward is the same number (*note Transforming a Vine into a Balanced BST::). At the same time, we keep track of the height of the balanced tree. The final tree always has height at least 1. Each compression step means that it is one level taller than that. If the tree needed general-to-special-case transformations, that is, leaves > 0, then it's one more than that. 93. = vine = tree->bst_count - leaves; height = 1 + (leaves > 0); while (vine > 1) { compress ((struct bst_node *) &tree->bst_root, vine / 2); vine /= 2; height++; } This code is included in 90, 512, and 680. Finally, we make sure that the height of the tree is within range for what the functions that use stacks can handle. Otherwise, we could end up with an infinite loop, with bst_t_next() (for example) calling bst_balance() repeatedly to balance the tree in order to reduce its height to the acceptable range. 94. = if (height > BST_MAX_HEIGHT) { fprintf (stderr, "libavl: Tree too big (%lu nodes) to handle.", (unsigned long) tree->bst_count); exit (EXIT_FAILURE); } This code is included in 90. 4.12.2.3 Implementing Compression ................................. The final bit of code we need is that for performing a compression. The following code performs a compression consisting of count applications of the compression transformation starting at root: 95. = /* Performs a compression transformation count times, starting at root. */ static void compress (struct bst_node *root, unsigned long count) { assert (root != NULL); while (count--) { struct bst_node *red = root->bst_link[0]; struct bst_node *black = red->bst_link[0]; root->bst_link[0] = black; red->bst_link[0] = black->bst_link[1]; black->bst_link[1] = red; root = black; } } This code is included in 90 and 512. The operation of compress() should be obvious, given the discussion earlier. *Note Balancing General Trees::, above, for a review. See also: [Stout 1986], vine_to_tree procedure. 4.13 Aside: Joining BSTs ======================== Occasionally we may want to take a pair of BSTs and merge or "join" their contents, forming a single BST that contains all the items in the two original BSTs. It's easy to do this with a series of calls to bst_insert(), but we can optimize the process if we write a function exclusively for the purpose. We'll write such a function in this section. There are two restrictions on the trees to be joined. First, the BSTs' contents must be disjoint. That is, no item in one may match any item in the other. Second, the BSTs must have compatible comparison functions. Typically, they are the same. Speaking more precisely, if f() and g() are the comparison functions, p and q are nodes in either BST, and r and s are the BSTs' user-provided extra comparison parameters, then the expressions f(p, q, r), f(p, q, s), g(p, q, r), and g(p, q, s) must all have the same value for all possible choices of p and q. Suppose we're trying to join the trees shown below: 4,a 7,b _' \ / \ 1 6 3 8 \ \ ^ 2 9 0 5 Our first inclination is to try a "divide and conquer" approach by reducing the problem of joining a and b to the subproblems of joining a's left subtree with b's left subtree and joining a's right subtree with b's right subtree. Let us postulate for the moment that we are able to solve these subproblems and that the solutions that we come up with are the following: 3 8 _.-' \ ^ 0 5 6 9 \ 1 \ 2 To convert this partial solution into a full solution we must combine these two subtrees into a single tree and at the same time reintroduce the nodes a and b into the combined tree. It is easy enough to do this by making a (or b) the root of the combined tree with these two subtrees as its children, then inserting b (or a) into the combined tree. Unfortunately, in neither case will this actually work out properly for our example. The diagram below illustrates one possibility, the result of combining the two subtrees as the child of node 4, then inserting node 7 into the final tree. As you can see, nodes 4 and 5 are out of order:(1) 4 _' `._ 3 8 _.-' \ _' \ 0 5 6 9 ** \ \ 1 7 \ 2 Now let's step back and analyze why this attempt failed. It was essentially because, when we recombined the subtrees, a node in the combined tree's left subtree had a value larger than the root. If we trace it back to the original trees to be joined, we see that this was because node 5 in the left subtree of b is greater than a. (If we had chosen 7 as the root of the combined tree we would have found instead node 6 in the right subtree of b to be the culprit.) On the other hand, if every node in the left subtree of a had a value less than b's value, and every node in the right subtree of a had a value greater than b's value, there would be no problem. Hey, wait a second... we can force that condition. If we perform a root insertion (*note Root Insertion in a BST::) of b into subtree a, then we end up with one pair of subtrees whose node values are all less than 7 (the new and former left subtrees of node 7) and one pair of subtrees whose node values are all greater than 7 (the new and former right subtrees of node 7). Conceptually it looks like this, although in reality we would need to remove node 7 from the tree on the right as we inserted it into the tree on the left: 7 7 _' \ / \ 4 9 3 8 _' \ ^ 1 6 0 5 \ 2 We can then combine the two subtrees with values less than 7 with each other, and similarly for the ones with values greater than 7, using the same algorithm recursively, and safely set the resulting subtrees as the left and right subtrees of node 7, respectively. The final product is a correctly joined binary tree: 7 _.' \ 3 8 _.-' \ \ 0 5 9 \ ^ 1 4 6 \ 2 Of course, since we've defined a join recursively in terms of itself, there must be some maximum depth to the recursion, some simple case that can be defined without further recursion. This is easy: the join of an empty tree with another tree is the second tree. Implementation .............. It's easy to implement this algorithm recursively. The only nonobvious part of the code below is the treatment of node b. We want to insert node b, but not b's children, into the subtree rooted at a. However, we still need to keep track of b's children. So we temporarily save b's children as b0 and b1 and set its child pointers to NULL before the root insertion. This code makes use of root_insert() from . 96. = /* Joins a and b, which are subtrees of tree, and returns the resulting tree. */ static struct bst_node * join (struct bst_table *tree, struct bst_node *a, struct bst_node *b) { if (b == NULL) return a; else if (a == NULL) return b; else { struct bst_node *b0 = b->bst_link[0]; struct bst_node *b1 = b->bst_link[1]; b->bst_link[0] = b->bst_link[1] = NULL; root_insert (tree, &a, b); a->bst_link[0] = join (tree, b0, a->bst_link[0]); a->bst_link[1] = join (tree, b1, a->bst_link[1]); return a; } } /* Joins a and b, which must be disjoint and have compatible comparison functions. b is destroyed in the process. */ void bst_join (struct bst_table *a, struct bst_table *b) { a->bst_root = join (a, a->bst_root, b->bst_root); a->bst_count += b->bst_count; free (b); } See also: [Sedgewick 1998], program 12.16. Exercises: 1. Rewrite bst_join() to avoid use of recursion. ---------- Footnotes ---------- (1) The ** notation in the diagram emphasizes that this is a counterexample. 4.14 Testing ============ Whew! We're finally done with building functions for performing BST operations. But we haven't tested any of our code. Testing is an essential step in writing programs, because untested software cannot be assumed to work. Let's build a test program that exercises all of the functions we wrote. We'll also do our best to make parts of it generic, so that we can reuse test code in later chapters when we want to test other BST-based structures. The first step is to figure out how to test the code. One goal in testing is to exercise as much of the code as possible. Ideally, every line of code would be executed sometime during testing. Often, this is difficult or impossible, but the principle remains valid, with the goal modified to testing as much of the code as possible. In applying this principle to the BST code, we have to consider why each line of code is executed. If we look at the code for most functions in , we can see that, if we execute them for any BST of reasonable size, most or all of their code will be tested. This is encouraging. It means that we can just construct some trees and try out the BST functions on them, check that the results make sense, and have a pretty good idea that they work. Moreover, if we build trees in a random fashion, and delete their nodes in a random order, and do it several times, we'll even have a good idea that the bst_probe() and bst_delete() cases have all come up and worked properly. (If you want to be sure, then you can insert printf() calls for each case to record when they trip.) This is not the same as a proof of correctness, but proofs of correctness can only be constructed by computer scientists with fancy degrees, not by mere clever programmers. There are three notably missing pieces of code coverage if we just do the above. These are stack overflow handling, memory allocation failure handling, and traverser code to deal with modified trees. But we can mop up these extra problems with a little extra effort:(1) * Stack overflow handling can be tested by forcing the stack to overflow. Stack overflow can occur in many places, so for best effect we must test each possible spot. We will write special tests for these problems. * Memory allocation failure handling can be tested by simulating memory allocation failures. We will write a replacement memory allocator that "fails" after a specified number of calls. This allocator will also allow for memory leak detection. * Traverser code to deal with modified trees. This can be tested by modifying trees during traversal and making sure that the traversal functions still work as expected. The testing code can be broken into the following groups of functions: Testing and verification These functions actually try out the BST routines and do their best to make sure that their results are correct. Test set generation Generates the order of node insertion and deletion, for use during testing. Memory manager Handles memory issues, including memory leak detection and failure simulation. User interaction Figures out what the user wants to test in this run. Main program Glues everything else together by calling functions in the proper order. Utilities Miscellaneous routines that don't fit comfortably into another category. Most of the test code will also work nicely for testing other binary tree-based structures. This code is grouped into a single file, , which has the following structure: 97. = #include #include #include #include #include #include #include #include "test.h"
avl 14> avl 28> avl 27> avl 61>
avl 15> #endif /* avl.h */ 143. = #include #include #include #include #include "avl.h" See also: [Knuth 1998b], sections 6.2.2 and 6.2.3; [Cormen 1990], section 13.4. ---------- Footnotes ---------- (1) This seems true intuitively, but there are some difficult mathematics in this area. For details, refer to [Knuth 1998b] theorem 6.2.2H, [Knuth 1977], and [Knuth 1978]. 5.1 Balancing Rule ================== A binary search tree is an AVL tree if the difference in height between the subtrees of each of its nodes is between -1 and +1. Said another way, a BST is an AVL tree if it is an empty tree or if its subtrees are AVL trees and the difference in height between its left and right subtree is between -1 and +1. Here are some AVL trees: 3 4 2 / \ / \ ^ 2 4 2 5 1 3 / ^ 1 1 3 These binary search trees are not AVL trees: 3 4 / / 2 2 / ^ 1 1 3 In an AVL tree, the height of a node's right subtree minus the height of its left subtree is called the node's "balance factor". Balance factors are always -1, 0, or +1. They are often represented as one of the single characters -, 0, or +. Because of their importance in AVL trees, balance factors will often be shown in this chapter in AVL tree diagrams along with or instead of data items. In tree diagrams, balance factors are enclosed in angle brackets: `<->', `<0>', `<+>'. Here are the AVL trees from above, but with balance factors shown in place of data values: <-> <-> <0> _' \ __..-' \ _' \ <-> <0> <0> <0> <0> <0> _' _' \ <0> <0> <0> See also: [Knuth 1998b], section 6.2.3. 5.1.1 Analysis -------------- How good is the AVL balancing rule? That is, before we consider how much complication it adds to BST operations, what does this balancing rule guarantee about performance? This is a simple question only if you're familiar with the mathematics behind computer science. For our purposes, it suffices to state the results: An AVL tree with n nodes has height between log2 (n + 1) and 1.44 * log2 (n + 2) - 0.328. An AVL tree with height h has between pow (2, (h + .328) / 1.44) - 2 and pow (2, h) - 1 nodes. For comparison, an optimally balanced BST with n nodes has height ceil (log2 (n + 1)). An optimally balanced BST with height h has between pow (2, h - 1) and pow (2, h) - 1 nodes.(1) The average speed of a search in a binary tree depends on the tree's height, so the results above are quite encouraging: an AVL tree will never be more than about 50% taller than the corresponding optimally balanced tree. Thus, we have a guarantee of good performance even in the worst case, and optimal performance in the best case. See also: [Knuth 1998b], theorem 6.2.3A. ---------- Footnotes ---------- (1) Here log2 is the standard C base-2 logarithm function, pow is the exponentiation function, and ceil is the "ceiling" or "round up" function. For more information, consult a C reference guide, such as [Kernighan 1988]. 5.2 Data Types ============== We need to define data types for AVL trees like we did for BSTs. AVL tree nodes contain all the fields that a BST node does, plus a field recording its balance factor: 144. = /* An AVL tree node. */ struct avl_node { struct avl_node *avl_link[2]; /* Subtrees. */ void *avl_data; /* Pointer to data. */ signed char avl_balance; /* Balance factor. */ }; This code is included in 142. We're using avl_ as the prefix for all AVL-related identifiers. The other data structures for AVL trees are the same as for BSTs. 5.3 Operations ============== Now we'll implement for AVL trees all the operations that we did for BSTs. Here's the outline. Creation and search of AVL trees is exactly like that for plain BSTs, and the generic table functions for insertion convenience, assertion, and memory allocation are still relevant, so we just reuse the code. Of the remaining functions, we will write new implementations of the insertion and deletion functions and revise the traversal and copy functions. 145. = avl 30> avl 31>
avl 592> avl 84> avl 6>
avl 594> This code is included in 143. 5.4 Insertion ============= The insertion function for unbalanced BSTs does not maintain the AVL balancing rule, so we have to write a new insertion function. But before we get into the nitty-gritty details, let's talk in generalities. This is time well spent because we will be able to apply many of the same insights to AVL deletion and insertion and deletion in red-black trees. Conceptually, there are two stages to any insertion or deletion operation in a balanced tree. The first stage may lead to violation of the tree's balancing rule. If so, we fix it in the second stage. The insertion or deletion itself is done in the first stage, in much the same way as in an unbalanced BST, and we may also do a bit of additional bookkeeping work, such as updating balance factors in an AVL tree, or swapping node "colors" in red-black trees. If the first stage of the operation does not lead to a violation of the tree's balancing rule, nothing further needs to be done. But if it does, the second stage rearranges nodes and modifies their attributes to restore the tree's balance. This process is said to "rebalance" the tree. The kinds of rebalancing that might be necessary depend on the way the operation is performed and the tree's balancing rule. A well-chosen balancing rule helps to minimize the necessity for rebalancing. When rebalancing does become necessary in an AVL or red-black tree, its effects are limited to the nodes along or near the direct path from the inserted or deleted node up to the root of the tree. Usually, only one or two of these nodes are affected, but, at most, one simple manipulation is performed at each of the nodes along this path. This property ensures that balanced tree operations are efficient (see Exercise 1 for details). That's enough theory for now. Let's return to discussing the details of AVL insertion. There are four steps in libavl's implementation of AVL insertion: 1. *Search* for the location to insert the new item. 2. *Insert* the item as a new leaf. 3. *Update* balance factors in the tree that were changed by the insertion. 4. *Rebalance* the tree, if necessary. Steps 1 and 2 are the same as for insertion into a BST. Step 3 performs the additional bookkeeping alluded to above in the general description of balanced tree operations. Finally, step 4 rebalances the tree, if necessary, to restore the AVL balancing rule. The following sections will cover all the details of AVL insertion. For now, here's an outline of avl_probe(): 146. = void ** avl_probe (struct avl_table *tree, void *item) { assert (tree != NULL && item != NULL); } This code is included in 145. 147. = struct avl_node *y, *z; /* Top node to update balance factor, and parent. */ struct avl_node *p, *q; /* Iterator, and parent. */ struct avl_node *n; /* Newly inserted node. */ struct avl_node *w; /* New root of rebalanced subtree. */ int dir; /* Direction to descend. */ unsigned char da[AVL_MAX_HEIGHT]; /* Cached comparison results. */ int k = 0; /* Number of cached results. */ This code is included in 146, 301, and 419. See also: [Knuth 1998b], algorithm 6.2.3A. Exercises: *1. When rebalancing manipulations are performed on the chain of nodes from the inserted or deleted node to the root, no manipulation takes more than a fixed amount of time. In other words, individual manipulations do not involve any kind of iteration or loop. What can you conclude about the speed of an individual insertion or deletion in a large balanced tree, compared to the best-case speed of an operation for unbalanced BSTs? 5.4.1 Step 1: Search -------------------- The search step is an extended version of the corresponding code for BST insertion in . The earlier code had only two variables to maintain: the current node the direction to descend from p. The AVL code does this, but it maintains some other variables, too. During each iteration of the for loop, p is the node we are examining, q is p's parent, y is the most recently examined node with nonzero balance factor, z is y's parent, and elements 0...k - 1 of array da[] record each direction descended, starting from z, in order to arrive at p. The purposes for many of these variables are surely uncertain right now, but they will become clear later. 148. = z = (struct avl_node *) &tree->avl_root; y = tree->avl_root; dir = 0; for (q = z, p = y; p != NULL; q = p, p = p->avl_link[dir]) { int cmp = tree->avl_compare (item, p->avl_data, tree->avl_param); if (cmp == 0) return &p->avl_data; if (p->avl_balance != 0) z = q, y = p, k = 0; da[k++] = dir = cmp > 0; } This code is included in 146. 5.4.2 Step 2: Insert -------------------- Following the search loop, q is the last non-null node examined, so it is the parent of the node to be inserted. The code below creates and initializes a new node as a child of q on side dir, and stores a pointer to it into n. Compare this code for insertion to that within . 149. = n = q->avl_link[dir] = tree->avl_alloc->libavl_malloc (tree->avl_alloc, sizeof *n); if (n == NULL) return NULL; tree->avl_count++; n->avl_data = item; n->avl_link[0] = n->avl_link[1] = NULL; n->avl_balance = 0; if (y == NULL) return &n->avl_data; This code is included in 146. Exercises: 1. How can y be NULL? Why is this special-cased? 5.4.3 Step 3: Update Balance Factors ------------------------------------ When we add a new node n to an AVL tree, the balance factor of n's parent must change, because the new node increases the height of one of the parent's subtrees. The balance factor of n's parent's parent may need to change, too, depending on the parent's balance factor, and in fact the change can propagate all the way up the tree to its root. At each stage of updating balance factors, we are in a similar situation. First, we are examining a particular node p that is one of n's direct ancestors. The first time around, p is n's parent, the next time, if necessary, p is n's grandparent, and so on. Second, the height of one of p's subtrees has increased, and which one can be determined using da[]. In general, if the height of p's left subtree increases, p's balance factor decreases. On the other hand, if the right subtree's height increases, p's balance factor increases. If we account for the three possible starting balance factors and the two possible sides, there are six possibilities. The three of these corresponding to an increase in one subtree's height are symmetric with the others that go along with an increase in the other subtree's height. We treat these three cases below. Case 1: p has balance factor 0 .............................. If p had balance factor 0, its new balance factor is - or +, depending on the side of the root to which the node was added. After that, the change in height propagates up the tree to p's parent (unless p is the tree's root) because the height of the subtree rooted at p's parent has also increased. The example below shows a new node n inserted as the left child of a node with balance factor 0. On the far left is the original tree before insertion; in the middle left is the tree after insertion but before any balance factors are adjusted; in the middle right is the tree after the first adjustment, with p as n's parent; on the far right is the tree after the second adjustment, with p as n's grandparent. Only in the trees on the far left and far right are all of the balance factors correct. <0> <0> p _' \ _' \ <-> <0> <0> <0> p <0> _' \ _' \ => _' => <-> => <-> <0> <0> <0> n _' _' <0> n n <0> <0> Case 2: p's shorter subtree has increased in height ................................................... If the new node was added to p's shorter subtree, then the subtree has become more balanced and its balance factor becomes 0. If p started out with balance factor +, this means the new node is in p's left subtree. If p had a - balance factor, this means the new node is in the right subtree. Since tree p has the same height as it did before, the change does not propagate up the tree any farther, and we are done. Here's an example that shows pre-insertion and post-balance factor updating views: <0> <0> __..-' `._ __..-' \ <+> p <+> <+> => \ <0> \ \ <0> _' \ <0> <0> n <0> <0> Case 3: p's taller subtree has increased in height .................................................. If the new node was added on the taller side of a subtree with nonzero balance factor, the balance factor becomes +2 or -2. This is a problem, because balance factors in AVL trees must be between -1 and +1. We have to rebalance the tree in this case. We will cover rebalancing later. For now, take it on faith that rebalancing does not increase the height of subtree p as a whole, so there is no need to propagate changes any farther up the tree. Here's an example of an insertion that leads to rebalancing. On the left is the tree before insertion; in the middle is the tree after insertion and updating balance factors; on the right is the tree after rebalancing to. The -2 balance factor is shown as two minus signs (--). The rebalanced tree is the same height as the original tree before insertion. <--> <-> _' <0> _' <-> _' \ <0> => _' => n <0> n <0> <0> As another demonstration that the height of a rebalanced subtree does not change after insertion, here's a similar example that has one more layer of nodes. The trees below follow the same pattern as the ones above, but the rebalanced subtree has a parent. Even though the tree's root has the wrong balance factor in the middle diagram, it turns out to be correct after rebalancing. <-> <-> _.' \ <-> _' \ <--> <0> __..-' \ <-> <0> _' <0> <0> _' => <-> => _' \ <0> _' n <0> n <0> <0> Implementation .............. Looking at the rules above, we can see that only in case 1, where p's balance factor is 0, do changes to balance factors continue to propagate upward in the tree. So we can start from n's parent and move upward in the tree, handling case 1 each time, until we hit a nonzero balance factor, handle case 2 or case 3 at that node, and we're done (except for possible rebalancing afterward). Wait a second--there is no efficient way to move upward in a binary search tree!(1) Fortunately, there is another approach we can use. Remember the extra code we put into ? This code kept track of the last node we'd passed through that had a nonzero balance factor as s. We can use s to move downward, instead of upward, through the nodes whose balance factors are to be updated. Node s itself is the topmost node to be updated; when we arrive at node n, we know we're done. We also kept track of the directions we moved downward in da[]. Suppose that we've got a node p whose balance factor is to be updated and a direction d that we moved from it. We know that if we moved down to the left (d == 0) then the balance factor must be decreased, and that if we moved down to the right (d == 1) then the balance factor must be increased. Now we have enough knowledge to write the code to update balance factors. The results are almost embarrassingly short: 150. = for (p = y, k = 0; p != n; p = p->avl_link[da[k]], k++) if (da[k] == 0) p->avl_balance--; else p->avl_balance++; This code is included in 146, 301, and 419. Now p points to the new node as a consequence of the loop's exit condition. Variable p will not be modified again in this function, so it is used in the function's final return statement to take the address of the new node's avl_data member (see above). Exercises: 1. Can case 3 be applied to the parent of the newly inserted node? 2. For each of the AVL trees below, add a new node with a value smaller than any already in the tree and update the balance factors of the existing nodes. For each balance factor that changes, indicate the numbered case above that applies. Which of the trees require rebalancing after the insertion? <0> <+> __..-' `._ _' `._ <-> <+> <-> <0> <0> _' \ _' _' \ <0> <0> <0> <0> <0> 3. Earlier versions of libavl used chars, not unsigned chars, to cache the results of comparisons, as the elements of da[] are used here. At some warning levels, this caused the GNU C compiler to emit the warning "array subscript has type `char'" when it encountered expressions like q->avl_link[da[k]]. Explain why this can be a useful warning message. 4. If our AVL trees won't ever have a height greater than 32, then we can portably use the bits in a single unsigned long to compactly store what the entire da[] array does. Write a new version of step 3 to use this form, along with any necessary modifications to other steps and avl_probe()'s local variables. ---------- Footnotes ---------- (1) We could make a list of the nodes as we move down the tree and reuse it on the way back up. We'll do that for deletion, but there's a simpler way for insertion, so keep reading. 5.4.4 Step 4: Rebalance ----------------------- We've covered steps 1 through 3 so far. Step 4, rebalancing, is somewhat complicated, but it's the key to the entire insertion procedure. It is also similar to, but simpler than, other rebalancing procedures we'll see later. As a result, we're going to discuss it in detail. Follow along carefully and it should all make sense. Before proceeding, let's briefly review the circumstances under which we need to rebalance. Looking back a few sections, we see that there is only one case where this is required: case 3, when the new node is added in the taller subtree of a node with nonzero balance factor. Case 3 is the case where y has a -2 or +2 balance factor after insertion. For now, we'll just consider the -2 case, because we can write code for the +2 case later in a mechanical way by applying the principle of symmetry. In accordance with this idea, step 4 branches into three cases immediately, one for each rebalancing case and a third that just returns from the function if no rebalancing is necessary: 151. = if (y->avl_balance == -2) { } else if (y->avl_balance == +2) { } else return &n->avl_data; See also 153 and 154. This code is included in 146. We will call y's left child x. The new node is somewhere in the subtrees of x. There are now only two cases of interest, distinguished on whether x has a + or - balance factor. These cases are almost entirely separate: 152. = struct avl_node *x = y->avl_link[0]; if (x->avl_balance == -1) { } else { } This code is included in 151 and 162. In either case, w receives the root of the rebalanced subtree, which is used to update the parent's pointer to the subtree root (recall that z is the parent of y): 153. += z->avl_link[y != z->avl_link[0]] = w; Finally, we increment the generation number, because the tree's structure has changed. Then we're done and we return to the caller: 154. += tree->avl_generation++; return &n->avl_data; Case 1: x has - balance factor .............................. For a - balance factor, we just rotate right at y. Then the entire process, including insertion and rebalancing, looks like this: | | | y y x <-> <--> <0> _.-' \ _.-' \ / `_ x c => x c => a* y <0> <-> <0> / \ / \ / \ a b a* b b c This figure also introduces a new graphical convention. The change in subtree a between the first and second diagrams is indicated by an asterisk (*).(1) In this case, it indicates that the new node was inserted in subtree a. The code here is similar to rotate_right() in the solution to Exercise 4.3-2: 155. = w = x; y->avl_link[0] = x->avl_link[1]; x->avl_link[1] = y; x->avl_balance = y->avl_balance = 0; This code is included in 152 and 529. Case 2: x has + balance factor .............................. This case is just a little more intricate. First, let x's right child be w. Either w is the new node, or the new node is in one of w's subtrees. To restore balance, we rotate left at x, then rotate right at y (this is a kind of "double rotation"). The process, starting just after the insertion and showing the results of each rotation, looks like this: | | y y | <--> <--> w __.-' \ _' \ <0> x d w d => / \ <+> => / \ x y / \ x c ^ ^ a w ^ a b c d ^ a b b c At the beginning, the figure does not show the balance factor of w. This is because there are three possibilities: *Case 2.1:* w has balance factor 0. This means that w is the new node. a, b, c, and d have height 0. After the rotations, x and y have balance factor 0. *Case 2.2:* w has balance factor -. a, b, and d have height h > 0, and c has height h - 1. *Case 2.3:* w has balance factor +. a, c, and d have height h > 0, and b has height h - 1. 156. = assert (x->avl_balance == +1); w = x->avl_link[1]; x->avl_link[1] = w->avl_link[0]; w->avl_link[0] = x; y->avl_link[0] = w->avl_link[1]; w->avl_link[1] = y; if (w->avl_balance == -1) x->avl_balance = 0, y->avl_balance = +1; else if (w->avl_balance == 0) x->avl_balance = y->avl_balance = 0; else /* w->avl_balance == +1 */ x->avl_balance = -1, y->avl_balance = 0; w->avl_balance = 0; This code is included in 152, 177, 307, 427, and 530. Exercises: 1. Why can't the new node be x rather than a node in x's subtrees? 2. Why can't x have a 0 balance factor? 3. For each subcase of case 2, draw a figure like that given for generic case 2 that shows the specific balance factors at each step. 4. Explain the expression z->avl_link[y != z->avl_link[0]] = w in the second part of above. Why would it be a bad idea to substitute the apparent equivalent z->avl_link[y == z->avl_link[1]] = w? 5. Suppose that we wish to make a copy of an AVL tree, preserving the original tree's shape, by inserting nodes from the original tree into a new tree, using avl_probe(). Will inserting the original tree's nodes in level order (see the answer to Exercise 4.7-4) have the desired effect? ---------- Footnotes ---------- (1) A "prime" (') is traditional, but primes are easy to overlook. 5.4.5 Symmetric Case -------------------- Finally, we need to write code for the case that we chose not to discuss earlier, where the insertion occurs in the right subtree of y. All we have to do is invert the signs of balance factors and switch avl_link[] indexes between 0 and 1. The results are this: 157. = struct avl_node *x = y->avl_link[1]; if (x->avl_balance == +1) { } else { } This code is included in 151 and 162. 158. = w = x; y->avl_link[1] = x->avl_link[0]; x->avl_link[0] = y; x->avl_balance = y->avl_balance = 0; This code is included in 157 and 532. 159. = assert (x->avl_balance == -1); w = x->avl_link[0]; x->avl_link[0] = w->avl_link[1]; w->avl_link[1] = x; y->avl_link[1] = w->avl_link[0]; w->avl_link[0] = y; if (w->avl_balance == +1) x->avl_balance = 0, y->avl_balance = -1; else if (w->avl_balance == 0) x->avl_balance = y->avl_balance = 0; else /* w->avl_balance == -1 */ x->avl_balance = +1, y->avl_balance = 0; w->avl_balance = 0; This code is included in 157, 174, 310, 428, and 533. 5.4.6 Example ------------- We're done with writing the code. Now, for clarification, let's run through an example designed to need lots of rebalancing along the way. Suppose that, starting with an empty AVL tree, we insert 6, 5, and 4, in that order. The first two insertions do not require rebalancing. After inserting 4, rebalancing is needed because the balance factor of node 6 would otherwise become -2, an invalid value. This is case 1, so we perform a right rotation on 6. So far, the AVL tree has evolved this way: 6 6 / 5 6 => / => 5 => ^ 5 / 4 6 4 If we now insert 1, then 3, a double rotation (case 2.1) becomes necessary, in which we rotate left at 1, then rotate right at 4: 5 5 5 / \ / \ 5 / \ 4 6 4 6 / \ 4 6 => _' => / => 3 6 / 1 3 ^ 1 \ / 1 4 3 1 Inserting a final item, 2, requires a right rotation (case 1) on 5: 5 _' \ 3 3 6 _' \ _' \ => 1 5 1 4 \ ^ \ 2 4 6 2 5.4.7 Aside: Recursive Insertion -------------------------------- In previous sections we first looked at recursive approaches because they were simpler and more elegant than iterative solutions. As it happens, the reverse is true for insertion into an AVL tree. But just for completeness, we will now design a recursive implementation of avl_probe(). Our first task in such a design is to figure out what arguments and return value the recursive core of the insertion function will have. We'll begin by considering AVL insertion in the abstract. Our existing function avl_probe() works by first moving down the tree, from the root to a leaf, then back up the tree, from leaf to root, as necessary to adjust balance factors or rebalance. In the existing iterative version, down and up movement are implemented by pushing nodes onto and popping them off from a stack. In a recursive version, moving down the tree becomes a recursive call, and moving up the tree becomes a function return. While descending the tree, the important pieces of information are the tree itself (to allow for comparisons to be made), the current node, and the data item we're inserting. The latter two items need to be modifiable by the function, the former because the tree rooted at the node may need to be rearranged during a rebalance, and the latter because of avl_probe()'s return value. While ascending the tree, we'll still have access to all of this information, but, to allow for adjustment of balance factors and rebalancing, we also need to know whether the subtree visited in a nested call became taller. We can use the function's return value for this. Finally, we know to stop moving down and start moving up when we find a null pointer in the tree, which is the place for the new node to be inserted. This suggests itself naturally as the test used to stop the recursion. Here is an outline of a recursive insertion function directly corresponding to these considerations: 160. = static int probe (struct avl_table *tree, struct avl_node **p, void ***data) { struct avl_node *y; /* The current node; shorthand for *p. */ assert (tree != NULL && p != NULL && data != NULL); y = *p; if (y == NULL) { } else /* y != NULL */ { } } See also 163. Parameter p is declared as a double pointer (struct avl_node **) and data as a triple pointer (void ***). In both cases, this is because C passes arguments by value, so that a function modifying one of its arguments produces no change in the value seen in the caller. As a result, to allow a function to modify a scalar, a pointer to it must be passed as an argument; to modify a pointer, a double pointer must be passed; to modify a double pointer, a triple pointer must be passed. This can result in difficult-to-understand code, so it is often advisable to copy the dereferenced argument into a local variable for read-only use, as *p is copied into y here. When the insertion point is found, a new node is created and a pointer to it stored into *p. Because the insertion causes the subtree to increase in height (from 0 to 1), a value of 1 is then returned: 161. = y = *p = tree->avl_alloc->libavl_malloc (tree->avl_alloc, sizeof *y); if (y == NULL) { *data = NULL; return 0; } y->avl_data = **data; *data = &y->avl_data; y->avl_link[0] = y->avl_link[1] = NULL; y->avl_balance = 0; tree->avl_count++; tree->avl_generation++; return 1; This code is included in 160. When we're not at the insertion point, we move down, then back up. Whether to move down to the left or the right depends on the value of the item to insert relative to the value in the current node y. Moving down is the domain of the recursive call to probe(). If the recursive call doesn't increase the height of a subtree of y, then there's nothing further to do, so we return immediately. Otherwise, on the way back up, it is necessary to at least adjust y's balance factor, and possibly to rebalance as well. If only adjustment of the balance factor is necessary, it is done and the return value is based on whether this subtree has changed height in the process. Rebalancing is accomplished using the same code used in iterative insertion. A rebalanced subtree has the same height as before insertion, so the value returned is 0. The details are in the code itself: 162. = struct avl_node *w; /* New root of this subtree; replaces *p. */ int cmp; cmp = tree->avl_compare (**data, y->avl_data, tree->avl_param); if (cmp < 0) { if (probe (tree, &y->avl_link[0], data) == 0) return 0; if (y->avl_balance == +1) { y->avl_balance = 0; return 0; } else if (y->avl_balance == 0) { y->avl_balance = -1; return 1; } else { } } else if (cmp > 0) { struct avl_node *r; /* Right child of y, for rebalancing. */ if (probe (tree, &y->avl_link[1], data) == 0) return 0; if (y->avl_balance == -1) { y->avl_balance = 0; return 0; } else if (y->avl_balance == 0) { y->avl_balance = +1; return 1; } else { } } else /* cmp == 0 */ { *data = &y->avl_data; return 0; } *p = w; return 0; This code is included in 160. Finally, we need a wrapper function to start the recursion off correctly and deal with passing back the results: 163. += /* Inserts item into tree and returns a pointer to item's address. If a duplicate item is found in the tree, returns a pointer to the duplicate without inserting item. Returns NULL in case of memory allocation failure. */ void ** avl_probe (struct avl_table *tree, void *item) { void **ret = &item; probe (tree, &tree->avl_root, &ret); return ret; } 5.5 Deletion ============ Deletion in an AVL tree is remarkably similar to insertion. The steps that we go through are analogous: 1. *Search* for the item to delete. 2. *Delete* the item. 3. *Update* balance factors. 4. *Rebalance* the tree, if necessary. 5. *Finish up* and return. The main difference is that, after a deletion, we may have to rebalance at more than one level of a tree, starting from the bottom up. This is a bit painful, because it means that we have to keep track of all the nodes that we visit as we search for the node to delete, so that we can then move back up the tree. The actual updating of balance factors and rebalancing steps are similar to those used for insertion. The following sections cover deletion from an AVL tree in detail. Before we get started, here's an outline of the function. 164. = void * avl_delete (struct avl_table *tree, const void *item) { /* Stack of nodes. */ struct avl_node *pa[AVL_MAX_HEIGHT]; /* Nodes. */ unsigned char da[AVL_MAX_HEIGHT]; /* avl_link[] indexes. */ int k; /* Stack pointer. */ struct avl_node *p; /* Traverses tree to find node to delete. */ int cmp; /* Result of comparison between item and p. */ assert (tree != NULL && item != NULL); } This code is included in 145. See also: [Knuth 1998b], pages 473-474; [Pfaff 1998]. 5.5.1 Step 1: Search -------------------- The only difference between this search and an ordinary search in a BST is that we have to keep track of the nodes above the one we're deleting. We do this by pushing them onto the stack defined above. Each iteration through the loop compares item to p's data, pushes the node onto the stack, moves down in the proper direction. The first trip through the loop is something of an exception: we hard-code the comparison result to -1 so that the pseudo-root node is always the topmost node on the stack. When we find a match, we set item to the actual data item found, so that we can return it later. 165. = k = 0; p = (struct avl_node *) &tree->avl_root; for (cmp = -1; cmp != 0; cmp = tree->avl_compare (item, p->avl_data, tree->avl_param)) { int dir = cmp > 0; pa[k] = p; da[k++] = dir; p = p->avl_link[dir]; if (p == NULL) return NULL; } item = p->avl_data; This code is included in 164 and 220. 5.5.2 Step 2: Delete -------------------- At this point, we've identified p as the node to delete. The node on the top of the stack, da[k - 1], is p's parent node. There are the same three cases we saw in deletion from an ordinary BST (*note Deleting from a BST::), with the addition of code to copy balance factors and update the stack. The code for selecting cases is the same as for BSTs: 166. = if (p->avl_link[1] == NULL) { } else { struct avl_node *r = p->avl_link[1]; if (r->avl_link[0] == NULL) { } else { } } See also 167. This code is included in 164. Regardless of the case, we are in the same situation after the deletion: node p has been removed from the tree and the stack contains k nodes at which rebalancing may be necessary. Later code may change p to point elsewhere, so we free the node immediately. A pointer to the item data has already been saved in item (*note avldelsaveitem::): 167. += tree->avl_alloc->libavl_free (tree->avl_alloc, p); Case 1: p has no right child ............................ If p has no right child, then we can replace it with its left child, the same as for BSTs (*note bstdelcase1::). 168. = pa[k - 1]->avl_link[da[k - 1]] = p->avl_link[0]; This code is included in 166. Case 2: p's right child has no left child ......................................... If p has a right child r, which in turn has no left child, then we replace p by r, attaching p's left child to r, as we would in an unbalanced BST (*note bstdelcase2::). In addition, r acquires p's balance factor, and r must be added to the stack of nodes above the deleted node. 169. = r->avl_link[0] = p->avl_link[0]; r->avl_balance = p->avl_balance; pa[k - 1]->avl_link[da[k - 1]] = r; da[k] = 1; pa[k++] = r; This code is included in 166. Case 3: p's right child has a left child ........................................ If p's right child has a left child, then this is the third and most complicated case. On the other hand, as a modification from the third case in an ordinary BST deletion (*note bstdelcase3::), it is rather simple. We're deleting the inorder successor of p, so we push the nodes above it onto the stack. The only trickery is that we do not know in advance the node that will replace p, so we reserve a spot on the stack for it (da[j]) and fill it in later: 170. = struct avl_node *s; int j = k++; for (;;) { da[k] = 0; pa[k++] = r; s = r->avl_link[0]; if (s->avl_link[0] == NULL) break; r = s; } s->avl_link[0] = p->avl_link[0]; r->avl_link[0] = s->avl_link[1]; s->avl_link[1] = p->avl_link[1]; s->avl_balance = p->avl_balance; pa[j - 1]->avl_link[da[j - 1]] = s; da[j] = 1; pa[j] = s; This code is included in 166. Exercises: 1. Write an alternate version of that moves data instead of pointers, as in Exercise 4.8-2. 2. Why is it important that the item data was saved earlier? (Why couldn't we save it just before freeing the node?) 5.5.3 Step 3: Update Balance Factors ------------------------------------ When we updated balance factors in insertion, we were lucky enough to know in advance which ones we'd need to update. Moreover, we never needed to rebalance at more than one level in the tree for any one insertion. These two factors conspired in our favor to let us do all the updating of balance factors at once from the top down. Everything is not quite so simple in AVL deletion. We don't have any easy way to figure out during the search process which balance factors will need to be updated, and for that matter we may need to perform rebalancing at multiple levels. Our strategy must change. This new approach is not fundamentally different from the previous one. We work from the bottom up instead of from the top down. We potentially look at each of the nodes along the direct path from the deleted node to the tree's root, starting at pa[k - 1], the parent of the deleted node. For each of these nodes, we adjust its balance factor and possibly perform rebalancing. After that, if we're lucky, this was enough to restore the tree's balancing rule, and we are finished with updating balance factors and rebalancing. Otherwise, we look at the next node, repeating the process. Here is the loop itself with the details abstracted out: 171. = assert (k > 0); while (--k > 0) { struct avl_node *y = pa[k]; if (da[k] == 0) { } else { } } This code is included in 164. The reason this works is the loop invariants. That is, because each time we look at a node in order to update its balance factor, the situation is the same. In particular, if we're looking at a node pa[k], then we know that it's because the height of its subtree on side da[k] decreased, so that the balance factor of node pa[k] needs to be updated. The rebalancing operations we choose reflect this invariant: there are sometimes multiple valid ways to rebalance at a given node and propagate the results up the tree, but only one way to do this while maintaining the invariant. (This is especially true in red-black trees, for which we will develop code for two possible invariants under insertion and deletion.) Updating the balance factor of a node after deletion from its left side and right side are symmetric, so we'll discuss only the left-side case here and construct the code for the right-side case later. Suppose we have a node y whose left subtree has decreased in height. In general, this increases its balance factor, because the balance factor of a node is the height of its right subtree minus the height of its left subtree. More specifically, there are three cases, treated individually below. Case 1: y has - balance factor .............................. If y started with a - balance factor, then its left subtree was taller than its right subtree. Its left subtree has decreased in height, so the two subtrees must now be the same height and we set y's balance factor to 0. This is between -1 and +1, so there is no need to rebalance at y. However, binary tree y has itself decreased in height, so that means that we must rebalance the AVL tree above y as well, so we continue to the next iteration of the loop. The diagram below may help in visualization. On the left is shown the original configuration of a subtree, where subtree a has height h and subtree b has height h - 1. The height of a nonempty binary tree is one plus the larger of its subtrees' heights, so tree y has height h + 1. The diagram on the right shows the situation after a node has been deleted from a, reducing that subtree's height. The new height of tree y is (h - 1) + 1 == h. | | y y <-> <0> / \ => _' \ a b a* b h h-1 h-1 h-1 Case 2: y has 0 balance factor .............................. If y started with a 0 balance factor, and its left subtree decreased in height, then the result is that its right subtree is now taller than its left subtree, so the new balance factor is +. However, the overall height of binary tree y has not changed, so no balance factors above y need to be changed, and we are done, hence we break to exit the loop. Here's the corresponding diagram, similar to the one for the previous case. The height of tree y on both sides of the diagram is h + 1, since y's taller subtree in both cases has height h. | | y y <0> <+> / \ => _' \ a b a* b h h h-1 h Case 3: y has + balance factor .............................. Otherwise, y started with a + balance factor, so the decrease in height of its left subtree, which was already shorter than its right subtree, causes a violation of the AVL constraint with a +2 balance factor. We need to rebalance. After rebalancing, we may or may not have to rebalance further up the tree. Here's a diagram of what happens to forcing rebalancing: | | y y <+> <++> _' \ => _' \ a b a* b h-1 h h-2 h Implementation .............. The implementation is straightforward: 172. = y->avl_balance++; if (y->avl_balance == +1) break; else if (y->avl_balance == +2) { } This code is included in 171. 5.5.4 Step 4: Rebalance ----------------------- Now we have to write code to rebalance when it becomes necessary. We'll use rotations to do this, as before. Again, we'll distinguish the cases on the basis of x's balance factor, where x is y's right child: 173. = struct avl_node *x = y->avl_link[1]; if (x->avl_balance == -1) { } else { } This code is included in 172. Case 1: x has - balance factor .............................. If x has a - balance factor, we handle rebalancing in a manner analogous to case 2 for insertion. In fact, we reuse the code. We rotate right at x, then left at y. w is the left child of x. The two rotations look like this: | | y y | <++> <++> w / `._ / `_ <0> a x a w => / \ <-> => / \ y x / \ b x ^ ^ w d ^ a b c d ^ c d b c 174. = struct avl_node *w; pa[k - 1]->avl_link[da[k - 1]] = w; This code is included in 173. Case 2: x has + or 0 balance factor ................................... When x's balance factor is +, the needed treatment is analogous to Case 1 for insertion. We simply rotate left at y and update the pointer to the subtree, then update balance factors. The deletion and rebalancing then look like this: | | | y y x <+> <++> <0> / `_ / `_ _.-' \ a x => a* x => y c <+> <+> <0> / \ / \ / \ b c b c a* b When x's balance factor is 0, we perform the same rotation, but the height of the overall subtree does not change, so we're done and can exit the loop with break. Here's what the deletion and rebalancing look like for this subcase: | | | y y x <+> <++> <-> / `_ / `_ _.-' \ a x => a* x => y c <0> <0> <+> / \ / \ / \ b c b c a* b 175. = y->avl_link[1] = x->avl_link[0]; x->avl_link[0] = y; pa[k - 1]->avl_link[da[k - 1]] = x; if (x->avl_balance == 0) { x->avl_balance = -1; y->avl_balance = +1; break; } else x->avl_balance = y->avl_balance = 0; This code is included in 173. Exercises: 1. In , we refer to fields in x, the right child of y, without checking that y has a non-null right child. Why can we assume that node x is non-null? 2. Describe the shape of a tree that might require rebalancing at every level above a particular node. Give an example. 5.5.5 Step 5: Finish Up ----------------------- 176. = tree->avl_count--; tree->avl_generation++; return (void *) item; This code is included in 164. 5.5.6 Symmetric Case -------------------- Here's the code for the symmetric case, where the deleted node was in the right subtree of its parent. 177. = y->avl_balance--; if (y->avl_balance == -1) break; else if (y->avl_balance == -2) { struct avl_node *x = y->avl_link[0]; if (x->avl_balance == +1) { struct avl_node *w; pa[k - 1]->avl_link[da[k - 1]] = w; } else { y->avl_link[0] = x->avl_link[1]; x->avl_link[1] = y; pa[k - 1]->avl_link[da[k - 1]] = x; if (x->avl_balance == 0) { x->avl_balance = +1; y->avl_balance = -1; break; } else x->avl_balance = y->avl_balance = 0; } } This code is included in 171. 5.6 Traversal ============= Traversal is largely unchanged from BSTs. However, we can be confident that the tree won't easily exceed the maximum stack height, because of the AVL balance condition, so we can omit checking for stack overflow. 178. = avl 62> avl 64> avl 69> avl 74> avl 75> This code is included in 145 and 196. We do need to make a new implementation of the insertion traverser initializer. Because insertion into an AVL tree is so complicated, we just write this as a wrapper to avl_probe(). There probably wouldn't be much of a speed improvement by inlining the code anyhow: 179. = void * avl_t_insert (struct avl_traverser *trav, struct avl_table *tree, void *item) { void **p; assert (trav != NULL && tree != NULL && item != NULL); p = avl_probe (tree, item); if (p != NULL) { trav->avl_table = tree; trav->avl_node = ((struct avl_node *) ((char *) p - offsetof (struct avl_node, avl_data))); trav->avl_generation = tree->avl_generation - 1; return *p; } else { avl_t_init (trav, tree); return NULL; } } This code is included in 178. We will present the rest of the modified functions without further comment. 180. = void * avl_t_first (struct avl_traverser *trav, struct avl_table *tree) { struct avl_node *x; assert (tree != NULL && trav != NULL); trav->avl_table = tree; trav->avl_height = 0; trav->avl_generation = tree->avl_generation; x = tree->avl_root; if (x != NULL) while (x->avl_link[0] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[0]; } trav->avl_node = x; return x != NULL ? x->avl_data : NULL; } This code is included in 178. 181. = void * avl_t_last (struct avl_traverser *trav, struct avl_table *tree) { struct avl_node *x; assert (tree != NULL && trav != NULL); trav->avl_table = tree; trav->avl_height = 0; trav->avl_generation = tree->avl_generation; x = tree->avl_root; if (x != NULL) while (x->avl_link[1] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[1]; } trav->avl_node = x; return x != NULL ? x->avl_data : NULL; } This code is included in 178. 182. = void * avl_t_find (struct avl_traverser *trav, struct avl_table *tree, void *item) { struct avl_node *p, *q; assert (trav != NULL && tree != NULL && item != NULL); trav->avl_table = tree; trav->avl_height = 0; trav->avl_generation = tree->avl_generation; for (p = tree->avl_root; p != NULL; p = q) { int cmp = tree->avl_compare (item, p->avl_data, tree->avl_param); if (cmp < 0) q = p->avl_link[0]; else if (cmp > 0) q = p->avl_link[1]; else /* cmp == 0 */ { trav->avl_node = p; return p->avl_data; } assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = p; } trav->avl_height = 0; trav->avl_node = NULL; return NULL; } This code is included in 178. 183. = void * avl_t_next (struct avl_traverser *trav) { struct avl_node *x; assert (trav != NULL); if (trav->avl_generation != trav->avl_table->avl_generation) trav_refresh (trav); x = trav->avl_node; if (x == NULL) { return avl_t_first (trav, trav->avl_table); } else if (x->avl_link[1] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[1]; while (x->avl_link[0] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[0]; } } else { struct avl_node *y; do { if (trav->avl_height == 0) { trav->avl_node = NULL; return NULL; } y = x; x = trav->avl_stack[--trav->avl_height]; } while (y == x->avl_link[1]); } trav->avl_node = x; return x->avl_data; } This code is included in 178. 184. = void * avl_t_prev (struct avl_traverser *trav) { struct avl_node *x; assert (trav != NULL); if (trav->avl_generation != trav->avl_table->avl_generation) trav_refresh (trav); x = trav->avl_node; if (x == NULL) { return avl_t_last (trav, trav->avl_table); } else if (x->avl_link[0] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[0]; while (x->avl_link[1] != NULL) { assert (trav->avl_height < AVL_MAX_HEIGHT); trav->avl_stack[trav->avl_height++] = x; x = x->avl_link[1]; } } else { struct avl_node *y; do { if (trav->avl_height == 0) { trav->avl_node = NULL; return NULL; } y = x; x = trav->avl_stack[--trav->avl_height]; } while (y == x->avl_link[0]); } trav->avl_node = x; return x->avl_data; } This code is included in 178. Exercises: 1. Explain the meaning of this ugly expression, used in avl_t_insert(): (struct avl_node *) ((char *) p - offsetof (struct avl_node, avl_data)) 5.7 Copying =========== Copying an AVL tree is similar to copying a BST. The only important difference is that we have to copy the AVL balance factor between nodes as well as node data. We don't check our stack height here, either. 185. = avl 82> struct avl_table * avl_copy (const struct avl_table *org, avl_copy_func *copy, avl_item_func *destroy, struct libavl_allocator *allocator) { struct avl_node *stack[2 * (AVL_MAX_HEIGHT + 1)]; int height = 0; struct avl_table *new; const struct avl_node *x; struct avl_node *y; assert (org != NULL); new = avl_create (org->avl_compare, org->avl_param, allocator != NULL ? allocator : org->avl_alloc); if (new == NULL) return NULL; new->avl_count = org->avl_count; if (new->avl_count == 0) return new; x = (const struct avl_node *) &org->avl_root; y = (struct avl_node *) &new->avl_root; for (;;) { while (x->avl_link[0] != NULL) { assert (height < 2 * (AVL_MAX_HEIGHT + 1)); y->avl_link[0] = new->avl_alloc->libavl_malloc (new->avl_alloc, sizeof *y->avl_link[0]); if (y->avl_link[0] == NULL) { if (y != (struct avl_node *) &new->avl_root) { y->avl_data = NULL; y->avl_link[1] = NULL; } copy_error_recovery (stack, height, new, destroy); return NULL; } stack[height++] = (struct avl_node *) x; stack[height++] = y; x = x->avl_link[0]; y = y->avl_link[0]; } y->avl_link[0] = NULL; for (;;) { y->avl_balance = x->avl_balance; if (copy == NULL) y->avl_data = x->avl_data; else { y->avl_data = copy (x->avl_data, org->avl_param); if (y->avl_data == NULL) { y->avl_link[1] = NULL; copy_error_recovery (stack, height, new, destroy); return NULL; } } if (x->avl_link[1] != NULL) { y->avl_link[1] = new->avl_alloc->libavl_malloc (new->avl_alloc, sizeof *y->avl_link[1]); if (y->avl_link[1] == NULL) { copy_error_recovery (stack, height, new, destroy); return NULL; } x = x->avl_link[1]; y = y->avl_link[1]; break; } else y->avl_link[1] = NULL; if (height <= 2) return new; y = stack[--height]; x = stack[--height]; } } } This code is included in 145 and 196. 5.8 Testing =========== Our job isn't done until we can demonstrate that our code works. We'll do this with a test program built using the framework from the previous chapter (*note Testing BST Functions::). All we have to do is produce functions for AVL trees that correspond to each of those in . This just involves making small changes to the functions used there. They are presented below without additional comment. 186. = #include #include #include #include "avl.h" #include "test.h" avl 119> avl 104> avl 100> avl 122> 187. = static int compare_trees (struct avl_node *a, struct avl_node *b) { int okay; if (a == NULL || b == NULL) { assert (a == NULL && b == NULL); return 1; } if (*(int *) a->avl_data != *(int *) b->avl_data || ((a->avl_link[0] != NULL) != (b->avl_link[0] != NULL)) || ((a->avl_link[1] != NULL) != (b->avl_link[1] != NULL)) || a->avl_balance != b->avl_balance) { printf (" Copied nodes differ: a=%d (bal=%d) b=%d (bal=%d) a:", *(int *) a->avl_data, a->avl_balance, *(int *) b->avl_data, b->avl_balance); if (a->avl_link[0] != NULL) printf ("l"); if (a->avl_link[1] != NULL) printf ("r"); printf (" b:"); if (b->avl_link[0] != NULL) printf ("l"); if (b->avl_link[1] != NULL) printf ("r"); printf ("\n"); return 0; } okay = 1; if (a->avl_link[0] != NULL) okay &= compare_trees (a->avl_link[0], b->avl_link[0]); if (a->avl_link[1] != NULL) okay &= compare_trees (a->avl_link[1], b->avl_link[1]); return okay; } This code is included in 186. 188. = /* Examines the binary tree rooted at node. Zeroes *okay if an error occurs. Otherwise, does not modify *okay. Sets *count to the number of nodes in that tree, including node itself if node != NULL. Sets *height to the tree's height. All the nodes in the tree are verified to be at least min but no greater than max. */ static void recurse_verify_tree (struct avl_node *node, int *okay, size_t *count, int min, int max, int *height) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subheight[2]; /* Heights of subtrees. */ if (node == NULL) { *count = 0; *height = 0; return; } d = *(int *) node->avl_data; recurse_verify_tree (node->avl_link[0], okay, &subcount[0], min, d - 1, &subheight[0]); recurse_verify_tree (node->avl_link[1], okay, &subcount[1], d + 1, max, &subheight[1]); *count = 1 + subcount[0] + subcount[1]; *height = 1 + (subheight[0] > subheight[1] ? subheight[0] : subheight[1]); } This code is included in 186. 189. = if (subheight[1] - subheight[0] != node->avl_balance) { printf (" Balance factor of node %d is %d, but should be %d.\n", d, node->avl_balance, subheight[1] - subheight[0]); *okay = 0; } else if (node->avl_balance < -1 || node->avl_balance > +1) { printf (" Balance factor of node %d is %d.\n", d, node->avl_balance); *okay = 0; } This code is included in 188, 332, 451, and 550. 190. = static int verify_tree (struct avl_table *tree, int array[], size_t n) { int okay = 1; bst_count is correct; bst => avl 110> if (okay) { } if (okay) { avl 115> } if (okay) { avl 116> } if (okay) { avl 117> } if (okay) { avl 118> } return okay; } This code is included in 186, 330, 449, and 548. 191. = /* Recursively verify tree structure. */ size_t count; int height; recurse_verify_tree (tree->avl_root, &okay, &count, 0, INT_MAX, &height); This code is included in 190. 6 Red-Black Trees ***************** The last chapter saw us implementing a library for one particular type of balanced trees. Red-black trees were invented by R. Bayer and studied at length by L. J. Guibas and R. Sedgewick. This chapter will implement a library for another kind of balanced tree, called a "red-black tree". For brevity, we'll often abbreviate "red-black" to RB. Insertion and deletion operations on red-black trees are more complex to describe or to code than the same operations on AVL trees. Red-black trees also have a higher maximum height than AVL trees for a given number of nodes. The primary advantage of red-black trees is that, in AVL trees, deleting one node from a tree containing n nodes may require log2 (n) rotations, but deletion in a red-black tree never requires more than three rotations. The functions for RB trees in this chapter are analogous to those that we developed for use with AVL trees in the previous chapter. Here's an outline of the red-black code: 192. = #ifndef RB_H #define RB_H 1 #include
rb 14> rb 27> rb 61>
rb 15> #endif /* rb.h */ 193. = #include #include #include #include #include "rb.h" See also: [Cormen 1990], chapter 14, "Chapter notes." 6.1 Balancing Rule ================== To most clearly express the red-black balancing rule, we need a few new vocabulary terms. First, define a "non-branching node" as a node that does not "branch" the binary tree in different directions, i.e., a node with exactly zero or one children. Second, a "path" is a list of one or more nodes in a binary tree where every node in the list (except the last node, of course) is "adjacent" in the tree to the one after it. Two nodes in a tree are considered to be adjacent for this purpose if one is the child of the other. Furthermore, a "simple path" is a path that does not contain any given node more than once. Finally, a node p is a "descendant" of a second node q if both p and q are the same node, or if p is located in one of the subtrees of q. With these definitions in mind, a red-black tree is a binary search tree in which every node has been labeled with a "color", either "red" or "black", with those colors distributed according to these two simple rules, which are called the "red-black balancing rules" and often referenced by number: 1. No red node has a red child. 2. Every simple path from a given node to one of its non-branching node descendants contains the same number of black nodes. Any binary search tree that conforms to these rules is a red-black tree. Additionally, all red-black trees in libavl share a simple additional property: their roots are black. This property is not essential, but it does slightly simplify insertion and deletion operations. To aid in digestion of all these definitions, here are some red-black trees that might be produced by libavl: 4 1 4 ___..--' \ _' `---...___ 1 5 __..-' `._ 0 6 3 6 _' `._ __..-' \ 0 3 _' \ _' \ 4 7 1 2 5 7 _' _' \ 2 3 5 In this book, black nodes are marked `b' and red nodes marked `r', as shown here. The three colored BSTs below are *not* red-black trees. The one on the left violates rule 1, because red node 2 is a child of red node 4. The one in the middle violates rule 2, because one path from the root has two black nodes (4-2-3) and the other paths from the root down to a non-branching node (4-2-1, 4-5, 4-5-6) have only one black node. The one on the right violates rule 2, because the path consisting of only node 1 has only one black node but path 1-2 has two black nodes. 4 4 1 __..-' \ __..-' \ 2 5 2 5 \ 2 _' \ _' \ \ 1 3 1 3 6 See also: [Cormen 1990], section 14.1; [Sedgewick 1998], definitions 13.3 and 13.4. Exercises: *1. A red-black tree contains only black nodes. Describe the tree's shape. 2. Suppose that a red-black tree's root is red. How can it be transformed into a equivalent red-black tree with a black root? Does a similar procedure work for changing a RB's root from black to red? 3. Suppose we have a perfectly balanced red-black tree with exactly pow (2, n) - 1 nodes and a black root. Is it possible there is another way to arrange colors in a tree of the same shape that obeys the red-black rules while keeping the root black? Is it possible if we drop the requirement that the tree be balanced? 6.1.1 Analysis -------------- As we were for AVL trees, we're interested in what the red-black balancing rule guarantees about performance. Again, we'll simply state the results: A red-black tree with n nodes has height at least log2 (n + 1) but no more than 2 * log2 (n + 1). A red-black tree with height h has at least pow (2, h / 2) - 1 nodes but no more than pow (2, h) - 1. For comparison, an optimally balanced BST with n nodes has height ceil (log2 (n + 1)). An optimally balanced BST with height h has between pow (2, h - 1) and pow (2, h) - 1 nodes. See also: [Cormen 1990], lemma 14.1; [Sedgewick 1998], property 13.8. 6.2 Data Types ============== Red-black trees need their own data structure. Otherwise, there's no appropriate place to store each node's color. Here's a C type for a color and a structure for an RB node, using the rb_ prefix that we've adopted for this module: 194. = /* Color of a red-black node. */ enum rb_color { RB_BLACK, /* Black. */ RB_RED /* Red. */ }; /* A red-black tree node. */ struct rb_node { struct rb_node *rb_link[2]; /* Subtrees. */ void *rb_data; /* Pointer to data. */ unsigned char rb_color; /* Color. */ }; This code is included in 192. The maximum height for an RB tree is higher than for an AVL tree, because in the worst case RB trees store nodes less efficiently: 195. = /* Maximum RB height. */ #ifndef RB_MAX_HEIGHT #define RB_MAX_HEIGHT 48 #endif This code is included in 192, 333, 452, and 551. The other data structures for RB trees are the same as for BSTs or AVL trees. Exercises: 1. Why is it okay to have both an enumeration type and a structure member named rb_color? 6.3 Operations ============== Now we'll implement for RB trees all the operations that we did for BSTs. Everything but the insertion and deletion function can be borrowed either from our BST or AVL tree functions. The copy function is an unusual case: we need it to copy colors, instead of balance factors, between nodes, so we replace avl_balance by rb_color in the macro expansion. 196. = rb 30> rb 31>
rb 592> rb 178> rb; avl_balance => rb_color 185> rb 84> rb 6>
rb 594> This code is included in 193. 6.4 Insertion ============= The steps for insertion into a red-black tree are similar to those for insertion into an AVL tree: 1. *Search* for the location to insert the new item. 2. *Insert* the item. 3. *Rebalance* the tree as necessary to satisfy the red-black balance condition. Red-black node colors don't need to be updated in the way that AVL balance factors do, so there is no separate step for updating colors. Here's the outline of the function, expressed as code: 197. = void ** rb_probe (struct rb_table *tree, void *item) { return &n->rb_data; } This code is included in 196. 198. = struct rb_node *pa[RB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[RB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct rb_node *p; /* Traverses tree looking for insertion point. */ struct rb_node *n; /* Newly inserted node. */ assert (tree != NULL && item != NULL); This code is included in 33, 197, and 210. See also: [Cormen 1990], section 14.3; [Sedgewick 1998], program 13.6. 6.4.1 Step 1: Search -------------------- The first thing to do is to search for the point to insert the new node. In a manner similar to AVL deletion, we keep a stack of nodes tracking the path followed to arrive at the insertion point, so that later we can move up the tree in rebalancing. 199. = pa[0] = (struct rb_node *) &tree->rb_root; da[0] = 0; k = 1; for (p = tree->rb_root; p != NULL; p = p->rb_link[da[k - 1]]) { int cmp = tree->rb_compare (item, p->rb_data, tree->rb_param); if (cmp == 0) return &p->rb_data; pa[k] = p; da[k++] = cmp > 0; } This code is included in 197 and 210. 6.4.2 Step 2: Insert -------------------- 200. = n = pa[k - 1]->rb_link[da[k - 1]] = tree->rb_alloc->libavl_malloc (tree->rb_alloc, sizeof *n); if (n == NULL) return NULL; n->rb_data = item; n->rb_link[0] = n->rb_link[1] = NULL; n->rb_color = RB_RED; tree->rb_count++; tree->rb_generation++; This code is included in 197 and 210. Exercises: 1. Why are new nodes colored red, instead of black? 6.4.3 Step 3: Rebalance ----------------------- The code in step 2 that inserts a node always colors the new node red. This means that rule 2 is always satisfied afterward (as long as it was satisfied before we began). On the other hand, rule 1 is broken if the newly inserted node's parent was red. In this latter case we must rearrange or recolor the BST so that it is again an RB tree. This is what rebalancing does. At each step in rebalancing, we have the invariant that we just colored a node p red and that p's parent, the node at the top of the stack, is also red, a rule 1 violation. The rebalancing step may either clear up the violation entirely, without introducing any other violations, in which case we are done, or, if that is not possible, it reduces the violation to a similar violation of rule 1 higher up in the tree, in which case we go around again. In no case can we allow the rebalancing step to introduce a rule 2 violation, because the loop is not prepared to repair that kind of problem: it does not fit the invariant. If we allowed rule 2 violations to be introduced, we would have to write additional code to recognize and repair those violations. This extra code would be a waste of space, because we can do just fine without it. (Incidentally, there is nothing magical about using a rule 1 violation as our rebalancing invariant. We could use a rule 2 violation as our invariant instead, and in fact we will later write an alternate implementation that does that, in order to show how it would be done.) Here is the rebalancing loop. At each rebalancing step, it checks that we have a rule 1 violation by checking the color of pa[k - 1], the node on the top of the stack, and then divides into two cases, one for rebalancing an insertion in pa[k - 1]'s left subtree and a symmetric case for the right subtree. After rebalancing it recolors the root of the tree black just in case the loop changed it to red: 201. = while (k >= 3 && pa[k - 1]->rb_color == RB_RED) { if (da[k - 2] == 0) { } else { } } tree->rb_root->rb_color = RB_BLACK; This code is included in 197. Now for the real work. We'll look at the left-side insertion case only. Consider the node that was just recolored red in the last rebalancing step, or if this is the first rebalancing step, the newly inserted node n. The code does not name this node, but we will refer to it here as q. We know that q is red and, because the loop condition was met, that its parent pa[k - 1] is red. Therefore, due to rule 1, q's grandparent, pa[k - 2], must be black. After this, we have three cases, distinguished by the following code: 202. = struct rb_node *y = pa[k - 2]->rb_link[1]; if (y != NULL && y->rb_color == RB_RED) { } else { struct rb_node *x; if (da[k - 1] == 0) y = pa[k - 1]; else { } break; } This code is included in 201. Case 1: q's uncle is red ........................ If q has an "uncle" y, that is, its grandparent has a child on the side opposite q, and y is red, then rearranging the tree's color scheme is all that needs to be done, like this: | | pa[k-2] pa[k-2] ___..--' `_ ___..--' `_ pa[k-1] y pa[k-1] y => _.-' \ / \ _.-' \ / \ q c d e q c d e / \ / \ a b a b Notice the neat way that this preserves the "black-height", or the number of black nodes in any simple path from a given node down to a node with 0 or 1 children, at pa[k - 2]. This ensures that rule 2 is not violated. After the transformation, if node pa[k - 2]'s parent exists and is red, then we have to move up the tree and try again. The while loop condition takes care of this test, so adjusting the stack is all that has to be done in this code segment: 203. = pa[k - 1]->rb_color = y->rb_color = RB_BLACK; pa[k - 2]->rb_color = RB_RED; k -= 2; This code is included in 202, 207, 342, and 462. Case 2: q is the left child of pa[k - 1] ........................................ If q is the left child of its parent, then we can perform a right rotation at q's grandparent, which we'll call x, and recolor a couple of nodes. Then we're all done, because we've satisfied both rules. Here's a diagram of what's happened: | pa[k-2],x | y ___...---' \ pa[k-1],y d _.-' `_ => q x _.-' \ q c / \ / \ a b c d / \ a b There's no need to progress farther up the tree, because neither the subtree's black-height nor its root's color have changed. Here's the corresponding code. Bear in mind that the break statement is in the enclosing code segment: 204. = x = pa[k - 2]; x->rb_color = RB_RED; y->rb_color = RB_BLACK; x->rb_link[0] = y->rb_link[1]; y->rb_link[1] = x; pa[k - 3]->rb_link[da[k - 3]] = y; This code is included in 202, 343, and 464. Case 3: q is the right child of pa[k - 1] ......................................... The final case, where q is a right child, is really just a small variant of case 2, so we can handle it by transforming it into case 2 and sharing code for that case. To transform case 2 to case 3, we just rotate left at q's parent, which is then treated as q. The diagram below shows the transformation from case 3 into case 2. After this transformation, x is relabeled q and y's parent is labeled x, then rebalancing continues as shown in the diagram for case 2, with the exception that pa[k - 1] is not updated to correspond to y as shown in that diagram. That's okay because variable y has already been set to point to the proper node. | | pa[k-2] _.-' \ _____.....----' \ y d pa[k-1],x d => _.-' \ / `_ x c a q,y / \ / \ a b b c 205. = x = pa[k - 1]; y = x->rb_link[1]; x->rb_link[1] = y->rb_link[0]; y->rb_link[0] = x; pa[k - 2]->rb_link[0] = y; This code is included in 202, 344, and 466. Exercises: 1. Why is the test k >= 3 on the while loop valid? (Hint: read the code for step 4, below, first.) 2. Consider rebalancing case 2 and, in particular, what would happen if the root of subtree d were red. Wouldn't the rebalancing transformation recolor x as red and thus cause a rule 1 violation? 6.4.4 Symmetric Case -------------------- 206. = struct rb_node *y = pa[k - 2]->rb_link[0]; if (y != NULL && y->rb_color == RB_RED) { } else { struct rb_node *x; if (da[k - 1] == 1) y = pa[k - 1]; else { } break; } This code is included in 201. 207. = This code is included in 206, 346, and 463. 208. = x = pa[k - 2]; x->rb_color = RB_RED; y->rb_color = RB_BLACK; x->rb_link[1] = y->rb_link[0]; y->rb_link[0] = x; pa[k - 3]->rb_link[da[k - 3]] = y; This code is included in 206, 347, and 465. 209. = x = pa[k - 1]; y = x->rb_link[0]; x->rb_link[0] = y->rb_link[1]; y->rb_link[1] = x; pa[k - 2]->rb_link[1] = y; This code is included in 206, 348, and 467. 6.4.5 Aside: Initial Black Insertion ------------------------------------ The traditional algorithm for insertion in an RB tree colors new nodes red. This is a good choice, because it often means that no rebalancing is necessary, but it is not the only possible choice. This section implements an alternate algorithm for insertion into an RB tree that colors new nodes black. The outline is the same as for initial-red insertion. We change the newly inserted node from red to black and replace the rebalancing algorithm: 210. = void ** rb_probe (struct rb_table *tree, void *item) { RB_BLACK 200> return &n->rb_data; } The remaining task is to devise the rebalancing algorithm. Rebalancing is always necessary, unless the tree was empty before insertion, because insertion of a black node into a nonempty tree always violates rule 2. Thus, our invariant is that we have a rule 2 violation to fix. More specifically, the invariant, as implemented, is that at the top of each trip through the loop, stack pa[] contains the chain of ancestors of a node that is the black root of a subtree whose black-height is 1 more than it should be. We give that node the name q. There is one easy rebalancing special case: if node q has a black parent, we can just recolor q as red, and we're done. Here's the loop: 211. = while (k >= 2) { struct rb_node *q = pa[k - 1]->rb_link[da[k - 1]]; if (pa[k - 1]->rb_color == RB_BLACK) { q->rb_color = RB_RED; break; } if (da[k - 2] == 0) { } else { } } This code is included in 210. Consider rebalancing where insertion was on the left side of q's grandparent. We know that q is black and its parent pa[k - 1] is red. Then, we can divide rebalancing into three cases, described below in detail. (For additional insight, compare these cases to the corresponding cases for initial-red insertion.) 212. = struct rb_node *y = pa[k - 2]->rb_link[1]; if (y != NULL && y->rb_color == RB_RED) { } else { struct rb_node *x; if (da[k - 1] == 0) y = pa[k - 1]; else { } } This code is included in 211. Case 1: q's uncle is red ........................ If q has an red "uncle" y, then we recolor q red and pa[k - 1] and y black. This fixes the immediate problem, making the black-height of q equal to its sibling's, but increases the black-height of pa[k - 2], so we must repeat the rebalancing process farther up the tree: | | pa[k-2] pa[k-2] ___..--' `_ ___..--' `_ pa[k-1] y pa[k-1] y => _.-' \ / \ _.-' \ / \ q c d e q c d e / \ / \ a b a b 213. = pa[k - 1]->rb_color = y->rb_color = RB_BLACK; q->rb_color = RB_RED; k -= 2; This code is included in 212 and 217. Case 2: q is the left child of pa[k - 1] ........................................ If q is a left child, then call q's parent y and its grandparent x, rotate right at x, and recolor q, y, and x. The effect is that the black-heights of all three subtrees is the same as before q was inserted, so we're done, and break out of the loop. | pa[k-2],x | y ___...---' \ pa[k-1],y d _.-' `_ => q x _.-' \ q c / \ / \ a b c d / \ a b 214. = x = pa[k - 2]; x->rb_color = q->rb_color = RB_RED; y->rb_color = RB_BLACK; x->rb_link[0] = y->rb_link[1]; y->rb_link[1] = x; pa[k - 3]->rb_link[da[k - 3]] = y; break; This code is included in 212. Case 3: q is the right child of pa[k - 1] ......................................... If q is a right child, then we rotate left at its parent, which we here call x. The result is in the form for application of case 2, so after the rotation, we relabel the nodes to be consistent with that case. | | pa[k-2] pa[k-2] _____.....----' \ _.-' \ pa[k-1],x d q d => / `_ _.-' \ a q x c / \ / \ b c a b 215. = x = pa[k - 1]; y = pa[k - 2]->rb_link[0] = q; x->rb_link[1] = y->rb_link[0]; q = y->rb_link[0] = x; This code is included in 212. 6.4.5.1 Symmetric Case ...................... 216. = struct rb_node *y = pa[k - 2]->rb_link[0]; if (y != NULL && y->rb_color == RB_RED) { } else { struct rb_node *x; if (da[k - 1] == 1) y = pa[k - 1]; else { } } This code is included in 211. 217. = This code is included in 216. 218. = x = pa[k - 2]; x->rb_color = q->rb_color = RB_RED; y->rb_color = RB_BLACK; x->rb_link[1] = y->rb_link[0]; y->rb_link[0] = x; pa[k - 3]->rb_link[da[k - 3]] = y; break; This code is included in 216. 219. = x = pa[k - 1]; y = pa[k - 2]->rb_link[1] = q; x->rb_link[0] = y->rb_link[1]; q = y->rb_link[1] = x; This code is included in 216. 6.5 Deletion ============ The process of deletion from an RB tree is very much in line with the other algorithms for balanced trees that we've looked at already. This time, the steps are: 1. *Search* for the item to delete. 2. *Delete* the item. 3. *Rebalance* the tree as necessary. 4. *Finish up* and return. Here's an outline of the code. Step 1 is already done for us, because we can reuse the search code from AVL deletion. 220. = void * rb_delete (struct rb_table *tree, const void *item) { struct rb_node *pa[RB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[RB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct rb_node *p; /* The node to delete, or a node part way to it. */ int cmp; /* Result of comparison between item and p. */ assert (tree != NULL && item != NULL); rb 165> } This code is included in 196. See also: [Cormen 1990], section 14.4. 6.5.1 Step 2: Delete -------------------- At this point, p is the node to be deleted and the stack contains all of the nodes on the simple path from the tree's root down to p. The immediate task is to delete p. We break deletion down into the familiar three cases (*note Deleting from a BST::), but before we dive into the code, let's think about the situation. In red-black insertion, we were able to limit the kinds of violation that could occur to rule 1 or rule 2, at our option, by choosing the new node's color. No such luxury is available in deletion, because colors have already been assigned to all of the nodes. In fact, a naive approach to deletion can lead to multiple violations in widely separated parts of a tree. Consider the effects of deletion of node 3 from the following red-black tree tree, supposing that it is a subtree of some larger tree: | 3 __..-' `----...._____ 1 8 _' \ __..-' \ 0 2 6 9 __..-' \ 4 7 \ 5 If we performed this deletion in a literal-minded fashion, we would end up with the tree below, with the following violations: rule 1, between node 6 and its child; rule 2, at node 6; rule 2, at node 4, because the black-height of the subtree as a whole has increased (ignoring the rule 2 violation at node 6); and rule 1, at node 4, only if the subtree's parent is red. The result is difficult to rebalance in general because we have two problem areas to deal with, one at node 4, one at node 6. | 4 __..-' `---...___ 1 8 _' \ __..-' \ 0 2 6 9 _' \ 5 7 Fortunately, we can make things easier for ourselves. We can eliminate the problem area at node 4 simply by recoloring it red, the same color as the node it replaced, as shown below. Then all we have to deal with are the violations at node 6: | 4 __..-' `---...___ 1 8 _' \ __..-' \ 0 2 6 9 _' \ 5 7 This idea holds in general. So, when we replace the deleted node p by a different node q, we set q's color to p's. Besides that, as an implementation detail, we need to keep track of the color of the node that was moved, i.e., node q's former color. We do this here by saving it temporarily in p. In other words, when we replace one node by another during deletion, we swap their colors. Now we know enough to begin the implementation. While reading this code, keep in mind that after deletion, regardless of the case selected, the stack contains a list of the nodes where rebalancing may be required, and da[k - 1] indicates the side of pa[k - 1] from which a node of color p->rb_color was deleted. Here's an outline of the meat of the code: 221. = if (p->rb_link[1] == NULL) { } else { enum rb_color t; struct rb_node *r = p->rb_link[1]; if (r->rb_link[0] == NULL) { } else { } } This code is included in 220. Case 1: p has no right child ............................ In case 1, p has no right child, so we replace it by its left subtree. As a very special case, there is no need to do any swapping of colors (see Exercise 1 for details). 222. = pa[k - 1]->rb_link[da[k - 1]] = p->rb_link[0]; This code is included in 221. Case 2: p's right child has no left child ......................................... In this case, p has a right child r, which in turn has no left child. We replace p by r, swap the colors of nodes p and r, and add r to the stack because we may need to rebalance there. Here's a pre- and post-deletion diagram that shows one possible set of colors out of the possibilities. Node p is shown detached after deletion to make it clear that the colors are swapped: | | p r p / \ / \ a r => a x \ x 223. = r->rb_link[0] = p->rb_link[0]; t = r->rb_color; r->rb_color = p->rb_color; p->rb_color = t; pa[k - 1]->rb_link[da[k - 1]] = r; da[k] = 1; pa[k++] = r; This code is included in 221. Case 3: p's right child has a left child ........................................ In this case, p's right child has a left child. The code here is basically the same as for AVL deletion. We replace p by its inorder successor s and swap their node colors. Because they may require rebalancing, we also add all of the nodes we visit to the stack. Here's a diagram to clear up matters, again with arbitrary colors: | | p s / `----....____ / `---...___ a a _' \ _' \ p ... c ... c _.-' => _.-' r r _.-' \ / \ s b x b \ x 224. = struct rb_node *s; int j = k++; for (;;) { da[k] = 0; pa[k++] = r; s = r->rb_link[0]; if (s->rb_link[0] == NULL) break; r = s; } da[j] = 1; pa[j] = s; pa[j - 1]->rb_link[da[j - 1]] = s; s->rb_link[0] = p->rb_link[0]; r->rb_link[0] = s->rb_link[1]; s->rb_link[1] = p->rb_link[1]; t = s->rb_color; s->rb_color = p->rb_color; p->rb_color = t; This code is included in 221. Exercises: *1. In case 1, why is it unnecessary to swap the colors of p and the node that replaces it? 2. Rewrite to replace the deleted node's rb_data by its successor, then delete the successor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 6.5.2 Step 3: Rebalance ----------------------- At this point, node p has been removed from tree and p->rb_color indicates the color of the node that was removed from the tree. Our first step is to handle one common special case: if we deleted a red node, no rebalancing is necessary, because deletion of a red node cannot violate either rule. Here is the code to avoid rebalancing in this special case: 225. = if (p->rb_color == RB_BLACK) { } This code is included in 220. On the other hand, if a black node was deleted, then we have more work to do. At the least, we have a violation of rule 2. If the deletion brought together two red nodes, as happened in the example in the previous section, there is also a violation of rule 1. We must now fix both of these problems by rebalancing. This time, the rebalancing loop invariant is that the black-height of pa[k - 1]'s subtree on side da[k - 1] is 1 less than the black-height of its other subtree, a rule 2 violation. There may also be a rule 2 violation, such pa[k - 1] and its child on side da[k - 1], which we will call x, are both red. (In the first iteration of the rebalancing loop, node x is the node labeled as such in the diagrams in the previous section.) If this is the case, then the fix for rule 2 is simple: just recolor x black. This increases the black-height and fixes any rule 1 violation as well. If we can do this, we're all done. Otherwise, we have more work to do. Here's the rebalancing loop: 226. = for (;;) { struct rb_node *x = pa[k - 1]->rb_link[da[k - 1]]; if (x != NULL && x->rb_color == RB_RED) { x->rb_color = RB_BLACK; break; } if (k < 2) break; if (da[k - 1] == 0) { } else { } k--; } This code is included in 225. Now we'll take a detailed look at the rebalancing algorithm. As before, we'll only examine the case where the deleted node was in its parent's left subtree, that is, where da[k - 1] is 0. The other case is similar. Recall that x is pa[k - 1]->rb_link[da[k - 1]] and that it may be a null pointer. In the left-side deletion case, x is pa[k - 1]'s left child. We now designate x's "sibling", the right child of pa[k - 1], as w. Jumping right in, here's an outline of the rebalancing code: 227. = struct rb_node *w = pa[k - 1]->rb_link[1]; if (w->rb_color == RB_RED) { } if ((w->rb_link[0] == NULL || w->rb_link[0]->rb_color == RB_BLACK) && (w->rb_link[1] == NULL || w->rb_link[1]->rb_color == RB_BLACK)) { } else { if (w->rb_link[1] == NULL || w->rb_link[1]->rb_color == RB_BLACK) { } break; } This code is included in 226. Case Reduction: Ensure w is black ................................. We know, at this point, that x is a black node or an empty tree. Node w may be red or black. If w is red, we perform a left rotation at the common parent of x and w, labeled A in the diagram below, and recolor A and its own newly acquired parent C. Then we reassign w as the new sibling of x. The effect is to ensure that w is also black, in order to reduce the number of cases: | | A,pa[k-1] C,pa[k-2] / `--..__ _____.....----' `_ x C,w A,pa[k-1] D => _.-' `_ / `_ / \ B D x B,w c d / \ / \ / \ a b c d a b Node w must have children because x is black, in order to satisfy rule 2, and w's children must be black because of rule 1. Here is the code corresponding to this transformation. Because the ancestors of node x change, pa[] and da[] are updated as well as w. 228. = w->rb_color = RB_BLACK; pa[k - 1]->rb_color = RB_RED; pa[k - 1]->rb_link[1] = w->rb_link[0]; w->rb_link[0] = pa[k - 1]; pa[k - 2]->rb_link[da[k - 2]] = w; pa[k] = pa[k - 1]; da[k] = 0; pa[k - 1] = w; k++; w = pa[k - 1]->rb_link[1]; This code is included in 227, 358, and 475. Now we can take care of the three rebalancing cases one by one. Remember that the situation is a deleted black node in the subtree designated x and the goal is to correct a rule 2 violation. Although subtree x may be an empty tree, the diagrams below show it as a black node. That's okay because the code itself never refers to x. The label is supplied for the reader's benefit only. Case 1: w has no red children ............................. If w doesn't have any red children, then it can be recolored red. When we do that, the black-height of the subtree rooted at w has decreased, so we must move up the tree, with pa[k - 1] becoming the new x, to rebalance at w and x's parent. The parent, labeled B in the diagram below, may be red or black. Its color is not changed within the code for this case. If it is red, then the next iteration of the rebalancing loop will recolor it as red immediately and exit. In particular, B will be red if the transformation to make x black was performed earlier. If, on the other hand, B is black, the loop will continue as usual. | | B,pa[k-1] B,x _.-' `_ _.-' `_ A,x C,w => A C / \ / \ / \ / \ a b c d a b c d 229. = w->rb_color = RB_RED; This code is included in 227, 359, 475, and 574. Case 2: w's right child is red .............................. If w's right child is red, we can perform a left rotation at pa[k - 1] and recolor some nodes, and thereby satisfy both of the red-black rules. The loop is then complete. The transformation looks like this: | | B,pa[x-1] C _.-' `_ _.-' `_ A,x C,w B D => / \ / `_ _.-' \ / \ a b c D A c d e / \ / \ d e a b The corresponding code is below. The break is supplied by the enclosing code segment : 230. = w->rb_color = pa[k - 1]->rb_color; pa[k - 1]->rb_color = RB_BLACK; w->rb_link[1]->rb_color = RB_BLACK; pa[k - 1]->rb_link[1] = w->rb_link[0]; w->rb_link[0] = pa[k - 1]; pa[k - 2]->rb_link[da[k - 2]] = w; This code is included in 227, 360, and 477. Case 3: w's left child is red ............................. Because the conditions for neither case 1 nor case 2 apply, the only remaining possibility is that w has a red left child. When this is the case, we can transform it into case 2 by rotating right at w. This causes w to move to the node that was previously w's left child, in this way: | | B,pa[k-1] B,pa[k-1] _.-' `--..__ _.-' `_ A,x D,w A,x C,w => / \ _.-' \ / \ / `_ a b C e a b c D / \ / \ c d d e 231. = struct rb_node *y = w->rb_link[0]; y->rb_color = RB_BLACK; w->rb_color = RB_RED; w->rb_link[0] = y->rb_link[1]; y->rb_link[1] = w; w = pa[k - 1]->rb_link[1] = y; This code is included in 227, 361, and 479. 6.5.3 Step 4: Finish Up ----------------------- All that's left to do is free the node, update counters, and return the deleted item: 232. = tree->rb_alloc->libavl_free (tree->rb_alloc, p); tree->rb_count--; tree->rb_generation++; return (void *) item; This code is included in 220. 6.5.4 Symmetric Case -------------------- 233. = struct rb_node *w = pa[k - 1]->rb_link[0]; if (w->rb_color == RB_RED) { } if ((w->rb_link[0] == NULL || w->rb_link[0]->rb_color == RB_BLACK) && (w->rb_link[1] == NULL || w->rb_link[1]->rb_color == RB_BLACK)) { } else { if (w->rb_link[0] == NULL || w->rb_link[0]->rb_color == RB_BLACK) { } break; } This code is included in 226. 234. = w->rb_color = RB_BLACK; pa[k - 1]->rb_color = RB_RED; pa[k - 1]->rb_link[0] = w->rb_link[1]; w->rb_link[1] = pa[k - 1]; pa[k - 2]->rb_link[da[k - 2]] = w; pa[k] = pa[k - 1]; da[k] = 1; pa[k - 1] = w; k++; w = pa[k - 1]->rb_link[0]; This code is included in 233, 364, and 476. 235. = w->rb_color = RB_RED; This code is included in 233, 365, and 476. 236. = struct rb_node *y = w->rb_link[1]; y->rb_color = RB_BLACK; w->rb_color = RB_RED; w->rb_link[1] = y->rb_link[0]; y->rb_link[0] = w; w = pa[k - 1]->rb_link[0] = y; This code is included in 233, 367, and 480. 237. = w->rb_color = pa[k - 1]->rb_color; pa[k - 1]->rb_color = RB_BLACK; w->rb_link[0]->rb_color = RB_BLACK; pa[k - 1]->rb_link[0] = w->rb_link[1]; w->rb_link[1] = pa[k - 1]; pa[k - 2]->rb_link[da[k - 2]] = w; This code is included in 233, 366, and 478. 6.6 Testing =========== Now we'll present a test program to demonstrate that our code works, using the same framework that has been used in past chapters. The additional code needed is straightforward: 238. = #include #include #include #include "rb.h" #include "test.h" rb 119> rb 104> rb 100> rb 122> 239. = static int compare_trees (struct rb_node *a, struct rb_node *b) { int okay; if (a == NULL || b == NULL) { assert (a == NULL && b == NULL); return 1; } if (*(int *) a->rb_data != *(int *) b->rb_data || ((a->rb_link[0] != NULL) != (b->rb_link[0] != NULL)) || ((a->rb_link[1] != NULL) != (b->rb_link[1] != NULL)) || a->rb_color != b->rb_color) { printf (" Copied nodes differ: a=%d%c b=%d%c a:", *(int *) a->rb_data, a->rb_color == RB_RED ? 'r' : 'b', *(int *) b->rb_data, b->rb_color == RB_RED ? 'r' : 'b'); if (a->rb_link[0] != NULL) printf ("l"); if (a->rb_link[1] != NULL) printf ("r"); printf (" b:"); if (b->rb_link[0] != NULL) printf ("l"); if (b->rb_link[1] != NULL) printf ("r"); printf ("\n"); return 0; } okay = 1; if (a->rb_link[0] != NULL) okay &= compare_trees (a->rb_link[0], b->rb_link[0]); if (a->rb_link[1] != NULL) okay &= compare_trees (a->rb_link[1], b->rb_link[1]); return okay; } This code is included in 238. 240. = /* Examines the binary tree rooted at node. Zeroes *okay if an error occurs. Otherwise, does not modify *okay. Sets *count to the number of nodes in that tree, including node itself if node != NULL. Sets *bh to the tree's black-height. All the nodes in the tree are verified to be at least min but no greater than max. */ static void recurse_verify_tree (struct rb_node *node, int *okay, size_t *count, int min, int max, int *bh) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subbh[2]; /* Black-heights of subtrees. */ if (node == NULL) { *count = 0; *bh = 0; return; } d = *(int *) node->rb_data; recurse_verify_tree (node->rb_link[0], okay, &subcount[0], min, d - 1, &subbh[0]); recurse_verify_tree (node->rb_link[1], okay, &subcount[1], d + 1, max, &subbh[1]); *count = 1 + subcount[0] + subcount[1]; *bh = (node->rb_color == RB_BLACK) + subbh[0]; } This code is included in 238. 241. = if (node->rb_color != RB_RED && node->rb_color != RB_BLACK) { printf (" Node %d is neither red nor black (%d).\n", d, node->rb_color); *okay = 0; } This code is included in 240, 370, 484, and 585. 242. = /* Verify compliance with rule 1. */ if (node->rb_color == RB_RED) { if (node->rb_link[0] != NULL && node->rb_link[0]->rb_color == RB_RED) { printf (" Red node %d has red left child %d\n", d, *(int *) node->rb_link[0]->rb_data); *okay = 0; } if (node->rb_link[1] != NULL && node->rb_link[1]->rb_color == RB_RED) { printf (" Red node %d has red right child %d\n", d, *(int *) node->rb_link[1]->rb_data); *okay = 0; } } This code is included in 240 and 585. 243. = /* Verify compliance with rule 2. */ if (subbh[0] != subbh[1]) { printf (" Node %d has two different black-heights: left bh=%d, " "right bh=%d\n", d, subbh[0], subbh[1]); *okay = 0; } This code is included in 240, 370, 484, and 585. 244. = static int verify_tree (struct rb_table *tree, int array[], size_t n) { int okay = 1; bst_count is correct; bst => rb 110> if (okay) { } if (okay) { } if (okay) { rb 115> } if (okay) { rb 116> } if (okay) { rb 117> } if (okay) { rb 118> } return okay; } This code is included in 238, 368, 482, and 583. 245. = if (tree->rb_root != NULL && tree->rb_root->rb_color != RB_BLACK) { printf (" Tree's root is not black.\n"); okay = 0; } This code is included in 244. 246. = /* Recursively verify tree structure. */ size_t count; int bh; recurse_verify_tree (tree->rb_root, &okay, &count, 0, INT_MAX, &bh); This code is included in 244. 7 Threaded Binary Search Trees ****************************** Traversal in inorder, as done by libavl traversers, is a common operation in a binary tree. To do this efficiently in an ordinary binary search tree or balanced tree, we need to maintain a list of the nodes above the current node, or at least a list of nodes still to be visited. This need leads to the stack used in struct bst_traverser and friends. It's really too bad that we need such stacks for traversal. First, they take up space. Second, they're fragile: if an item is inserted into or deleted from the tree during traversal, or if the tree is balanced, we have to rebuild the traverser's stack. In addition, it can sometimes be difficult to know in advance how tall the stack will need to be, as demonstrated by the code that we wrote to handle stack overflow. These problems are important enough that, in this book, we'll look at two different solutions. This chapter looks at the first of these, which adds special pointers, each called a "thread", to nodes, producing what is called a threaded binary search tree, "threaded tree", or simply a TBST.(1) Later in the book, we'll examine an alternate and more general solution using a "parent pointer" in each node. Here's the outline of the TBST code. We're using the prefix tbst_ this time: 247. = #ifndef TBST_H #define TBST_H 1 #include
tbst 14>
tbst 15> tbst 88> #endif /* tbst.h */ 248. = #include #include #include #include "tbst.h" ---------- Footnotes ---------- (1) This usage of "thread" has nothing to do with the idea of a program with multiple "threads of excecution", a form of multitasking within a single program. 7.1 Threads =========== In an ordinary binary search tree or balanced tree, a lot of the pointer fields go more-or-less unused. Instead of pointing to somewhere useful, they are used to store null pointers. In a sense, they're wasted. What if we were to instead use these fields to point elsewhere in the tree? This is the idea behind a threaded tree. In a threaded tree, a node's left child pointer field, if it would otherwise be a null pointer, is used to point to the node's inorder predecessor. An otherwise-null right child pointer field points to the node's successor. The least-valued node in a threaded tree has a null pointer for its left thread, and the greatest-valued node similarly has a null right thread. These two are the only null pointers in a threaded tree. Here's a sample threaded tree: 3 _.-' `---....____ 2 6 _.-' \ ___..--' `--..___ 1 [3] 4 8 / \ _' `._ _.-' `._ [] [2] [3] 5 7 9 _' \ _' \ _' \ [4] [6] [6] [8] [8] [] This diagram illustrates the convention used for threads in text: thread links are designated by surrounding the node name or value with square brackets. Null threads in the least and greatest nodes are shown as `[0]', which is also used to show threads up to nodes not shown in the diagram. This notation is unfortunate, but less visually confusing than trying to include additional arrows in text art tree diagrams. There are some disadvantages to threaded trees. Each node in an unthreaded tree has only one pointer that leads to it, either from the tree structure or its parent node, but in a threaded tree some nodes have as many as three pointers leading to them: one from the root or parent, one from its predecessor's right thread, and one from its successor's left thread. This means that, although traversing a threaded tree is simpler, building and maintaining a threaded tree is more complicated. As we learned earlier, any node that has a right child has a successor in its right subtree, and that successor has no left child. So, a node in an threaded tree has a left thread pointing back to it if and only if the node has a right child. Similarly, a node has a right thread pointing to it if and only if the node has a left child. Take a look at the sample tree above and check these statements for yourself for some of its nodes. See also: [Knuth 1997], section 2.3.1. 7.2 Data Types ============== We need two extra fields in the node structure to keep track of whether each link is a child pointer or a thread. Each of these fields is called a "tag". The revised struct tbst_node, along with enum tbst_tag for tags, looks like this: 249. = /* Characterizes a link as a child pointer or a thread. */ enum tbst_tag { TBST_CHILD, /* Child pointer. */ TBST_THREAD /* Thread. */ }; /* A threaded binary search tree node. */ struct tbst_node { struct tbst_node *tbst_link[2]; /* Subtrees. */ void *tbst_data; /* Pointer to data. */ unsigned char tbst_tag[2]; /* Tag fields. */ }; This code is included in 247. Each element of tbst_tag[] is set to TBST_CHILD if the corresponding tbst_link[] element is a child pointer, or to TBST_THREAD if it is a thread. The other members of struct tbst_node should be familiar. We also want a revised table structure, because traversers in threaded trees do not need a generation number: 250. = /* Tree data structure. */ struct tbst_table { struct tbst_node *tbst_root; /* Tree's root. */ tbst_comparison_func *tbst_compare; /* Comparison function. */ void *tbst_param; /* Extra argument to tbst_compare. */ struct libavl_allocator *tbst_alloc; /* Memory allocator. */ size_t tbst_count; /* Number of items in tree. */ }; This code is included in 247, 297, 333, 372, 415, 452, 486, 519, and 551. There is no need to define a maximum height for TBST trees because none of the TBST functions use a stack. Exercises: 1. We defined enum tbst_tag for distinguishing threads from child pointers, but declared the actual tag members as unsigned char instead. Why? 7.3 Operations ============== Now that we've changed the basic form of our binary trees, we have to rewrite most of the tree functions. A function designed for use with unthreaded trees will get hopelessly lost in a threaded tree, because it will follow threads that it thinks are child pointers. The only functions we can keep are the totally generic functions defined in terms of other table functions. 251. =
tbst 592> tbst 6>
tbst 594> This code is included in 248. 7.4 Creation ============ Function tbst_create() is the same as bst_create() except that a struct tbst_table has no generation number to fill in. 252. = struct tbst_table * tbst_create (tbst_comparison_func *compare, void *param, struct libavl_allocator *allocator) { struct tbst_table *tree; assert (compare != NULL); if (allocator == NULL) allocator = &tbst_allocator_default; tree = allocator->libavl_malloc (allocator, sizeof *tree); if (tree == NULL) return NULL; tree->tbst_root = NULL; tree->tbst_compare = compare; tree->tbst_param = param; tree->tbst_alloc = allocator; tree->tbst_count = 0; return tree; } This code is included in 251, 300, 336, 375, 418, 455, 489, 522, and 554. 7.5 Search ========== In searching a TBST we just have to be careful to distinguish threads from child pointers. If we hit a thread link, then we've run off the bottom of the tree and the search is unsuccessful. Other that that, a search in a TBST works the same as in any other binary search tree. 253. = void * tbst_find (const struct tbst_table *tree, const void *item) { const struct tbst_node *p; assert (tree != NULL && item != NULL); p = tree->tbst_root; if (p == NULL) return NULL; for (;;) { int cmp, dir; cmp = tree->tbst_compare (item, p->tbst_data, tree->tbst_param); if (cmp == 0) return p->tbst_data; dir = cmp > 0; if (p->tbst_tag[dir] == TBST_CHILD) p = p->tbst_link[dir]; else return NULL; } } This code is included in 251, 300, and 336. 7.6 Insertion ============= It take a little more effort to insert a new node into a threaded BST than into an unthreaded one, but not much more. The only difference is that we now have to set up the new node's left and right threads to point to its predecessor and successor, respectively. Fortunately, these are easy to figure out. Suppose that new node n is the right child of its parent p (the other case is symmetric). This means that p is n's predecessor, because n is the least node in p's right subtree. Moreover, n's successor is the node that was p's successor before n was inserted, that is to say, it is the same as p's former right thread. Here's an example that may help to clear up the description. When new node 3 is inserted as the right child of 2, its left thread points to 2 and its right thread points where 2's right thread formerly did, to 4: 6 6 ___..--' \ ___..--' \ 4 [] 4 [] __..-' `._ ____....---' `._ 2,p 5 => 2,p 5 _.-' \ _' \ _.-' `._ _' \ 1 [4] [4] [6] 1 3,n [4] [6] / \ / \ _' \ [] [2] [] [2] [2] [4] The following code unifies the left-side and right-side cases using dir, which takes the value 1 for a right-side insertion, 0 for a left-side insertion. The side opposite dir can then be expressed simply as !dir. 254. = void ** tbst_probe (struct tbst_table *tree, void *item) { struct tbst_node *p; /* Traverses tree to find insertion point. */ struct tbst_node *n; /* New node. */ int dir; /* Side of p on which n is inserted. */ assert (tree != NULL && item != NULL); return &n->tbst_data; } This code is included in 251. 255. = if (tree->tbst_root != NULL) for (p = tree->tbst_root; ; p = p->tbst_link[dir]) { int cmp = tree->tbst_compare (item, p->tbst_data, tree->tbst_param); if (cmp == 0) return &p->tbst_data; dir = cmp > 0; if (p->tbst_tag[dir] == TBST_THREAD) break; } else { p = (struct tbst_node *) &tree->tbst_root; dir = 0; } This code is included in 254 and 668. 256. = n = tree->tbst_alloc->libavl_malloc (tree->tbst_alloc, sizeof *n); if (n == NULL) return NULL; tree->tbst_count++; n->tbst_data = item; n->tbst_tag[0] = n->tbst_tag[1] = TBST_THREAD; n->tbst_link[dir] = p->tbst_link[dir]; if (tree->tbst_root != NULL) { p->tbst_tag[dir] = TBST_CHILD; n->tbst_link[!dir] = p; } else n->tbst_link[1] = NULL; p->tbst_link[dir] = n; This code is included in 254, 303, and 339. See also: [Knuth 1997], algorithm 2.3.1I. Exercises: 1. What happens if we reverse the order of the final if statement above and the following assignment? 7.7 Deletion ============ When we delete a node from a threaded tree, we have to update one or two more pointers than if it were an unthreaded BST. What's more, we sometimes have to go to a bit of effort to track down what pointers these are, because they are in the predecessor and successor of the node being deleted. The outline is the same as for deleting a BST node: 257. = void * tbst_delete (struct tbst_table *tree, const void *item) { struct tbst_node *p; /* Node to delete. */ struct tbst_node *q; /* Parent of p. */ int dir; /* Index into q->tbst_link[] that leads to p. */ assert (tree != NULL && item != NULL); } This code is included in 251. We search down the tree to find the item to delete, p. As we do it we keep track of its parent q and the direction dir that we descended from it. The initial value of q and dir use the trick seen originally in copying a BST (*note Copying a BST Iteratively::). There are nicer ways to do the same thing, though they are not necessarily as efficient. See the exercises for one possibility. 258. = if (tree->tbst_root == NULL) return NULL; p = tree->tbst_root; q = (struct tbst_node *) &tree->tbst_root; dir = 0; for (;;) { int cmp = tree->tbst_compare (item, p->tbst_data, tree->tbst_param); if (cmp == 0) break; dir = cmp > 0; if (p->tbst_tag[dir] == TBST_THREAD) return NULL; q = p; p = p->tbst_link[dir]; } item = p->tbst_data; This code is included in 257. The cases for deletion from a threaded tree are a bit different from those for an unthreaded tree. The key point to keep in mind is that a node with n children has n threads pointing to it that must be updated when it is deleted. Let's look at the cases in detail now. Here's the outline: 259. = if (p->tbst_tag[1] == TBST_THREAD) { if (p->tbst_tag[0] == TBST_CHILD) { } else { } } else { struct tbst_node *r = p->tbst_link[1]; if (r->tbst_tag[0] == TBST_THREAD) { } else { } } This code is included in 257. Case 1: p has a right thread and a left child ............................................. If p has a right thread and a left child, then we replace it by its left child. We also replace its predecessor t's right thread by p's right thread. In the most general subcase, the whole operation looks something like this: | q | _.-' \ q p [] ___...---' \ ___...---' \ s [] s [q] / `_ / `_ => r u r u / `_ / `_ t x t x / \ / \ v [q] v [p] On the other hand, it can be as simple as this: | q | _.-' \ q p [] _.-' \ _.-' \ => x [] x [q] / \ / \ [] [q] [] [p] Both of these subcases, and subcases in between them in complication, are handled by the same code: 260. = struct tbst_node *t = p->tbst_link[0]; while (t->tbst_tag[1] == TBST_CHILD) t = t->tbst_link[1]; t->tbst_link[1] = p->tbst_link[1]; q->tbst_link[dir] = p->tbst_link[0]; This code is included in 259 and 314. Case 2: p has a right thread and a left thread .............................................. If p is a leaf, then no threads point to it, but we must change its parent q's pointer to p to a thread, pointing to the same place that the corresponding thread of p pointed. This is easy, and typically looks something like this: | q | / `._ q [] p => / \ _' \ [] [] [q] [] There is one special case, which comes up when q is the pseudo-node used for the parent of the root. We can't access tbst_tag[] in this "node". Here's the code: 261. = q->tbst_link[dir] = p->tbst_link[dir]; if (q != (struct tbst_node *) &tree->tbst_root) q->tbst_tag[dir] = TBST_THREAD; This code is included in 259 and 315. Case 3: p's right child has a left thread ......................................... If p has a right child r, and r itself has a left thread, then we delete p by moving r into its place. Here's an example where the root node is deleted: 2,p _.-' `._ 3,r 1 3,r _.-' `--..___ / \ _' `--..___ 1 5 [] [2] [2] 5 => / \ _.-' `._ _.-' `._ [] [3] 4 6 4 6 _' \ _' \ _' \ _' \ [3] [5] [5] [] [3] [5] [5] [] This just involves changing q's right link to point to r, copying p's left link and tag into r, and fixing any thread that pointed to p so that it now points to r. The code is straightforward: 262. = r->tbst_link[0] = p->tbst_link[0]; r->tbst_tag[0] = p->tbst_tag[0]; if (r->tbst_tag[0] == TBST_CHILD) { struct tbst_node *t = r->tbst_link[0]; while (t->tbst_tag[1] == TBST_CHILD) t = t->tbst_link[1]; t->tbst_link[1] = r; } q->tbst_link[dir] = r; This code is included in 259 and 316. Case 4: p's right child has a left child ........................................ If p has a right child, which in turn has a left child, we arrive at the most complicated case. It corresponds to case 3 in deletion from an unthreaded BST. The solution is to find p's successor s and move it in place of p. In this case, r is s's parent node, not necessarily p's right child. There are two subcases here. In the first, s has a right child. In that subcase, s's own successor's left thread already points to s, so we need not adjust any threads. Here's an example of this subcase. Notice how the left thread of node 3, s's successor, already points to s. 1,p _.-' `------......._______ 2,s 0 5 _.-' `----....._____ / \ __..-' \ 0 5 [] [1] 4,r [] / \ __..-' \ ___...---' \ => [] [2] 4,r [] 2,s [5] _.-' \ _' `._ 3 [5] [1] 3 _' \ _' \ [2] [4] [2] [4] The second subcase comes up when s has a right thread. Because s also has a left thread, this means that s is a leaf. This subcase requires us to change r's left link to a thread to its predecessor, which is now s. Here's a continuation of the previous example, showing deletion of the new root, node 2: 2,p _.-' `-----.....______ 3,s 0 5 _.-' `---...___ / \ __..-' \ 0 5 [] [2] 4,r [] => / \ __..-' \ __..-' \ [] [3] 4,r [] 3,s [5] _' \ _' \ [3] [5] [2] [4] The first part of the code handles finding r and s: 263. = struct tbst_node *s; for (;;) { s = r->tbst_link[0]; if (s->tbst_tag[0] == TBST_THREAD) break; r = s; } See also 264 and 265. This code is included in 259 and 317. Next, we update r, handling each of the subcases: 264. += if (s->tbst_tag[1] == TBST_CHILD) r->tbst_link[0] = s->tbst_link[1]; else { r->tbst_link[0] = s; r->tbst_tag[0] = TBST_THREAD; } Finally, we copy p's links and tags into s and chase down and update any right thread in s's left subtree, then replace the pointer from q down to s: 265. += s->tbst_link[0] = p->tbst_link[0]; if (p->tbst_tag[0] == TBST_CHILD) { struct tbst_node *t = p->tbst_link[0]; while (t->tbst_tag[1] == TBST_CHILD) t = t->tbst_link[1]; t->tbst_link[1] = s; s->tbst_tag[0] = TBST_CHILD; } s->tbst_link[1] = p->tbst_link[1]; s->tbst_tag[1] = TBST_CHILD; q->tbst_link[dir] = s; We finish up by deallocating the node, decrementing the tree's item count, and returning the deleted item's data: 266. = tree->tbst_alloc->libavl_free (tree->tbst_alloc, p); tree->tbst_count--; return (void *) item; This code is included in 257. Exercises: *1. In a threaded BST, there is an efficient algorithm to find the parent of a given node. Use this algorithm to reimplement . 2. In case 2, we must handle q as the pseudo-root as a special case. Can we rearrange the TBST data structures to avoid this? 3. Rewrite case 4 to replace the deleted node's tbst_data by its successor and actually delete the successor, instead of moving around pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) *4. Many of the cases in deletion from a TBST require searching down the tree for the nodes with threads to the deleted node. Show that this adds only a constant number of operations to the deletion of a randomly selected node, compared to a similar deletion in an unthreaded tree. 7.8 Traversal ============= Traversal in a threaded BST is much simpler than in an unthreaded one. This is, indeed, much of the point to threading our trees. This section implements all of the libavl traverser functions for threaded trees. Suppose we wish to find the successor of an arbitrary node in a threaded tree. If the node has a right child, then the successor is the smallest item in the node's right subtree. Otherwise, the node has a right thread, and its sucessor is simply the node to which the right thread points. If the right thread is a null pointer, then the node is the largest in the tree. We can find the node's predecessor in a similar manner. We don't ever need to know the parent of a node to traverse the threaded tree, so there's no need to keep a stack. Moreover, because a traverser has no stack to be corrupted by changes to its tree, there is no need to keep or compare generation numbers. Therefore, this is all we need for a TBST traverser structure: 267. = /* TBST traverser structure. */ struct tbst_traverser { struct tbst_table *tbst_table; /* Tree being traversed. */ struct tbst_node *tbst_node; /* Current node in tree. */ }; This code is included in 247, 297, 333, 372, 415, 452, 486, 519, and 551. The traversal functions are collected together here. A few of the functions are implemented directly in terms of their unthreaded BST counterparts, but most must be reimplemented: 268. = tbst 74> tbst 75> This code is included in 251, 300, and 336. See also: [Knuth 1997], algorithm 2.3.1S. 7.8.1 Starting at the Null Node ------------------------------- 269. = void tbst_t_init (struct tbst_traverser *trav, struct tbst_table *tree) { trav->tbst_table = tree; trav->tbst_node = NULL; } This code is included in 268, 395, 502, and 546. 7.8.2 Starting at the First Node -------------------------------- 270. = void * tbst_t_first (struct tbst_traverser *trav, struct tbst_table *tree) { assert (tree != NULL && trav != NULL); trav->tbst_table = tree; trav->tbst_node = tree->tbst_root; if (trav->tbst_node != NULL) { while (trav->tbst_node->tbst_tag[0] == TBST_CHILD) trav->tbst_node = trav->tbst_node->tbst_link[0]; return trav->tbst_node->tbst_data; } else return NULL; } This code is included in 268. 7.8.3 Starting at the Last Node ------------------------------- 271. = void * tbst_t_last (struct tbst_traverser *trav, struct tbst_table *tree) { assert (tree != NULL && trav != NULL); trav->tbst_table = tree; trav->tbst_node = tree->tbst_root; if (trav->tbst_node != NULL) { while (trav->tbst_node->tbst_tag[1] == TBST_CHILD) trav->tbst_node = trav->tbst_node->tbst_link[1]; return trav->tbst_node->tbst_data; } else return NULL; } This code is included in 268. 7.8.4 Starting at a Found Node ------------------------------ The code for this function is derived with few changes from . 272. = void * tbst_t_find (struct tbst_traverser *trav, struct tbst_table *tree, void *item) { struct tbst_node *p; assert (trav != NULL && tree != NULL && item != NULL); trav->tbst_table = tree; trav->tbst_node = NULL; p = tree->tbst_root; if (p == NULL) return NULL; for (;;) { int cmp, dir; cmp = tree->tbst_compare (item, p->tbst_data, tree->tbst_param); if (cmp == 0) { trav->tbst_node = p; return p->tbst_data; } dir = cmp > 0; if (p->tbst_tag[dir] == TBST_CHILD) p = p->tbst_link[dir]; else return NULL; } } This code is included in 268. 7.8.5 Starting at an Inserted Node ---------------------------------- This implementation is a trivial adaptation of . In particular, management of generation numbers has been removed. 273. = void * tbst_t_insert (struct tbst_traverser *trav, struct tbst_table *tree, void *item) { void **p; assert (trav != NULL && tree != NULL && item != NULL); p = tbst_probe (tree, item); if (p != NULL) { trav->tbst_table = tree; trav->tbst_node = ((struct tbst_node *) ((char *) p - offsetof (struct tbst_node, tbst_data))); return *p; } else { tbst_t_init (trav, tree); return NULL; } } This code is included in 268, 395, and 546. 7.8.6 Initialization by Copying ------------------------------- 274. = void * tbst_t_copy (struct tbst_traverser *trav, const struct tbst_traverser *src) { assert (trav != NULL && src != NULL); trav->tbst_table = src->tbst_table; trav->tbst_node = src->tbst_node; return trav->tbst_node != NULL ? trav->tbst_node->tbst_data : NULL; } This code is included in 268, 395, 502, and 546. 7.8.7 Advancing to the Next Node -------------------------------- Despite the earlier discussion (*note Traversing a TBST::), there are actually three cases, not two, in advancing within a threaded binary tree. The extra case turns up when the current node is the null item. We deal with that case by calling out to tbst_t_first(). Notice also that, below, in the case of following a thread we must check for a null node, but not in the case of following a child pointer. 275. = void * tbst_t_next (struct tbst_traverser *trav) { assert (trav != NULL); if (trav->tbst_node == NULL) return tbst_t_first (trav, trav->tbst_table); else if (trav->tbst_node->tbst_tag[1] == TBST_THREAD) { trav->tbst_node = trav->tbst_node->tbst_link[1]; return trav->tbst_node != NULL ? trav->tbst_node->tbst_data : NULL; } else { trav->tbst_node = trav->tbst_node->tbst_link[1]; while (trav->tbst_node->tbst_tag[0] == TBST_CHILD) trav->tbst_node = trav->tbst_node->tbst_link[0]; return trav->tbst_node->tbst_data; } } This code is included in 268. See also: [Knuth 1997], algorithm 2.3.1S. 7.8.8 Backing Up to the Previous Node ------------------------------------- 276. = void * tbst_t_prev (struct tbst_traverser *trav) { assert (trav != NULL); if (trav->tbst_node == NULL) return tbst_t_last (trav, trav->tbst_table); else if (trav->tbst_node->tbst_tag[0] == TBST_THREAD) { trav->tbst_node = trav->tbst_node->tbst_link[0]; return trav->tbst_node != NULL ? trav->tbst_node->tbst_data : NULL; } else { trav->tbst_node = trav->tbst_node->tbst_link[0]; while (trav->tbst_node->tbst_tag[1] == TBST_CHILD) trav->tbst_node = trav->tbst_node->tbst_link[1]; return trav->tbst_node->tbst_data; } } This code is included in 268. 7.9 Copying =========== We can use essentially the same algorithm to copy threaded BSTs as unthreaded (see ). Some modifications are necessary, of course. The most obvious change is that the threads must be set up. This is not hard. We can do it the same way that tbst_probe() does. Less obvious is the way to get rid of the stack. In bst_copy(), the stack was used to keep track of as yet incompletely processed parents of the current node. When we came back to one of these nodes, we did the actual copy of the node data, then visited the node's right subtree, if non-empty. In a threaded tree, we can replace the use of the stack by the use of threads. Instead of popping an item off the stack when we can't move down in the tree any further, we follow the node's right thread. This brings us up to an ancestor (parent, grandparent, ...) of the node, which we can then deal with in the same way as before. This diagram shows the threads that would be followed to find parents in copying a couple of different threaded binary trees. Of course, the TBSTs would have complete sets of threads, but only the ones that are followed are shown: 5 4 ___...---' `--..__ __..-' `-.__ 2 8 2 6 _.-' `-.__ __..-' \ _.-' \ _.-' \ 1 4 6 9 1 3 5 7 _.-' \ _.-' \ \ \ \ \ \ \ 0 [2] 3 [5] 7 [] [2] [4] [6] [] \ \ \ [1] [4] [8] Why does following the right thread from a node bring us to one of the node's ancestors? Consider the algorithm for finding the successor of a node with no right child, described earlier (*note Better Iterative Traversal::). This algorithm just moves up the tree from a node to its parent, grandparent, etc., guaranteeing that the successor will be a ancestor of the original node. How do we know that following the right thread won't take us too far up the tree and skip copying some subtree? Because we only move up to the right one time using that same algorithm. When we move up to the left, we're going back to some binary tree whose right subtree we've already dealt with (we are currently in the right subtree of that binary tree, so of course we've dealt with it). In conclusion, following the right thread always takes us to just the node whose right subtree we want to copy next. Of course, if that node happens to have an empty right subtree, then there is nothing to do, so we just continue along the next right thread, and so on. The first step is to build a function to copy a single node. The following function copy_node() does this, creating a new node as the child of an existing node: 277. = /* Creates a new node as a child of dst on side dir. Copies data from src into the new node, applying copy(), if non-null. Returns nonzero only if fully successful. Regardless of success, integrity of the tree structure is assured, though failure may leave a null pointer in a tbst_data member. */ static int copy_node (struct tbst_table *tree, struct tbst_node *dst, int dir, const struct tbst_node *src, tbst_copy_func *copy) { struct tbst_node *new = tree->tbst_alloc->libavl_malloc (tree->tbst_alloc, sizeof *new); if (new == NULL) return 0; new->tbst_link[dir] = dst->tbst_link[dir]; new->tbst_tag[dir] = TBST_THREAD; new->tbst_link[!dir] = dst; new->tbst_tag[!dir] = TBST_THREAD; dst->tbst_link[dir] = new; dst->tbst_tag[dir] = TBST_CHILD; if (copy == NULL) new->tbst_data = src->tbst_data; else { new->tbst_data = copy (src->tbst_data, tree->tbst_param); if (new->tbst_data == NULL) return 0; } return 1; } This code is included in 278. Using the node copy function above, constructing the tree copy function is easy. In fact, the code is considerably easier to read than our original function to iteratively copy an unthreaded binary tree (*note Handling Errors in Iterative BST Copying::), because this function is not as heavily optimized. One tricky part is getting the copy started. We can't use the dirty trick from bst_copy() of casting the address of a bst_root to a node pointer, because we need access to the first tag as well as the first link (see Exercise 2 for a way to sidestep this problem). So instead we use a couple of "pseudo-root" nodes rp and rq, allocated locally. 278. = This code is included in 251. 279. = struct tbst_table * tbst_copy (const struct tbst_table *org, tbst_copy_func *copy, tbst_item_func *destroy, struct libavl_allocator *allocator) { struct tbst_table *new; const struct tbst_node *p; struct tbst_node *q; struct tbst_node rp, rq; assert (org != NULL); new = tbst_create (org->tbst_compare, org->tbst_param, allocator != NULL ? allocator : org->tbst_alloc); if (new == NULL) return NULL; new->tbst_count = org->tbst_count; if (new->tbst_count == 0) return new; p = &rp; rp.tbst_link[0] = org->tbst_root; rp.tbst_tag[0] = TBST_CHILD; q = &rq; rq.tbst_link[0] = NULL; rq.tbst_tag[0] = TBST_THREAD; for (;;) { if (p->tbst_tag[0] == TBST_CHILD) { if (!copy_node (new, q, 0, p->tbst_link[0], copy)) { copy_error_recovery (rq.tbst_link[0], new, destroy); return NULL; } p = p->tbst_link[0]; q = q->tbst_link[0]; } else { while (p->tbst_tag[1] == TBST_THREAD) { p = p->tbst_link[1]; if (p == NULL) { q->tbst_link[1] = NULL; new->tbst_root = rq.tbst_link[0]; return new; } q = q->tbst_link[1]; } p = p->tbst_link[1]; q = q->tbst_link[1]; } if (p->tbst_tag[1] == TBST_CHILD) if (!copy_node (new, q, 1, p->tbst_link[1], copy)) { copy_error_recovery (rq.tbst_link[0], new, destroy); return NULL; } } } This code is included in 278 and 329. A sensitive issue in the code above is treatment of the final thread. The initial call to copy_node() causes a right thread to point to rq, but it needs to be a null pointer. We need to perform this kind of transformation: rq rq ______......-----' _____.....-----' 2 2 _.-' `--..___ _.-' `--..___ 1 4 => 1 4 / \ _.-' `._ / \ _.-' `._ [] [2] 3 5 [] [2] 3 5 _' \ _' \ _' \ _' \ [2] [4] [4] [rq] [2] [4] [4] [] When the copy is successful, this is just a matter of setting the final q's right child pointer to NULL, but when it is unsuccessful we have to find the pointer in question, which is in the greatest node in the tree so far (to see this, try constructing a few threaded BSTs by hand on paper). Function copy_error_recovery() does this, as well as destroying the tree. It also handles the case of failure when no nodes have yet been added to the tree: 280. = static void copy_error_recovery (struct tbst_node *p, struct tbst_table *new, tbst_item_func *destroy) { new->tbst_root = p; if (p != NULL) { while (p->tbst_tag[1] == TBST_CHILD) p = p->tbst_link[1]; p->tbst_link[1] = NULL; } tbst_destroy (new, destroy); } This code is included in 278 and 329. Exercises: 1. In the diagram above that shows examples of threads followed while copying a TBST, all right threads in the TBSTs are shown. Explain how this is not just a coincidence. 2. Suggest some optimization possibilities for tbst_copy(). 7.10 Destruction ================ Destroying a threaded binary tree is easy. We can simply traverse the tree in inorder in the usual way. We always have a way to get to the next node without having to go back up to any of the nodes we've already destroyed. (We do, however, have to make sure to go find the next node before destroying the current one, in order to avoid reading data from freed memory.) Here's all it takes: 281. = void tbst_destroy (struct tbst_table *tree, tbst_item_func *destroy) { struct tbst_node *p; /* Current node. */ struct tbst_node *n; /* Next node. */ p = tree->tbst_root; if (p != NULL) while (p->tbst_tag[0] == TBST_CHILD) p = p->tbst_link[0]; while (p != NULL) { n = p->tbst_link[1]; if (p->tbst_tag[1] == TBST_CHILD) while (n->tbst_tag[0] == TBST_CHILD) n = n->tbst_link[0]; if (destroy != NULL && p->tbst_data != NULL) destroy (p->tbst_data, tree->tbst_param); tree->tbst_alloc->libavl_free (tree->tbst_alloc, p); p = n; } tree->tbst_alloc->libavl_free (tree->tbst_alloc, tree); } This code is included in 251, 300, and 336. 7.11 Balance ============ Just like their unthreaded cousins, threaded binary trees can become degenerate, leaving their good performance characteristics behind. When this happened in a unthreaded BST, stack overflow often made it necessary to rebalance the tree. This doesn't happen in our implementation of threaded BSTs, because none of the routines uses a stack. It is still useful to have a rebalance routine for performance reasons, so we will implement one, in this section, anyway. There is no need to change the basic algorithm. As before, we convert the tree to a linear "vine", then the vine to a balanced binary search tree. *Note Balancing a BST::, for a review of the balancing algorithm. Here is the outline and prototype for tbst_balance(). 282. = This code is included in 251. 283. = /* Balances tree. */ void tbst_balance (struct tbst_table *tree) { assert (tree != NULL); tree_to_vine (tree); vine_to_tree (tree); } This code is included in 282 and 408. 7.11.1 From Tree to Vine ------------------------ We could transform a threaded binary tree into a vine in the same way we did for unthreaded binary trees, by use of rotations (*note Transforming a BST into a Vine::). But one of the reasons we did it that way was to avoid use of a stack, which is no longer a problem. It's now simpler to rearrange nodes by inorder traversal. We start by finding the minimum node in the tree as p, which will step through the tree in inorder. During each trip through the main loop, we find p's successor as q and make p the left child of q. We also have to make sure that p's right thread points to q. That's all there is to it. 284. = static void tree_to_vine (struct tbst_table *tree) { struct tbst_node *p; if (tree->tbst_root == NULL) return; p = tree->tbst_root; while (p->tbst_tag[0] == TBST_CHILD) p = p->tbst_link[0]; for (;;) { struct tbst_node *q = p->tbst_link[1]; if (p->tbst_tag[1] == TBST_CHILD) { while (q->tbst_tag[0] == TBST_CHILD) q = q->tbst_link[0]; p->tbst_tag[1] = TBST_THREAD; p->tbst_link[1] = q; } if (q == NULL) break; q->tbst_tag[0] = TBST_CHILD; q->tbst_link[0] = p; p = q; } tree->tbst_root = p; } This code is included in 282. Sometimes one trip through the main loop above will put the TBST into an inconsistent state, where two different nodes are the parent of a third node. Such an inconsistency is always corrected in the next trip through the loop. An example is warranted. Suppose the original threaded binary tree looks like this, with nodes p and q for the initial iteration of the loop as marked: 3 ____....---' \ 1,p [] / `._ [] 2,q _' \ [1] [3] The first trip through the loop makes p, 1, the child of q, 2, but p's former parent's left child pointer still points to p. We now have a situation where node 1 has two parents: both 2 and 3. This diagram tries to show the situation by omitting the line that would otherwise lead down from 3 to 2: 3 \ 2,q [] __..-' \ 1,p [3] / \ [] [2] On the other hand, node 2's right thread still points to 3, so on the next trip through the loop there is no trouble finding the new p's successor. Node 3 is made the parent of 2 and all is well. This diagram shows the new p and q, then the fixed-up vine. The only difference is that node 3 now, correctly, has 2 as its left child: 3,q 3,q \ __..-' \ 2,p [] 2,p [] _.-' \ => _.-' \ 1 [3] 1 [3] / \ / \ [] [2] [] [2] 7.11.2 From Vine to Balanced Tree --------------------------------- Transforming a vine into a balanced threaded BST is similar to the same operation on an unthreaded BST. We can use the same algorithm, adjusting it for presence of the threads. The following outline is similar to . In fact, we entirely reuse , just changing bst to tbst. We omit the final check on the tree's height, because none of the TBST functions are height-limited. 285. = static void vine_to_tree (struct tbst_table *tree) { unsigned long vine; /* Number of nodes in main vine. */ unsigned long leaves; /* Nodes in incomplete bottom level, if any. */ int height; /* Height of produced balanced tree. */ tbst 91> } This code is included in 282 and 408. Not many changes are needed to adapt the algorithm to handle threads. Consider the basic right rotation transformation used during a compression: | | R B _.-' \ / `_ B c => a R / \ / \ a b b c The rotation does not disturb a or c, so the only node that can cause trouble is b. If b is a real child node, then there's no need to do anything differently. But if b is a thread, then we have to swap around the direction of the thread, like this: | | R B __..-' \ / `._ B c => a R / \ _' \ a [R] [B] c After a rotation that involves a thread, the next rotation on B will not involve a thread. So after we perform a rotation that adjusts a thread in one place, the next one in the same place will not require a thread adjustment. Every node in the vine we start with has a thread as its right link. This means that during the first pass along the main vine we must perform thread adjustments at every node, but subsequent passes along the vine must not perform any adjustments. This simple idea is complicated by the initial partial compression pass in trees that do not have exactly one fewer than a power of two nodes. After a partial compression pass, the nodes at the top of the main vine no longer have right threads, but the ones farther down still do. We deal with this complication by defining the compress() function so it can handle a mixture of rotations with and without right threads. The rotations that need thread adjustments will always be below the ones that do not, so this function simply takes a pair of parameters, the first specifying how many rotations without thread adjustment to perform, the next how many with thread adjustment. Compare this code to that for unthreaded BSTs: 286. = /* Performs a nonthreaded compression operation nonthread times, then a threaded compression operation thread times, starting at root. */ static void compress (struct tbst_node *root, unsigned long nonthread, unsigned long thread) { assert (root != NULL); while (nonthread--) { struct tbst_node *red = root->tbst_link[0]; struct tbst_node *black = red->tbst_link[0]; root->tbst_link[0] = black; red->tbst_link[0] = black->tbst_link[1]; black->tbst_link[1] = red; root = black; } while (thread--) { struct tbst_node *red = root->tbst_link[0]; struct tbst_node *black = red->tbst_link[0]; root->tbst_link[0] = black; red->tbst_link[0] = black; red->tbst_tag[0] = TBST_THREAD; black->tbst_tag[1] = TBST_CHILD; root = black; } } This code is included in 282. When we reduce the general case to the 2**n - 1 special case, all of the rotations adjust threads: 287. = compress ((struct tbst_node *) &tree->tbst_root, 0, leaves); This code is included in 285. We deal with the first compression specially, in order to clean up any remaining unadjusted threads: 288. = vine = tree->tbst_count - leaves; height = 1 + (leaves > 0); if (vine > 1) { unsigned long nonleaves = vine / 2; leaves /= 2; if (leaves > nonleaves) { leaves = nonleaves; nonleaves = 0; } else nonleaves -= leaves; compress ((struct tbst_node *) &tree->tbst_root, leaves, nonleaves); vine /= 2; height++; } See also 289. This code is included in 285. After this, all the remaining compressions use only rotations without thread adjustment, and we're done: 289. += while (vine > 1) { compress ((struct tbst_node *) &tree->tbst_root, vine / 2, 0); vine /= 2; height++; } 7.12 Testing ============ There's little new in the testing code. We do add an test for tbst_balance(), because none of the existing tests exercise it. This test doesn't check that tbst_balance() actually balances the tree, it just verifies that afterwards the tree contains the items it should, so to be certain that balancing is correct, turn up the verbosity and look at the trees printed. Function print_tree_structure() prints thread node numbers preceded by `>', with null threads indicated by `>>'. This notation is compatible with the plain text output format of the `texitree' program used to draw the binary trees in this book. (It will cause errors for PostScript output because it omits node names.) 290. = #include #include #include #include "tbst.h" #include "test.h" tbst 104> tbst 122> 291. = void print_tree_structure (struct tbst_node *node, int level) { int i; if (level > 16) { printf ("[...]"); return; } if (node == NULL) { printf (""); return; } printf ("%d(", node->tbst_data ? *(int *) node->tbst_data : -1); for (i = 0; i <= 1; i++) { if (node->tbst_tag[i] == TBST_CHILD) { if (node->tbst_link[i] == node) printf ("loop"); else print_tree_structure (node->tbst_link[i], level + 1); } else if (node->tbst_link[i] != NULL) printf (">%d", (node->tbst_link[i]->tbst_data ? *(int *) node->tbst_link[i]->tbst_data : -1)); else printf (">>"); if (i == 0) fputs (", ", stdout); } putchar (')'); } void print_whole_tree (const struct tbst_table *tree, const char *title) { printf ("%s: ", title); print_tree_structure (tree->tbst_root, 0); putchar ('\n'); } This code is included in 290, 330, and 368. 292. = static int compare_trees (struct tbst_node *a, struct tbst_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->tbst_data : -1, b ? *(int *) b->tbst_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->tbst_data != *(int *) b->tbst_data || a->tbst_tag[0] != b->tbst_tag[0] || a->tbst_tag[1] != b->tbst_tag[1]) { printf (" Copied nodes differ: a=%d b=%d a:", *(int *) a->tbst_data, *(int *) b->tbst_data); if (a->tbst_tag[0] == TBST_CHILD) printf ("l"); if (a->tbst_tag[1] == TBST_CHILD) printf ("r"); printf (" b:"); if (b->tbst_tag[0] == TBST_CHILD) printf ("l"); if (b->tbst_tag[1] == TBST_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->tbst_tag[0] == TBST_THREAD) assert ((a->tbst_link[0] == NULL) != (a->tbst_link[0] != b->tbst_link[0])); if (a->tbst_tag[1] == TBST_THREAD) assert ((a->tbst_link[1] == NULL) != (a->tbst_link[1] != b->tbst_link[1])); okay = 1; if (a->tbst_tag[0] == TBST_CHILD) okay &= compare_trees (a->tbst_link[0], b->tbst_link[0]); if (a->tbst_tag[1] == TBST_CHILD) okay &= compare_trees (a->tbst_link[1], b->tbst_link[1]); return okay; } This code is included in 290. 293. = static void recurse_verify_tree (struct tbst_node *node, int *okay, size_t *count, int min, int max) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ if (node == NULL) { *count = 0; return; } d = *(int *) node->tbst_data; subcount[0] = subcount[1] = 0; if (node->tbst_tag[0] == TBST_CHILD) recurse_verify_tree (node->tbst_link[0], okay, &subcount[0], min, d - 1); if (node->tbst_tag[1] == TBST_CHILD) recurse_verify_tree (node->tbst_link[1], okay, &subcount[1], d + 1, max); *count = 1 + subcount[0] + subcount[1]; } This code is included in 290. 294. = static int verify_tree (struct tbst_table *tree, int array[], size_t n) { int okay = 1; bst_count is correct; bst => tbst 110> if (okay) { tbst 111> } if (okay) { tbst 115> } if (okay) { tbst 116> } if (okay) { tbst 117> } if (okay) { tbst 118> } return okay; } This code is included in 290. 295. = int test_correctness (struct libavl_allocator *allocator, int insert[], int delete[], int n, int verbosity) { struct tbst_table *tree; int okay = 1; int i; tbst 102> tbst 103> tbst 105> tbst 108> return okay; } This code is included in 290, 411, and 515. 296. = /* Test tbst_balance(). */ if (verbosity >= 2) printf (" Testing balancing...\n"); tree = tbst_create (compare_ints, NULL, allocator); if (tree == NULL) { if (verbosity >= 0) printf (" Out of memory creating tree.\n"); return 1; } for (i = 0; i < n; i++) { void **p = tbst_probe (tree, &insert[i]); if (p == NULL) { if (verbosity >= 0) printf (" Out of memory in insertion.\n"); tbst_destroy (tree, NULL); return 1; } if (*p != &insert[i]) printf (" Duplicate item in tree!\n"); } if (verbosity >= 4) print_whole_tree (tree, " Pre-balance"); tbst_balance (tree); if (verbosity >= 4) print_whole_tree (tree, " Post-balance"); if (!verify_tree (tree, insert, n)) return 0; tbst_destroy (tree, NULL); This code is included in 295. 8 Threaded AVL Trees ******************** The previous chapter introduced a new concept in BSTs, the idea of threads. Threads allowed us to simplify traversals and eliminate the use of stacks. On the other hand, threaded trees can still grow tall enough that they reduce the program's performance unacceptably, the problem that balanced trees were meant to solve. Ideally, we'd like to add threads to balanced trees, to produce threaded balanced trees that combine the best of both worlds. We can do this, and it's not even very difficult. This chapter will show how to add threads to AVL trees. The next will show how to add them to red-black trees. Here's an outline of the table implementation for threaded AVL or "TAVL" trees that we'll develop in this chapter. Note the usage of prefix tavl_ for these functions. 297. = #ifndef TAVL_H #define TAVL_H 1 #include
tavl 14> tavl 28> tavl 250> tavl 267>
tavl 15> #endif /* tavl.h */ 298. = #include #include #include #include "tavl.h" 8.1 Data Types ============== The TAVL node structure takes the basic fields for a BST and adds a balance factor for AVL balancing and a pair of tag fields to allow for threading. 299. = /* Characterizes a link as a child pointer or a thread. */ enum tavl_tag { TAVL_CHILD, /* Child pointer. */ TAVL_THREAD /* Thread. */ }; /* An TAVL tree node. */ struct tavl_node { struct tavl_node *tavl_link[2]; /* Subtrees. */ void *tavl_data; /* Pointer to data. */ unsigned char tavl_tag[2]; /* Tag fields. */ signed char tavl_balance; /* Balance factor. */ }; This code is included in 297. Exercises: 1. struct avl_node contains three pointer members and a single character member, whereas struct tavl_node additionally contains an array of two characters. Is struct tavl_node necessarily larger than struct avl_node? 8.2 Rotations ============= Rotations are just as useful in threaded BSTs as they are in unthreaded ones. We do need to re-examine the idea, though, to see how the presence of threads affect rotations. A generic rotation looks like this diagram taken from *Note BST Rotations::: | | Y X / \ / \ X c a Y ^ ^ a b b c Any of the subtrees labeled a, b, and c may be in fact threads. In the most extreme case, all of them are threads, and the rotation looks like this: Y X _.-' \ / `._ X [] [] Y / \ _' \ [] [Y] [X] [] As you can see, the thread from X to Y, represented by subtree b, reverses direction and becomes a thread from Y to X following a right rotation. This has to be handled as a special case in code for rotation. See Exercise 1 for details. On the other hand, there is no need to do anything special with threads originating in subtrees of a rotated node. This is a direct consequence of the locality and order-preserving properties of a rotation (*note BST Rotations::). Here's an example diagram to demonstrate. Note in particular that the threads from A, B, and C point to the same nodes in both trees: Y X ___..--' `._ _.-' `--..___ X C A Y _.-' `._ _' \ / \ _.-' `._ A B [Y] [] [] [X] B C / \ _' \ _' \ _' \ [] [X] [X] [Y] [X] [Y] [Y] [] Exercises: 1. Write functions for right and left rotations in threaded BSTs, analogous to those for unthreaded BSTs developed in Exercise 4.3-2. 8.3 Operations ============== Now we'll implement all the usual operations for TAVL trees. We can reuse everything from TBSTs except insertion, deletion, and copy functions. Most of the copy function code will in fact be reused also. Here's the outline: 300. = tavl 252> tavl 253>
tavl 592> tavl 268> tavl 281> tavl 6>
tavl 594> This code is included in 298. 8.4 Insertion ============= Insertion into an AVL tree is not complicated much by the need to update threads. The outline is the same as before, and the code for step 3 and the local variable declarations can be reused entirely: 301. = void ** tavl_probe (struct tavl_table *tree, void *item) { tavl 147> assert (tree != NULL && item != NULL); tavl 150> } This code is included in 300. 8.4.1 Steps 1 and 2: Search and Insert -------------------------------------- The first step is a lot like the unthreaded AVL version in . There is an unfortunate special case for an empty tree, because a null pointer for tavl_root indicates an empty tree but in a nonempty tree we must seek a thread link. After we're done, p, not q as before, is the node below which a new node should be inserted, because the test for stepping outside the binary tree now comes before advancing p. 302. = z = (struct tavl_node *) &tree->tavl_root; y = tree->tavl_root; if (y != NULL) { for (q = z, p = y; ; q = p, p = p->tavl_link[dir]) { int cmp = tree->tavl_compare (item, p->tavl_data, tree->tavl_param); if (cmp == 0) return &p->tavl_data; if (p->tavl_balance != 0) z = q, y = p, k = 0; da[k++] = dir = cmp > 0; if (p->tavl_tag[dir] == TAVL_THREAD) break; } } else { p = z; dir = 0; } This code is included in 301. The insertion adds to the TBST code by setting the balance factor of the new node and handling the first insertion into an empty tree as a special case: 303. = tavl 256> n->tavl_balance = 0; if (tree->tavl_root == n) return &n->tavl_data; This code is included in 301. 8.4.2 Step 4: Rebalance ----------------------- Now we're finally to the interesting part, the rebalancing step. We can tell whether rebalancing is necessary based on the balance factor of y, the same as in unthreaded AVL insertion: 304. = if (y->tavl_balance == -2) { } else if (y->tavl_balance == +2) { } else return &n->tavl_data; z->tavl_link[y != z->tavl_link[0]] = w; return &n->tavl_data; This code is included in 301. We will examine the case of insertion in the left subtree of y, the node at which we must rebalance. We take x as y's child on the side of the new node, then, as for unthreaded AVL insertion, we distinguish two cases based on the balance factor of x: 305. = struct tavl_node *x = y->tavl_link[0]; if (x->tavl_balance == -1) { } else { } This code is included in 304. Case 1: x has - balance factor .............................. As for unthreaded insertion, we rotate right at y (*note Rebalancing AVL Trees::). Notice the resemblance of the following code to rotate_right() in the solution to Exercise 8.2-1. 306. = w = x; if (x->tavl_tag[1] == TAVL_THREAD) { x->tavl_tag[1] = TAVL_CHILD; y->tavl_tag[0] = TAVL_THREAD; y->tavl_link[0] = x; } else y->tavl_link[0] = x->tavl_link[1]; x->tavl_link[1] = y; x->tavl_balance = y->tavl_balance = 0; This code is included in 305. Case 2: x has + balance factor .............................. When x has a + balance factor, we perform the transformation shown below, which consists of a left rotation at x followed by a right rotation at y. This is the same transformation used in unthreaded insertion: | y | <--> w __.-' \ <0> x d / \ <+> => x y / \ ^ ^ a w a b c d ^ b c We could simply apply the standard code from Exercise 8.2-1 in each rotation (see Exercise 1), but it is just as straightforward to do both of the rotations together, then clean up any threads. Subtrees a and d cannot cause thread-related trouble, because they are not disturbed during the transformation: a remains x's left child and d remains y's right child. The children of w, subtrees b and c, do require handling. If subtree b is a thread, then after the rotation and before fix-up x's right link points to itself, and, similarly, if c is a thread then y's left link points to itself. These links must be changed into threads to w instead, and w's links must be tagged as child pointers. If both b and c are threads then the transformation looks like the diagram below, showing pre-rebalancing and post-rebalancing, post-fix-up views. The AVL balance rule implies that if b and c are threads then a and d are also: | y | <--> w ___...---' \ <0> x [] _.-' `._ <+> => x y / `._ / \ _' \ [] w [] [w] [w] [] _' \ [x] [y] The required code is heavily based on the corresponding code for unthreaded AVL rebalancing: 307. = tavl 156> if (w->tavl_tag[0] == TAVL_THREAD) { x->tavl_tag[1] = TAVL_THREAD; x->tavl_link[1] = w; w->tavl_tag[0] = TAVL_CHILD; } if (w->tavl_tag[1] == TAVL_THREAD) { y->tavl_tag[0] = TAVL_THREAD; y->tavl_link[0] = w; w->tavl_tag[1] = TAVL_CHILD; } This code is included in 305, 324, and 667. Exercises: 1. Rewrite in terms of the routines from Exercise 8.2-1. 8.4.3 Symmetric Case -------------------- Here is the corresponding code for the case where insertion occurs in the right subtree of y. 308. = struct tavl_node *x = y->tavl_link[1]; if (x->tavl_balance == +1) { } else { } This code is included in 304. 309. = w = x; if (x->tavl_tag[0] == TAVL_THREAD) { x->tavl_tag[0] = TAVL_CHILD; y->tavl_tag[1] = TAVL_THREAD; y->tavl_link[1] = x; } else y->tavl_link[1] = x->tavl_link[0]; x->tavl_link[0] = y; x->tavl_balance = y->tavl_balance = 0; This code is included in 308. 310. = tavl 159> if (w->tavl_tag[0] == TAVL_THREAD) { y->tavl_tag[1] = TAVL_THREAD; y->tavl_link[1] = w; w->tavl_tag[0] = TAVL_CHILD; } if (w->tavl_tag[1] == TAVL_THREAD) { x->tavl_tag[0] = TAVL_THREAD; x->tavl_link[0] = w; w->tavl_tag[1] = TAVL_CHILD; } This code is included in 308, 320, and 666. 8.5 Deletion ============ Deletion from a TAVL tree can be accomplished by combining our knowledge about AVL trees and threaded trees. From one perspective, we add rebalancing to TBST deletion. From the other perspective, we add thread handling to AVL tree deletion. The function outline is about the same as usual. We do add a helper function for finding the parent of a TAVL node: 311. = tavl 327> void * tavl_delete (struct tavl_table *tree, const void *item) { struct tavl_node *p; /* Traverses tree to find node to delete. */ struct tavl_node *q; /* Parent of p. */ int dir; /* Index into q->tavl_link[] to get p. */ int cmp; /* Result of comparison between item and p. */ assert (tree != NULL && item != NULL); } This code is included in 300. 8.5.1 Step 1: Search -------------------- We use p to search down the tree and keep track of p's parent with q. We keep the invariant at the beginning of the loop here that q->tavl_link[dir] == p. As the final step, we record the item deleted and update the tree's item count. 312. = if (tree->tavl_root == NULL) return NULL; p = (struct tavl_node *) &tree->tavl_root; for (cmp = -1; cmp != 0; cmp = tree->tavl_compare (item, p->tavl_data, tree->tavl_param)) { dir = cmp > 0; q = p; if (p->tavl_tag[dir] == TAVL_THREAD) return NULL; p = p->tavl_link[dir]; } item = p->tavl_data; This code is included in 311 and 670. 8.5.2 Step 2: Delete -------------------- The cases for deletion are the same as for a TBST (*note Deleting from a TBST::). The difference is that we have to copy around balance factors and keep track of where balancing needs to start. After the deletion, q is the node at which balance factors must be updated and possible rebalancing occurs and dir is the side of q from which the node was deleted. For cases 1 and 2, q need not change from its current value as the parent of the deleted node. For cases 3 and 4, q will need to be changed. 313. = if (p->tavl_tag[1] == TAVL_THREAD) { if (p->tavl_tag[0] == TAVL_CHILD) { } else { } } else { struct tavl_node *r = p->tavl_link[1]; if (r->tavl_tag[0] == TAVL_THREAD) { } else { } } tree->tavl_alloc->libavl_free (tree->tavl_alloc, p); This code is included in 311. Case 1: p has a right thread and a left child ............................................. If p has a right thread and a left child, then we replace it by its left child. Rebalancing must begin right above p, which is already set as q. There's no need to change the TBST code: 314. = tavl 260> This code is included in 313. Case 2: p has a right thread and a left thread .............................................. If p is a leaf, then we change q's pointer to p into a thread. Again, rebalancing must begin at the node that's already set up as q and there's no need to change the TBST code: 315. = tavl 261> This code is included in 313. Case 3: p's right child has a left thread ......................................... If p has a right child r, which in turn has no left child, then we move r in place of p. In this case r, having replaced p, acquires p's former balance factor and rebalancing must start from there. The deletion in this case is always on the right side of the node. 316. = tavl 262> r->tavl_balance = p->tavl_balance; q = r; dir = 1; This code is included in 313. Case 4: p's right child has a left child ........................................ The most general case comes up when p's right child has a left child, where we replace p by its successor s. In that case s acquires p's former balance factor and rebalancing begins from s's parent r. Node s is always the left child of r. 317. = tavl 263> s->tavl_balance = p->tavl_balance; q = r; dir = 0; This code is included in 313. Exercises: 1. Rewrite to replace the deleted node's tavl_data by its successor, then delete the successor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 8.5.3 Step 3: Update Balance Factors ------------------------------------ Rebalancing begins from node q, from whose side dir a node was deleted. Node q at the beginning of the iteration becomes node y, the root of the balance factor update and rebalancing, and dir at the beginning of the iteration is used to separate the left-side and right-side deletion cases. The loop also updates the values of q and dir for rebalancing and for use in the next iteration of the loop, if any. These new values can only be assigned after the old ones are no longer needed, but must be assigned before any rebalancing so that the parent link to y can be changed. For q this is after y receives q's old value and before rebalancing. For dir, it is after the branch point that separates the left-side and right-side deletion cases, so the dir assignment is duplicated in each branch. The code used to update q is discussed later. 318. = while (q != (struct tavl_node *) &tree->tavl_root) { struct tavl_node *y = q; q = find_parent (tree, y); if (dir == 0) { dir = q->tavl_link[0] != y; y->tavl_balance++; if (y->tavl_balance == +1) break; else if (y->tavl_balance == +2) { } } else { } } tree->tavl_count--; return (void *) item; This code is included in 311. 8.5.4 Step 4: Rebalance ----------------------- Rebalancing after deletion in a TAVL tree divides into three cases. The first of these is analogous to case 1 in unthreaded AVL deletion, the other two to case 2 (*note Inserting into a TBST::). The cases are distinguished, as usual, based on the balance factor of right child x of the node y at which rebalancing occurs: 319. = struct tavl_node *x = y->tavl_link[1]; assert (x != NULL); if (x->tavl_balance == -1) { } else { q->tavl_link[dir] = x; if (x->tavl_balance == 0) { break; } else /* x->tavl_balance == +1 */ { } } This code is included in 318. Case 1: x has - balance factor .............................. This case is just like case 2 in TAVL insertion. In fact, we can even reuse the code: 320. = struct tavl_node *w; q->tavl_link[dir] = w; This code is included in 319. Case 2: x has 0 balance factor .............................. If x has a 0 balance factor, then we perform a left rotation at y. The transformation looks like this, with subtree heights listed under their labels: | | s r <++> <-> _' `_ _.-' \ a r => s c h-1 <0> <+> h / \ _' \ b c a b h h h-1 h Subtree b is taller than subtree a, so even if h takes its minimum value of 1, then subtree b has height h == 1 and, therefore, it must contain at least one node and there is no need to do any checking for threads. The code is simple: 321. = y->tavl_link[1] = x->tavl_link[0]; x->tavl_link[0] = y; x->tavl_balance = -1; y->tavl_balance = +1; This code is included in 319 and 443. Case 3: x has + balance factor .............................. If x has a + balance factor, we perform a left rotation at y, same as for case 2, and the transformation looks like this: | | s s <++> <0> _' `._ __..-' \ a r => r c h-1 <+> <0> h _' \ _' \ b c a b h-1 h h-1 h-1 One difference from case 2 is in the resulting balance factors. The other is that if h == 1, then subtrees a and b have height h - 1 == 0, so a and b may actually be threads. In that case, the transformation must be done this way: | s | <++> r / `._ <0> [] r __..-' `._ <+> => s c _' `._ <0> <0> [s] c / \ _' \ <0> [] [r] [r] [] _' \ [r] [] This code handles both possibilities: 322. = if (x->tavl_tag[0] == TAVL_CHILD) y->tavl_link[1] = x->tavl_link[0]; else { y->tavl_tag[1] = TAVL_THREAD; x->tavl_tag[0] = TAVL_CHILD; } x->tavl_link[0] = y; y->tavl_balance = x->tavl_balance = 0; This code is included in 319. 8.5.5 Symmetric Case -------------------- Here's the code for the symmetric case. 323. = dir = q->tavl_link[0] != y; y->tavl_balance--; if (y->tavl_balance == -1) break; else if (y->tavl_balance == -2) { struct tavl_node *x = y->tavl_link[0]; assert (x != NULL); if (x->tavl_balance == +1) { } else { q->tavl_link[dir] = x; if (x->tavl_balance == 0) { break; } else /* x->tavl_balance == -1 */ { } } } This code is included in 318. 324. = struct tavl_node *w; q->tavl_link[dir] = w; This code is included in 323. 325. = y->tavl_link[0] = x->tavl_link[1]; x->tavl_link[1] = y; x->tavl_balance = +1; y->tavl_balance = -1; This code is included in 323 and 444. 326. = if (x->tavl_tag[1] == TAVL_CHILD) y->tavl_link[0] = x->tavl_link[1]; else { y->tavl_tag[0] = TAVL_THREAD; x->tavl_tag[1] = TAVL_CHILD; } x->tavl_link[1] = y; y->tavl_balance = x->tavl_balance = 0; This code is included in 323. 8.5.6 Finding the Parent of a Node ---------------------------------- The last component of tavl_delete() left undiscussed is the implementation of its helper function find_parent(), which requires an algorithm for finding the parent of an arbitrary node in a TAVL tree. If there were no efficient algorithm for this purpose, we would have to keep a stack of parent nodes as we did for unthreaded AVL trees. (This is still an option, as shown in Exercise 3.) We are fortunate that such an algorithm does exist. Let's discover it. Because child pointers always lead downward in a BST, the only way that we're going to get from one node to another one above it is by following a thread. Almost directly from our definition of threads, we know that if a node q has a right child p, then there is a left thread in the subtree rooted at p that points back to q. Because a left thread points from a node to its predecessor, this left thread to q must come from q's successor, which we'll call s. The situation looks like this: q / `--...___ a p _' \ ... b _' s _' \ [q] c This leads immediately to an algorithm to find q given p, if p is q's right child. We simply follow left links starting at p until we we reach a thread, then we follow that thread. On the other hand, it doesn't help if p is q's left child, but there's an analogous situation with q's predecessor in that case. Will this algorithm work for any node in a TBST? It won't work for the root node, because no thread points above the root (see Exercise 2). It will work for any other node, because any node other than the root has its successor or predecessor as its parent. Here is the actual code, which finds and returns the parent of node. It traverses both the left and right subtrees of node at once, using x to move down to the left and y to move down to the right. When it hits a thread on one side, it checks whether it leads to node's parent. If it does, then we're done. If it doesn't, then we continue traversing along the other side, which is guaranteed to lead to node's parent. 327. = /* Returns the parent of node within tree, or a pointer to tbst_root if s is the root of the tree. */ static struct tbst_node * find_parent (struct tbst_table *tree, struct tbst_node *node) { if (node != tree->tbst_root) { struct tbst_node *x, *y; for (x = y = node; ; x = x->tbst_link[0], y = y->tbst_link[1]) if (y->tbst_tag[1] == TBST_THREAD) { struct tbst_node *p = y->tbst_link[1]; if (p == NULL || p->tbst_link[0] != node) { while (x->tbst_tag[0] == TBST_CHILD) x = x->tbst_link[0]; p = x->tbst_link[0]; } return p; } else if (x->tbst_tag[0] == TBST_THREAD) { struct tbst_node *p = x->tbst_link[0]; if (p == NULL || p->tbst_link[1] != node) { while (y->tbst_tag[1] == TBST_CHILD) y = y->tbst_link[1]; p = y->tbst_link[1]; } return p; } } else return (struct tbst_node *) &tree->tbst_root; } This code is included in 311, 668, and 670. See also: [Knuth 1997], exercise 2.3.1-19. Exercises: *1. Show that finding the parent of a given node using this algorithm, averaged over all the node within a TBST, requires only a constant number of links to be followed. 2. The structure of threads in our TBSTs force finding the parent of the root node to be special-cased. Suggest a modification to the tree structure to avoid this. 3. It can take several steps to find the parent of an arbitrary node in a TBST, even though the operation is "efficient" in the sense of Exercise 7.7-4. On the other hand, finding the parent of a node is very fast with a stack, but it costs time to construct the stack. Rewrite tavl_delete() to use a stack instead of the parent node algorithm. 8.6 Copying =========== We can use the tree copy function for TBSTs almost verbatim here. The one necessary change is that copy_node() must copy node balance factors. Here's the new version: 328. = static int copy_node (struct tavl_table *tree, struct tavl_node *dst, int dir, const struct tavl_node *src, tavl_copy_func *copy) { struct tavl_node *new = tree->tavl_alloc->libavl_malloc (tree->tavl_alloc, sizeof *new); if (new == NULL) return 0; new->tavl_link[dir] = dst->tavl_link[dir]; new->tavl_tag[dir] = TAVL_THREAD; new->tavl_link[!dir] = dst; new->tavl_tag[!dir] = TAVL_THREAD; dst->tavl_link[dir] = new; dst->tavl_tag[dir] = TAVL_CHILD; new->tavl_balance = src->tavl_balance; if (copy == NULL) new->tavl_data = src->tavl_data; else { new->tavl_data = copy (src->tavl_data, tree->tavl_param); if (new->tavl_data == NULL) return 0; } return 1; } This code is included in 329. 329. = tavl 280> tavl 279> This code is included in 300 and 336. 8.7 Testing =========== The testing code harbors no surprises. 330. = #include #include #include #include "tavl.h" #include "test.h" tavl 291> tavl 104> tavl 190> tavl 100> tavl 122> 331. = static int compare_trees (struct tavl_node *a, struct tavl_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->tavl_data : -1, b ? *(int *) b->tavl_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->tavl_data != *(int *) b->tavl_data || a->tavl_tag[0] != b->tavl_tag[0] || a->tavl_tag[1] != b->tavl_tag[1] || a->tavl_balance != b->tavl_balance) { printf (" Copied nodes differ: a=%d (bal=%d) b=%d (bal=%d) a:", *(int *) a->tavl_data, a->tavl_balance, *(int *) b->tavl_data, b->tavl_balance); if (a->tavl_tag[0] == TAVL_CHILD) printf ("l"); if (a->tavl_tag[1] == TAVL_CHILD) printf ("r"); printf (" b:"); if (b->tavl_tag[0] == TAVL_CHILD) printf ("l"); if (b->tavl_tag[1] == TAVL_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->tavl_tag[0] == TAVL_THREAD) assert ((a->tavl_link[0] == NULL) != (a->tavl_link[0] != b->tavl_link[0])); if (a->tavl_tag[1] == TAVL_THREAD) assert ((a->tavl_link[1] == NULL) != (a->tavl_link[1] != b->tavl_link[1])); okay = 1; if (a->tavl_tag[0] == TAVL_CHILD) okay &= compare_trees (a->tavl_link[0], b->tavl_link[0]); if (a->tavl_tag[1] == TAVL_CHILD) okay &= compare_trees (a->tavl_link[1], b->tavl_link[1]); return okay; } This code is included in 330. 332. = static void recurse_verify_tree (struct tavl_node *node, int *okay, size_t *count, int min, int max, int *height) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subheight[2]; /* Heights of subtrees. */ if (node == NULL) { *count = 0; *height = 0; return; } d = *(int *) node->tavl_data; subcount[0] = subcount[1] = 0; subheight[0] = subheight[1] = 0; if (node->tavl_tag[0] == TAVL_CHILD) recurse_verify_tree (node->tavl_link[0], okay, &subcount[0], min, d - 1, &subheight[0]); if (node->tavl_tag[1] == TAVL_CHILD) recurse_verify_tree (node->tavl_link[1], okay, &subcount[1], d + 1, max, &subheight[1]); *count = 1 + subcount[0] + subcount[1]; *height = 1 + (subheight[0] > subheight[1] ? subheight[0] : subheight[1]); tavl 189> } This code is included in 330. 9 Threaded Red-Black Trees ************************** In the last two chapters, we introduced the idea of a threaded binary search tree, then applied that idea to AVL trees to produce threaded AVL trees. In this chapter, we will apply the idea of threading to red-black trees, resulting in threaded red-black or "TRB" trees. Here's an outline of the table implementation for threaded RB trees, which use a trb_ prefix. 333. = #ifndef TRB_H #define TRB_H 1 #include
trb 14> trb 195> trb 250> trb 267>
trb 15> #endif /* trb.h */ 334. = #include #include #include #include "trb.h" 9.1 Data Types ============== To make a RB tree node structure into a threaded RB tree node structure, we just add a pair of tag fields. We also reintroduce a maximum height definition here. It is not used by traversers, only by by the default versions of trb_probe() and trb_delete(), for maximum efficiency. 335. = /* Color of a red-black node. */ enum trb_color { TRB_BLACK, /* Black. */ TRB_RED /* Red. */ }; /* Characterizes a link as a child pointer or a thread. */ enum trb_tag { TRB_CHILD, /* Child pointer. */ TRB_THREAD /* Thread. */ }; /* An TRB tree node. */ struct trb_node { struct trb_node *trb_link[2]; /* Subtrees. */ void *trb_data; /* Pointer to data. */ unsigned char trb_color; /* Color. */ unsigned char trb_tag[2]; /* Tag fields. */ }; This code is included in 333. 9.2 Operations ============== Now we'll implement all the usual operations for TRB trees. Here's the outline. We can reuse everything from TBSTs except insertion, deletion, and copy functions. The copy function is implemented by reusing the version for TAVL trees, but copying colors instead of balance factors. 336. = trb 252> trb 253>
trb 592> trb 268> trb; tavl_balance => trb_color 329> trb 281> trb 6>
trb 594> This code is included in 334. 9.3 Insertion ============= The structure of the insertion routine is predictable: 337. = void ** trb_probe (struct trb_table *tree, void *item) { struct trb_node *pa[TRB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[TRB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct trb_node *p; /* Traverses tree looking for insertion point. */ struct trb_node *n; /* Newly inserted node. */ int dir; /* Side of p on which n is inserted. */ assert (tree != NULL && item != NULL); return &n->trb_data; } This code is included in 336. 9.3.1 Steps 1 and 2: Search and Insert -------------------------------------- As usual, we search the tree from the root and record parents as we go. 338. = da[0] = 0; pa[0] = (struct trb_node *) &tree->trb_root; k = 1; if (tree->trb_root != NULL) { for (p = tree->trb_root; ; p = p->trb_link[dir]) { int cmp = tree->trb_compare (item, p->trb_data, tree->trb_param); if (cmp == 0) return &p->trb_data; pa[k] = p; da[k++] = dir = cmp > 0; if (p->trb_tag[dir] == TRB_THREAD) break; } } else { p = (struct trb_node *) &tree->trb_root; dir = 0; } This code is included in 337. The code for insertion is included within the loop for easy access to the dir variable. 339. = trb 256> n->trb_color = TRB_RED; This code is included in 337 and 668. 9.3.2 Step 3: Rebalance ----------------------- The basic rebalancing loop is unchanged from . 340. = while (k >= 3 && pa[k - 1]->trb_color == TRB_RED) { if (da[k - 2] == 0) { } else { } } tree->trb_root->trb_color = TRB_BLACK; This code is included in 337. The cases for rebalancing are the same as in , too. We do need to check for threads, instead of null pointers. 341. = struct trb_node *y = pa[k - 2]->trb_link[1]; if (pa[k - 2]->trb_tag[1] == TRB_CHILD && y->trb_color == TRB_RED) { } else { struct trb_node *x; if (da[k - 1] == 0) y = pa[k - 1]; else { } break; } This code is included in 340. The rest of this section deals with the individual rebalancing cases, the same as in unthreaded RB insertion (*note Inserting an RB Node Step 3 - Rebalance::). Each iteration deals with a node whose color has just been changed to red, which is the newly inserted node n in the first trip through the loop. In the discussion, we'll call this node q. Case 1: q's uncle is red ........................ If node q has an red "uncle", then only recoloring is required. Because no links are changed, no threads need to be updated, and we can reuse the code for RB insertion without change: 342. = trb 203> This code is included in 341. Case 2: q is the left child of its parent ......................................... If q is the left child of its parent, we rotate right at q's grandparent, and recolor a few nodes. Here's the transformation: | pa[k-2],x | y ___...---' \ pa[k-1],y d _.-' `_ => q x _.-' \ q c / \ / \ a b c d / \ a b This transformation can only cause thread problems with subtree c, since the other subtrees stay firmly in place. If c is a thread, then we need to make adjustments after the transformation to account for the difference between threaded and unthreaded rotation, so that the final operation looks like this: | pa[k-2],x | y ____....---' \ pa[k-1],y d _.-' `._ => q x _.-' \ q [x] / \ _' \ a b [y] d / \ a b 343. = trb 204> if (y->trb_tag[1] == TRB_THREAD) { y->trb_tag[1] = TRB_CHILD; x->trb_tag[0] = TRB_THREAD; x->trb_link[0] = y; } This code is included in 341. Case 3: q is the right child of its parent .......................................... The modification to case 3 is the same as the modification to case 2, but it applies to a left rotation instead of a right rotation. The adjusted case looks like this: | | pa[k-2] _.-' \ _____.....-----' \ y d pa[k-1],x d => __..-' \ / `._ x c a w,y / \ _' \ a [y] [x] c 344. = trb 205> if (y->trb_tag[0] == TRB_THREAD) { y->trb_tag[0] = TRB_CHILD; x->trb_tag[1] = TRB_THREAD; x->trb_link[1] = y; } This code is included in 341. 9.3.3 Symmetric Case -------------------- 345. = struct trb_node *y = pa[k - 2]->trb_link[0]; if (pa[k - 2]->trb_tag[0] == TRB_CHILD && y->trb_color == TRB_RED) { } else { struct trb_node *x; if (da[k - 1] == 1) y = pa[k - 1]; else { } break; } This code is included in 340. 346. = trb 207> This code is included in 345. 347. = trb 208> if (y->trb_tag[0] == TRB_THREAD) { y->trb_tag[0] = TRB_CHILD; x->trb_tag[1] = TRB_THREAD; x->trb_link[1] = y; } This code is included in 345. 348. = trb 209> if (y->trb_tag[1] == TRB_THREAD) { y->trb_tag[1] = TRB_CHILD; x->trb_tag[0] = TRB_THREAD; x->trb_link[0] = y; } This code is included in 345. Exercises: 1. It could be argued that the algorithm here is "impure" because it uses a stack, when elimination of the need for a stack is one of the reasons originally given for using threaded trees. Write a version of trb_probe() that avoids the use of a stack. You can use find_parent() from as a substitute. 9.4 Deletion ============ The outline for the deletion function follows the usual pattern. 349. = void * trb_delete (struct trb_table *tree, const void *item) { struct trb_node *pa[TRB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[TRB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k = 0; /* Stack height. */ struct trb_node *p; int cmp, dir; assert (tree != NULL && item != NULL); } This code is included in 336. 9.4.1 Step 1: Search -------------------- There's nothing new or interesting in the search code. 350. = if (tree->trb_root == NULL) return NULL; p = (struct trb_node *) &tree->trb_root; for (cmp = -1; cmp != 0; cmp = tree->trb_compare (item, p->trb_data, tree->trb_param)) { dir = cmp > 0; pa[k] = p; da[k++] = dir; if (p->trb_tag[dir] == TRB_THREAD) return NULL; p = p->trb_link[dir]; } item = p->trb_data; This code is included in 349 and 659. 9.4.2 Step 2: Delete -------------------- The code for node deletion is a combination of RB deletion (*note Deleting an RB Node Step 2 - Delete::) and TBST deletion (*note Deleting from a TBST::). The node to delete is p, and after deletion the stack contains all the nodes down to where rebalancing begins. The cases are the same as for TBST deletion: 351. = if (p->trb_tag[1] == TRB_THREAD) { if (p->trb_tag[0] == TRB_CHILD) { } else { } } else { enum trb_color t; struct trb_node *r = p->trb_link[1]; if (r->trb_tag[0] == TRB_THREAD) { } else { } } This code is included in 349. Case 1: p has a right thread and a left child ............................................. If the node to delete p has a right thread and a left child, then we replace it by its left child. We also have to chase down the right thread that pointed to p. The code is almost the same as , but we use the stack here instead of a single parent pointer. 352. = struct trb_node *t = p->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = p->trb_link[1]; pa[k - 1]->trb_link[da[k - 1]] = p->trb_link[0]; This code is included in 351. Case 2: p has a right thread and a left thread .............................................. Deleting a leaf node is the same process as for a TBST. The changes from are again due to the use of a stack. 353. = pa[k - 1]->trb_link[da[k - 1]] = p->trb_link[da[k - 1]]; if (pa[k - 1] != (struct trb_node *) &tree->trb_root) pa[k - 1]->trb_tag[da[k - 1]] = TRB_THREAD; This code is included in 351. Case 3: p's right child has a left thread ......................................... The code for case 3 merges with . First, the node is deleted in the same way used for a TBST. Then the colors of p and r are swapped, and r is added to the stack, in the same way as for RB deletion. 354. = r->trb_link[0] = p->trb_link[0]; r->trb_tag[0] = p->trb_tag[0]; if (r->trb_tag[0] == TRB_CHILD) { struct trb_node *t = r->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = r; } pa[k - 1]->trb_link[da[k - 1]] = r; t = r->trb_color; r->trb_color = p->trb_color; p->trb_color = t; da[k] = 1; pa[k++] = r; This code is included in 351. Case 4: p's right child has a left child ........................................ Case 4 is a mix of and . It follows the outline of TBST deletion, but updates the stack. After the deletion it also swaps the colors of p and s as in RB deletion. 355. = struct trb_node *s; int j = k++; for (;;) { da[k] = 0; pa[k++] = r; s = r->trb_link[0]; if (s->trb_tag[0] == TRB_THREAD) break; r = s; } da[j] = 1; pa[j] = s; if (s->trb_tag[1] == TRB_CHILD) r->trb_link[0] = s->trb_link[1]; else { r->trb_link[0] = s; r->trb_tag[0] = TRB_THREAD; } s->trb_link[0] = p->trb_link[0]; if (p->trb_tag[0] == TRB_CHILD) { struct trb_node *t = p->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = s; s->trb_tag[0] = TRB_CHILD; } s->trb_link[1] = p->trb_link[1]; s->trb_tag[1] = TRB_CHILD; t = s->trb_color; s->trb_color = p->trb_color; p->trb_color = t; pa[j - 1]->trb_link[da[j - 1]] = s; This code is included in 351. Exercises: 1. Rewrite to replace the deleted node's tavl_data by its successor, then delete the successor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 9.4.3 Step 3: Rebalance ----------------------- The outline for rebalancing after threaded RB deletion is the same as for the unthreaded case (*note Deleting an RB Node Step 3 - Rebalance::): 356. = if (p->trb_color == TRB_BLACK) { for (; k > 1; k--) { if (pa[k - 1]->trb_tag[da[k - 1]] == TRB_CHILD) { struct trb_node *x = pa[k - 1]->trb_link[da[k - 1]]; if (x->trb_color == TRB_RED) { x->trb_color = TRB_BLACK; break; } } if (da[k - 1] == 0) { } else { } } if (tree->trb_root != NULL) tree->trb_root->trb_color = TRB_BLACK; } This code is included in 349. The rebalancing cases are the same, too. We need to check for thread tags, not for null pointers, though, in some places: 357. = struct trb_node *w = pa[k - 1]->trb_link[1]; if (w->trb_color == TRB_RED) { } if ((w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) && (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK)) { } else { if (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK) { } break; } This code is included in 356. Case Reduction: Ensure w is black ................................. This transformation does not move around any subtrees that might be threads, so there is no need for it to change. 358. = trb 228> This code is included in 357. Case 1: w has no red children ............................. This transformation just recolors nodes, so it also does not need any changes. 359. = trb 229> This code is included in 357. Case 2: w's right child is red .............................. If w has a red right child and a left thread, then it is necessary to adjust tags and links after the left rotation at w and recoloring, as shown in this diagram: | | pa[x-1],B C _.-' `._ __..-' `_ x,A w,C B D => / \ _' `_ _.-' \ / \ a b [B] D A [C] d e / \ / \ d e a b 360. = trb 230> if (w->trb_tag[0] == TRB_THREAD) { w->trb_tag[0] = TRB_CHILD; pa[k - 1]->trb_tag[1] = TRB_THREAD; pa[k - 1]->trb_link[1] = w; } This code is included in 357. Case 3: w's left child is red ............................. If w has a red left child, which has a right thread, then we again need to adjust tags and links after right rotation at w and recoloring, as shown here: | | pa[k-1],B pa[k-1],B _.-' `--...___ _.-' `_ x,A w,D x,A w,C => / \ __..-' \ / \ / `._ a b C e a b c D / \ _' \ c [D] [C] e 361. = trb 231> if (w->trb_tag[1] == TRB_THREAD) { w->trb_tag[1] = TRB_CHILD; w->trb_link[1]->trb_tag[0] = TRB_THREAD; w->trb_link[1]->trb_link[0] = w; } This code is included in 357. 9.4.4 Step 4: Finish Up ----------------------- All that's left to do is free the node, update the count, and return the deleted item: 362. = tree->trb_alloc->libavl_free (tree->trb_alloc, p); tree->trb_count--; return (void *) item; This code is included in 349. 9.4.5 Symmetric Case -------------------- 363. = struct trb_node *w = pa[k - 1]->trb_link[0]; if (w->trb_color == TRB_RED) { } if ((w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) && (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK)) { } else { if (w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) { } break; } This code is included in 356. 364. = trb 234> This code is included in 363. 365. = trb 235> This code is included in 363. 366. = trb 237> if (w->trb_tag[1] == TRB_THREAD) { w->trb_tag[1] = TRB_CHILD; pa[k - 1]->trb_tag[0] = TRB_THREAD; pa[k - 1]->trb_link[0] = w; } This code is included in 363. 367. = trb 236> if (w->trb_tag[0] == TRB_THREAD) { w->trb_tag[0] = TRB_CHILD; w->trb_link[0]->trb_tag[1] = TRB_THREAD; w->trb_link[0]->trb_link[1] = w; } This code is included in 363. Exercises: 1. Write another version of trb_delete() that does not use a stack. You can use to find the parent of a node. 9.5 Testing =========== The testing code harbors no surprises. 368. = #include #include #include #include "trb.h" #include "test.h" trb 291> trb 104> trb 244> trb 100> trb 122> 369. = static int compare_trees (struct trb_node *a, struct trb_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->trb_data : -1, b ? *(int *) b->trb_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->trb_data != *(int *) b->trb_data || a->trb_tag[0] != b->trb_tag[0] || a->trb_tag[1] != b->trb_tag[1] || a->trb_color != b->trb_color) { printf (" Copied nodes differ: a=%d%c b=%d%c a:", *(int *) a->trb_data, a->trb_color == TRB_RED ? 'r' : 'b', *(int *) b->trb_data, b->trb_color == TRB_RED ? 'r' : 'b'); if (a->trb_tag[0] == TRB_CHILD) printf ("l"); if (a->trb_tag[1] == TRB_CHILD) printf ("r"); printf (" b:"); if (b->trb_tag[0] == TRB_CHILD) printf ("l"); if (b->trb_tag[1] == TRB_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->trb_tag[0] == TRB_THREAD) assert ((a->trb_link[0] == NULL) != (a->trb_link[0] != b->trb_link[0])); if (a->trb_tag[1] == TRB_THREAD) assert ((a->trb_link[1] == NULL) != (a->trb_link[1] != b->trb_link[1])); okay = 1; if (a->trb_tag[0] == TRB_CHILD) okay &= compare_trees (a->trb_link[0], b->trb_link[0]); if (a->trb_tag[1] == TRB_CHILD) okay &= compare_trees (a->trb_link[1], b->trb_link[1]); return okay; } This code is included in 368. 370. = static void recurse_verify_tree (struct trb_node *node, int *okay, size_t *count, int min, int max, int *bh) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subbh[2]; /* Black-heights of subtrees. */ if (node == NULL) { *count = 0; *bh = 0; return; } d = *(int *) node->trb_data; subcount[0] = subcount[1] = 0; subbh[0] = subbh[1] = 0; if (node->trb_tag[0] == TRB_CHILD) recurse_verify_tree (node->trb_link[0], okay, &subcount[0], min, d - 1, &subbh[0]); if (node->trb_tag[1] == TRB_CHILD) recurse_verify_tree (node->trb_link[1], okay, &subcount[1], d + 1, max, &subbh[1]); *count = 1 + subcount[0] + subcount[1]; *bh = (node->trb_color == TRB_BLACK) + subbh[0]; trb 241> trb 243> } This code is included in 368. 371. = /* Verify compliance with rule 1. */ if (node->trb_color == TRB_RED) { if (node->trb_tag[0] == TRB_CHILD && node->trb_link[0]->trb_color == TRB_RED) { printf (" Red node %d has red left child %d\n", d, *(int *) node->trb_link[0]->trb_data); *okay = 0; } if (node->trb_tag[1] == TRB_CHILD && node->trb_link[1]->trb_color == TRB_RED) { printf (" Red node %d has red right child %d\n", d, *(int *) node->trb_link[1]->trb_data); *okay = 0; } } This code is included in 370. 10 Right-Threaded Binary Search Trees ************************************* We originally introduced threaded trees to allow for traversal without maintaining a stack explicitly. This worked out well, so we implemented tables using threaded BSTs and AVL and RB trees. However, maintaining the threads can take some time. It would be nice if we could have the advantages of threads without so much of the overhead. In one common special case, we can. Threaded trees are symmetric: there are left threads for moving to node predecessors and right threads for move to node successors. But traversals are not symmetric: many algorithms that traverse table entries only from least to greatest, never backing up. This suggests a matching asymmetric tree structure that has only right threads. We can do this. In this chapter, we will develop a table implementation for a new kind of binary tree, called a right-threaded binary search tree, "right-threaded tree", or simply "RTBST", that has threads only on the right side of nodes. Construction and modification of such trees can be faster and simpler than threaded trees because there is no need to maintain the left threads. There isn't anything fundamentally new here, but just for completeness, here's an example of a right-threaded tree: 3 _.-' `--..__ 2 6 _.-' \ __..-' `-.__ 1 [3] 4 8 \ \ _.-' \ [2] 5 7 9 \ \ \ [6] [8] [] Keep in mind that although it is not efficient, it is still possible to traverse a right-threaded tree in order from greatest to least.(1) If it were not possible at all, then we could not build a complete table implementation based on right-threaded trees, because the definition of a table includes the ability to traverse it in either direction (*note Manipulators::). Here's the outline of the RTBST code, which uses the prefix rtbst_: 372. = #ifndef RTBST_H #define RTBST_H 1 #include
rtbst 14> rtbst 250> rtbst 267>
rtbst 15> rtbst 88> #endif /* rtbst.h */ 373. = #include #include #include #include "rtbst.h" See also: [Knuth 1997], section 2.3.1. Exercises: 1. We can define a "left-threaded tree" in a way analogous to a right-threaded tree, as a binary search tree with threads only on the left sides of nodes. Is this a useful thing to do? ---------- Footnotes ---------- (1) It can be efficient if we use a stack to do it, but that kills the advantage of threading the tree. It would be possible to implement two sets of traversers for right-threaded trees, one with a stack, one without, but in that case it's probably better to just use a threaded tree. 10.1 Data Types =============== 374. = /* Characterizes a link as a child pointer or a thread. */ enum rtbst_tag { RTBST_CHILD, /* Child pointer. */ RTBST_THREAD /* Thread. */ }; /* A threaded binary search tree node. */ struct rtbst_node { struct rtbst_node *rtbst_link[2]; /* Subtrees. */ void *rtbst_data; /* Pointer to data. */ unsigned char rtbst_rtag; /* Tag field. */ }; This code is included in 372. 10.2 Operations =============== 375. = rtbst 252>
rtbst 592> rtbst 6>
rtbst 594> This code is included in 373. 10.3 Search =========== A right-threaded tree is inherently asymmetric, so many of the algorithms on it will necessarily be asymmetric as well. The search function is the simplest demonstration of this. For descent to the left, we test for a null left child with rtbst_link[0]; for descent to the right, we test for a right thread with rtbst_rtag. Otherwise, the code is familiar: 376. = void * rtbst_find (const struct rtbst_table *tree, const void *item) { const struct rtbst_node *p; int dir; assert (tree != NULL && item != NULL); if (tree->rtbst_root == NULL) return NULL; for (p = tree->rtbst_root; ; p = p->rtbst_link[dir]) { int cmp = tree->rtbst_compare (item, p->rtbst_data, tree->rtbst_param); if (cmp == 0) return p->rtbst_data; dir = cmp > 0; if (dir == 0) { if (p->rtbst_link[0] == NULL) return NULL; } else /* dir == 1 */ { if (p->rtbst_rtag == RTBST_THREAD) return NULL; } } } This code is included in 375, 418, and 455. 10.4 Insertion ============== Regardless of the kind of binary tree we're dealing with, adding a new node requires setting three pointer fields: the parent pointer and the two child pointers of the new node. On the other hand, we do save a tiny bit on tags: we set either 1 or 2 tags here as opposed to a constant of 3 in . Here is the outline: 377. = void ** rtbst_probe (struct rtbst_table *tree, void *item) { struct rtbst_node *p; /* Current node in search. */ int dir; /* Side of p on which to insert the new node. */ struct rtbst_node *n; /* New node. */ } This code is included in 375. The code to search for the insertion point is not unusual: 378. = if (tree->rtbst_root != NULL) for (p = tree->rtbst_root; ; p = p->rtbst_link[dir]) { int cmp = tree->rtbst_compare (item, p->rtbst_data, tree->rtbst_param); if (cmp == 0) return &p->rtbst_data; dir = cmp > 0; if (dir == 0) { if (p->rtbst_link[0] == NULL) break; } else /* dir == 1 */ { if (p->rtbst_rtag == RTBST_THREAD) break; } } else { p = (struct rtbst_node *) &tree->rtbst_root; dir = 0; } This code is included in 377. Now for the insertion code. An insertion to the left of a node p in a right-threaded tree replaces the left link by the new node n. The new node in turn has a null left child and a right thread pointing back to p: | | p p _.-' \ \ => n a a \ [p] An insertion to the right of p replaces the right thread by the new child node n. The new node has a null left child and a right thread that points where p's right thread formerly pointed: | | s s ____...---' ___...--' ... ... `_ `_ => p p / \ / \ a n a [s] \ [s] We can handle both of these cases in one code segment. The difference is in the treatment of n's right child and p's right tag. Insertion into an empty tree is handled as a special case as well: 379. = n = tree->rtbst_alloc->libavl_malloc (tree->rtbst_alloc, sizeof *n); if (n == NULL) return NULL; tree->rtbst_count++; n->rtbst_data = item; n->rtbst_link[0] = NULL; if (dir == 0) { if (tree->rtbst_root != NULL) n->rtbst_link[1] = p; else n->rtbst_link[1] = NULL; } else /* dir == 1 */ { p->rtbst_rtag = RTBST_CHILD; n->rtbst_link[1] = p->rtbst_link[1]; } n->rtbst_rtag = RTBST_THREAD; p->rtbst_link[dir] = n; return &n->rtbst_data; This code is included in 377. 10.5 Deletion ============= Deleting a node from an RTBST can be done using the same ideas as for other kinds of trees we've seen. However, as it turns out, a variant of this usual technique allows for faster code. In this section, we will implement the usual method, then the improved version. The latter is actually used in libavl. Here is the outline of the function. Step 2 is the only part that varies between versions: 380. = void * rtbst_delete (struct rtbst_table *tree, const void *item) { struct rtbst_node *p; /* Node to delete. */ struct rtbst_node *q; /* Parent of p. */ int dir; /* Index into q->rtbst_link[] that leads to p. */ assert (tree != NULL && item != NULL); } This code is included in 375. The first step just finds the node to delete. After it executes, p is the node to delete and q and dir are set such that q->rtbst_link[dir] == p. 381. = if (tree->rtbst_root == NULL) return NULL; p = tree->rtbst_root; q = (struct rtbst_node *) &tree->rtbst_root; dir = 0; if (p == NULL) return NULL; for (;;) { int cmp = tree->rtbst_compare (item, p->rtbst_data, tree->rtbst_param); if (cmp == 0) break; dir = cmp > 0; if (dir == 0) { if (p->rtbst_link[0] == NULL) return NULL; } else /* dir == 1 */ { if (p->rtbst_rtag == RTBST_THREAD) return NULL; } q = p; p = p->rtbst_link[dir]; } item = p->rtbst_data; This code is included in 380. The final step is also common. We just clean up and return: 382. = tree->rtbst_alloc->libavl_free (tree->rtbst_alloc, p); tree->rtbst_count--; return (void *) item; This code is included in 380. 10.5.1 Right-Looking Deletion ----------------------------- Our usual algorithm for deletion looks at the right subtree of the node to be deleted, so we call it "right-looking." The outline for this kind of deletion is the same as in TBST deletion (*note Deleting from a TBST::): 383. = if (p->rtbst_rtag == RTBST_THREAD) { if (p->rtbst_link[0] != NULL) { } else { } } else { struct rtbst_node *r = p->rtbst_link[1]; if (r->rtbst_link[0] == NULL) { } else { } } Each of the four cases, presented below, is closely analogous to the same case in TBST deletion. Case 1: p has a right thread and a left child ............................................. In this case, node p has a right thread and a left child. As in a TBST, this means that after deleting p we must update the right thread in p's former left subtree to point to p's replacement. The only difference from is in structure members: 384. = struct rtbst_node *t = p->rtbst_link[0]; while (t->rtbst_rtag == RTBST_CHILD) t = t->rtbst_link[1]; t->rtbst_link[1] = p->rtbst_link[1]; q->rtbst_link[dir] = p->rtbst_link[0]; This code is included in 383. Case 2: p has a right thread and no left child .............................................. If node p is a leaf, then there are two subcases, according to whether p is a left child or a right child of its parent q. If dir is 0, then p is a left child and the pointer from its parent must be set to NULL. If dir is 1, then p is a right child and the link from its parent must be changed to a thread to its successor. In either of these cases we must set q->rtbst_link[dir]: if dir is 0, we set it to NULL, otherwise dir is 1 and we set it to p->rtbst_link[1]. However, we know that p->rtbst_link[0] is NULL, because p is a leaf, so we can instead unconditionally assign p->rtbst_link[dir]. In addition, if dir is 1, then we must tag q's right link as a thread. If q is the pseudo-root, then dir is 0 and everything works out fine with no need for a special case. 385. = q->rtbst_link[dir] = p->rtbst_link[dir]; if (dir == 1) q->rtbst_rtag = RTBST_THREAD; This code is included in 383. Case 3: p's right child has no left child ......................................... Code for this case, where p has a right child r that itself has no left child, is almost identical to . There is no left tag to copy, but it is still necessary to chase down the right thread in r's new left subtree (the same as p's former left subtree): 386. = r->rtbst_link[0] = p->rtbst_link[0]; if (r->rtbst_link[0] != NULL) { struct rtbst_node *t = r->rtbst_link[0]; while (t->rtbst_rtag == RTBST_CHILD) t = t->rtbst_link[1]; t->rtbst_link[1] = r; } q->rtbst_link[dir] = r; This code is included in 383. Case 4: p's right child has a left child ........................................ Code for case 4, the most general case, is very similar to . The only notable difference is in the subcase where s has a right thread: in that case we just set r's left link to NULL instead of having to set it up as a thread. 387. = struct rtbst_node *s; for (;;) { s = r->rtbst_link[0]; if (s->rtbst_link[0] == NULL) break; r = s; } if (s->rtbst_rtag == RTBST_CHILD) r->rtbst_link[0] = s->rtbst_link[1]; else r->rtbst_link[0] = NULL; s->rtbst_link[0] = p->rtbst_link[0]; if (p->rtbst_link[0] != NULL) { struct rtbst_node *t = p->rtbst_link[0]; while (t->rtbst_rtag == RTBST_CHILD) t = t->rtbst_link[1]; t->rtbst_link[1] = s; } s->rtbst_link[1] = p->rtbst_link[1]; s->rtbst_rtag = RTBST_CHILD; q->rtbst_link[dir] = s; This code is included in 383. Exercises: 1. Rewrite to replace the deleted node's rtavl_data by its successor, then delete the successor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 10.5.2 Left-Looking Deletion ---------------------------- The previous section implemented the "right-looking" form of deletion used elsewhere in libavl. Compared to deletion in a fully threaded binary tree, the benefits to using an RTBST with this kind of deletion are minimal: * Cases 1 and 2 are similar code in both TBST and RTBST deletion. * Case 3 in an RTBST avoids one tag copy required in TBST deletion. * One subcase of case 4 in an RTBST avoids one tag assignment required in the same subcase of TBST deletion. This is hardly worth it. We saved at most one assignment per call. We need something better if it's ever going to be worthwhile to use right-threaded trees. Fortunately, there is a way that we can save a little more. This is by changing our right-looking deletion into left-looking deletion, by switching the use of left and right children in the algorithm. In a BST or TBST, this symmetrical change in the algorithm would have no effect, because the BST and TBST node structures are themselves symmetric. But in an asymmetric RTBST even a symmetric change can have a significant effect on an algorithm, as we'll see. The cases for left-looking deletion are outlined in the same way as for right-looking deletion: 388. = if (p->rtbst_link[0] == NULL) { if (p->rtbst_rtag == RTBST_CHILD) { } else { } } else { struct rtbst_node *r = p->rtbst_link[0]; if (r->rtbst_rtag == RTBST_THREAD) { } else { } } This code is included in 380. Case 1: p has a right child but no left child ............................................. If the node to delete p has a right child but no left child, we can just replace it by its right child. There is no right thread to update in p's left subtree because p has no left child, and there is no left thread to update because a right-threaded tree has no left threads. The deletion looks like this if p's right child is designated x: | p | \ x x => ^ ^ a b a b 389. = q->rtbst_link[dir] = p->rtbst_link[1]; This code is included in 388. Case 2: p has a right thread and no left child .............................................. This case is analogous to case 2 in right-looking deletion covered earlier. The same discussion applies. 390. = q->rtbst_link[dir] = p->rtbst_link[dir]; if (dir == 1) q->rtbst_rtag = RTBST_THREAD; This code is included in 388. Case 3: p's left child has a right thread ......................................... If p has a left child r that itself has a right thread, then we replace p by r. Node r receives p's former right link, as shown here: | p | _.-' \ r r b => ^ / \ a b a [p] There is no need to fiddle with threads. If r has a right thread then it gets replaced by p's right child or thread anyhow. Any right thread within r's left subtree either points within that subtree or to r. Finally, r's right subtree cannot cause problems. 391. = r->rtbst_link[1] = p->rtbst_link[1]; r->rtbst_rtag = p->rtbst_rtag; q->rtbst_link[dir] = r; This code is included in 388. Case 4: p's left child has a right child ........................................ The final case handles deletion of a node p with a left child r that in turn has a right child. The code here follows the same pattern as (see the discussion there for details). The first step is to find the predecessor s of node p: 392. = struct rtbst_node *s; for (;;) { s = r->rtbst_link[1]; if (s->rtbst_rtag == RTBST_THREAD) break; r = s; } See also 393 and 394. This code is included in 388. Next, we update r, handling two subcases depending on whether s has a left child: 393. += if (s->rtbst_link[0] != NULL) r->rtbst_link[1] = s->rtbst_link[0]; else { r->rtbst_link[1] = s; r->rtbst_rtag = RTBST_THREAD; } The final step is to copy p's fields into s, then set q's child pointer to point to s instead of p. There is no need to chase down any threads. 394. += s->rtbst_link[0] = p->rtbst_link[0]; s->rtbst_link[1] = p->rtbst_link[1]; s->rtbst_rtag = p->rtbst_rtag; q->rtbst_link[dir] = s; Exercises: 1. Rewrite to replace the deleted node's rtavl_data by its predecessor, then delete the predecessor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 10.5.3 Aside: Comparison of Deletion Algorithms ----------------------------------------------- This book has presented algorithms for deletion from BSTs, TBSTs, and RTBSTs. In fact, we implemented two algorithms for RTBSTs. Each of these four algorithms has slightly different performance characteristics. The following table summarizes the behavior of all of the cases in these algorithms. Each cell describes the actions that take place: "link" is the number of link fields set, "tag" the number of tag fields set, and "succ/pred" the number of general successor or predecessors found during the case. BST* TBST Right-Looking Left-Looking TBST TBST Case 1 1 link 2 links 2 links 1 link 1 succ/pred 1 succ/pred Case 2 1 link 1 link 1 link 1 link 1 tag 1 tag 1 tag Case 3 2 links 3 links 3 links 2 links 1 tag 1 tag 1 succ/pred 1 succ/pred Case 4 4 links 5 links 5 links 4 links subcase 1 2 tags 1 tag 1 tag 1 succ/pred 2 succ/pred 2 succ/pred 1 succ/pred Case 4 4 links 5 links 5 links 4 links subcase 2 2 tags 1 tag 1 tag 1 succ/pred 2 succ/pred 2 succ/pred 1 succ/pred * Listed cases 1 and 2 both correspond to BST deletion case 1, and listed cases 3 and 4 to BST deletion cases 2 and 3, respectively. BST deletion does not have any subcases in its case 3 (listed case 4), so it also saves a test to distinguish subcases. As you can see, the penalty for left-looking deletion from a RTBST, compared to a plain BST, is at most one tag assignment in any given case, except for the need to distinguish subcases of case 4. In this sense at least, left-looking deletion from an RTBST is considerably faster than deletion from a TBST or right-looking deletion from a RTBST. This means that it can indeed be worthwhile to implement right-threaded trees instead of BSTs or TBSTs. 10.6 Traversal ============== Traversal in an RTBST is unusual due to its asymmetry. Moving from smaller nodes to larger nodes is easy: we do it with the same algorithm used in a TBST. Moving the other way is more difficult and inefficient besides: we have neither a stack of parent nodes to fall back on nor left threads to short-circuit. RTBSTs use the same traversal structure as TBSTs, so we can reuse some of the functions from TBST traversers. We also get a few directly from the implementations for BSTs. Other than that, everything has to be written anew here: 395. = rtbst 269> rtbst 273> rtbst 274> rtbst 74> rtbst 75> This code is included in 375, 418, and 455. 10.6.1 Starting at the First Node --------------------------------- To find the first (least) item in the tree, we just descend all the way to the left, as usual. In an RTBST, as in a BST, this involves checking for null pointers. 396. = void * rtbst_t_first (struct rtbst_traverser *trav, struct rtbst_table *tree) { assert (tree != NULL && trav != NULL); trav->rtbst_table = tree; trav->rtbst_node = tree->rtbst_root; if (trav->rtbst_node != NULL) { while (trav->rtbst_node->rtbst_link[0] != NULL) trav->rtbst_node = trav->rtbst_node->rtbst_link[0]; return trav->rtbst_node->rtbst_data; } else return NULL; } This code is included in 395. 10.6.2 Starting at the Last Node -------------------------------- To start at the last (greatest) item in the tree, we descend all the way to the right. In an RTBST, as in a TBST, this involves checking for thread links. 397. = void * rtbst_t_last (struct rtbst_traverser *trav, struct rtbst_table *tree) { assert (tree != NULL && trav != NULL); trav->rtbst_table = tree; trav->rtbst_node = tree->rtbst_root; if (trav->rtbst_node != NULL) { while (trav->rtbst_node->rtbst_rtag == RTBST_CHILD) trav->rtbst_node = trav->rtbst_node->rtbst_link[1]; return trav->rtbst_node->rtbst_data; } else return NULL; } This code is included in 395. 10.6.3 Starting at a Found Node ------------------------------- To start from an item found in the tree, we use the same algorithm as rtbst_find(). 398. = void * rtbst_t_find (struct rtbst_traverser *trav, struct rtbst_table *tree, void *item) { struct rtbst_node *p; assert (trav != NULL && tree != NULL && item != NULL); trav->rtbst_table = tree; trav->rtbst_node = NULL; p = tree->rtbst_root; if (p == NULL) return NULL; for (;;) { int cmp = tree->rtbst_compare (item, p->rtbst_data, tree->rtbst_param); if (cmp == 0) { trav->rtbst_node = p; return p->rtbst_data; } if (cmp < 0) { p = p->rtbst_link[0]; if (p == NULL) return NULL; } else { if (p->rtbst_rtag == RTBST_THREAD) return NULL; p = p->rtbst_link[1]; } } } This code is included in 395. 10.6.4 Advancing to the Next Node --------------------------------- We use the same algorithm to advance an RTBST traverser as for TBST traversers. The only important difference between this code and is the substitution of rtbst_rtag for tbst_tag[1]. 399. = void * rtbst_t_next (struct rtbst_traverser *trav) { assert (trav != NULL); if (trav->rtbst_node == NULL) return rtbst_t_first (trav, trav->rtbst_table); else if (trav->rtbst_node->rtbst_rtag == RTBST_THREAD) { trav->rtbst_node = trav->rtbst_node->rtbst_link[1]; return trav->rtbst_node != NULL ? trav->rtbst_node->rtbst_data : NULL; } else { trav->rtbst_node = trav->rtbst_node->rtbst_link[1]; while (trav->rtbst_node->rtbst_link[0] != NULL) trav->rtbst_node = trav->rtbst_node->rtbst_link[0]; return trav->rtbst_node->rtbst_data; } } This code is included in 395. 10.6.5 Backing Up to the Previous Node -------------------------------------- Moving an RTBST traverser backward has the same cases as in the other ways of finding an inorder predecessor that we've already discussed. The two main cases are distinguished on whether the current item has a left child; the third case comes up when there is no current item, implemented simply by delegation to rtbst_t_last(): 400. = void * rtbst_t_prev (struct rtbst_traverser *trav) { assert (trav != NULL); if (trav->rtbst_node == NULL) return rtbst_t_last (trav, trav->rtbst_table); else if (trav->rtbst_node->rtbst_link[0] == NULL) { } else { } } This code is included in 395. The novel case is where the node p whose predecessor we want has no left child. In this case, we use a modified version of the algorithm originally specified for finding a node's successor in an unthreaded tree (*note Better Iterative Traversal::). We take the idea of moving up until we've moved up to the left, and turn it upside down (to avoid need for a parent stack) and reverse it (to find the predecessor instead of the successor). The idea here is to trace p's entire direct ancestral line. Starting from the root of the tree, we repeatedly compare each node's data with p's and use the result to move downward, until we encounter node p itself. Each time we move down from a node x to its right child, we record x as the potential predecessor of p. When we finally arrive at p, the last node so selected is the actual predecessor, or if none was selected then p is the least node in the tree and we select the null item as its predecessor. Consider this algorithm in the context of the tree shown here: 3 __..-' `-----......______ 1 9 _.-' \ ____....---' \ 0 2 5 [] \ \ _.-' `-.__ [1] [3] 4 7 \ _.-' \ [5] 6 8 \ \ [7] [9] To find the predecessor of node 8, we trace the path from the root down to it: 3-9-5-7-8. The last time we move down to the right is from 7 to 8, so 7 is node 8's predecessor. To find the predecessor of node 6, we trace the path 3-9-5-7-6 and notice that we last move down to the right from 5 to 7, so 5 is node 6's predecessor. Finally, node 0 has the null item as its predecessor because path 3-1-0 does not involve any rightward movement. Here is the code to implement this case: 401. = rtbst_comparison_func *cmp = trav->rtbst_table->rtbst_compare; void *param = trav->rtbst_table->rtbst_param; struct rtbst_node *node = trav->rtbst_node; struct rtbst_node *i; trav->rtbst_node = NULL; for (i = trav->rtbst_table->rtbst_root; i != node; ) { int dir = cmp (node->rtbst_data, i->rtbst_data, param) > 0; if (dir == 1) trav->rtbst_node = i; i = i->rtbst_link[dir]; } return trav->rtbst_node != NULL ? trav->rtbst_node->rtbst_data : NULL; This code is included in 400. The other case, where the node whose predecessor we want has a left child, is nothing new. We just find the largest node in the node's left subtree: 402. = trav->rtbst_node = trav->rtbst_node->rtbst_link[0]; while (trav->rtbst_node->rtbst_rtag == RTBST_CHILD) trav->rtbst_node = trav->rtbst_node->rtbst_link[1]; return trav->rtbst_node->rtbst_data; This code is included in 400. 10.7 Copying ============ The algorithm that we used for copying a TBST makes use of threads, but only right threads, so we can apply this algorithm essentially unmodified to RTBSTs. We will make one change that superficially simplifies and improves the elegance of the algorithm. Function tbst_copy() in uses a pair of local variables rp and rq to store pointers to the original and new tree's root, because accessing the tag field of a cast "pseudo-root" pointer produces undefined behavior. However, in an RTBST there is no tag for a node's left subtree. During a TBST copy, only the left tags of the root nodes are accessed, so this means that we can use the pseudo-roots in the RTBST copy, with no need for rp or rq. 403. = struct rtbst_table * rtbst_copy (const struct rtbst_table *org, rtbst_copy_func *copy, rtbst_item_func *destroy, struct libavl_allocator *allocator) { struct rtbst_table *new; const struct rtbst_node *p; struct rtbst_node *q; assert (org != NULL); new = rtbst_create (org->rtbst_compare, org->rtbst_param, allocator != NULL ? allocator : org->rtbst_alloc); if (new == NULL) return NULL; new->rtbst_count = org->rtbst_count; if (new->rtbst_count == 0) return new; p = (struct rtbst_node *) &org->rtbst_root; q = (struct rtbst_node *) &new->rtbst_root; for (;;) { if (p->rtbst_link[0] != NULL) { if (!copy_node (new, q, 0, p->rtbst_link[0], copy)) { copy_error_recovery (new, destroy); return NULL; } p = p->rtbst_link[0]; q = q->rtbst_link[0]; } else { while (p->rtbst_rtag == RTBST_THREAD) { p = p->rtbst_link[1]; if (p == NULL) { q->rtbst_link[1] = NULL; return new; } q = q->rtbst_link[1]; } p = p->rtbst_link[1]; q = q->rtbst_link[1]; } if (p->rtbst_rtag == RTBST_CHILD) if (!copy_node (new, q, 1, p->rtbst_link[1], copy)) { copy_error_recovery (new, destroy); return NULL; } } } This code is included in 406 and 447. The code to copy a node must be modified to deal with the asymmetrical nature of insertion in an RTBST: 404. = static int copy_node (struct rtbst_table *tree, struct rtbst_node *dst, int dir, const struct rtbst_node *src, rtbst_copy_func *copy) { struct rtbst_node *new = tree->rtbst_alloc->libavl_malloc (tree->rtbst_alloc, sizeof *new); if (new == NULL) return 0; new->rtbst_link[0] = NULL; new->rtbst_rtag = RTBST_THREAD; if (dir == 0) new->rtbst_link[1] = dst; else { new->rtbst_link[1] = dst->rtbst_link[1]; dst->rtbst_rtag = RTBST_CHILD; } dst->rtbst_link[dir] = new; if (copy == NULL) new->rtbst_data = src->rtbst_data; else { new->rtbst_data = copy (src->rtbst_data, tree->rtbst_param); if (new->rtbst_data == NULL) return 0; } return 1; } This code is included in 406. The error recovery function for copying is a bit simpler now, because the use of the pseudo-root means that no assignment to the new tree's root need take place, eliminating the need for one of the function's parameters: 405. = static void copy_error_recovery (struct rtbst_table *new, rtbst_item_func *destroy) { struct rtbst_node *p = new->rtbst_root; if (p != NULL) { while (p->rtbst_rtag == RTBST_CHILD) p = p->rtbst_link[1]; p->rtbst_link[1] = NULL; } rtbst_destroy (new, destroy); } This code is included in 406 and 447. 406. = This code is included in 375. 10.8 Destruction ================ The destruction algorithm for TBSTs makes use only of right threads, so we can easily adapt it for RTBSTs. 407. = void rtbst_destroy (struct rtbst_table *tree, rtbst_item_func *destroy) { struct rtbst_node *p; /* Current node. */ struct rtbst_node *n; /* Next node. */ p = tree->rtbst_root; if (p != NULL) while (p->rtbst_link[0] != NULL) p = p->rtbst_link[0]; while (p != NULL) { n = p->rtbst_link[1]; if (p->rtbst_rtag == RTBST_CHILD) while (n->rtbst_link[0] != NULL) n = n->rtbst_link[0]; if (destroy != NULL && p->rtbst_data != NULL) destroy (p->rtbst_data, tree->rtbst_param); tree->rtbst_alloc->libavl_free (tree->rtbst_alloc, p); p = n; } tree->rtbst_alloc->libavl_free (tree->rtbst_alloc, tree); } This code is included in 375, 418, and 455. 10.9 Balance ============ As for so many other operations, we can reuse most of the TBST balancing code to rebalance RTBSTs. Some of the helper functions can be completely recycled: 408. = rtbst 285> rtbst 283> This code is included in 375. The only substantative difference for the remaining two functions is that there is no need to set nodes' left tags (since they don't have any): 409. = static void tree_to_vine (struct rtbst_table *tree) { struct rtbst_node *p; if (tree->rtbst_root == NULL) return; p = tree->rtbst_root; while (p->rtbst_link[0] != NULL) p = p->rtbst_link[0]; for (;;) { struct rtbst_node *q = p->rtbst_link[1]; if (p->rtbst_rtag == RTBST_CHILD) { while (q->rtbst_link[0] != NULL) q = q->rtbst_link[0]; p->rtbst_rtag = RTBST_THREAD; p->rtbst_link[1] = q; } if (q == NULL) break; q->rtbst_link[0] = p; p = q; } tree->rtbst_root = p; } This code is included in 408. 410. = /* Performs a compression transformation count times, starting at root. */ static void compress (struct rtbst_node *root, unsigned long nonthread, unsigned long thread) { assert (root != NULL); while (nonthread--) { struct rtbst_node *red = root->rtbst_link[0]; struct rtbst_node *black = red->rtbst_link[0]; root->rtbst_link[0] = black; red->rtbst_link[0] = black->rtbst_link[1]; black->rtbst_link[1] = red; root = black; } while (thread--) { struct rtbst_node *red = root->rtbst_link[0]; struct rtbst_node *black = red->rtbst_link[0]; root->rtbst_link[0] = black; red->rtbst_link[0] = NULL; black->rtbst_rtag = RTBST_CHILD; root = black; } } This code is included in 408. 10.10 Testing ============= There's nothing new or interesting in the test code. 411. = #include #include #include #include "rtbst.h" #include "test.h" rtbst 104> rtbst 109> rtbst 295> rtbst 122> 412. = void print_tree_structure (struct rtbst_node *node, int level) { if (level > 16) { printf ("[...]"); return; } if (node == NULL) { printf (""); return; } printf ("%d(", node->rtbst_data ? *(int *) node->rtbst_data : -1); if (node->rtbst_link[0] != NULL) print_tree_structure (node->rtbst_link[0], level + 1); fputs (", ", stdout); if (node->rtbst_rtag == RTBST_CHILD) { if (node->rtbst_link[1] == node) printf ("loop"); else print_tree_structure (node->rtbst_link[1], level + 1); } else if (node->rtbst_link[1] != NULL) printf (">%d", (node->rtbst_link[1]->rtbst_data ? *(int *) node->rtbst_link[1]->rtbst_data : -1)); else printf (">>"); putchar (')'); } void print_whole_tree (const struct rtbst_table *tree, const char *title) { printf ("%s: ", title); print_tree_structure (tree->rtbst_root, 0); putchar ('\n'); } This code is included in 411, 449, and 482. 413. = static int compare_trees (struct rtbst_node *a, struct rtbst_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->rtbst_data : -1, b ? *(int *) b->rtbst_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->rtbst_data != *(int *) b->rtbst_data || a->rtbst_rtag != b->rtbst_rtag) { printf (" Copied nodes differ: a=%d b=%d a:", *(int *) a->rtbst_data, *(int *) b->rtbst_data); if (a->rtbst_rtag == RTBST_CHILD) printf ("r"); printf (" b:"); if (b->rtbst_rtag == RTBST_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->rtbst_rtag == RTBST_THREAD) assert ((a->rtbst_link[1] == NULL) != (a->rtbst_link[1] != b->rtbst_link[1])); okay = compare_trees (a->rtbst_link[0], b->rtbst_link[0]); if (a->rtbst_rtag == RTBST_CHILD) okay &= compare_trees (a->rtbst_link[1], b->rtbst_link[1]); return okay; } This code is included in 411. 414. = static void recurse_verify_tree (struct rtbst_node *node, int *okay, size_t *count, int min, int max) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ if (node == NULL) { *count = 0; return; } d = *(int *) node->rtbst_data; subcount[0] = subcount[1] = 0; recurse_verify_tree (node->rtbst_link[0], okay, &subcount[0], min, d - 1); if (node->rtbst_rtag == RTBST_CHILD) recurse_verify_tree (node->rtbst_link[1], okay, &subcount[1], d + 1, max); *count = 1 + subcount[0] + subcount[1]; } This code is included in 411. 11 Right-Threaded AVL Trees *************************** In the same way that we can combine threaded trees with AVL trees to produce threaded AVL trees, we can combine right-threaded trees with AVL trees to produce right-threaded AVL trees. This chapter explores this combination, producing another table implementation. Here's the form of the source and header files. Notice the use of rtavl_ as the identifier prefix. Likewise, we will often refer to right-threaded AVL trees as "RTAVL trees". 415. = #ifndef RTAVL_H #define RTAVL_H 1 #include
rtavl 14> rtavl 28> rtavl 250> rtavl 267>
rtavl 15> #endif /* rtavl.h */ 416. = #include #include #include #include "rtavl.h" 11.1 Data Types =============== Besides the members needed for any BST, an RTAVL node structure needs a tag to indicate whether the right link is a child pointer or a thread, and a balance factor to facilitate AVL balancing. Here's what we end up with: 417. = /* Characterizes a link as a child pointer or a thread. */ enum rtavl_tag { RTAVL_CHILD, /* Child pointer. */ RTAVL_THREAD /* Thread. */ }; /* A threaded binary search tree node. */ struct rtavl_node { struct rtavl_node *rtavl_link[2]; /* Subtrees. */ void *rtavl_data; /* Pointer to data. */ unsigned char rtavl_rtag; /* Tag field. */ signed char rtavl_balance; /* Balance factor. */ }; This code is included in 415. 11.2 Operations =============== Most of the operations for RTAVL trees can come directly from their RTBST implementations. The notable exceptions are, as usual, the insertion and deletion functions. The copy function will also need a small tweak. Here's the list of operations: 418. = rtavl 252> rtavl 376>
rtavl 592> rtavl 395> rtavl 407> rtavl 6>
rtavl 594> This code is included in 416. 11.3 Rotations ============== We will use rotations in right-threaded trees in the same way as for other kinds of trees that we have already examined. As always, a generic rotation looks like this: | | Y X / \ / \ X c a Y ^ ^ a b b c On the left side of this diagram, a may be an empty subtree and b and c may be threads. On the right side, a and b may be empty subtrees and c may be a thread. If none of them in fact represent actual nodes, then we end up with the following pathological case: | | Y X _.-' \ \ X [] Y \ \ [Y] [] Notice the asymmetry here: in a right rotation the right thread from X to Y becomes a null left child of Y, but in a left rotation this is reversed and a null subtree b becomes a right thread from X to Y. Contrast this to the correponding rotation in a threaded tree (*note TBST Rotations::), where either way the same kind of change occurs: the thread from X to Y, or vice versa, simply reverses direction. As with other kinds of rotations we've seen, there is no need to make any changes in subtrees of a, b, or c, because of rotations' locality and order-preserving properties (*note BST Rotations::). In particular, nodes a and c, if they exist, need no adjustments, as implied by the diagram above, which shows no changes to these subtrees on opposite sides. Exercises: 1. Write functions for right and left rotations in right-threaded BSTs, analogous to those for unthreaded BSTs developed in Exercise 4.3-2. 11.4 Insertion ============== Insertion into an RTAVL tree follows the same pattern as insertion into other kinds of balanced tree. The outline is straightforward: 419. = void ** rtavl_probe (struct rtavl_table *tree, void *item) { rtavl 147> assert (tree != NULL && item != NULL); rtavl 150> } This code is included in 418. 11.4.1 Steps 1-2: Search and Insert ----------------------------------- The basic insertion step itself follows the same steps as does for a plain RTBST. We do keep track of the directions moved on stack da[] and the last-seen node with nonzero balance factor, in the same way as for unthreaded AVL trees. 420. = z = (struct rtavl_node *) &tree->rtavl_root; y = tree->rtavl_root; if (tree->rtavl_root != NULL) for (q = z, p = y; ; q = p, p = p->rtavl_link[dir]) { int cmp = tree->rtavl_compare (item, p->rtavl_data, tree->rtavl_param); if (cmp == 0) return &p->rtavl_data; if (p->rtavl_balance != 0) z = q, y = p, k = 0; da[k++] = dir = cmp > 0; if (dir == 0) { if (p->rtavl_link[0] == NULL) break; } else /* dir == 1 */ { if (p->rtavl_rtag == RTAVL_THREAD) break; } } else { p = (struct rtavl_node *) &tree->rtavl_root; dir = 0; } This code is included in 419. 421. = n = tree->rtavl_alloc->libavl_malloc (tree->rtavl_alloc, sizeof *n); if (n == NULL) return NULL; tree->rtavl_count++; n->rtavl_data = item; n->rtavl_link[0] = NULL; if (dir == 0) n->rtavl_link[1] = p; else /* dir == 1 */ { p->rtavl_rtag = RTAVL_CHILD; n->rtavl_link[1] = p->rtavl_link[1]; } n->rtavl_rtag = RTAVL_THREAD; n->rtavl_balance = 0; p->rtavl_link[dir] = n; if (y == NULL) { n->rtavl_link[1] = NULL; return &n->rtavl_data; } This code is included in 419. 11.4.2 Step 4: Rebalance ------------------------ Unlike all of the AVL rebalancing algorithms we've seen so far, rebalancing of a right-threaded AVL tree is not symmetric. This means that we cannot single out left-side rebalancing or right-side rebalancing as we did before, hand-waving the rest of it as a symmetric case. But both cases are very similar, if not exactly symmetric, so we will present the corresponding cases together. The theory is exactly the same as before (*note Rebalancing AVL Trees::). Here is the code to choose between left-side and right-side rebalancing: 422. = if (y->rtavl_balance == -2) { } else if (y->rtavl_balance == +2) { } else return &n->rtavl_data; z->rtavl_link[y != z->rtavl_link[0]] = w; return &n->rtavl_data; This code is included in 419. The code to choose between the two subcases within the left-side and right-side rebalancing cases follows below. As usual during rebalancing, y is the node at which rebalancing occurs, x is its child on the same side as the inserted node, and cases are distinguished on the basis of x's balance factor: 423. = struct rtavl_node *x = y->rtavl_link[0]; if (x->rtavl_balance == -1) { } else { } This code is included in 422. 424. = struct rtavl_node *x = y->rtavl_link[1]; if (x->rtavl_balance == +1) { } else { } This code is included in 422. Case 1: x has taller subtree on side of insertion ................................................. If node x's taller subtree is on the same side as the inserted node, then we perform a rotation at y in the opposite direction. That is, if the insertion occurred in the left subtree of y and x has a - balance factor, we rotate right at y, and if the insertion was to the right and x has a + balance factor, we rotate left at y. This changes the balance of both x and y to zero. None of this is a change from unthreaded or fully threaded rebalancing. The difference is in the handling of empty subtrees, that is, in the rotation itself (*note RTBST Rotations::). Here is a diagram of left-side rebalancing for the interesting case where x has a right thread. Taken along with x's - balance factor, this means that n, the newly inserted node, must be x's left child. Therefore, subtree x has height 2, so y has no right child (because it has a -2 balance factor). This chain of logic means that we know exactly what the tree looks like in this particular subcase: | y | <--> x __..-' \ <0> x [] __..-' \ <-> => n y __..-' \ <0> <0> n [y] \ \ <0> [x] [] \ [x] 425. = w = x; if (x->rtavl_rtag == RTAVL_THREAD) { x->rtavl_rtag = RTAVL_CHILD; y->rtavl_link[0] = NULL; } else y->rtavl_link[0] = x->rtavl_link[1]; x->rtavl_link[1] = y; x->rtavl_balance = y->rtavl_balance = 0; This code is included in 423. Here is the diagram and code for the similar right-side case: | y | <++> x \ <0> x __..-' \ <+> => y n \ <0> <0> n \ \ <0> [x] [] \ [] 426. = w = x; if (x->rtavl_link[0] == NULL) { y->rtavl_rtag = RTAVL_THREAD; y->rtavl_link[1] = x; } else y->rtavl_link[1] = x->rtavl_link[0]; x->rtavl_link[0] = y; x->rtavl_balance = y->rtavl_balance = 0; This code is included in 424. Case 2: x has taller subtree on side opposite insertion ....................................................... If node x's taller subtree is on the side opposite the newly inserted node, then we perform a double rotation: first rotate at x in the same direction as the inserted node, then in the opposite direction at y. This is the same as in a threaded or unthreaded tree, and indeed we can reuse much of the code. The case where the details differ is, as usual, where threads or null child pointers are moved around. In the most extreme case for insertion to the left, where w is a leaf, we know that x has no left child and s no right child, and the situation looks like the diagram below before and after the rebalancing step: | y | <--> w ___...---' \ <0> x [] __..-' \ <+> => x y \ <0> <0> w \ \ <0> [w] [] \ [y] 427. = rtavl 156> if (x->rtavl_link[1] == NULL) { x->rtavl_rtag = RTAVL_THREAD; x->rtavl_link[1] = w; } if (w->rtavl_rtag == RTAVL_THREAD) { y->rtavl_link[0] = NULL; w->rtavl_rtag = RTAVL_CHILD; } This code is included in 423 and 442. Here is the code and diagram for right-side insertion rebalancing: | y | <++> w `--..__ <0> x __..-' \ <-> => y x __..-' \ <0> <0> w [] \ \ <0> [w] [] \ [x] 428. = rtavl 159> if (y->rtavl_link[1] == NULL) { y->rtavl_rtag = RTAVL_THREAD; y->rtavl_link[1] = w; } if (w->rtavl_rtag == RTAVL_THREAD) { x->rtavl_link[0] = NULL; w->rtavl_rtag = RTAVL_CHILD; } This code is included in 424 and 441. 11.5 Deletion ============= Deletion in an RTAVL tree takes the usual pattern. 429. = void * rtavl_delete (struct rtavl_table *tree, const void *item) { /* Stack of nodes. */ struct rtavl_node *pa[RTAVL_MAX_HEIGHT]; /* Nodes. */ unsigned char da[RTAVL_MAX_HEIGHT]; /* rtavl_link[] indexes. */ int k; /* Stack pointer. */ struct rtavl_node *p; /* Traverses tree to find node to delete. */ assert (tree != NULL && item != NULL); return (void *) item; } This code is included in 418. 11.5.1 Step 1: Search --------------------- There's nothing new in searching an RTAVL tree for a node to delete. We use p to search the tree, and push its chain of parent nodes onto stack pa[] along with the directions da[] moved down from them, including the pseudo-root node at the top. 430. = k = 1; da[0] = 0; pa[0] = (struct rtavl_node *) &tree->rtavl_root; p = tree->rtavl_root; if (p == NULL) return NULL; for (;;) { int cmp, dir; cmp = tree->rtavl_compare (item, p->rtavl_data, tree->rtavl_param); if (cmp == 0) break; dir = cmp > 0; if (dir == 0) { if (p->rtavl_link[0] == NULL) return NULL; } else /* dir == 1 */ { if (p->rtavl_rtag == RTAVL_THREAD) return NULL; } pa[k] = p; da[k++] = dir; p = p->rtavl_link[dir]; } tree->rtavl_count--; item = p->rtavl_data; This code is included in 429 and 468. 11.5.2 Step 2: Delete --------------------- As demonstrated in the previous chapter, left-looking deletion, where we examine the left subtree of the node to be deleted, is more efficient than right-looking deletion in an RTBST (*note Left-Looking Deletion in an RTBST::). This holds true in an RTAVL tree, too. 431. = if (p->rtavl_link[0] == NULL) { if (p->rtavl_rtag == RTAVL_CHILD) { } else { } } else { struct rtavl_node *r = p->rtavl_link[0]; if (r->rtavl_rtag == RTAVL_THREAD) { } else { } } tree->rtavl_alloc->libavl_free (tree->rtavl_alloc, p); This code is included in 429. Case 1: p has a right child but no left child ............................................. If the node to be deleted, p, has a right child but not a left child, then we replace it by its right child. 432. = pa[k - 1]->rtavl_link[da[k - 1]] = p->rtavl_link[1]; This code is included in 431 and 470. Case 2: p has a right thread and no left child .............................................. If we are deleting a leaf, then we replace it by a null pointer if it's a left child, or by a pointer to its own former right thread if it's a right child. Refer back to the commentary on for further explanation. 433. = pa[k - 1]->rtavl_link[da[k - 1]] = p->rtavl_link[da[k - 1]]; if (da[k - 1] == 1) pa[k - 1]->rtavl_rtag = RTAVL_THREAD; This code is included in 431 and 471. Case 3: p's left child has a right thread ......................................... If p has a left child r, and r has a right thread, then we replace p by r and transfer p's former right link to r. Node r also receives p's balance factor. 434. = r->rtavl_link[1] = p->rtavl_link[1]; r->rtavl_rtag = p->rtavl_rtag; r->rtavl_balance = p->rtavl_balance; pa[k - 1]->rtavl_link[da[k - 1]] = r; da[k] = 0; pa[k++] = r; This code is included in 431. Case 4: p's left child has a right child ........................................ The final case, where node p's left child r has a right child, is also the most complicated. We find p's predecessor s first: 435. = struct rtavl_node *s; int j = k++; for (;;) { da[k] = 1; pa[k++] = r; s = r->rtavl_link[1]; if (s->rtavl_rtag == RTAVL_THREAD) break; r = s; } See also 436 and 437. This code is included in 431. Then we move s into p's place, not forgetting to update links and tags as necessary: 436. += da[j] = 0; pa[j] = pa[j - 1]->rtavl_link[da[j - 1]] = s; if (s->rtavl_link[0] != NULL) r->rtavl_link[1] = s->rtavl_link[0]; else { r->rtavl_rtag = RTAVL_THREAD; r->rtavl_link[1] = s; } Finally, we copy p's old information into s, except for the actual data: 437. += s->rtavl_balance = p->rtavl_balance; s->rtavl_link[0] = p->rtavl_link[0]; s->rtavl_link[1] = p->rtavl_link[1]; s->rtavl_rtag = p->rtavl_rtag; 11.5.3 Step 3: Update Balance Factors ------------------------------------- Updating balance factors works exactly the same way as in unthreaded AVL deletion (*note Deleting an AVL Node Step 3 - Update::). 438. = assert (k > 0); while (--k > 0) { struct rtavl_node *y = pa[k]; if (da[k] == 0) { y->rtavl_balance++; if (y->rtavl_balance == +1) break; else if (y->rtavl_balance == +2) { } } else { y->rtavl_balance--; if (y->rtavl_balance == -1) break; else if (y->rtavl_balance == -2) { } } } This code is included in 429. 11.5.4 Step 4: Rebalance ------------------------ Rebalancing in an RTAVL tree after deletion is not completely symmetric between left-side and right-side rebalancing, but there are pairs of similar subcases on each side. The outlines are similar, too. Either way, rebalancing occurs at node y, and cases are distinguished based on the balance factor of x, the child of y on the side opposite the deletion. 439. = struct rtavl_node *x = y->rtavl_link[1]; assert (x != NULL); if (x->rtavl_balance == -1) { } else { pa[k - 1]->rtavl_link[da[k - 1]] = x; if (x->rtavl_balance == 0) { break; } else /* x->rtavl_balance == +1 */ { } } This code is included in 438. 440. = struct rtavl_node *x = y->rtavl_link[0]; assert (x != NULL); if (x->rtavl_balance == +1) { } else { pa[k - 1]->rtavl_link[da[k - 1]] = x; if (x->rtavl_balance == 0) { break; } else /* x->rtavl_balance == -1 */ { } } This code is included in 438. Case 1: x has taller subtree on same side as deletion ..................................................... If the taller subtree of x is on the same side as the deletion, then we rotate at x in the opposite direction from the deletion, then at y in the same direction as the deletion. This is the same as case 2 for RTAVL insertion (*note rtavlinscase2::), which in turn performs the general transformation described for AVL deletion case 1 (*note avldelcase1::), and we can reuse the code. 441. = struct rtavl_node *w; pa[k - 1]->rtavl_link[da[k - 1]] = w; This code is included in 439. 442. = struct rtavl_node *w; pa[k - 1]->rtavl_link[da[k - 1]] = w; This code is included in 440. Case 2: x's subtrees are equal height ..................................... If x's two subtrees are of equal height, then we perform a rotation at y toward the deletion. This rotation cannot be troublesome, for the same reason discussed for rebalancing in TAVL trees (*note tavldelcase2::). We can even reuse the code: 443. = rtavl 321> This code is included in 439. 444. = rtavl 325> This code is included in 440. Case 3: x has taller subtree on side opposite deletion ...................................................... When x's taller subtree is on the side opposite the deletion, we rotate at y toward the deletion, same as case 2. If the deletion was on the left side of y, then the general form is the same as for TAVL deletion (*note tavldelcase3::). The special case for left-side deletion, where x lacks a left child, and the general form of the code, are shown here: | y | <++> x \ <0> x __..-' \ <+> => y c \ <0> <0> c \ \ <0> [x] [] \ [] 445. = if (x->rtavl_link[0] != NULL) y->rtavl_link[1] = x->rtavl_link[0]; else y->rtavl_rtag = RTAVL_THREAD; x->rtavl_link[0] = y; y->rtavl_balance = x->rtavl_balance = 0; This code is included in 439. The special case for right-side deletion, where x lacks a right child, and the general form of the code, are shown here: | y | <--> x __..-' \ <0> x [] __..-' \ <-> => a y __..-' \ <0> <0> a [y] \ \ <0> [x] [] \ [x] 446. = if (x->rtavl_rtag == RTAVL_CHILD) y->rtavl_link[0] = x->rtavl_link[1]; else { y->rtavl_link[0] = NULL; x->rtavl_rtag = RTAVL_CHILD; } x->rtavl_link[1] = y; y->rtavl_balance = x->rtavl_balance = 0; This code is included in 440. Exercises: 1. In the chapter about TAVL deletion, we offered two implementations of deletion: one using a stack () and one using an algorithm to find node parents (). For RTAVL deletion, we offer only a stack-based implementation. Why? 2. The introduction to this section states that left-looking deletion is more efficient than right-looking deletion in an RTAVL tree. Confirm this by writing a right-looking alternate implementation of and comparing the two sets of code. 3. Rewrite to replace the deleted node's rtavl_data by its successor, then delete the successor, instead of shuffling pointers. (Refer back to Exercise 4.8-3 for an explanation of why this approach cannot be used in libavl.) 11.6 Copying ============ We can reuse most of the RTBST copying functionality for copying RTAVL trees, but we must modify the node copy function to copy the balance factor into the new node as well. 447. = rtavl 405> rtavl 403> This code is included in 418 and 455. 448. = static int copy_node (struct rtavl_table *tree, struct rtavl_node *dst, int dir, const struct rtavl_node *src, rtavl_copy_func *copy) { struct rtavl_node *new = tree->rtavl_alloc->libavl_malloc (tree->rtavl_alloc, sizeof *new); if (new == NULL) return 0; new->rtavl_link[0] = NULL; new->rtavl_rtag = RTAVL_THREAD; if (dir == 0) new->rtavl_link[1] = dst; else { new->rtavl_link[1] = dst->rtavl_link[1]; dst->rtavl_rtag = RTAVL_CHILD; } dst->rtavl_link[dir] = new; new->rtavl_balance = src->rtavl_balance; if (copy == NULL) new->rtavl_data = src->rtavl_data; else { new->rtavl_data = copy (src->rtavl_data, tree->rtavl_param); if (new->rtavl_data == NULL) return 0; } return 1; } This code is included in 447. 11.7 Testing ============ 449. = #include #include #include #include "rtavl.h" #include "test.h" rtavl 412> rtavl 104> rtavl 190> rtavl 100> rtavl 122> 450. = static int compare_trees (struct rtavl_node *a, struct rtavl_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->rtavl_data : -1, b ? *(int *) b->rtavl_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->rtavl_data != *(int *) b->rtavl_data || a->rtavl_rtag != b->rtavl_rtag || a->rtavl_balance != b->rtavl_balance) { printf (" Copied nodes differ: a=%d (bal=%d) b=%d (bal=%d) a:", *(int *) a->rtavl_data, a->rtavl_balance, *(int *) b->rtavl_data, b->rtavl_balance); if (a->rtavl_rtag == RTAVL_CHILD) printf ("r"); printf (" b:"); if (b->rtavl_rtag == RTAVL_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->rtavl_rtag == RTAVL_THREAD) assert ((a->rtavl_link[1] == NULL) != (a->rtavl_link[1] != b->rtavl_link[1])); okay = compare_trees (a->rtavl_link[0], b->rtavl_link[0]); if (a->rtavl_rtag == RTAVL_CHILD) okay &= compare_trees (a->rtavl_link[1], b->rtavl_link[1]); return okay; } This code is included in 449. 451. = static void recurse_verify_tree (struct rtavl_node *node, int *okay, size_t *count, int min, int max, int *height) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subheight[2]; /* Heights of subtrees. */ if (node == NULL) { *count = 0; *height = 0; return; } d = *(int *) node->rtavl_data; subcount[0] = subcount[1] = 0; subheight[0] = subheight[1] = 0; recurse_verify_tree (node->rtavl_link[0], okay, &subcount[0], min, d - 1, &subheight[0]); if (node->rtavl_rtag == RTAVL_CHILD) recurse_verify_tree (node->rtavl_link[1], okay, &subcount[1], d + 1, max, &subheight[1]); *count = 1 + subcount[0] + subcount[1]; *height = 1 + (subheight[0] > subheight[1] ? subheight[0] : subheight[1]); rtavl 189> } This code is included in 449. 12 Right-Threaded Red-Black Trees ********************************* This chapter is this book's final demonstration of right-threaded trees, carried out by using them in a red-black tree implementation of tables. The chapter, and the code, follow the pattern that should now be familiar, using rtrb_ as the naming prefix and often referring to right-threaded right-black trees as "RTRB trees". 452. = #ifndef RTRB_H #define RTRB_H 1 #include
rtrb 14> rtrb 195> rtrb 250> rtrb 267>
rtrb 15> #endif /* rtrb.h */ 453. = #include #include #include #include "rtrb.h" 12.1 Data Types =============== Like any right-threaded tree node, an RTRB node has a right tag, and like any red-black tree node, an RTRB node has a color, either red or black. The combination is straightforward, as shown here. 454. = /* Color of a red-black node. */ enum rtrb_color { RTRB_BLACK, /* Black. */ RTRB_RED /* Red. */ }; /* Characterizes a link as a child pointer or a thread. */ enum rtrb_tag { RTRB_CHILD, /* Child pointer. */ RTRB_THREAD /* Thread. */ }; /* A threaded binary search tree node. */ struct rtrb_node { struct rtrb_node *rtrb_link[2]; /* Subtrees. */ void *rtrb_data; /* Pointer to data. */ unsigned char rtrb_color; /* Color. */ unsigned char rtrb_rtag; /* Tag field. */ }; This code is included in 452. 12.2 Operations =============== Most of the operations on RTRB trees can be borrowed from the corresponding operations on TBSTs, RTBSTs, or RTAVL trees, as shown below. 455. = rtrb 252> rtrb 376>
rtrb 592> rtrb 395> rtrb; rtavl_balance => rtrb_color 447> rtrb 407> rtrb 6>
rtrb 594> This code is included in 453. 12.3 Insertion ============== Insertion is, as usual, one of the operations that must be newly implemented for our new type of tree. There is nothing surprising in the function's outline: 456. = void ** rtrb_probe (struct rtrb_table *tree, void *item) { struct rtrb_node *pa[RTRB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[RTRB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct rtrb_node *p; /* Current node in search. */ struct rtrb_node *n; /* New node. */ int dir; /* Side of p on which p is located. */ assert (tree != NULL && item != NULL); return &n->rtrb_data; } This code is included in 455. 12.3.1 Steps 1 and 2: Search and Insert --------------------------------------- The process of search and insertion proceeds as usual. Stack pa[], with pa[k - 1] at top of stack, records the parents of the node p currently under consideration, with corresponding stack da[] indicating the direction moved. We use the standard code for insertion into an RTBST. When the loop exits, p is the node under which a new node should be inserted on side dir. 457. = da[0] = 0; pa[0] = (struct rtrb_node *) &tree->rtrb_root; k = 1; if (tree->rtrb_root != NULL) for (p = tree->rtrb_root; ; p = p->rtrb_link[dir]) { int cmp = tree->rtrb_compare (item, p->rtrb_data, tree->rtrb_param); if (cmp == 0) return &p->rtrb_data; pa[k] = p; da[k++] = dir = cmp > 0; if (dir == 0) { if (p->rtrb_link[0] == NULL) break; } else /* dir == 1 */ { if (p->rtrb_rtag == RTRB_THREAD) break; } } else { p = (struct rtrb_node *) &tree->rtrb_root; dir = 0; } This code is included in 456. 458. = n = tree->rtrb_alloc->libavl_malloc (tree->rtrb_alloc, sizeof *n); if (n == NULL) return NULL; tree->rtrb_count++; n->rtrb_data = item; n->rtrb_link[0] = NULL; if (dir == 0) { if (tree->rtrb_root != NULL) n->rtrb_link[1] = p; else n->rtrb_link[1] = NULL; } else /* dir == 1 */ { p->rtrb_rtag = RTRB_CHILD; n->rtrb_link[1] = p->rtrb_link[1]; } n->rtrb_rtag = RTRB_THREAD; n->rtrb_color = RTRB_RED; p->rtrb_link[dir] = n; This code is included in 456. 12.3.2 Step 3: Rebalance ------------------------ The rebalancing outline follows . 459. = while (k >= 3 && pa[k - 1]->rtrb_color == RTRB_RED) { if (da[k - 2] == 0) { } else { } } tree->rtrb_root->rtrb_color = RTRB_BLACK; This code is included in 456. The choice of case for insertion on the left side is made in the same way as in , except that of course right-side tests for non-empty subtrees are made using rtrb_rtag instead of rtrb_link[1], and similarly for insertion on the right side. In short, we take q (which is not a real variable) as the new node n if this is the first time through the loop, or a node whose color has just been changed to red otherwise. We know that both q and its parent pa[k - 1] are red, violating rule 1 for red-black trees, and that q's grandparent pa[k - 2] is black. Here is the code to distinguish cases: 460. = struct rtrb_node *y = pa[k - 2]->rtrb_link[1]; if (pa[k - 2]->rtrb_rtag == RTRB_CHILD && y->rtrb_color == RTRB_RED) { } else { struct rtrb_node *x; if (da[k - 1] == 0) y = pa[k - 1]; else { } break; } This code is included in 459. 461. = struct rtrb_node *y = pa[k - 2]->rtrb_link[0]; if (pa[k - 2]->rtrb_link[0] != NULL && y->rtrb_color == RTRB_RED) { } else { struct rtrb_node *x; if (da[k - 1] == 1) y = pa[k - 1]; else { } break; } This code is included in 459. Case 1: q's uncle is red ........................ If node q's uncle is red, then no links need be changed. Instead, we will just recolor nodes. We reuse the code for RB insertion (*note rbinscase1::): 462. = rtrb 203> This code is included in 460. 463. = rtrb 207> This code is included in 461. Case 2: q is on same side of parent as parent is of grandparent ............................................................... If q is a left child of its parent y and y is a left child of its own parent x, or if both q and y are right children, then we rotate at x away from y. This is the same that we would do in an unthreaded RB tree (*note rbinscase2::). However, as usual, we must make sure that threads are fixed up properly in the rotation. In particular, for case 2 in left-side rebalancing, we must convert a right thread of y, after rotation, into a null left child pointer of x, like this: | pa[k-2],x | y ____....---' \ pa[k-1],y d _.-' \ => q x _.-' \ q [x] / \ \ a b d / \ a b 464. = rtrb 204> if (y->rtrb_rtag == RTRB_THREAD) { y->rtrb_rtag = RTRB_CHILD; x->rtrb_link[0] = NULL; } This code is included in 460. For the right-side rebalancing case, we must convert a null left child of y, after rotation, into a right thread of x: | pa[k-2] | y / \ a pa[k-1],y __..-' `_ => x q `_ q / \ / \ a [y] c d / \ c d 465. = rtrb 208> if (x->rtrb_link[1] == NULL) { x->rtrb_rtag = RTRB_THREAD; x->rtrb_link[1] = y; } This code is included in 461. Case 3: q is on opposite side of parent as parent is of grandparent ................................................................... If q is a left child and its parent is a right child, or vice versa, then we have an instance of case 3, and we rotate at q's parent in the direction from q to its parent. We handle this case as seen before for unthreaded RB trees (*note rbinscase3::), with the addition of fix-ups for threads during rotation. The left-side fix-up and the code to do it look like this: | | pa[k-2] _.-' \ _____....----' \ y d pa[k-1],x d => __..-' \ / \ x c a y,q / \ \ a [y] c 466. = rtrb 205> if (x->rtrb_link[1] == NULL) { x->rtrb_rtag = RTRB_THREAD; x->rtrb_link[1] = y; } This code is included in 460. Here's the right-side fix-up and code: | | pa[k-2] / `_ / `--...___ a y a pa[k-1],x => / \ __..-' \ b x q,y d \ / \ d b [x] 467. = rtrb 209> if (y->rtrb_rtag == RTRB_THREAD) { y->rtrb_rtag = RTRB_CHILD; x->rtrb_link[0] = NULL; } This code is included in 461. 12.4 Deletion ============= The process of deletion from an RTRB tree is the same that we've seen many times now. Code for the first step is borrowed from RTAVL deletion: 468. = void * rtrb_delete (struct rtrb_table *tree, const void *item) { struct rtrb_node *pa[RTRB_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[RTRB_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct rtrb_node *p; assert (tree != NULL && item != NULL); rtrb 430> } This code is included in 455. 12.4.1 Step 2: Delete --------------------- We use left-looking deletion. At this point, p is the node to delete. After the deletion, x is the node that replaced p, or a null pointer if the node was deleted without replacement. The cases are distinguished in the usual way: 469. = if (p->rtrb_link[0] == NULL) { if (p->rtrb_rtag == RTRB_CHILD) { } else { } } else { enum rtrb_color t; struct rtrb_node *r = p->rtrb_link[0]; if (r->rtrb_rtag == RTRB_THREAD) { } else { } } This code is included in 468. Case 1: p has a right child but no left child ............................................. If p, the node to be deleted, has a right child but no left child, then we replace it by its right child. This is the same as . 470. = rtrb 432> This code is included in 469. Case 2: p has a right thread and no left child .............................................. Similarly, case 2 is the same as , with the addition of an assignment to x. 471. = rtrb 433> This code is included in 469. Case 3: p's left child has a right thread ......................................... If p has a left child r, and r has a right thread, then we replace p by r and transfer p's former right link to r. Node r also receives p's balance factor. 472. = r->rtrb_link[1] = p->rtrb_link[1]; r->rtrb_rtag = p->rtrb_rtag; t = r->rtrb_color; r->rtrb_color = p->rtrb_color; p->rtrb_color = t; pa[k - 1]->rtrb_link[da[k - 1]] = r; da[k] = 0; pa[k++] = r; This code is included in 469. Case 4: p's left child has a right child ........................................ The fourth case, where p has a left child that itself has a right child, uses the same algorithm as , except that instead of setting the balance factor of s, we swap the colors of t and s as in . 473. = struct rtrb_node *s; int j = k++; for (;;) { da[k] = 1; pa[k++] = r; s = r->rtrb_link[1]; if (s->rtrb_rtag == RTRB_THREAD) break; r = s; } da[j] = 0; pa[j] = pa[j - 1]->rtrb_link[da[j - 1]] = s; if (s->rtrb_link[0] != NULL) r->rtrb_link[1] = s->rtrb_link[0]; else { r->rtrb_rtag = RTRB_THREAD; r->rtrb_link[1] = s; } s->rtrb_link[0] = p->rtrb_link[0]; s->rtrb_link[1] = p->rtrb_link[1]; s->rtrb_rtag = p->rtrb_rtag; t = s->rtrb_color; s->rtrb_color = p->rtrb_color; p->rtrb_color = t; This code is included in 469. 12.4.2 Step 3: Rebalance ------------------------ The rebalancing step's outline is much like that for deletion in a symmetrically threaded tree, except that we must check for a null child pointer on the left side of x versus a thread on the right side: 474. = if (p->rtrb_color == RTRB_BLACK) { for (; k > 1; k--) { struct rtrb_node *x; if (da[k - 1] == 0 || pa[k - 1]->rtrb_rtag == RTRB_CHILD) x = pa[k - 1]->rtrb_link[da[k - 1]]; else x = NULL; if (x != NULL && x->rtrb_color == RTRB_RED) { x->rtrb_color = RTRB_BLACK; break; } if (da[k - 1] == 0) { } else { } } if (tree->rtrb_root != NULL) tree->rtrb_root->rtrb_color = RTRB_BLACK; } This code is included in 468. As for RTRB insertion, rebalancing on either side of the root is not symmetric because the tree structure itself is not symmetric, but again the rebalancing steps are very similar. The outlines of the left-side and right-side rebalancing code are below. The code for ensuring that w is black and for case 1 on each side are the same as the corresponding unthreaded RB code, because none of that code needs to check for empty trees: 475. = struct rtrb_node *w = pa[k - 1]->rtrb_link[1]; if (w->rtrb_color == RTRB_RED) { rtrb 228> } if ((w->rtrb_link[0] == NULL || w->rtrb_link[0]->rtrb_color == RTRB_BLACK) && (w->rtrb_rtag == RTRB_THREAD || w->rtrb_link[1]->rtrb_color == RTRB_BLACK)) { rtrb 229> } else { if (w->rtrb_rtag == RTRB_THREAD || w->rtrb_link[1]->rtrb_color == RTRB_BLACK) { } break; } This code is included in 474. 476. = struct rtrb_node *w = pa[k - 1]->rtrb_link[0]; if (w->rtrb_color == RTRB_RED) { rtrb 234> } if ((w->rtrb_link[0] == NULL || w->rtrb_link[0]->rtrb_color == RTRB_BLACK) && (w->rtrb_rtag == RTRB_THREAD || w->rtrb_link[1]->rtrb_color == RTRB_BLACK)) { rtrb 235> } else { if (w->rtrb_link[0] == NULL || w->rtrb_link[0]->rtrb_color == RTRB_BLACK) { } break; } This code is included in 474. Case 2: w's child opposite the deletion is red .............................................. If the deletion was on the left side of w and w's right child is red, we rotate left at pa[k - 1] and perform some recolorings, as we did for unthreaded RB trees (*note rbdelcase2::). There is a special case when w has no left child. This must be transformed into a thread from leading to w following the rotation: | | pa[k-1],B w,C _.-' \ __..-' `_ x,A w,C B D => / \ `_ _.-' \ / \ a b D x,A [C] d e / \ / \ d e a b 477. = rtrb 230> if (w->rtrb_link[0]->rtrb_link[1] == NULL) { w->rtrb_link[0]->rtrb_rtag = RTRB_THREAD; w->rtrb_link[0]->rtrb_link[1] = w; } This code is included in 475. Alternately, if the deletion was on the right side of w and w's left child is right, we rotate right at pa[k - 1] and recolor. There is an analogous special case: | | pa[k-1],C w,B __..-' `_ _.-' \ w,B x,D A C => _.-' \ / \ / \ `_ A [C] d e a b x,D / \ / \ a b d e 478. = rtrb 237> if (w->rtrb_rtag == RTRB_THREAD) { w->rtrb_rtag = RTRB_CHILD; pa[k - 1]->rtrb_link[0] = NULL; } This code is included in 476. Case 3: w's child on the side of the deletion is red .................................................... If the deletion was on the left side of w and w's left child is red, then we rotate right at w and recolor, as in case 3 for unthreaded RB trees (*note rbdelcase3::). There is a special case when w's left child has a right thread. This must be transformed into a null left child of w's right child following the rotation: | | pa[k-1],B pa[k-1],B _.-' `--...___ _.-' `_ x,A w,D x,A w,C => / \ __..-' \ / \ / \ a b C e a b c D / \ \ c [D] e 479. = rtrb 231> if (w->rtrb_rtag == RTRB_THREAD) { w->rtrb_rtag = RTRB_CHILD; w->rtrb_link[1]->rtrb_link[0] = NULL; } This code is included in 475. Alternately, if the deletion was on the right side of w and w's right child is red, we rotate left at w and recolor. There is an analogous special case: | | pa[k-1],C pa[k-1],C ___..--' `_ _.-' `_ w,A x,D w,B x,D => / \ / \ __..-' \ / \ a B d e A c d e \ / \ c a [B] 480. = rtrb 236> if (w->rtrb_link[0]->rtrb_link[1] == NULL) { w->rtrb_link[0]->rtrb_rtag = RTRB_THREAD; w->rtrb_link[0]->rtrb_link[1] = w; } This code is included in 476. 12.4.3 Step 4: Finish Up ------------------------ 481. = tree->rtrb_alloc->libavl_free (tree->rtrb_alloc, p); return (void *) item; This code is included in 468. 12.5 Testing ============ 482. = #include #include #include #include "rtrb.h" #include "test.h" rtrb 412> rtrb 104> rtrb 244> rtrb 100> rtrb 122> 483. = static int compare_trees (struct rtrb_node *a, struct rtrb_node *b) { int okay; if (a == NULL || b == NULL) { if (a != NULL || b != NULL) { printf (" a=%d b=%d\n", a ? *(int *) a->rtrb_data : -1, b ? *(int *) b->rtrb_data : -1); assert (0); } return 1; } assert (a != b); if (*(int *) a->rtrb_data != *(int *) b->rtrb_data || a->rtrb_rtag != b->rtrb_rtag || a->rtrb_color != b->rtrb_color) { printf (" Copied nodes differ: a=%d%c b=%d%c a:", *(int *) a->rtrb_data, a->rtrb_color == RTRB_RED ? 'r' : 'b', *(int *) b->rtrb_data, b->rtrb_color == RTRB_RED ? 'r' : 'b'); if (a->rtrb_rtag == RTRB_CHILD) printf ("r"); printf (" b:"); if (b->rtrb_rtag == RTRB_CHILD) printf ("r"); printf ("\n"); return 0; } if (a->rtrb_rtag == RTRB_THREAD) assert ((a->rtrb_link[1] == NULL) != (a->rtrb_link[1] != b->rtrb_link[1])); okay = compare_trees (a->rtrb_link[0], b->rtrb_link[0]); if (a->rtrb_rtag == RTRB_CHILD) okay &= compare_trees (a->rtrb_link[1], b->rtrb_link[1]); return okay; } This code is included in 482. 484. = static void recurse_verify_tree (struct rtrb_node *node, int *okay, size_t *count, int min, int max, int *bh) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subbh[2]; /* Black-heights of subtrees. */ if (node == NULL) { *count = 0; *bh = 0; return; } d = *(int *) node->rtrb_data; subcount[0] = subcount[1] = 0; subbh[0] = subbh[1] = 0; recurse_verify_tree (node->rtrb_link[0], okay, &subcount[0], min, d - 1, &subbh[0]); if (node->rtrb_rtag == RTRB_CHILD) recurse_verify_tree (node->rtrb_link[1], okay, &subcount[1], d + 1, max, &subbh[1]); *count = 1 + subcount[0] + subcount[1]; *bh = (node->rtrb_color == RTRB_BLACK) + subbh[0]; rtrb 241> rtrb 243> } This code is included in 482. 485. = /* Verify compliance with rule 1. */ if (node->rtrb_color == RTRB_RED) { if (node->rtrb_link[0] != NULL && node->rtrb_link[0]->rtrb_color == RTRB_RED) { printf (" Red node %d has red left child %d\n", d, *(int *) node->rtrb_link[0]->rtrb_data); *okay = 0; } if (node->rtrb_rtag == RTRB_CHILD && node->rtrb_link[1]->rtrb_color == RTRB_RED) { printf (" Red node %d has red right child %d\n", d, *(int *) node->rtrb_link[1]->rtrb_data); *okay = 0; } } This code is included in 484. 13 BSTs with Parent Pointers **************************** The preceding six chapters introduced two different forms of threaded trees, which simplified traversal by eliminating the need for a stack. There is another way to accomplish the same purpose: add to each node a "parent pointer", a link from the node to its parent. A binary search tree so augmented is called a BST with parent pointers, or PBST for short.(1) In this chapter, we show how to add parent pointers to binary trees. The next two chapters will add them to AVL trees and red-black trees. Parent pointers and threads have equivalent power. That is, given a node within a threaded tree, we can find the node's parent, and given a node within a tree with parent pointers, we can determine the targets of any threads that the node would have in a similar threaded tree. Parent pointers have some advantages over threads. In particular, parent pointers let us more efficiently eliminate the stack for insertion and deletion in balanced trees. Rebalancing during these operations requires us to locate the parents of nodes. In our implementations of threaded balanced trees, we wrote code to do this, but it took a relatively complicated and slow helper function. Parent pointers make it much faster and easier. It is also easier to search a tree with parent pointers than a threaded tree, because there is no need to check tags. Outside of purely technical issues, many people find the use of parent pointers more intuitive than threads. On the other hand, to traverse a tree with parent pointers in inorder we may have to follow several parent pointers instead of a single thread. What's more, parent pointers take extra space for a third pointer field in every node, whereas the tag fields in threaded balanced trees often fit into node structures without taking up additional room (see Exercise 8.1-1). Finally, maintaining parent pointers on insertion and deletion takes time. In fact, we'll see that it takes more operations (and thus, all else being equal, time) than maintaining threads. In conclusion, a general comparison of parent pointers with threads reveals no clear winner. Further discussion of the merits of parent pointers versus those of threads will be postponed until later in this book. For now, we'll stick to the problems of parent pointer implementation. Here's the outline of the PBST code. We're using the prefix pbst_ this time: 486. = #ifndef PBST_H #define PBST_H 1 #include
pbst 14> pbst 250> pbst 267>
pbst 15> pbst 88> #endif /* pbst.h */ 487. = #include #include #include #include "pbst.h" ---------- Footnotes ---------- (1) This abbreviation might be thought of as expanding to "parented BST" or "parental BST", but those are not proper terms. 13.1 Data Types =============== For PBSTs we reuse TBST table and traverser structures. In fact, the only data type that needs revision is the node structure. We take the basic form of a node and add a member pbst_parent to point to its parent node: 488. = /* A binary search tree with parent pointers node. */ struct pbst_node { struct pbst_node *pbst_link[2]; /* Subtrees. */ struct pbst_node *pbst_parent; /* Parent. */ void *pbst_data; /* Pointer to data. */ }; This code is included in 486. There is one special case: what should be the value of pbst_parent for a node that has no parent, that is, in the tree's root? There are two reasonable choices. First, pbst_parent could be NULL in the root. This makes it easy to check whether a node is the tree's root. On the other hand, we often follow a parent pointer in order to change the link down from the parent, and NULL as the root node's pbst_parent requires a special case. We can eliminate this special case if the root's pbst_parent is the tree's pseudo-root node, that is, (struct pbst_node *) &tree->pbst_root. The downside of this choice is that it becomes uglier, and perhaps slower, to check whether a node is the tree's root, because a comparison must be made against a non-constant expression instead of simply NULL. In this book, we make the former choice, so pbst_parent is NULL in the tree's root node. See also: [Cormen 1990], section 11.4. 13.2 Operations =============== When we added parent pointers to BST nodes, we did not change the interpretation of any of the node members. This means that any function that examines PBSTs without modifying them will work without change. We take advantage of that for tree search. We also get away with it for destruction, since there's no problem with failing to update parent pointers in that case. Although we could, technically, do the same for traversal, that would negate much of the advantage of parent pointers, so we reimplement them. Here is the overall outline: 489. = pbst 252> pbst 31>
pbst 592> pbst 84> pbst 6>
pbst 594> This code is included in 487. 13.3 Insertion ============== The only difference between this code and is that we set n's parent pointer after insertion. 490. = void ** pbst_probe (struct pbst_table *tree, void *item) { struct pbst_node *p, *q; /* Current node in search and its parent. */ int dir; /* Side of q on which p is located. */ struct pbst_node *n; /* Newly inserted node. */ assert (tree != NULL && item != NULL); return &n->pbst_data; } This code is included in 489. 491. = for (q = NULL, p = tree->pbst_root; p != NULL; q = p, p = p->pbst_link[dir]) { int cmp = tree->pbst_compare (item, p->pbst_data, tree->pbst_param); if (cmp == 0) return &p->pbst_data; dir = cmp > 0; } This code is included in 490 and 555. 492. = n = tree->pbst_alloc->libavl_malloc (tree->pbst_alloc, sizeof *p); if (n == NULL) return NULL; tree->pbst_count++; n->pbst_link[0] = n->pbst_link[1] = NULL; n->pbst_parent = q; n->pbst_data = item; if (q != NULL) q->pbst_link[dir] = n; else tree->pbst_root = n; This code is included in 490, 525, and 556. See also: [Cormen 1990], section 13.3. 13.4 Deletion ============= The new aspect of deletion in a PBST is that we must properly adjust parent pointers. The outline is the same as usual: 493. = void * pbst_delete (struct pbst_table *tree, const void *item) { struct pbst_node *p; /* Traverses tree to find node to delete. */ struct pbst_node *q; /* Parent of p. */ int dir; /* Side of q on which p is linked. */ assert (tree != NULL && item != NULL); } This code is included in 489. We find the node to delete by using p to search for item. For the first time in implementing a deletion routine, we do not keep track of the current node's parent, because we can always find it out later with little effort: 494. = if (tree->pbst_root == NULL) return NULL; p = tree->pbst_root; for (;;) { int cmp = tree->pbst_compare (item, p->pbst_data, tree->pbst_param); if (cmp == 0) break; dir = cmp > 0; p = p->pbst_link[dir]; if (p == NULL) return NULL; } item = p->pbst_data; See also 495. This code is included in 493, 534, and 566. Now we've found the node to delete, p. The first step in deletion is to find the parent of p as q. Node p is q's child on side dir. Deletion of the root is a special case: 495. += q = p->pbst_parent; if (q == NULL) { q = (struct pbst_node *) &tree->pbst_root; dir = 0; } The remainder of the deletion follows the usual outline: 496. = if (p->pbst_link[1] == NULL) { } else { struct pbst_node *r = p->pbst_link[1]; if (r->pbst_link[0] == NULL) { } else { } } This code is included in 493. Case 1: p has no right child ............................ If p has no right child, then we can replace it by its left child, if any. If p does have a left child then we must update its parent to be p's former parent. 497. = q->pbst_link[dir] = p->pbst_link[0]; if (q->pbst_link[dir] != NULL) q->pbst_link[dir]->pbst_parent = p->pbst_parent; This code is included in 496, 536, and 568. Case 2: p's right child has no left child ......................................... When we delete a node with a right child that in turn has no left child, the operation looks like this: | | p r / \ ^ a r => a b \ b The key points to notice are that node r's parent changes and so does the parent of r's new left child, if there is one. We update these in deletion: 498. = r->pbst_link[0] = p->pbst_link[0]; q->pbst_link[dir] = r; r->pbst_parent = p->pbst_parent; if (r->pbst_link[0] != NULL) r->pbst_link[0]->pbst_parent = r; This code is included in 496, 537, and 569. Case 3: p's right child has a left child ........................................ If p's right child has a left child, then we replace p by its successor, as usual. Finding the successor s and its parent r is a little simpler than usual, because we can move up the tree so easily. We know that s has a non-null parent so there is no need to handle that special case: 499. = struct pbst_node *s = r->pbst_link[0]; while (s->pbst_link[0] != NULL) s = s->pbst_link[0]; r = s->pbst_parent; See also 500. This code is included in 496, 538, and 570. The only other change here is that we must update parent pointers. It is easy to pick out the ones that must be changed by looking at a diagram of the deletion: | | p s / `--...___ / `-..__ a x a x _' \ _' \ ... d ... d _' => / r r _' \ ^ s c b c \ b Node s's parent changes, as do the parents of its new right child x and, if it has one, its left child a. Perhaps less obviously, if s originally had a right child, it becomes the new left child of r, so its new parent is r: 500. += r->pbst_link[0] = s->pbst_link[1]; s->pbst_link[0] = p->pbst_link[0]; s->pbst_link[1] = p->pbst_link[1]; q->pbst_link[dir] = s; if (s->pbst_link[0] != NULL) s->pbst_link[0]->pbst_parent = s; s->pbst_link[1]->pbst_parent = s; s->pbst_parent = p->pbst_parent; if (r->pbst_link[0] != NULL) r->pbst_link[0]->pbst_parent = r; Finally, we free the deleted node p and return its data: 501. = tree->pbst_alloc->libavl_free (tree->pbst_alloc, p); tree->pbst_count--; return (void *) item; This code is included in 493. See also: [Cormen 1990], section 13.3. Exercises: 1. In case 1, can we change the right side of the assignment in the if statement's consequent from p->pbst_parent to q? 13.5 Traversal ============== The traverser for a PBST is just like that for a TBST, so we can reuse a couple of the TBST functions. Besides that and a couple of completely generic functions, we have to reimplement the traversal functions. 502. = pbst 269> pbst 274> pbst 74> pbst 75> This code is included in 489. 13.5.1 Starting at the First Node --------------------------------- Finding the smallest node in the tree is just a matter of starting from the root and descending as far to the left as we can. 503. = void * pbst_t_first (struct pbst_traverser *trav, struct pbst_table *tree) { assert (tree != NULL && trav != NULL); trav->pbst_table = tree; trav->pbst_node = tree->pbst_root; if (trav->pbst_node != NULL) { while (trav->pbst_node->pbst_link[0] != NULL) trav->pbst_node = trav->pbst_node->pbst_link[0]; return trav->pbst_node->pbst_data; } else return NULL; } This code is included in 502 and 546. 13.5.2 Starting at the Last Node -------------------------------- This is the same as starting from the least item, except that we descend to the right. 504. = void * pbst_t_last (struct pbst_traverser *trav, struct pbst_table *tree) { assert (tree != NULL && trav != NULL); trav->pbst_table = tree; trav->pbst_node = tree->pbst_root; if (trav->pbst_node != NULL) { while (trav->pbst_node->pbst_link[1] != NULL) trav->pbst_node = trav->pbst_node->pbst_link[1]; return trav->pbst_node->pbst_data; } else return NULL; } This code is included in 502 and 546. 13.5.3 Starting at a Found Node ------------------------------- To start from a particular item, we search for it in the tree. If it exists then we initialize the traverser to it. Otherwise, we initialize the traverser to the null item and return a null pointer. There are no surprises here. 505. = void * pbst_t_find (struct pbst_traverser *trav, struct pbst_table *tree, void *item) { struct pbst_node *p; int dir; assert (trav != NULL && tree != NULL && item != NULL); trav->pbst_table = tree; for (p = tree->pbst_root; p != NULL; p = p->pbst_link[dir]) { int cmp = tree->pbst_compare (item, p->pbst_data, tree->pbst_param); if (cmp == 0) { trav->pbst_node = p; return p->pbst_data; } dir = cmp > 0; } trav->pbst_node = NULL; return NULL; } This code is included in 502 and 546. 13.5.4 Starting at an Inserted Node ----------------------------------- This function combines the functionality of search and insertion with initialization of a traverser. 506. = void * pbst_t_insert (struct pbst_traverser *trav, struct pbst_table *tree, void *item) { struct pbst_node *p, *q; /* Current node in search and its parent. */ int dir; /* Side of q on which p is located. */ struct pbst_node *n; /* Newly inserted node. */ assert (trav != NULL && tree != NULL && item != NULL); trav->pbst_table = tree; for (q = NULL, p = tree->pbst_root; p != NULL; q = p, p = p->pbst_link[dir]) { int cmp = tree->pbst_compare (item, p->pbst_data, tree->pbst_param); if (cmp == 0) { trav->pbst_node = p; return p->pbst_data; } dir = cmp > 0; } trav->pbst_node = n = tree->pbst_alloc->libavl_malloc (tree->pbst_alloc, sizeof *p); if (n == NULL) return NULL; tree->pbst_count++; n->pbst_link[0] = n->pbst_link[1] = NULL; n->pbst_parent = q; n->pbst_data = item; if (q != NULL) q->pbst_link[dir] = n; else tree->pbst_root = n; return item; } This code is included in 502. 13.5.5 Advancing to the Next Node --------------------------------- There are the same three cases for advancing a traverser as the other types of binary trees that we've already looked at. Two of the cases, the ones where we're starting from the null item or a node that has a right child, are unchanged. The third case, where the node that we're starting from has no right child, is the case that must be revised. We can use the same algorithm that we did for ordinary BSTs without threads or parent pointers, described earlier (*note Better Iterative Traversal::). Simply put, we move upward in the tree until we move up to the right (or until we move off the top of the tree). The code uses q to move up the tree and p as q's child, so the termination condition is when p is q's left child or q becomes a null pointer. There is a non-null successor in the former case, where the situation looks like this: | q / \ p c ^ a b 507. = void * pbst_t_next (struct pbst_traverser *trav) { assert (trav != NULL); if (trav->pbst_node == NULL) return pbst_t_first (trav, trav->pbst_table); else if (trav->pbst_node->pbst_link[1] == NULL) { struct pbst_node *q, *p; /* Current node and its child. */ for (p = trav->pbst_node, q = p->pbst_parent; ; p = q, q = q->pbst_parent) if (q == NULL || p == q->pbst_link[0]) { trav->pbst_node = q; return trav->pbst_node != NULL ? trav->pbst_node->pbst_data : NULL; } } else { trav->pbst_node = trav->pbst_node->pbst_link[1]; while (trav->pbst_node->pbst_link[0] != NULL) trav->pbst_node = trav->pbst_node->pbst_link[0]; return trav->pbst_node->pbst_data; } } This code is included in 502 and 546. See also: [Cormen 1990], section 13.2. 13.5.6 Backing Up to the Previous Node -------------------------------------- This is the same as advancing a traverser, except that we reverse the directions. 508. = void * pbst_t_prev (struct pbst_traverser *trav) { assert (trav != NULL); if (trav->pbst_node == NULL) return pbst_t_last (trav, trav->pbst_table); else if (trav->pbst_node->pbst_link[0] == NULL) { struct pbst_node *q, *p; /* Current node and its child. */ for (p = trav->pbst_node, q = p->pbst_parent; ; p = q, q = q->pbst_parent) if (q == NULL || p == q->pbst_link[1]) { trav->pbst_node = q; return trav->pbst_node != NULL ? trav->pbst_node->pbst_data : NULL; } } else { trav->pbst_node = trav->pbst_node->pbst_link[0]; while (trav->pbst_node->pbst_link[1] != NULL) trav->pbst_node = trav->pbst_node->pbst_link[1]; return trav->pbst_node->pbst_data; } } This code is included in 502 and 546. See also: [Cormen 1990], section 13.2. 13.6 Copying ============ To copy BSTs with parent pointers, we use a simple adaptation of our original algorithm for copying BSTs, as implemented in . That function used a stack to keep track of the nodes that need to be revisited to have their right subtrees copies. We can eliminate that by using the parent pointers. Instead of popping a pair of nodes off the stack, we ascend the tree until we moved up to the left: 509. = struct pbst_table * pbst_copy (const struct pbst_table *org, pbst_copy_func *copy, pbst_item_func *destroy, struct libavl_allocator *allocator) { struct pbst_table *new; const struct pbst_node *x; struct pbst_node *y; assert (org != NULL); new = pbst_create (org->pbst_compare, org->pbst_param, allocator != NULL ? allocator : org->pbst_alloc); if (new == NULL) return NULL; new->pbst_count = org->pbst_count; if (new->pbst_count == 0) return new; x = (const struct pbst_node *) &org->pbst_root; y = (struct pbst_node *) &new->pbst_root; for (;;) { while (x->pbst_link[0] != NULL) { y->pbst_link[0] = new->pbst_alloc->libavl_malloc (new->pbst_alloc, sizeof *y->pbst_link[0]); if (y->pbst_link[0] == NULL) { if (y != (struct pbst_node *) &new->pbst_root) { y->pbst_data = NULL; y->pbst_link[1] = NULL; } copy_error_recovery (y, new, destroy); return NULL; } y->pbst_link[0]->pbst_parent = y; x = x->pbst_link[0]; y = y->pbst_link[0]; } y->pbst_link[0] = NULL; for (;;) { if (copy == NULL) y->pbst_data = x->pbst_data; else { y->pbst_data = copy (x->pbst_data, org->pbst_param); if (y->pbst_data == NULL) { y->pbst_link[1] = NULL; copy_error_recovery (y, new, destroy); return NULL; } } if (x->pbst_link[1] != NULL) { y->pbst_link[1] = new->pbst_alloc->libavl_malloc (new->pbst_alloc, sizeof *y->pbst_link[1]); if (y->pbst_link[1] == NULL) { copy_error_recovery (y, new, destroy); return NULL; } y->pbst_link[1]->pbst_parent = y; x = x->pbst_link[1]; y = y->pbst_link[1]; break; } else y->pbst_link[1] = NULL; for (;;) { const struct pbst_node *w = x; x = x->pbst_parent; if (x == NULL) { new->pbst_root->pbst_parent = NULL; return new; } y = y->pbst_parent; if (w == x->pbst_link[0]) break; } } } } This code is included in 489. Recovering from an error changes in the same way. We ascend from the node where we were copying when memory ran out and set the right children of the nodes where we ascended to the right to null pointers, then destroy the fixed-up tree: 510. = static void copy_error_recovery (struct pbst_node *q, struct pbst_table *new, pbst_item_func *destroy) { assert (q != NULL && new != NULL); for (;;) { struct pbst_node *p = q; q = q->pbst_parent; if (q == NULL) break; if (p == q->pbst_link[0]) q->pbst_link[1] = NULL; } pbst_destroy (new, destroy); } This code is included in 509 and 547. 13.7 Balance ============ We can balance a PBST in the same way that we would balance a BST without parent pointers. In fact, we'll use the same code, with the only change omitting only the maximum height check. This code doesn't set parent pointers, so afterward we traverse the tree to take care of that. Here are the pieces of the core code that need to be repeated: 511. = pbst 89> void pbst_balance (struct pbst_table *tree) { assert (tree != NULL); tree_to_vine (tree); vine_to_tree (tree); update_parents (tree); } This code is included in 489. 512. = pbst 95> static void vine_to_tree (struct pbst_table *tree) { unsigned long vine; /* Number of nodes in main vine. */ unsigned long leaves; /* Nodes in incomplete bottom level, if any. */ int height; /* Height of produced balanced tree. */ pbst 91> pbst 92> pbst 93> } This code is included in 511. 513. = /* Special PBST functions. */ void pbst_balance (struct pbst_table *tree); Updating Parent Pointers ........................ The procedure for rebalancing a binary tree leaves the nodes' parent pointers pointing every which way. Now we'll fix them. Incidentally, this is a general procedure, so the same code could be used in other situations where we have a tree to which we want to add parent pointers. The procedure takes the same form as an inorder traversal, except that there is nothing to do in the place where we would normally visit the node. Instead, every time we move down to the left or the right, we set the parent pointer of the node we move to. The code is straightforward enough. The basic strategy is to always move down to the left when possible; otherwise, move down to the right if possible; otherwise, repeatedly move up until we've moved up to the left to arrive at a node with a right child, then move to that right child. 514. = static void update_parents (struct pbst_table *tree) { struct pbst_node *p; if (tree->pbst_root == NULL) return; tree->pbst_root->pbst_parent = NULL; for (p = tree->pbst_root; ; p = p->pbst_link[1]) { for (; p->pbst_link[0] != NULL; p = p->pbst_link[0]) p->pbst_link[0]->pbst_parent = p; for (; p->pbst_link[1] == NULL; p = p->pbst_parent) { for (;;) { if (p->pbst_parent == NULL) return; if (p == p->pbst_parent->pbst_link[0]) break; p = p->pbst_parent; } } p->pbst_link[1]->pbst_parent = p; } } This code is included in 511. Exercises: 1. There is another approach to updating parent pointers: we can do it during the compressions. Implement this approach. Make sure not to miss any pointers. 13.8 Testing ============ 515. = #include #include #include #include "pbst.h" #include "test.h" pbst 119> pbst 104> pbst 109> pbst 295> pbst 122> 516. = static int compare_trees (struct pbst_node *a, struct pbst_node *b) { int okay; if (a == NULL || b == NULL) { assert (a == NULL && b == NULL); return 1; } if (*(int *) a->pbst_data != *(int *) b->pbst_data || ((a->pbst_link[0] != NULL) != (b->pbst_link[0] != NULL)) || ((a->pbst_link[1] != NULL) != (b->pbst_link[1] != NULL)) || ((a->pbst_parent != NULL) != (b->pbst_parent != NULL)) || (a->pbst_parent != NULL && b->pbst_parent != NULL && a->pbst_parent->pbst_data != b->pbst_parent->pbst_data)) { printf (" Copied nodes differ:\n" " a: %d, parent %d, %s left child, %s right child\n" " b: %d, parent %d, %s left child, %s right child\n", *(int *) a->pbst_data, a->pbst_parent != NULL ? *(int *) a->pbst_parent : -1, a->pbst_link[0] != NULL ? "has" : "no", a->pbst_link[1] != NULL ? "has" : "no", *(int *) b->pbst_data, b->pbst_parent != NULL ? *(int *) b->pbst_parent : -1, b->pbst_link[0] != NULL ? "has" : "no", b->pbst_link[1] != NULL ? "has" : "no"); return 0; } okay = 1; if (a->pbst_link[0] != NULL) okay &= compare_trees (a->pbst_link[0], b->pbst_link[0]); if (a->pbst_link[1] != NULL) okay &= compare_trees (a->pbst_link[1], b->pbst_link[1]); return okay; } This code is included in 515. 517. = static void recurse_verify_tree (struct pbst_node *node, int *okay, size_t *count, int min, int max) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int i; if (node == NULL) { *count = 0; return; } d = *(int *) node->pbst_data; recurse_verify_tree (node->pbst_link[0], okay, &subcount[0], min, d - 1); recurse_verify_tree (node->pbst_link[1], okay, &subcount[1], d + 1, max); *count = 1 + subcount[0] + subcount[1]; } This code is included in 515. 518. = for (i = 0; i < 2; i++) { if (node->pbst_link[i] != NULL && node->pbst_link[i]->pbst_parent != node) { printf (" Node %d has parent %d (should be %d).\n", *(int *) node->pbst_link[i]->pbst_data, (node->pbst_link[i]->pbst_parent != NULL ? *(int *) node->pbst_link[i]->pbst_parent->pbst_data : -1), d); *okay = 0; } } This code is included in 517, 550, and 585. 14 AVL Trees with Parent Pointers ********************************* This chapter adds parent pointers to AVL trees. The result is a data structure that combines the strengths of AVL trees and trees with parent pointers. Of course, there's no free lunch: it combines their disadvantages, too. The abbreviation we'll use for the term "AVL tree with parent pointers" is "PAVL tree", with corresponding prefix pavl_. Here's the outline for the PAVL table implementation: 519. = #ifndef PAVL_H #define PAVL_H 1 #include
pavl 14> pavl 28> pavl 250> pavl 267>
pavl 15> #endif /* pavl.h */ 520. = #include #include #include #include "pavl.h" 14.1 Data Types =============== A PAVL tree node has a parent pointer and an AVL balance field in addition to the usual members needed for any binary search tree: 521. = /* An PAVL tree node. */ struct pavl_node { struct pavl_node *pavl_link[2]; /* Subtrees. */ struct pavl_node *pavl_parent; /* Parent node. */ void *pavl_data; /* Pointer to data. */ signed char pavl_balance; /* Balance factor. */ }; This code is included in 519. The other data structures are the same as the corresponding ones for TBSTs. 14.2 Rotations ============== Let's consider how rotations work in PBSTs. Here's the usual illustration of a rotation: | | Y X / \ / \ X c a Y ^ ^ a b b c As we move from the left side to the right side, rotating right at Y, the parents of up to three nodes change. In any case, Y's former parent becomes X's new parent and X becomes Y's new parent. In addition, if b is not an empty subtree, then the parent of subtree b's root node becomes Y. Moving from right to left, the situation is reversed. See also: [Cormen 1990], section 14.2. Exercises: 1. Write functions for right and left rotations in BSTs with parent pointers, analogous to those for plain BSTs developed in Exercise 4.3-2. 14.3 Operations =============== As usual, we must reimplement the item insertion and deletion functions. The tree copy function and some of the traversal functions also need to be rewritten. 522. = pavl 252> pavl 31>
pavl 592> pavl 84> pavl 6>
pavl 594> This code is included in 520. 14.4 Insertion ============== The same basic algorithm has been used for insertion in all of our AVL tree variants so far. (In fact, all three functions share the same set of local variables.) For PAVL trees, we will slightly modify our approach. In particular, until now we have cached comparison results on the way down in order to quickly adjust balance factors after the insertion. Parent pointers let us avoid this caching but still efficiently update balance factors. Before we look closer, here is the function's outline: 523. = void ** pavl_probe (struct pavl_table *tree, void *item) { struct pavl_node *y; /* Top node to update balance factor, and parent. */ struct pavl_node *p, *q; /* Iterator, and parent. */ struct pavl_node *n; /* Newly inserted node. */ struct pavl_node *w; /* New root of rebalanced subtree. */ int dir; /* Direction to descend. */ assert (tree != NULL && item != NULL); } This code is included in 522. 14.4.1 Steps 1 and 2: Search and Insert --------------------------------------- We search much as before. Despite use of the parent pointers, we preserve the use of q as the parent of p because the termination condition is a value of NULL for p, and NULL has no parent. (Thus, q is not, strictly speaking, always p's parent, but rather the last node examined before p.) Because of parent pointers, there is no need for variable z, used in earlier implementations of AVL insertion to maintain y's parent. 524. = y = tree->pavl_root; for (q = NULL, p = tree->pavl_root; p != NULL; q = p, p = p->pavl_link[dir]) { int cmp = tree->pavl_compare (item, p->pavl_data, tree->pavl_param); if (cmp == 0) return &p->pavl_data; dir = cmp > 0; if (p->pavl_balance != 0) y = p; } This code is included in 523. The node to create and insert the new node is based on that for PBSTs. There is a special case for a node inserted into an empty tree: 525. = pavl 492> n->pavl_balance = 0; if (tree->pavl_root == n) return &n->pavl_data; This code is included in 523. 14.4.2 Step 3: Update Balance Factors ------------------------------------- Until now, in step 3 of insertion into AVL trees we've always updated balance factors from the top down, starting at y and working our way down to n (see, e.g., ). This approach was somewhat unnatural, but it worked. The original reason we did it this way was that it was either impossible, as for AVL and RTAVL trees, or slow, as for TAVL trees, to efficiently move upward in a tree. That's not a consideration anymore, so we can do it from the bottom up and in the process eliminate the cache used before. At each step, we need to know the node to update and, for that node, on which side of its parent it is a child. In the code below, q is the node and dir is the side. 526. = for (p = n; p != y; p = q) { q = p->pavl_parent; dir = q->pavl_link[0] != p; if (dir == 0) q->pavl_balance--; else q->pavl_balance++; } This code is included in 523. Exercises: 1. Does this step 3 update the same set of balance factors as would a literal adaptation of ? 2. Would it be acceptable to substitute q->pavl_link[1] == p for q->pavl_link[0] != p in the code segment above? 14.4.3 Step 4: Rebalance ------------------------ The changes needed to the rebalancing code for parent pointers resemble the changes for threads in that we can reuse most of the code from plain AVL trees. We just need to add a few new statements to each rebalancing case to adjust the parent pointers of nodes whose parents have changed. The outline of the rebalancing code should be familiar by now. The code to update the link to the root of the rebalanced subtree is the only change. It needs a special case for the root, because the parent pointer of the root node is a null pointer, not the pseudo-root node. The other choice would simplify this piece of code, but complicate other pieces (*note PBST Data Types::). 527. = if (y->pavl_balance == -2) { } else if (y->pavl_balance == +2) { } else return &n->pavl_data; if (w->pavl_parent != NULL) w->pavl_parent->pavl_link[y != w->pavl_parent->pavl_link[0]] = w; else tree->pavl_root = w; return &n->pavl_data; This code is included in 523. As usual, the cases for rebalancing are distinguished based on the balance factor of the child of the unbalanced node on its taller side: 528. = struct pavl_node *x = y->pavl_link[0]; if (x->pavl_balance == -1) { } else { } This code is included in 527. Case 1: x has - balance factor .............................. The added code here is exactly the same as that added to BST rotation to handle parent pointers (in Exercise 14.2-1), and for good reason since this case simply performs a right rotation in the PAVL tree. 529. = pavl 155> x->pavl_parent = y->pavl_parent; y->pavl_parent = x; if (y->pavl_link[0] != NULL) y->pavl_link[0]->pavl_parent = y; This code is included in 528. Case 2: x has + balance factor .............................. When x has a + balance factor, we need a double rotation, composed of a right rotation at x followed by a left rotation at y. The diagram below show the effect of each of the rotations: | | y y | <--> <--> w __.-' \ _' \ <0> x d w d => / \ <+> => / \ x y / \ x c ^ ^ a w ^ a b c d ^ a b b c Along with this double rotation comes a small bulk discount in parent pointer assignments. The parent of w changes in both rotations, but we only need assign to it its final value once, ignoring the intermediate value. 530. = pavl 156> w->pavl_parent = y->pavl_parent; x->pavl_parent = y->pavl_parent = w; if (x->pavl_link[1] != NULL) x->pavl_link[1]->pavl_parent = x; if (y->pavl_link[0] != NULL) y->pavl_link[0]->pavl_parent = y; This code is included in 528 and 544. 14.4.4 Symmetric Case --------------------- 531. = struct pavl_node *x = y->pavl_link[1]; if (x->pavl_balance == +1) { } else { } This code is included in 527. 532. = pavl 158> x->pavl_parent = y->pavl_parent; y->pavl_parent = x; if (y->pavl_link[1] != NULL) y->pavl_link[1]->pavl_parent = y; This code is included in 531. 533. = pavl 159> w->pavl_parent = y->pavl_parent; x->pavl_parent = y->pavl_parent = w; if (x->pavl_link[0] != NULL) x->pavl_link[0]->pavl_parent = x; if (y->pavl_link[1] != NULL) y->pavl_link[1]->pavl_parent = y; This code is included in 531 and 541. 14.5 Deletion ============= Deletion from a PAVL tree is a natural outgrowth of algorithms we have already implemented. The basic algorithm is the one originally used for plain AVL trees. The search step is taken verbatim from PBST deletion. The deletion step combines PBST and TAVL tree code. Finally, the rebalancing strategy is the same as used in TAVL deletion. The function outline is below. As noted above, step 1 is borrowed from PBST deletion. The other steps are implemented in the following sections. 534. = void * pavl_delete (struct pavl_table *tree, const void *item) { struct pavl_node *p; /* Traverses tree to find node to delete. */ struct pavl_node *q; /* Parent of p. */ int dir; /* Side of q on which p is linked. */ assert (tree != NULL && item != NULL); pavl 494> } This code is included in 522. 14.5.1 Step 2: Delete --------------------- The actual deletion step is derived from that for PBSTs. We add code to modify balance factors and set up for rebalancing. After the deletion, q is the node at which balance factors must be updated and possible rebalancing occurs and dir is the side of q from which the node was deleted. This follows the pattern already seen in TAVL deletion (*note Deleting a TAVL Node Step 2 - Delete::). 535. = if (p->pavl_link[1] == NULL) { } else { struct pavl_node *r = p->pavl_link[1]; if (r->pavl_link[0] == NULL) { } else { } } tree->pavl_alloc->libavl_free (tree->pavl_alloc, p); This code is included in 534. Case 1: p has no right child ............................ No changes are needed for case 1. No balance factors need change and q and dir are already set up correctly. 536. = pavl 497> This code is included in 535. Case 2: p's right child has no left child ......................................... See the commentary on for details. 537. = pavl 498> r->pavl_balance = p->pavl_balance; q = r; dir = 1; This code is included in 535. Case 3: p's right child has a left child ........................................ See the commentary on for details. 538. = pavl 499> s->pavl_balance = p->pavl_balance; q = r; dir = 0; This code is included in 535. 14.5.2 Step 3: Update Balance Factors ------------------------------------- Step 3, updating balance factors, is taken straight from TAVL deletion (*note Deleting a TAVL Node Step 3 - Update::), with the call to find_parent() replaced by inline code that uses pavl_parent. 539. = while (q != (struct pavl_node *) &tree->pavl_root) { struct pavl_node *y = q; if (y->pavl_parent != NULL) q = y->pavl_parent; else q = (struct pavl_node *) &tree->pavl_root; if (dir == 0) { dir = q->pavl_link[0] != y; y->pavl_balance++; if (y->pavl_balance == +1) break; else if (y->pavl_balance == +2) { } } else { } } tree->pavl_count--; return (void *) item; This code is included in 534. 14.5.3 Step 4: Rebalance ------------------------ The two cases for PAVL deletion are distinguished based on x's balance factor, as always: 540. = struct pavl_node *x = y->pavl_link[1]; if (x->pavl_balance == -1) { } else { } This code is included in 539. Case 1: x has - balance factor .............................. The same rebalancing is needed here as for a - balance factor in PAVL insertion, and the same code is used. 541. = struct pavl_node *w; q->pavl_link[dir] = w; This code is included in 540. Case 2: x has + or 0 balance factor ................................... If x has a + or 0 balance factor, we rotate left at y and update parent pointers as for any left rotation (*note PBST Rotations::). We also update balance factors. If x started with balance factor 0, then we're done. Otherwise, x becomes the new y for the next loop iteration, and rebalancing continues. *Note avldel2::, for details on this rebalancing case. 542. = y->pavl_link[1] = x->pavl_link[0]; x->pavl_link[0] = y; x->pavl_parent = y->pavl_parent; y->pavl_parent = x; if (y->pavl_link[1] != NULL) y->pavl_link[1]->pavl_parent = y; q->pavl_link[dir] = x; if (x->pavl_balance == 0) { x->pavl_balance = -1; y->pavl_balance = +1; break; } else { x->pavl_balance = y->pavl_balance = 0; y = x; } This code is included in 540. 14.5.4 Symmetric Case --------------------- 543. = dir = q->pavl_link[0] != y; y->pavl_balance--; if (y->pavl_balance == -1) break; else if (y->pavl_balance == -2) { struct pavl_node *x = y->pavl_link[0]; if (x->pavl_balance == +1) { } else { } } This code is included in 539. 544. = struct pavl_node *w; q->pavl_link[dir] = w; This code is included in 543. 545. = y->pavl_link[0] = x->pavl_link[1]; x->pavl_link[1] = y; x->pavl_parent = y->pavl_parent; y->pavl_parent = x; if (y->pavl_link[0] != NULL) y->pavl_link[0]->pavl_parent = y; q->pavl_link[dir] = x; if (x->pavl_balance == 0) { x->pavl_balance = +1; y->pavl_balance = -1; break; } else { x->pavl_balance = y->pavl_balance = 0; y = x; } This code is included in 543. 14.6 Traversal ============== The only difference between PAVL and PBST traversal functions is the insertion initializer. We use the TBST implementation here, which performs a call to pavl_probe(), instead of the PBST implementation, which inserts the node directly without handling node colors. 546. = pavl 269> pavl 503> pavl 504> pavl 505> pavl 273> pavl 274> pavl 507> pavl 508> pavl 74> pavl 75> This code is included in 522 and 554. 14.7 Copying ============ The copy function is the same as , except that it copies pavl_balance between copied nodes. 547. = pavl 510> struct pavl_table * pavl_copy (const struct pavl_table *org, pavl_copy_func *copy, pavl_item_func *destroy, struct libavl_allocator *allocator) { struct pavl_table *new; const struct pavl_node *x; struct pavl_node *y; assert (org != NULL); new = pavl_create (org->pavl_compare, org->pavl_param, allocator != NULL ? allocator : org->pavl_alloc); if (new == NULL) return NULL; new->pavl_count = org->pavl_count; if (new->pavl_count == 0) return new; x = (const struct pavl_node *) &org->pavl_root; y = (struct pavl_node *) &new->pavl_root; for (;;) { while (x->pavl_link[0] != NULL) { y->pavl_link[0] = new->pavl_alloc->libavl_malloc (new->pavl_alloc, sizeof *y->pavl_link[0]); if (y->pavl_link[0] == NULL) { if (y != (struct pavl_node *) &new->pavl_root) { y->pavl_data = NULL; y->pavl_link[1] = NULL; } copy_error_recovery (y, new, destroy); return NULL; } y->pavl_link[0]->pavl_parent = y; x = x->pavl_link[0]; y = y->pavl_link[0]; } y->pavl_link[0] = NULL; for (;;) { y->pavl_balance = x->pavl_balance; if (copy == NULL) y->pavl_data = x->pavl_data; else { y->pavl_data = copy (x->pavl_data, org->pavl_param); if (y->pavl_data == NULL) { y->pavl_link[1] = NULL; copy_error_recovery (y, new, destroy); return NULL; } } if (x->pavl_link[1] != NULL) { y->pavl_link[1] = new->pavl_alloc->libavl_malloc (new->pavl_alloc, sizeof *y->pavl_link[1]); if (y->pavl_link[1] == NULL) { copy_error_recovery (y, new, destroy); return NULL; } y->pavl_link[1]->pavl_parent = y; x = x->pavl_link[1]; y = y->pavl_link[1]; break; } else y->pavl_link[1] = NULL; for (;;) { const struct pavl_node *w = x; x = x->pavl_parent; if (x == NULL) { new->pavl_root->pavl_parent = NULL; return new; } y = y->pavl_parent; if (w == x->pavl_link[0]) break; } } } } This code is included in 522 and 554. 14.8 Testing ============ The testing code harbors no surprises. 548. = #include #include #include #include "pavl.h" #include "test.h" pavl 119> pavl 104> pavl 190> pavl 100> pavl 122> 549. = /* Compares binary trees rooted at a and b, making sure that they are identical. */ static int compare_trees (struct pavl_node *a, struct pavl_node *b) { int okay; if (a == NULL || b == NULL) { assert (a == NULL && b == NULL); return 1; } if (*(int *) a->pavl_data != *(int *) b->pavl_data || ((a->pavl_link[0] != NULL) != (b->pavl_link[0] != NULL)) || ((a->pavl_link[1] != NULL) != (b->pavl_link[1] != NULL)) || ((a->pavl_parent != NULL) != (b->pavl_parent != NULL)) || (a->pavl_parent != NULL && b->pavl_parent != NULL && a->pavl_parent->pavl_data != b->pavl_parent->pavl_data) || a->pavl_balance != b->pavl_balance) { printf (" Copied nodes differ:\n" " a: %d, bal %+d, parent %d, %s left child, %s right child\n" " b: %d, bal %+d, parent %d, %s left child, %s right child\n", *(int *) a->pavl_data, a->pavl_balance, a->pavl_parent != NULL ? *(int *) a->pavl_parent : -1, a->pavl_link[0] != NULL ? "has" : "no", a->pavl_link[1] != NULL ? "has" : "no", *(int *) b->pavl_data, b->pavl_balance, b->pavl_parent != NULL ? *(int *) b->pavl_parent : -1, b->pavl_link[0] != NULL ? "has" : "no", b->pavl_link[1] != NULL ? "has" : "no"); return 0; } okay = 1; if (a->pavl_link[0] != NULL) okay &= compare_trees (a->pavl_link[0], b->pavl_link[0]); if (a->pavl_link[1] != NULL) okay &= compare_trees (a->pavl_link[1], b->pavl_link[1]); return okay; } This code is included in 548. 550. = static void recurse_verify_tree (struct pavl_node *node, int *okay, size_t *count, int min, int max, int *height) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subheight[2]; /* Heights of subtrees. */ int i; if (node == NULL) { *count = 0; *height = 0; return; } d = *(int *) node->pavl_data; recurse_verify_tree (node->pavl_link[0], okay, &subcount[0], min, d - 1, &subheight[0]); recurse_verify_tree (node->pavl_link[1], okay, &subcount[1], d + 1, max, &subheight[1]); *count = 1 + subcount[0] + subcount[1]; *height = 1 + (subheight[0] > subheight[1] ? subheight[0] : subheight[1]); pavl 189> pavl 518> } This code is included in 548. 15 Red-Black Trees with Parent Pointers *************************************** As our twelfth and final example of a table data structure, this chapter will implement a table as a red-black tree with parent pointers, or "PRB" tree for short. We use prb_ as the prefix for identifiers. Here's the outline: 551. = #ifndef PRB_H #define PRB_H 1 #include
prb 14> prb 195> prb 250> prb 267>
prb 15> #endif /* prb.h */ 552. = #include #include #include #include "prb.h" 15.1 Data Types =============== The PRB node structure adds a color and a parent pointer to the basic binary tree data structure. The other PRB data structures are the same as the ones used for TBSTs. 553. = /* Color of a red-black node. */ enum prb_color { PRB_BLACK, /* Black. */ PRB_RED /* Red. */ }; /* A red-black tree with parent pointers node. */ struct prb_node { struct prb_node *prb_link[2]; /* Subtrees. */ struct prb_node *prb_parent; /* Parent. */ void *prb_data; /* Pointer to data. */ unsigned char prb_color; /* Color. */ }; This code is included in 551. See also: [Cormen 1990], section 14.1. 15.2 Operations =============== Most of the PRB operations use the same implementations as did PAVL trees in the last chapter. The PAVL copy function is modified to copy colors instead of balance factors. The item insertion and deletion functions must be newly written, of course. 554. = prb 252> prb 31>
prb 592> prb 546> prb; pavl_balance => prb_color 547> prb 84> prb 6>
prb 594> This code is included in 552. 15.3 Insertion ============== Inserting into a red-black tree is a problem whose form of solution should by now be familiar to the reader. We must now update parent pointers, of course, but the major difference here is that it is fast and easy to find the parent of any given node, eliminating any need for a stack. Here's the function outline. The code for finding the insertion point is taken directly from the PBST code: 555. = void ** prb_probe (struct prb_table *tree, void *item) { struct prb_node *p; /* Traverses tree looking for insertion point. */ struct prb_node *q; /* Parent of p; node at which we are rebalancing. */ struct prb_node *n; /* Newly inserted node. */ int dir; /* Side of q on which n is inserted. */ assert (tree != NULL && item != NULL); prb 491> return &n->prb_data; } This code is included in 554. See also: [Cormen 1990], section 14.3. 15.3.1 Step 2: Insert --------------------- The code to do the insertion is based on that for PBSTs. We need only add initialization of the new node's color. 556. = prb 492> n->prb_color = PRB_RED; This code is included in 555. 15.3.2 Step 3: Rebalance ------------------------ When we rebalanced ordinary RB trees, we used the expressions pa[k - 1] and pa[k - 2] to refer to the parent and grandparent, respectively, of the node at which we were rebalancing, and we called that node q, though that wasn't a variable name (*note Inserting an RB Node Step 3 - Rebalance::). Now that we have parent pointers, we use a real variable q to refer to the node where we're rebalancing. This means that we could refer to its parent and grandparent as q->prb_parent and q->prb_parent->prb_parent, respectively, but there's a small problem with that. During rebalancing, we will need to move nodes around and modify parent pointers. That means that q->prb_parent and q->prb_parent->prb_parent will be changing under us as we work. This makes writing correct code hard, and reading it even harder. It is much easier to use a pair of new variables to hold q's parent and grandparent. That's exactly the role that f and g, respectively, play in the code below. If you compare this code to , you'll also notice the way that checking that f and g are non-null corresponds to checking that the stack height is at least 3 (see Exercise 6.4.3-1 for an explanation of the reason this is a valid test). 557. = q = n; for (;;) { struct prb_node *f; /* Parent of q. */ struct prb_node *g; /* Grandparent of q. */ f = q->prb_parent; if (f == NULL || f->prb_color == PRB_BLACK) break; g = f->prb_parent; if (g == NULL) break; if (g->prb_link[0] == f) { } else { } } tree->prb_root->prb_color = PRB_BLACK; This code is included in 555. After replacing pa[k - 1] by f and pa[k - 2] by g, the cases for PRB rebalancing are distinguished on the same basis as those for RB rebalancing (see ). One addition: cases 2 and 3 need to work with q's great-grandparent, so they stash it into a new variable h. 558. = struct prb_node *y = g->prb_link[1]; if (y != NULL && y->prb_color == PRB_RED) { } else { struct prb_node *h; /* Great-grandparent of q. */ h = g->prb_parent; if (h == NULL) h = (struct prb_node *) &tree->prb_root; if (f->prb_link[1] == q) { } break; } This code is included in 557. Case 1: q's uncle is red ........................ In this case, as before, we need only rearrange colors (*note rbinscase1::). Instead of popping the top two items off the stack, we directly set up q, the next node at which to rebalance, to be the (former) grandparent of the original q. | | g g _.-' `_ _.-' `_ f y f y => _.-' \ / \ _.-' \ / \ q c d e q c d e / \ / \ a b a b 559. = f->prb_color = y->prb_color = PRB_BLACK; g->prb_color = PRB_RED; q = g; This code is included in 558. Case 2: q is the left child of its parent ......................................... If q is the left child of its parent, we rotate right at g: | g | f _.-' \ f d _.-' `_ => q g _.-' \ q c / \ / \ a b c d / \ a b The result satisfies both RB balancing rules. Refer back to the discussion of the same case in ordinary RB trees for more details (*note rbinscase2::). 560. = g->prb_color = PRB_RED; f->prb_color = PRB_BLACK; g->prb_link[0] = f->prb_link[1]; f->prb_link[1] = g; h->prb_link[h->prb_link[0] != g] = f; f->prb_parent = g->prb_parent; g->prb_parent = f; if (g->prb_link[0] != NULL) g->prb_link[0]->prb_parent = g; This code is included in 558. Case 3: q is the right child of its parent .......................................... If q is a right child, then we transform it into case 2 by rotating left at f: | | g g ___...--' \ _.-' \ f d q d => / `_ _.-' \ a q f c / \ / \ b c a b Afterward we relabel q as f and treat the result as case 2. There is no need to properly set q itself because case 2 never uses variable q. For more details, refer back to case 3 in ordinary RB trees (*note rbinscase3::). 561. = f->prb_link[1] = q->prb_link[0]; q->prb_link[0] = f; g->prb_link[0] = q; f->prb_parent = q; if (f->prb_link[1] != NULL) f->prb_link[1]->prb_parent = f; f = q; This code is included in 558. 15.3.3 Symmetric Case --------------------- 562. = struct prb_node *y = g->prb_link[0]; if (y != NULL && y->prb_color == PRB_RED) { } else { struct prb_node *h; /* Great-grandparent of q. */ h = g->prb_parent; if (h == NULL) h = (struct prb_node *) &tree->prb_root; if (f->prb_link[0] == q) { } break; } This code is included in 557. 563. = f->prb_color = y->prb_color = PRB_BLACK; g->prb_color = PRB_RED; q = g; This code is included in 562. 564. = g->prb_color = PRB_RED; f->prb_color = PRB_BLACK; g->prb_link[1] = f->prb_link[0]; f->prb_link[0] = g; h->prb_link[h->prb_link[0] != g] = f; f->prb_parent = g->prb_parent; g->prb_parent = f; if (g->prb_link[1] != NULL) g->prb_link[1]->prb_parent = g; This code is included in 562. 565. = f->prb_link[0] = q->prb_link[1]; q->prb_link[1] = f; g->prb_link[1] = q; f->prb_parent = q; if (f->prb_link[0] != NULL) f->prb_link[0]->prb_parent = f; f = q; This code is included in 562. 15.4 Deletion ============= The RB item deletion algorithm needs the same kind of changes to handle parent pointers that the RB item insertion algorithm did. We can reuse the code from PBST trees for finding the node to delete. The rest of the code will be presented in the following sections. 566. = void * prb_delete (struct prb_table *tree, const void *item) { struct prb_node *p; /* Node to delete. */ struct prb_node *q; /* Parent of p. */ struct prb_node *f; /* Node at which we are rebalancing. */ int dir; /* Side of q on which p is a child; side of f from which node was deleted. */ assert (tree != NULL && item != NULL); prb 494> } This code is included in 554. See also: [Cormen 1990], section 14.4. 15.4.1 Step 2: Delete --------------------- The goal of this step is to remove p from the tree and set up f as the node where rebalancing should start. Secondarily, we set dir as the side of f from which the node was deleted. Together, f and dir fill the role that the top-of-stack entries in pa[] and da[] took in ordinary RB deletion. 567. = if (p->prb_link[1] == NULL) { } else { enum prb_color t; struct prb_node *r = p->prb_link[1]; if (r->prb_link[0] == NULL) { } else { } } This code is included in 566. Case 1: p has no right child ............................ If p has no right child, then rebalancing should start at its parent, q, and dir is already the side that p is on. The rest is the same as PBST deletion (*note pbstdel1::). 568. = prb 497> f = q; This code is included in 567. Case 2: p's right child has no left child ......................................... In case 2, we swap the colors of p and r as for ordinary RB deletion (*note rbcolorswap::). We set up f and dir in the same way that set up the top of stack. The rest is the same as PBST deletion (*note pbstdel2::). 569. = prb 498> t = p->prb_color; p->prb_color = r->prb_color; r->prb_color = t; f = r; dir = 1; This code is included in 567. Case 3: p's right child has a left child ........................................ Case 2 swaps the colors of p and s the same way as in ordinary RB deletion (*note rbcolorswap::), and sets up f and dir in the same way that set up the stack. The rest is borrowed from PBST deletion (*note pbstdel3::). 570. = prb 499> t = p->prb_color; p->prb_color = s->prb_color; s->prb_color = t; f = r; dir = 0; This code is included in 567. 15.4.2 Step 3: Rebalance ------------------------ The rebalancing code is easily related to the analogous code for ordinary RB trees in . As we carefully set up in step 2, we use f as the top of stack node and dir as the side of f from which a node was deleted. These variables f and dir were formerly represented by pa[k - 1] and da[k - 1], respectively. Additionally, variable g is used to represent the parent of f. Formerly the same node was referred to as pa[k - 2]. The code at the end of the loop simply moves f and dir up one level in the tree. It has the same effect as did popping the stack with k-. 571. = if (p->prb_color == PRB_BLACK) { for (;;) { struct prb_node *x; /* Node we want to recolor black if possible. */ struct prb_node *g; /* Parent of f. */ struct prb_node *t; /* Temporary for use in finding parent. */ x = f->prb_link[dir]; if (x != NULL && x->prb_color == PRB_RED) { x->prb_color = PRB_BLACK; break; } if (f == (struct prb_node *) &tree->prb_root) break; g = f->prb_parent; if (g == NULL) g = (struct prb_node *) &tree->prb_root; if (dir == 0) { } else { } t = f; f = f->prb_parent; if (f == NULL) f = (struct prb_node *) &tree->prb_root; dir = f->prb_link[0] != t; } } This code is included in 566. The code to distinguish rebalancing cases in PRB trees is almost identical to . 572. = struct prb_node *w = f->prb_link[1]; if (w->prb_color == PRB_RED) { } if ((w->prb_link[0] == NULL || w->prb_link[0]->prb_color == PRB_BLACK) && (w->prb_link[1] == NULL || w->prb_link[1]->prb_color == PRB_BLACK)) { } else { if (w->prb_link[1] == NULL || w->prb_link[1]->prb_color == PRB_BLACK) { } break; } This code is included in 571. Case Reduction: Ensure w is black ................................. The case reduction code is much like that for plain RB trees (*note rbdcr::), with pa[k - 1] replaced by f and pa[k - 2] replaced by g. Instead of updating the stack, we change g. Node f need not change because it's already what we want it to be. We also need to update parent pointers for the rotation. | | A,f C,g / `--..__ ___...--' `_ x C,w A,f D => _.-' `_ / `_ / \ B D x B,w c d / \ / \ / \ a b c d a b 573. = w->prb_color = PRB_BLACK; f->prb_color = PRB_RED; f->prb_link[1] = w->prb_link[0]; w->prb_link[0] = f; g->prb_link[g->prb_link[0] != f] = w; w->prb_parent = f->prb_parent; f->prb_parent = w; g = w; w = f->prb_link[1]; w->prb_parent = f; This code is included in 572. Case 1: w has no red children ............................. Case 1 is trivial. No changes from ordinary RB trees are necessary (*note rbdelcase1::). 574. = prb 229> This code is included in 572. Case 2: w's right child is red .............................. The changes from ordinary RB trees (*note rbdelcase2::) for case 2 follow the same pattern. 575. = w->prb_color = f->prb_color; f->prb_color = PRB_BLACK; w->prb_link[1]->prb_color = PRB_BLACK; f->prb_link[1] = w->prb_link[0]; w->prb_link[0] = f; g->prb_link[g->prb_link[0] != f] = w; w->prb_parent = f->prb_parent; f->prb_parent = w; if (f->prb_link[1] != NULL) f->prb_link[1]->prb_parent = f; This code is included in 572. Case 3: w's left child is red ............................. The code for case 3 in ordinary RB trees (*note rbdelcase3::) needs slightly more intricate changes than case 1 or case 2, so the diagram below may help to clarify: | | B,f B,f _.-' `--..__ _.-' `_ A,x D,w A,x C,w => / \ _.-' \ / \ / `_ a b C e a b c D / \ / \ c d d e 576. = struct prb_node *y = w->prb_link[0]; y->prb_color = PRB_BLACK; w->prb_color = PRB_RED; w->prb_link[0] = y->prb_link[1]; y->prb_link[1] = w; if (w->prb_link[0] != NULL) w->prb_link[0]->prb_parent = w; w = f->prb_link[1] = y; w->prb_link[1]->prb_parent = w; This code is included in 572. 15.4.3 Step 4: Finish Up ------------------------ 577. = tree->prb_alloc->libavl_free (tree->prb_alloc, p); tree->prb_count--; return (void *) item; This code is included in 566. 15.4.4 Symmetric Case --------------------- 578. = struct prb_node *w = f->prb_link[0]; if (w->prb_color == PRB_RED) { } if ((w->prb_link[0] == NULL || w->prb_link[0]->prb_color == PRB_BLACK) && (w->prb_link[1] == NULL || w->prb_link[1]->prb_color == PRB_BLACK)) { } else { if (w->prb_link[0] == NULL || w->prb_link[0]->prb_color == PRB_BLACK) { } break; } This code is included in 571. 579. = w->prb_color = PRB_BLACK; f->prb_color = PRB_RED; f->prb_link[0] = w->prb_link[1]; w->prb_link[1] = f; g->prb_link[g->prb_link[0] != f] = w; w->prb_parent = f->prb_parent; f->prb_parent = w; g = w; w = f->prb_link[0]; w->prb_parent = f; This code is included in 578. 580. = w->prb_color = PRB_RED; This code is included in 578. 581. = w->prb_color = f->prb_color; f->prb_color = PRB_BLACK; w->prb_link[0]->prb_color = PRB_BLACK; f->prb_link[0] = w->prb_link[1]; w->prb_link[1] = f; g->prb_link[g->prb_link[0] != f] = w; w->prb_parent = f->prb_parent; f->prb_parent = w; if (f->prb_link[0] != NULL) f->prb_link[0]->prb_parent = f; This code is included in 578. 582. = struct prb_node *y = w->prb_link[1]; y->prb_color = PRB_BLACK; w->prb_color = PRB_RED; w->prb_link[1] = y->prb_link[0]; y->prb_link[0] = w; if (w->prb_link[1] != NULL) w->prb_link[1]->prb_parent = w; w = f->prb_link[0] = y; w->prb_link[0]->prb_parent = w; This code is included in 578. 15.5 Testing ============ No comment is necessary. 583. = #include #include #include #include "prb.h" #include "test.h" prb 119> prb 104> prb 244> prb 100> prb 122> 584. = static int compare_trees (struct prb_node *a, struct prb_node *b) { int okay; if (a == NULL || b == NULL) { assert (a == NULL && b == NULL); return 1; } if (*(int *) a->prb_data != *(int *) b->prb_data || ((a->prb_link[0] != NULL) != (b->prb_link[0] != NULL)) || ((a->prb_link[1] != NULL) != (b->prb_link[1] != NULL)) || a->prb_color != b->prb_color) { printf (" Copied nodes differ: a=%d%c b=%d%c a:", *(int *) a->prb_data, a->prb_color == PRB_RED ? 'r' : 'b', *(int *) b->prb_data, b->prb_color == PRB_RED ? 'r' : 'b'); if (a->prb_link[0] != NULL) printf ("l"); if (a->prb_link[1] != NULL) printf ("r"); printf (" b:"); if (b->prb_link[0] != NULL) printf ("l"); if (b->prb_link[1] != NULL) printf ("r"); printf ("\n"); return 0; } okay = 1; if (a->prb_link[0] != NULL) okay &= compare_trees (a->prb_link[0], b->prb_link[0]); if (a->prb_link[1] != NULL) okay &= compare_trees (a->prb_link[1], b->prb_link[1]); return okay; } This code is included in 583. 585. = /* Examines the binary tree rooted at node. Zeroes *okay if an error occurs. Otherwise, does not modify *okay. Sets *count to the number of nodes in that tree, including node itself if node != NULL. Sets *bh to the tree's black-height. All the nodes in the tree are verified to be at least min but no greater than max. */ static void recurse_verify_tree (struct prb_node *node, int *okay, size_t *count, int min, int max, int *bh) { int d; /* Value of this node's data. */ size_t subcount[2]; /* Number of nodes in subtrees. */ int subbh[2]; /* Black-heights of subtrees. */ int i; if (node == NULL) { *count = 0; *bh = 0; return; } d = *(int *) node->prb_data; recurse_verify_tree (node->prb_link[0], okay, &subcount[0], min, d - 1, &subbh[0]); recurse_verify_tree (node->prb_link[1], okay, &subcount[1], d + 1, max, &subbh[1]); *count = 1 + subcount[0] + subcount[1]; *bh = (node->prb_color == PRB_BLACK) + subbh[0]; prb 241> prb 242> prb 243> prb 518> } This code is included in 583. Appendix A References ********************* [Aho 1986]. Aho, A. V., R. Sethi, and J. D. Ullman, `Compilers: Principles, Techniques, and Tools'. Addison-Wesley, 1986. ISBN 0-201-10088-6. [Bentley 2000]. Bentley, J., `Programming Pearls', 2nd ed. Addison-Wesley, 2000. ISBN 0-201-65788-0. [Brown 2001]. Brown, S., "Identifiers NOT To Use in C Programs". Oak Road Systems, Feb. 15, 2001. `http://www.oakroadsystems.com/tech/c-predef.htm'. [Cormen 1990]. Cormen, C. H., C. E. Leiserson, and R. L. Rivest, `Introduction to Algorithms'. McGraw-Hill, 1990. ISBN 0-262-03141-8. [FSF 1999]. Free Software Foundation, `GNU C Library Reference Manual', version 0.08, 1999. [FSF 2001]. Free Software Foundation, "GNU Coding Standards", ed. of March 23, 2001. [ISO 1990]. International Organization for Standardization, `ANSI/ISO 9899-1990: American National Standard for Programming Languages--C', 1990. Reprinted in `The Annotated ANSI C Standard', ISBN 0-07-881952-0. [ISO 1998]. International Organization for Standardization, `ISO/IEC 14882:1998(E): Programming languages--C++', 1998. [ISO 1999]. International Orgnaization for Standardization, `ISO/IEC 9899:1999: Programming Languages--C', 2nd ed., 1999. [Kernighan 1976]. Kernighan, B. W., and P. J. Plauger, `Software Tools'. Addison-Wesley, 1976. ISBN 0-201-03669-X. [Kernighan 1988]. Kernighan, B. W., and D. M. Ritchie, `The C Programming Language', 2nd ed. Prentice-Hall, 1988. ISBN 0-13-110362-8. [Knuth 1997]. Knuth, D. E., `The Art of Computer Programming, Volume 1: Fundamental Algorithms', 3rd ed. Addison-Wesley, 1997. ISBN 0-201-89683-4. [Knuth 1998a]. Knuth, D. E., `The Art of Computer Programming, Volume 2: Seminumerical Algorithms', 3rd ed. Addison-Wesley, 1998. ISBN 0-201-89684-2. [Knuth 1998b]. Knuth, D. E., `The Art of Computer Programming, Volume 3: Sorting and Searching', 2nd ed. Addison-Wesley, 1998. ISBN 0-201-89685-0. [Knuth 1977]. Knuth, D. E., "Deletions that Preserve Randomness", `IEEE Trans. on Software Eng.' SE-3 (1977), pp. 351-9. Reprinted in [Knuth 2000]. [Knuth 1978]. Knuth, D. E., "A Trivial Algorithm Whose Analysis Isn't", `Journal of Algorithms' 6 (1985), pp. 301-22. Reprinted in [Knuth 2000]. [Knuth 1992]. Knuth, D. E., `Literate Programming', CSLI Lecture Notes Number 27. Center for the Study of Language and Information, Leland Stanford Junior University, 1992. ISBN 0-9370-7380-6. [Knuth 2000]. Knuth, D. E., `Selected Papers on Analysis of Algorithms', CSLI Lecture Notes Number 102. Center for the Study of Language and Information, Leland Stanford Junior University, 2000. ISBN 1-57586-212-3. [Pfaff 1998]. Pfaff, B. L., "An Iterative Algorithm for Deletion from AVL-Balanced Binary Trees". Presented July 1998, annual meeting of Pi Mu Epsilon, Toronto, Canada. `http://benpfaff.org/avl/'. [Sedgewick 1998]. Sedgewick, R., `Algorithms in C, Parts 1-4', 3rd ed. Addison-Wesley, 1998. ISBN 0-201-31452-5. [SGI 1993]. Silicon Graphics, Inc., `Standard Template Library Programmer's Guide'. `http://www.sgi.com/tech/stl/'. [Stout 1986]. Stout, F. S. and B. L. Warren, "Tree Rebalancing in Optimal Time and Space", `Communications of the ACM' 29 (1986), pp. 902-908. [Summit 1999]. Summit, S., "comp.lang.c Answers to Frequently Asked Questions", version 3.5. `http://www.eskimo.com/~scs/C-faq/top.html'. ISBN 0-201-84519-9. Appendix B Supplementary Code ***************************** This appendix contains code too long for the exposition or too far from the main topic of the book. B.1 Option Parser ================= The BST test program contains an option parser for handling command-line options. *Note User Interaction::, for an introduction to its public interface. This section describes the option parser's implementation. The option parsing state is kept in struct option_state: 586.
= #define tbl_count(table) ((size_t) (table)->tbl_count) This code is included in 15. Another way to get the same effect is to use the unary + operator, like this: #define tbl_count(table) (+(table)->tbl_count) See also: [ISO 1990], section 6.3.4; [Kernighan 1988], section A7.5. Section 2.8 ----------- 1. If a memory allocation function that never returns a null pointer is used, then it is reasonable to use these functions. For instance, tbl_allocator_abort from Exercise 2.5-2 is such an allocator. 2. Among other reasons, tbl_find() returns a null pointer to indicate that no matching item was found in the table. Null pointers in the table could therefore lead to confusing results. It is better to entirely prevent them from being inserted. 3. 592.
= void * tbl_insert (struct tbl_table *table, void *item) { void **p = tbl_probe (table, item); return p == NULL || *p == item ? NULL : *p; } void * tbl_replace (struct tbl_table *table, void *item) { void **p = tbl_probe (table, item); if (p == NULL || *p == item) return NULL; else { void *r = *p; *p = item; return r; } } This code is included in 29, 145, 196, 251, 300, 336, 375, 418, 455, 489, 522, and 554. Section 2.9 ----------- 1. Keep in mind that these directives have to be processed every time the header file is included. (Typical header file are designed to be "idempotent", i.e., processed by the compiler only on first inclusion and skipped on any later inclusions, because some C constructs cause errors if they are encountered twice during a compilation.) 593.
= /* Table assertion functions. */ #ifndef NDEBUG #undef tbl_assert_insert #undef tbl_assert_delete #else #define tbl_assert_insert(table, item) tbl_insert (table, item) #define tbl_assert_delete(table, item) tbl_delete (table, item) #endif This code is included in 24. See also: [Summit 1999], section 10.7. 2. tbl_assert_insert() must be based on tbl_probe(), because tbl_insert() does not distinguish in its return value between successful insertion and memory allocation errors. Assertions must be enabled for these functions because we want them to verify success if assertions were enabled at the point from which they were called, not if assertions were enabled when the table was compiled. Notice the parentheses around the assertion function names before. The parentheses prevent the macros by the same name from being expanded. A function-like macro is only expanded when its name is followed by a left parenthesis, and the extra set of parentheses prevents this from being the case. Alternatively #undef directives could be used to achieve the same effect. 594.
= #undef NDEBUG #include void (tbl_assert_insert) (struct tbl_table *table, void *item) { void **p = tbl_probe (table, item); assert (p != NULL && *p == item); } void * (tbl_assert_delete) (struct tbl_table *table, void *item) { void *p = tbl_delete (table, item); assert (p != NULL); return p; } This code is included in 29, 145, 196, 251, 300, 336, 375, 418, 455, 489, 522, and 554. 3. The assert() macro is meant for testing for design errors and "impossible" conditions, not runtime errors like disk input/output errors or memory allocation failures. If the memory allocator can fail, then the assert() call in tbl_assert_insert() effectively does this. See also: [Summit 1999], section 20.24b. Section 2.12 ------------ 1. Both tables and sets store sorted arrangements of unique items. Both require a strict weak ordering on the items that they contain. libavl uses ternary comparison functions whereas the STL uses binary comparison functions (see Exercise 2.3-6). The description of tables here doesn't list any particular speed requirements for operations, whereas STL sets are constrained in the complexity of their operations. It's worth noting, however, that the libavl implementation of AVL and RB trees meet all of the STL complexity requirements, for their equivalent operations, except one. The exception is that set methods begin() and rbegin() must have constant-time complexity, whereas the equivalent libavl functions *_t_first() and *_t_last() on AVL and RB trees have logarithmic complexity. libavl traversers and STL iterators have similar semantics. Both remain valid if new items are inserted, and both remain valid if old items are deleted, unless it's the iterator's current item that's deleted. The STL has a more complete selection of methods than libavl does of table functions, but many of the additional ones (e.g., distance() or erase() each with two iterators as arguments) can be implemented easily in terms of existing libavl functions. These might benefit from optimization possible with specialized implementations, but may not be worth it. The SGI/HP implementation of the STL does not contain any such optimization. See also: [ISO 1998], sections 23.1, 23.1.2, and 23.3.3. 2. The nonessential functions are: * tbl_probe(), tbl_insert(), and tbl_replace(), which can be implemented in terms of tbl_t_insert() and tbl_t_replace(). * tbl_find(), which can be implemented in terms of tbl_t_find(). * tbl_assert_insert() and tbl_assert_delete(). * tbl_t_first() and tbl_t_last(), which can be implemented with tbl_t_init() and tbl_t_next(). If we allow it to know what allocator was used for the original table, which is, strictly speaking, cheating, then we can also implement tbl_copy() in terms of tbl_create(), tbl_t_insert(), and tbl_destroy(). Under similar restrictions we can also implement tbl_t_prev() and tbl_t_copy() in terms of tbl_t_init() and tbl_t_next(), though in a very inefficient way. Chapter 3 ========= Section 3.1 ----------- 1. The following program can be improved in many ways. However, we will implement a much better testing framework later, so this is fine for now. 595. = #include #define MAX_INPUT 1024 int main (void) { int array[MAX_INPUT]; int n, i; for (n = 0; n < MAX_INPUT; n++) if (scanf ("%d", &array[n]) != 1) break; for (i = 0; i < n; i++) { int result = seq_search (array, n, array[i]); if (result != i) printf ("seq_search() returned %d looking for %d - expected %d\n", result, array[i], i); } return 0; } Section 3.4 ----------- 1. Some types don't have a largest possible value; e.g., arbitrary-length strings. Section 3.5 ----------- 1. Knuth's name for this procedure is "uniform binary search." The code below is an almost-literal implementation of his Algorithm U. The fact that Knuth's arrays are 1-based, but C arrays are 0-based, accounts for most of the differences. The code below uses for (;;) to assemble an "infinite" loop, a common C idiom. 596. = /* Returns the offset within array[] of an element equal to key, or -1 if key is not in array[]. array[] must be an array of n ints sorted in ascending order, with array[-1] modifiable. */ int uniform_binary_search (int array[], int n, int key) { int i = (n + 1) / 2 - 1; int m = n / 2; array[-1] = INT_MIN; for (;;) { if (key < array[i]) { if (m == 0) return -1; i -= (m + 1) / 2; m /= 2; } else if (key > array[i]) { if (m == 0) return -1; i += (m + 1) / 2; m /= 2; } else return i >= 0 ? i : -1; } } This code is included in 600. See also: [Knuth 1998b], section 6.2.1, Algorithm U. 2a. This actually uses blp_bsearch(), implemented in part (b) below, in order to allow that function to be tested. You can replace the reference to blp_bsearch() by bsearch() without problem. 597. = /* Compares the ints pointed to by pa and pb and returns positive if *pa > *pb, negative if *pa < *pb, or zero if *pa == *pb. */ static int compare_ints (const void *pa, const void *pb) { const int *a = pa; const int *b = pb; if (*a > *b) return 1; else if (*a < *b) return -1; else return 0; } /* Returns the offset within array[] of an element equal to key, or -1 if key is not in array[]. array[] must be an array of n ints sorted in ascending order. */ static int binary_search_bsearch (int array[], int n, int key) { int *p = blp_bsearch (&key, array, n, sizeof *array, compare_ints); return p != NULL ? p - array : -1; } This code is included in 600. 2b. This function is named using the author of this book's initials. Note that the implementation below assumes that count, a size_t, won't exceed the range of an int. Some systems provide a type called ssize_t for this purpose, but we won't assume that here. (long is perhaps a better choice than int.) 598. = /* Plug-compatible with standard C library bsearch(). */ static void * blp_bsearch (const void *key, const void *array, size_t count, size_t size, int (*compare) (const void *, const void *)) { int min = 0; int max = count; while (max >= min) { int i = (min + max) / 2; void *item = ((char *) array) + size * i; int cmp = compare (key, item); if (cmp < 0) max = i - 1; else if (cmp > 0) min = i + 1; else return item; } return NULL; } This code is included in 597. 3. Here's an outline of the entire program: 599. = #include #include #include #include We need to include all the search functions we're going to use: 600. = This code is included in 599. We need to make a list of the search functions. We start by defining the array's element type: 601. = /* Description of a search function. */ struct search_func { const char *name; int (*search) (int array[], int n, int key); }; See also 602. This code is included in 599. Then we define the list as an array: 602. += /* Array of all the search functions we know. */ struct search_func search_func_tab[] = { {"seq_search()", seq_search}, {"seq_sentinel_search()", seq_sentinel_search}, {"seq_sorted_search()", seq_sorted_search}, {"seq_sorted_sentinel_search()", seq_sorted_sentinel_search}, {"seq_sorted_sentinel_search_2()", seq_sorted_sentinel_search_2}, {"binary_search()", binary_search}, {"uniform_binary_search()", uniform_binary_search}, {"binary_search_bsearch()", binary_search_bsearch}, {"cheat_search()", cheat_search}, }; /* Number of search functions. */ const size_t n_search_func = sizeof search_func_tab / sizeof *search_func_tab; We've added previously unseen function cheat_search() to the array. This is a function that "cheats" on the search because it knows that we are only going to search in a array such that array[i] == i. The purpose of cheat_search() is to allow us to find out how much of the search time is overhead imposed by the framework and the function calls and how much is actual search time. Here's cheat_search(): 603. = /* Cheating search function that knows that array[i] == i. n must be the array size and key the item to search for. array[] is not used. Returns the index in array[] where key is found, or -1 if key is not in array[]. */ int cheat_search (int array[], int n, int key) { return key >= 0 && key < n ? key : -1; } This code is included in 600. We're going to need some functions for timing operations. First, a function to "start" a timer: 604. = /* ``Starts'' a timer by recording the current time in *t. */ static void start_timer (clock_t *t) { clock_t now = clock (); while (now == clock ()) /* Do nothing. */; *t = clock (); } See also 605. This code is included in 599. Function start_timer() waits for the value returned by clock() to change before it records the value. On systems with a slow timer (such as PCs running MS-DOS, where the clock ticks only 18.2 times per second), this gives more stable timing results because it means that timing always starts near the beginning of a clock tick. We also need a function to "stop" the timer and report the results: 605. += /* Prints the elapsed time since start, set by start_timer(). */ static void stop_timer (clock_t start) { clock_t end = clock (); printf ("%.2f seconds\n", ((double) (end - start)) / CLOCKS_PER_SEC); } The value reported by clock() can "wrap around" to zero from a large value. stop_timer() does not allow for this possibility. We will write three tests for the search functions. The first of these just checks that the search function works properly: 606. = /* Tests that f->search returns expect when called to search for key within array[], which has n elements such that array[i] == i. */ static void test_search_func_at (struct search_func *f, int array[], int n, int key, int expect) { int result = f->search (array, n, key); if (result != expect) printf ("%s returned %d looking for %d - expected %d\n", f->name, result, key, expect); } /* Tests searches for each element in array[] having n elements such that array[i] == i, and some unsuccessful searches too, all using function f->search. */ static void test_search_func (struct search_func *f, int array[], int n) { static const int shouldnt_find[] = {INT_MIN, -20, -1, INT_MAX}; int i; printf ("Testing integrity of %s... ", f->name); fflush (stdout); /* Verify that the function finds values that it should. */ for (i = 0; i < n; i++) test_search_func_at (f, array, n, i, i); /* Verify that the function doesn't find values it shouldn't. */ for (i = 0; i < (int) (sizeof shouldnt_find / sizeof *shouldnt_find); i++) test_search_func_at (f, array, n, shouldnt_find[i], -1); printf ("done\n"); } See also 607 and 608. This code is included in 599. The second test function finds the time required for searching for elements in the array: 607. += /* Times a search for each element in array[] having n elements such that array[i] == i, repeated n_iter times, using function f->search. */ static void time_successful_search (struct search_func *f, int array[], int n, int n_iter) { clock_t timer; printf ("Timing %d sets of successful searches... ", n_iter); fflush (stdout); start_timer (&timer); while (n_iter-- > 0) { int i; for (i = 0; i < n; i++) f->search (array, n, i); } stop_timer (timer); } The last test function finds the time required for searching for values that don't appear in the array: 608. += /* Times n search for elements not in array[] having n elements such that array[i] == i, repeated n_iter times, using function f->search. */ static void time_unsuccessful_search (struct search_func *f, int array[], int n, int n_iter) { clock_t timer; printf ("Timing %d sets of unsuccessful searches... ", n_iter); fflush (stdout); start_timer (&timer); while (n_iter-- > 0) { int i; for (i = 0; i < n; i++) f->search (array, n, -i); } stop_timer (timer); } Here's the main program: 609. = int main (int argc, char *argv[]) { struct search_func *f; /* Search function. */ int *array, n; /* Array and its size. */ int n_iter; /* Number of iterations. */ return 0; } This code is included in 599. 610. = if (argc != 4) usage (); { long algorithm = stoi (argv[1]) - 1; if (algorithm < 0 || algorithm > (long) n_search_func) usage (); f = &search_func_tab[algorithm]; } n = stoi (argv[2]); n_iter = stoi (argv[3]); if (n < 1 || n_iter < 1) usage (); This code is included in 609. 611. = /* s should point to a decimal representation of an integer. Returns the value of s, if successful, or 0 on failure. */ static int stoi (const char *s) { long x = strtol (s, NULL, 10); return x >= INT_MIN && x <= INT_MAX ? x : 0; } This code is included in 609 and 617. When reading the code below, keep in mind that some of our algorithms use a sentinel at the end and some use a sentinel at the beginning, so we allocate two extra integers and take the middle part. 612. = array = malloc ((n + 2) * sizeof *array); if (array == NULL) { fprintf (stderr, "out of memory\n"); exit (EXIT_FAILURE); } array++; { int i; for (i = 0; i < n; i++) array[i] = i; } This code is included in 609. 613. = test_search_func (f, array, n); time_successful_search (f, array, n, n_iter); time_unsuccessful_search (f, array, n, n_iter); This code is included in 609. 614. = free (array - 1); This code is included in 609. 615. = /* Prints a message to the console explaining how to use this program. */ static void usage (void) { size_t i; fputs ("usage: srch-test \n" "where is one of the following:\n", stdout); for (i = 0; i < n_search_func; i++) printf (" %u for %s\n", (unsigned) i + 1, search_func_tab[i].name); fputs (" is the size of the array to search, and\n" " is the number of times to iterate.\n", stdout); exit (EXIT_FAILURE); } This code is included in 609. 4. Here are the results on the author's computer, a Pentium II at 233 MHz, using GNU C 2.95.2, for 1024 iterations using arrays of size 1024 with no optimization. All values are given in seconds rounded to tenths. *Function* *Successful searches* *Unsuccessful searches* seq_search() 18.4 36.3 seq_sentinel_search() 16.5 32.8 seq_sorted_search() 18.6 0.1 seq_sorted_sentinel_search()16.4 0.2 seq_sorted_sentinel_search_2()16.6 0.2 binary_search() 1.3 1.2 uniform_binary_search() 1.1 1.1 binary_search_bsearch() 2.6 2.4 cheat_search() 0.1 0.1 Results of similar tests using full optimization were as follows: *Function* *Successful searches* *Unsuccessful searches* seq_search() 6.3 12.4 seq_sentinel_search() 4.8 9.4 seq_sorted_search() 9.3 0.1 seq_sorted_sentinel_search()4.8 0.2 seq_sorted_sentinel_search_2()4.8 0.2 binary_search() 0.7 0.5 uniform_binary_search() 0.7 0.6 binary_search_bsearch() 1.5 1.2 cheat_search() 0.1 0.1 Observations: * In general, the times above are about what we might expect them to be: they decrease as we go down the table. * Within sequential searches, the sentinel-based searches have better search times than non-sentinel searches, and other search characteristics (whether the array was sorted, for instance) had little impact on performance. * Unsuccessful searches were very fast for sorted sequential searches, but the particular test set used always allowed such searches to terminate after a single comparison. For other test sets one might expect these numbers to be similar to those for unordered sequential search. * Either of the first two forms of binary search had the best overall performance. They also have the best performance for successful searches and might be expected to have the best performance for unsuccessful searches in other test sets, for the reason given before. * Binary search using the general interface bsearch() was significantly slower than either of the other binary searches, probably because of the cost of the extra function calls. Items that are more expensive to compare (for instance, long text strings) might be expected to show less of a penalty. Here are the results on the same machine for 1,048,576 iterations on arrays of size 8 with full optimization: *Function* *Successful searches* *Unsuccessful searches* seq_search() 1.7 2.0 seq_sentinel_search() 1.7 2.0 seq_sorted_search() 2.0 1.1 seq_sorted_sentinel_search()1.9 1.1 seq_sorted_sentinel_search_2()1.8 1.2 binary_search() 2.5 1.9 uniform_binary_search() 2.4 2.3 binary_search_bsearch() 4.5 3.9 cheat_search() 0.7 0.7 For arrays this small, simple algorithms are the clear winners. The additional complications of binary search make it slower. Similar patterns can be expected on most architectures, although the "break even" array size where binary search and sequential search are equally fast can be expected to differ. Section 3.6 ----------- 1. Here is one easy way to do it: 616. = /* Initializes larger and smaller within range min...max of array[], which has n real elements plus a (n + 1)th sentinel element. */ int init_binary_tree_array (struct binary_tree_entry array[], int n, int min, int max) { if (min <= max) { /* The `+ 1' is necessary because the tree root must be at n / 2, and on the first call we have min == 0 and max == n - 1. */ int i = (min + max + 1) / 2; array[i].larger = init_binary_tree_array (array, n, i + 1, max); array[i].smaller = init_binary_tree_array (array, n, min, i - 1); return i; } else return n; } This code is included in 617. 2. 617. = #include #include #include
618.
= int main (int argc, char *argv[]) { struct binary_tree_entry *array; int n, i; /* Parse command line. */ if (argc != 2) usage (); n = stoi (argv[1]); if (n < 1) usage (); /* Allocate memory. */ array = malloc ((n + 1) * sizeof *array); if (array == NULL) { fprintf (stderr, "out of memory\n"); return EXIT_FAILURE; } /* Initialize array. */ for (i = 0; i < n; i++) array[i].value = i; init_binary_tree_array (array, n, 0, n - 1); /* Test successful and unsuccessful searches. */ for (i = -1; i < n; i++) { int result = binary_search_tree_array (array, n, i); if (result != i) printf ("Searching for %d: expected %d, but received %d\n", i, i, result); } /* Clean up. */ free (array); return EXIT_SUCCESS; } This code is included in 617. 619. = /* Print a helpful usage message and abort execution. */ static void usage (void) { fputs ("Usage: bin-ary-test \n" "where is the size of the array to test.\n", stdout); exit (EXIT_FAILURE); } This code is included in 617. Chapter 4 ========= 1. This construct makes "idempotent", that is, including it many times has the same effect as including it once. This is important because some C constructs, such as type definitions with typedef, are erroneous if included in a program multiple times. Of course,
is included outside the #ifndef-protected part of . This is intentional (see Exercise 2.9-1 for details). Section 4.2.2 ------------- 1. Under many circumstances we often want to know how many items are in a binary tree. In these cases it's cheaper to keep track of item counts as we go instead of counting them each time, which requires a full binary tree traversal. It would be better to omit it if we never needed to know how many items were in the tree, or if we only needed to know very seldom. Section 4.2.3 ------------- 1. The purpose for conditional definition of BST_MAX_HEIGHT is not to keep it from being redefined if the header file is included multiple times. There's a higher-level "include guard" for that (see Exercise 4-1), and, besides, identical definitions of a macro are okay in C. Instead, it is to allow the user to set the maximum height of binary trees by defining that macro before is #included. The limit can be adjusted upward for larger computers or downward for smaller ones. The main pitfall is that a user program will use different values of BST_MAX_HEIGHT in different source files. This leads to undefined behavior. Less of a problem are definitions to invalid values, which will be caught at compile time by the compiler. Section 4.3 ----------- 1. 2 2 / `_ / \ 2 1 4 1 4 ^ / \ ^ 1 4 3 6 3 5 ^ 5 7 2. The functions need to adjust the pointer from the rotated subtree's parent, so they take a double-pointer struct bst_node **. An alternative would be to accept two parameters: the rotated subtree's parent node and the bst_link[] index of the subtree. /* Rotates right at *yp. */ static void rotate_right (struct bst_node **yp) { struct bst_node *y = *yp; struct bst_node *x = y->bst_link[0]; y->bst_link[0] = x->bst_link[1]; x->bst_link[1] = y; *yp = x; } /* Rotates left at *xp. */ static void rotate_left (struct bst_node **xp) { struct bst_node *x = *xp; struct bst_node *y = x->bst_link[1]; x->bst_link[1] = y->bst_link[0]; y->bst_link[0] = x; *xp = y; } Section 4.7 ----------- 1. This is a dirty trick. The bst_root member of struct bst_table is not a struct bst_node, but we are pretending that it is by casting its address to struct bst_node *. We can get away with this only because the first member of struct bst_node * is bst_link, whose first element bst_link[0] is a struct bst_node *, the same type as bst_root. ANSI C guarantees that a pointer to a structure is a pointer to the structure's first member, so this is fine as long as we never try to access any member of *p except bst_link[0]. Trying to access other members would result in undefined behavior. The reason that we want to do this at all is that it means that the tree's root is not a special case. Otherwise, we have to deal with the root separately from the rest of the nodes in the tree, because of its special status as the only node in the tree not pointed to by the bst_link[] member of a struct bst_node. It is a good idea to get used to these kinds of pointer cast, because they are common in libavl. As an alternative, we can declare an actual instance of struct bst_node, store the tree's bst_root into its bst_link[0], and copy its possibly updated value back into bst_root when done. This isn't very elegant, but it works. This technique is used much later in this book, in . A different kind of alternative approach is used in Exercise 2. 2. Here, pointer-to-pointer q traverses the tree, starting with a pointer to the root, comparing each node found against item while looking for a null pointer. If an item equal to item is found, it returns a pointer to the item's data. Otherwise, q receives the address of the NULL pointer that becomes the new node, the new node is created, and a pointer to its data is returned. 620. = void ** bst_probe (struct bst_table *tree, void *item) { struct bst_node **q; int cmp; assert (tree != NULL && item != NULL); for (q = &tree->bst_root; *q != NULL; q = &(*q)->bst_link[cmp > 0]) { cmp = tree->bst_compare (item, (*q)->bst_data, tree->bst_param); if (cmp == 0) return &(*q)->bst_data; } *q = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof **q); if (*q == NULL) return NULL; (*q)->bst_link[0] = (*q)->bst_link[1] = NULL; (*q)->bst_data = item; tree->bst_count++; return &(*q)->bst_data; } 3. The first item to be inserted have the value of the original tree's root. After that, at each step, we can insert either an item with the value of either child x of any node in the original tree corresponding to a node y already in the copy tree, as long as x's value is not already in the copy tree. 4. The function below traverses tree in "level order". That is, it visits the root, then the root's children, then the children of the root's children, and so on, so that all the nodes at a particular level in the tree are visited in sequence. See also: [Sedgewick 1998], Program 5.16. 621. = /* Calls visit for each of the nodes in tree in level order. Returns nonzero if successful, zero if out of memory. */ static int bst_traverse_level_order (struct bst_table *tree, bst_item_func *visit) { struct bst_node **queue; size_t head, tail; if (tree->bst_count == 0) return 1; queue = tree->bst_alloc->libavl_malloc (tree->bst_alloc, sizeof *queue * tree->bst_count); if (queue == NULL) return 0; head = tail = 0; queue[head++] = tree->bst_root; while (head != tail) { struct bst_node *cur = queue[tail++]; visit (cur->bst_data, tree->bst_param); if (cur->bst_link[0] != NULL) queue[head++] = cur->bst_link[0]; if (cur->bst_link[1] != NULL) queue[head++] = cur->bst_link[1]; } tree->bst_alloc->libavl_free (tree->bst_alloc, queue); return 1; } Section 4.7.1 ------------- 1. 622. = /* Performs root insertion of n at root within tree. Subtree root must not contain a node matching n. Returns nonzero only if successful. */ static int root_insert (struct bst_table *tree, struct bst_node **root, struct bst_node *n) { struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[BST_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ struct bst_node *p; /* Traverses tree looking for insertion point. */ assert (tree != NULL && n != NULL); return 1; } 623. = pa[0] = (struct bst_node *) root; da[0] = 0; k = 1; for (p = *root; p != NULL; p = p->bst_link[da[k - 1]]) { int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param); assert (cmp != 0); if (k >= BST_MAX_HEIGHT) return 0; pa[k] = p; da[k++] = cmp > 0; } This code is included in 622. 624. = pa[k - 1]->bst_link[da[k - 1]] = n; This code is included in 622 and 625. 2. The idea is to optimize for the common case but allow for fallback to a slower algorithm that doesn't require a stack when necessary. 625. = /* Performs root insertion of n at root within tree. Subtree root must not contain a node matching n. Never fails and will not rebalance tree. */ static void root_insert (struct bst_table *tree, struct bst_node **root, struct bst_node *n) { struct bst_node *pa[BST_MAX_HEIGHT]; /* Nodes on stack. */ unsigned char da[BST_MAX_HEIGHT]; /* Directions moved from stack nodes. */ int k; /* Stack height. */ int overflow = 0; /* Set nonzero if stack overflowed. */ struct bst_node *p; /* Traverses tree looking for insertion point. */ assert (tree != NULL && n != NULL); } If the stack overflows while we're searching for the insertion point, we stop keeping track of any nodes but the last one and set overflow so that later we know that overflow occurred: 626. = pa[0] = (struct bst_node *) root; da[0] = 0; k = 1; for (p = *root; p != NULL; p = p->bst_link[da[k - 1]]) { int cmp = tree->bst_compare (n->bst_data, p->bst_data, tree->bst_param); assert (cmp != 0); if (k >= BST_MAX_HEIGHT) { overflow = 1; k--; } pa[k] = p; da[k++] = cmp > 0; } This code is included in 625. Once we've inserted the node, we deal with the rotation in the same way as before if there was no overflow. If overflow occurred, we instead do the rotations one by one, with a full traversal from *root every time: 627. = if (!overflow) { } else { while (*root != n) { struct bst_node **r; /* Link to node to rotate. */ struct bst_node *q; /* Node to rotate. */ int dir; for (r = root; ; r = &q->bst_link[dir]) { q = *r; dir = 0 < tree->bst_compare (n->bst_data, q->bst_data, tree->bst_param); if (q->bst_link[dir] == n) break; } if (dir == 0) { q->bst_link[0] = n->bst_link[1]; n->bst_link[1] = q; } else { q->bst_link[1] = n->bst_link[0]; n->bst_link[0] = q; } *r = n; } } This code is included in 625. 3. One insertion order that does _not_ require much stack is ascending order. If we insert 1...4 at the root in ascending order, for instance, we get a BST that looks like this: 4 / 3 / 2 / 1 If we then insert node 5, it will immediately be inserted as the right child of 4, and then a left rotation will make it the root, and we're back where we started without ever using more than one stack entry. Other obvious pathological orders such as descending order and "zig-zag" order behave similarly. One insertion order that does require an arbitrary amount of stack space is to first insert 1...n in ascending order, then the single item 0. Each of the first group of insertions requires only one stack entry (except the first, which does not use any), but the final insertion uses n - 1. If we're interested in high average consumption of stack space, the pattern consisting of a series of ascending insertions (n / 2 + 1)...n followed by a second ascending series 1...(n / 2), for even n, is most effective. For instance, each insertion for insertion order 6, 7, 8, 9, 10, 1, 2, 3, 4, 5 requires 0, 1, 1, 1, 1, 5, 6, 6, 6, 6 stack entries, respectively, for a total of 33. These are, incidentally, the best possible results in each category, as determined by exhaustive search over the 10! == 3,628,800 possible root insertion orders for trees of 10 nodes. (Thanks to Richard Heathfield for suggesting exhaustive search.) Section 4.8 ----------- 1. Add this before the top-level else clause in : 628. = else if (p->bst_link[0] == NULL) q->bst_link[dir] = p->bst_link[1]; 2. Be sure to look at Exercise 3 before actually making this change. 629. = struct bst_node *s = r->bst_link[0]; while (s->bst_link[0] != NULL) { r = s; s = r->bst_link[0]; } p->bst_data = s->bst_data; r->bst_link[0] = s->bst_link[1]; p = s; We could, indeed, make similar changes to the other cases, but for these cases the code would become more complicated, not simpler. 3. The semantics for libavl traversers only invalidate traversers with the deleted item selected, but the revised code would actually free the node of the successor to that item. Because struct bst_traverser keeps a pointer to the struct bst_node of the current item, attempts to use a traverser that had selected the successor of the deleted item would result in undefined behavior. Some other binary tree libraries have looser semantics on their traversers, so they can afford to use this technique. Section 4.9.1 ------------- 1. It would probably be faster to check before each call rather than after, because this way many calls would be avoided. However, it might be more difficult to maintain the code, because we would have to remember to check for a null pointer before every call. For instance, the call to traverse_recursive() within walk() might easily be overlooked. Which is "better" is therefore a toss-up, dependent on a program's goals and the programmer's esthetic sense. 2. 630. = void walk (struct bst_table *tree, bst_item_func *action, void *param) { void traverse_recursive (struct bst_node *node) { if (node != NULL) { traverse_recursive (node->bst_link[0]); action (node->bst_data, param); traverse_recursive (node->bst_link[1]); } } assert (tree != NULL && action != NULL); traverse_recursive (tree->bst_root); } Section 4.9.2 ------------- 1a. First of all, a minimal-height binary tree of n nodes has a "height" of about log2(n), that is, starting from the root and moving only downward, you can visit at most n nodes (including the root) without running out of nodes. Examination of the code should reveal to you that only moving down to the left pushes nodes on the stack and only moving upward pops nodes off. What's more, the first thing the code does is move as far down to the left as it can. So, the maximum height of the stack in a minimum-height binary tree of n nodes is the binary tree's height, or, again, about log2(n). 1b. If a binary tree has only left children, as does the BST on the left below, the stack will grow as tall as the tree, to a height of n. Conversely, if a binary tree has only right children, as does the BST on the right below, no nodes will be pushed onto the stack at all. 4 1 / \ 3 2 / \ 2 3 / \ 1 4 1c. It's only acceptable if it's known that the stack will not exceed the fixed maximum height (or if the program aborting with an error is itself acceptable). Otherwise, you should use a recursive method (but see part (e) below), or a dynamically extended stack, or a balanced binary tree library. 1d. Keep in mind this is not the only way or necessarily the best way to handle stack overflow. Our final code for tree traversal will rebalance the tree when it grows too tall. 631. = static void traverse_iterative (struct bst_node *node, bst_item_func *action, void *param) { struct bst_node **stack = NULL; size_t height = 0; size_t max_height = 0; for (;;) { while (node != NULL) { if (height >= max_height) { max_height = max_height * 2 + 8; stack = realloc (stack, sizeof *stack * max_height); if (stack == NULL) { fprintf (stderr, "out of memory\n"); exit (EXIT_FAILURE); } } stack[height++] = node; node = node->bst_link[0]; } if (height == 0) break; node = stack[--height]; action (node->bst_data, param); node = node->bst_link[1]; } free (stack); } 1e. Yes, traverse_recursive() can run out of memory, because its arguments must be stored somewhere by the compiler. Given typical compilers, it will consume more memory per call than traverse_iterative() will per item on the stack, because each call includes two arguments not pushed on traverse_iterative()'s stack, plus any needed compiler-specific bookkeeping information. Section 4.9.2.1 --------------- 1. After calling bst_balance(), the structure of the binary tree may have changed completely, so we need to "find our place" again by setting up the traverser structure as if the traversal had been done on the rebalanced tree all along. Specifically, members node, stack[], and height of struct traverser need to be updated. It is easy to set up struct traverser in this way, given the previous node in inorder traversal, which we'll call prev. Simply search the tree from the new root to find this node. Along the way, because the stack is used to record nodes whose left subtree we are examining, push nodes onto the stack as we move left down the tree. Member node receives prev->bst_link[1], just as it would have if no overflow had occurred. A small problem with this approach is that it requires knowing the previous node in inorder, which is neither explicitly noted in struct traverser nor easy to find out. But it _is_ easy to find out the next node: it is the smallest-valued node in the binary tree rooted at the node we were considering when the stack overflowed. (If you need convincing, refer to the code for next_item() above: the while loop descends to the left, pushing nodes as it goes, until it hits a NULL pointer, then the node pushed last is popped and returned.) So we can return this as the next node in inorder while setting up the traverser to return the nodes after it. Here's the code: 632. = struct bst_node *prev, *iter; prev = node; while (prev->bst_link[0] != NULL) prev = prev->bst_link[0]; bst_balance (trav->table); trav->height = 0; for (iter = trav->table->bst_root; iter != prev; ) if (trav->table->bst_compare (prev->bst_data, iter->bst_data, trav->table->bst_param) < 0) { trav->stack[trav->height++] = iter; iter = iter->bst_link[0]; } else iter = iter->bst_link[1]; trav->node = iter->bst_link[1]; return prev->bst_data; Without this code, it is not necessary to have member table in struct traverser. 2. It is possible to write prev_item() given our current next_item(), but the result is not very efficient, for two reasons, both related to the way that struct traverser is used. First, the structure doesn't contain a pointer to the current item. Second, its stack doesn't contain pointers to trees that must be descended to the left to find a predecessor node, only those that must be descended to the right to find a successor node. The next section will develop an alternate, more general method for traversal that avoids these problems. Section 4.9.3 ------------- 1. The bst_probe() function can't disturb any traversals. A change in the tree is only problematic for a traverser if it deletes the currently selected node (which is explicitly undefined: *note Traversers::) or if it shuffles around any of the nodes that are on the traverser's stack. An insertion into a tree only creates new leaves, so it can't cause either of those problems, and there's no need to increment the generation number. The same logic applies to bst_t_insert(), presented later. On the other hand, an insertion into the AVL and red-black trees discussed in the next two chapters can cause restructuring of the tree and thus potentially disturb ongoing traversals. For this reason, the insertion functions for AVL and red-black trees _will_ increment the tree's generation number. 2. First, trav_refresh() is only called from bst_t_next() and bst_t_prev(), and these functions are mirrors of each other, so we need only show it for one of them. Second, all of the traverser functions check the stack height, so these will not cause an item to be initialized at too high a height, nor will bst_t_next() or bst_t_prev() increase the stack height above its limit. Since the traverser functions won't force a too-tall stack directly, this leaves the other functions. Only functions that modify the tree could cause problems, by pushing an item farther down in the tree. There are only four functions that modify a tree. The insertion functions bst_probe() and bst_t_insert() can't cause problems, because they add leaves but never move around nodes. The deletion function bst_delete() does move around nodes in case 3, but it always moves them higher in the tree, never lower. Finally, bst_balance() always ensures that all nodes in the resultant tree are within the tree's height limit. 3. This won't work because the stack may contain pointers to nodes that have been deleted and whose memory have been freed. In ANSI C89 and C99, any use of a pointer to an object after the end of its lifetime results in undefined behavior, even seemingly innocuous uses such as pointer comparisons. What's worse, the memory for the node may already have been recycled for use for another, different node elsewhere in the tree. This approach does work if there are never any deletions in the tree, or if we use some kind of generation number for each node that we store along with each stack entry. The latter would be overkill unless comparisons are very expensive and the traversals in changing trees are common. Another possibility would be to somehow only select this behavior if there have been no deletions in the binary tree since the traverser was last used. This could be done, for instance, with a second generation number in the binary tree incremented only on deletions, with a corresponding number kept in the traverser. The following reimplements trav_refresh() to include this optimization. As noted, it will not work if there are any deletions in the tree. It does work for traversers that must be refreshed due to, e.g., rebalancing. 633. = /* Refreshes the stack of parent pointers in trav and updates its generation number. Will *not* work if any deletions have occurred in the tree. */ static void trav_refresh (struct bst_traverser *trav) { assert (trav != NULL); trav->bst_generation = trav->bst_table->bst_generation; if (trav->bst_node != NULL) { bst_comparison_func *cmp = trav->bst_table->bst_compare; void *param = trav->bst_table->bst_param; struct bst_node *node = trav->bst_node; struct bst_node *i = trav->bst_table->bst_root; size_t height = 0; if (trav->bst_height > 0 && i == trav->bst_stack[0]) for (; height < trav->bst_height; height++) { struct bst_node *next = trav->bst_stack[height + 1]; if (i->bst_link[0] != next && i->bst_link[1] != next) break; i = next; } while (i != node) { assert (height < BST_MAX_HEIGHT); assert (i != NULL); trav->bst_stack[height++] = i; i = i->bst_link[cmp (node->bst_data, i->bst_data, param) > 0]; } trav->bst_height = height; } } Section 4.9.3.2 --------------- 1. It only calls itself if it runs out of stack space. Its call to bst_balance() right before the recursive call ensures that the tree is short enough to fit within the stack, so the recursive call cannot overflow. Section 4.9.3.6 --------------- 1. The assignment statements are harmless, but memcpy() of overlapping regions produces undefined behavior. Section 4.10.1 -------------- 1a. Notice the use of & instead of && below. This ensures that both link fields get initialized, so that deallocation can be done in a simple way. If && were used instead then we wouldn't have any way to tell whether (*y)->bst_link[1] had been initialized. 634. = /* Stores in *y a new copy of tree rooted at x. Returns nonzero if successful, or zero if memory was exhausted.*/ static int bst_robust_copy_recursive_1 (struct bst_node *x, struct bst_node **y) { if (x != NULL) { *y = malloc (sizeof **y); if (*y == NULL) return 0; (*y)->bst_data = x->bst_data; if (!(bst_robust_copy_recursive_1 (x->bst_link[0], &(*y)->bst_link[0]) & bst_robust_copy_recursive_1 (x->bst_link[1], &(*y)->bst_link[1]))) { bst_deallocate_recursive (*y); *y = NULL; return 0; } } else *y = NULL; return 1; } Here's a needed auxiliary function: 635. = static void bst_deallocate_recursive (struct bst_node *node) { if (node == NULL) return; bst_deallocate_recursive (node->bst_link[0]); bst_deallocate_recursive (node->bst_link[1]); free (node); } 1b. 636. = static struct bst_node error_node; /* Makes and returns a new copy of tree rooted at x. If an allocation error occurs, returns &error_node. */ static struct bst_node * bst_robust_copy_recursive_2 (struct bst_node *x) { struct bst_node *y; if (x == NULL) return NULL; y = malloc (sizeof *y); if (y == NULL) return &error_node; y->bst_data = x->bst_data; y->bst_link[0] = bst_robust_copy_recursive_2 (x->bst_link[0]); y->bst_link[1] = bst_robust_copy_recursive_2 (x->bst_link[1]); if (y->bst_link[0] == &error_node || y->bst_link[1] == &error_node) { bst_deallocate_recursive (y); return &error_node; } return y; } 2. Here's one way to do it, which is simple but perhaps not the fastest possible. 637. = /* Copies tree rooted at x to y, which latter is allocated but not yet initialized. Returns one if successful, zero if memory was exhausted. In the latter case y is not freed but any partially allocated subtrees are. */ static int bst_robust_copy_recursive_3 (struct bst_node *x, struct bst_node *y) { y->bst_data = x->bst_data; if (x->bst_link[0] != NULL) { y->bst_link[0] = malloc (sizeof *y->bst_link[0]); if (y->bst_link[0] == NULL) return 0; if (!bst_robust_copy_recursive_3 (x->bst_link[0], y->bst_link[0])) { free (y->bst_link[0]); return 0; } } else y->bst_link[0] = NULL; if (x->bst_link[1] != NULL) { y->bst_link[1] = malloc (sizeof *y->bst_link[1]); if (y->bst_link[1] == NULL) return 0; if (!bst_robust_copy_recursive_3 (x->bst_link[1], y->bst_link[1])) { bst_deallocate_recursive (y->bst_link[0]); free (y->bst_link[1]); return 0; } } else y->bst_link[1] = NULL; return 1; } Section 4.10.2 -------------- 1. Here is one possibility. 638. = /* Copies org to a newly created tree, which is returned. */ struct bst_table * bst_copy_iterative (const struct bst_table *org) { struct bst_node *stack[2 * (BST_MAX_HEIGHT + 1)]; int height = 0; struct bst_table *new; const struct bst_node *x; struct bst_node *y; new = bst_create (org->bst_compare, org->bst_param, org->bst_alloc); new->bst_count = org->bst_count; if (new->bst_count == 0) return new; x = (const struct bst_node *) &org->bst_root; y = (struct bst_node *) &new->bst_root; for (;;) { while (x->bst_link[0] != NULL) { y->bst_link[0] = org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[0]); stack[height++] = (struct bst_node *) x; stack[height++] = y; x = x->bst_link[0]; y = y->bst_link[0]; } y->bst_link[0] = NULL; for (;;) { y->bst_data = x->bst_data; if (x->bst_link[1] != NULL) { y->bst_link[1] = org->bst_alloc->libavl_malloc (org->bst_alloc, sizeof *y->bst_link[1]); x = x->bst_link[1]; y = y->bst_link[1]; break; } else y->bst_link[1] = NULL; if (height <= 2) return new; y = stack[--height]; x = stack[--height]; } } } Section 4.11.1 -------------- 1. bst_copy() can set bst_data to NULL when memory allocation fails. Section 4.13 ------------ 1. Factoring out recursion is troublesome in this case. Writing the loop with an explicit stack exposes more explicitly the issue of stack overflow. Failure on stack overflow is not acceptable, because it would leave both trees in disarray, so we handle it by dropping back to a slower algorithm that does not require a stack. This code also makes use of root_insert() from . 639. = /* Adds to tree all the nodes in the tree rooted at p. */ static void fallback_join (struct bst_table *tree, struct bst_node *p) { struct bst_node *q; for (; p != NULL; p = q) if (p->bst_link[0] == NULL) { q = p->bst_link[1]; p->bst_link[0] = p->bst_link[1] = NULL; root_insert (tree, &tree->bst_root, p); } else { q = p->bst_link[0]; p->bst_link[0] = q->bst_link[1]; q->bst_link[1] = p; } } /* Joins a and b, which must be disjoint and have compatible comparison functions. b is destroyed in the process. */ void bst_join (struct bst_table *ta, struct bst_table *tb) { size_t count = ta->bst_count + tb->bst_count; if (ta->bst_root == NULL) ta->bst_root = tb->bst_root; else if (tb->bst_root != NULL) { struct bst_node **pa[BST_MAX_HEIGHT]; struct bst_node *qa[BST_MAX_HEIGHT]; int k = 0; pa[k] = &ta->bst_root; qa[k++] = tb->bst_root; while (k > 0) { struct bst_node **a = pa[--k]; struct bst_node *b = qa[k]; for (;;) { struct bst_node *b0 = b->bst_link[0]; struct bst_node *b1 = b->bst_link[1]; b->bst_link[0] = b->bst_link[1] = NULL; root_insert (ta, a, b); if (b1 != NULL) { if (k < BST_MAX_HEIGHT) { pa[k] = &(*a)->bst_link[1]; qa[k] = b1; if (*pa[k] != NULL) k++; else *pa[k] = qa[k]; } else { int j; fallback_join (ta, b0); fallback_join (ta, b1); for (j = 0; j < k; j++) fallback_join (ta, qa[j]); ta->bst_count = count; free (tb); bst_balance (ta); return; } } a = &(*a)->bst_link[0]; b = b0; if (*a == NULL) { *a = b; break; } else if (b == NULL) break; } } } ta->bst_count = count; free (tb); } Section 4.14.1 -------------- 1. Functions not used at all are bst_insert(), bst_replace(), bst_t_replace(), bst_malloc(), and bst_free(). Functions used explicitly within test() or functions that it calls are bst_create(), bst_find(), bst_probe(), bst_delete(), bst_t_init(), bst_t_first(), bst_t_last(), bst_t_insert(), bst_t_find(), bst_t_copy(), bst_t_next(), bst_t_prev(), bst_t_cur(), bst_copy(), and bst_destroy(). The trav_refresh() function is called indirectly by modifying the tree during traversal. The copy_error_recovery() function is called if a memory allocation error occurs during bst_copy(). The bst_balance() function, and therefore also tree_to_vine(), vine_to_tree(), and compress(), are called if a stack overflow occurs. It is possible to force both these behaviors with command-line options to the test program. 2. Some kinds of errors mean that we can keep going and test other parts of the code. Other kinds of errors mean that something is deeply wrong, and returning without cleanup is the safest action short of terminating the program entirely. The third category is memory allocation errors. In our test program these are always caused intentionally in order to test out the BST functions' error recovery abilities, so a memory allocation error is not really an error at all, and we clean up and return successfully. (A real memory allocation error will cause the program to abort in the memory allocator. See the definition of mt_allocate() within .) Section 4.14.1.1 ---------------- 1. The definition of size_t differs from one compiler to the next. All we know about it for sure is that it's an unsigned type appropriate for representing the size of an object. So we must convert it to some known type in order to pass it to printf(), because printf(), having a variable number of arguments, does not know what type to convert it into. Incidentally, C99 solves this problem by providing a `z' modifier for printf() conversions, so that we could use "%zu" to print out size_t values without the need for a cast. See also: [ISO 1999], section 7.19.6.1. 2. Yes. Section 4.14.2 -------------- 1. 640. = /* Fills the n elements of array[] with a random permutation of the integers between 0 and n - 1. */ static void permuted_integers (int array[], size_t n) { size_t i; for (i = 0; i < n; i++) array[i] = i; for (i = 0; i < n; i++) { size_t j = i + (unsigned) rand () / (RAND_MAX / (n - i) + 1); int t = array[j]; array[j] = array[i]; array[i] = t; } } This code is included in 642. 2. All it takes is a preorder traversal. If the code below is confusing, try looking back at . 641. = /* Generates a list of integers that produce a balanced tree when inserted in order into a binary tree in the usual way. min and max inclusively bound the values to be inserted. Output is deposited starting at *array. */ static void gen_balanced_tree (int min, int max, int **array) { int i; if (min > max) return; i = (min + max + 1) / 2; *(*array)++ = i; gen_balanced_tree (min, i - 1, array); gen_balanced_tree (i + 1, max, array); } This code is included in 642. 3. 642. = /* Generates a permutation of the integers 0 to n - 1 into insert[] according to insert_order. */ static void gen_insertions (size_t n, enum insert_order insert_order, int insert[]) { size_t i; switch (insert_order) { case INS_RANDOM: permuted_integers (insert, n); break; case INS_ASCENDING: for (i = 0; i < n; i++) insert[i] = i; break; case INS_DESCENDING: for (i = 0; i < n; i++) insert[i] = n - i - 1; break; case INS_BALANCED: gen_balanced_tree (0, n - 1, &insert); break; case INS_ZIGZAG: for (i = 0; i < n; i++) if (i % 2 == 0) insert[i] = i / 2; else insert[i] = n - i / 2 - 1; break; case INS_ASCENDING_SHIFTED: for (i = 0; i < n; i++) { insert[i] = i + n / 2; if ((size_t) insert[i] >= n) insert[i] -= n; } break; case INS_CUSTOM: for (i = 0; i < n; i++) if (scanf ("%d", &insert[i]) == 0) fail ("error reading insertion order from stdin"); break; default: assert (0); } } /* Generates a permutation of the integers 0 to n - 1 into delete[] according to delete_order and insert[]. */ static void gen_deletions (size_t n, enum delete_order delete_order, const int *insert, int *delete) { size_t i; switch (delete_order) { case DEL_RANDOM: permuted_integers (delete, n); break; case DEL_REVERSE: for (i = 0; i < n; i++) delete[i] = insert[n - i - 1]; break; case DEL_SAME: for (i = 0; i < n; i++) delete[i] = insert[i]; break; case DEL_CUSTOM: for (i = 0; i < n; i++) if (scanf ("%d", &delete[i]) == 0) fail ("error reading deletion order from stdin"); break; default: assert (0); } } This code is included in 97. 4. The function below is carefully designed. It uses time() to obtain the current time. The alternative clock() is a poor choice because it measures CPU time used, which is often more or less constant among runs. The actual value of a time_t is not portable, so it computes a "hash" of the bytes in it using a multiply-and-add technique. The factor used for multiplication normally comes out as 257, a prime and therefore a good candidate. See also: [Knuth 1998a], section 3.2.1; [Aho 1986], section 7.6. 643. = /* Choose and return an initial random seed based on the current time. Based on code by Lawrence Kirby . */ unsigned time_seed (void) { time_t timeval; /* Current time. */ unsigned char *ptr; /* Type punned pointed into timeval. */ unsigned seed; /* Generated seed. */ size_t i; timeval = time (NULL); ptr = (unsigned char *) &timeval; seed = 0; for (i = 0; i < sizeof timeval; i++) seed = seed * (UCHAR_MAX + 2u) + ptr[i]; return seed; } This code is included in 97. Section 4.14.3 -------------- 1. 644. += static int test_bst_t_last (struct bst_table *tree, int n) { struct bst_traverser trav; int *last; last = bst_t_last (&trav, tree); if (last == NULL || *last != n - 1) { printf (" Last item test failed: expected %d, got %d\n", n - 1, last != NULL ? *last : -1); return 0; } return 1; } static int test_bst_t_find (struct bst_table *tree, int n) { int i; for (i = 0; i < n; i++) { struct bst_traverser trav; int *iter; iter = bst_t_find (&trav, tree, &i); if (iter == NULL || *iter != i) { printf (" Find item test failed: looked for %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int test_bst_t_insert (struct bst_table *tree, int n) { int i; for (i = 0; i < n; i++) { struct bst_traverser trav; int *iter; iter = bst_t_insert (&trav, tree, &i); if (iter == NULL || iter == &i || *iter != i) { printf (" Insert item test failed: inserted dup %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int test_bst_t_next (struct bst_table *tree, int n) { struct bst_traverser trav; int i; bst_t_init (&trav, tree); for (i = 0; i < n; i++) { int *iter = bst_t_next (&trav); if (iter == NULL || *iter != i) { printf (" Next item test failed: expected %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int test_bst_t_prev (struct bst_table *tree, int n) { struct bst_traverser trav; int i; bst_t_init (&trav, tree); for (i = n - 1; i >= 0; i--) { int *iter = bst_t_prev (&trav); if (iter == NULL || *iter != i) { printf (" Previous item test failed: expected %d, got %d\n", i, iter != NULL ? *iter : -1); return 0; } } return 1; } static int test_bst_copy (struct bst_table *tree, int n) { struct bst_table *copy = bst_copy (tree, NULL, NULL, NULL); int okay = compare_trees (tree->bst_root, copy->bst_root); bst_destroy (copy, NULL); return okay; } Section 4.14.4 -------------- 1. Attempting to apply an allocation policy to allocations of zero-byte blocks is silly. How could a failure be indicated, given that one of the successful results for an allocation of 0 bytes is NULL? At any rate, libavl never calls bst_allocate() with a size argument of 0. See also: [ISO 1990], section 7.10.3. Section 4.15 ------------ 1. We'll use bsts_, short for "binary search tree with sentinel", as the prefix for these functions. First, we need node and tree structures: 645. = /* Node for binary search tree with sentinel. */ struct bsts_node { struct bsts_node *link[2]; int data; }; /* Binary search tree with sentinel. */ struct bsts_tree { struct bsts_node *root; struct bsts_node sentinel; struct libavl_allocator *alloc; }; This code is included in 649. Searching is simple: 646. = /* Returns nonzero only if item is in tree. */ int bsts_find (struct bsts_tree *tree, int item) { const struct bsts_node *node; tree->sentinel.data = item; node = tree->root; while (item != node->data) if (item < node->data) node = node->link[0]; else node = node->link[1]; return node != &tree->sentinel; } See also 647. This code is included in 649. Insertion is just a little more complex, because we have to keep track of the link that we just came from (alternately, we could divide the function into multiple cases): 647. += /* Inserts item into tree, if it is not already present. */ void bsts_insert (struct bsts_tree *tree, int item) { struct bsts_node **q = &tree->root; struct bsts_node *p = tree->root; tree->sentinel.data = item; while (item != p->data) { int dir = item > p->data; q = &p->link[dir]; p = p->link[dir]; } if (p == &tree->sentinel) { *q = tree->alloc->libavl_malloc (tree->alloc, sizeof **q); if (*q == NULL) { fprintf (stderr, "out of memory\n"); exit (EXIT_FAILURE); } (*q)->link[0] = (*q)->link[1] = &tree->sentinel; (*q)->data = item; } } Our test function will just insert a collection of integers, then make sure that all of them are in the resulting tree. This is not as thorough as it could be, and it doesn't bother to free what it allocates, but it is good enough for now: 648. = /* Tests BSTS functions. insert and delete must contain some permutation of values 0...n - 1. */ int test_correctness (struct libavl_allocator *alloc, int *insert, int *delete, int n, int verbosity) { struct bsts_tree tree; int okay = 1; int i; tree.root = &tree.sentinel; tree.alloc = alloc; for (i = 0; i < n; i++) bsts_insert (&tree, insert[i]); for (i = 0; i < n; i++) if (!bsts_find (&tree, i)) { printf ("%d should be in tree, but isn't\n", i); okay = 0; } return okay; } /* Not supported. */ int test_overflow (struct libavl_allocator *alloc, int order[], int n, int verbosity) { return 0; } This code is included in 649. Function test() doesn't free allocated nodes, resulting in a memory leak. You should fix this if you are concerned about it. Here's the whole program: 649. = #include #include #include #include "test.h" bsts 5> bsts 7> bsts 6> See also: [Bentley 2000], exercise 7 in chapter 13. Chapter 5 ========= Section 5.4 ----------- 1. In a BST, the time for an insertion or deletion is the time required to visit each node from the root down to the node of interest, plus some time to perform the operation itself. Functions bst_probe() and bst_delete() contain only a single loop each, which iterates once for each node examined. As the tree grows, the time for the actual operation loses significance and the total time for the operation becomes essentially proportional to the height of the tree, which is approximately log2 (n) in the best case (*note Analysis of AVL Balancing Rule::). We were given that the additional work for rebalancing an AVL or red-black tree is at most a constant amount multiplied by the height of the tree. Furthermore, the maximum height of an AVL tree is 1.44 times the maximum height for the corresponding perfectly balanced binary tree, and a red-black tree has a similar bound on its height. Therefore, for trees with many nodes, the worst-case time required to insert or delete an item in a balanced tree is a constant multiple of the time required for the same operation on an unbalanced BST in the best case. In the formal terms of computer science, insertion and deletion in a balanced tree are O(log n) operations, where n is the number of nodes in the tree. In practice, operations on balanced trees of reasonable size are, at worst, not much slower than operations on unbalanced binary trees and, at best, much faster. Section 5.4.2 ------------- 1. Variable y is only modified within . If y is set during the loop, it is set to p, which is always a non-null pointer within the loop. So y can only be NULL if it is last set before the loop begins. If that is true, it will be NULL only if tree->avl_root == NULL. So, variable y can only be NULL if the AVL tree was empty before the insertion. A NULL value for y is a special case because later code assumes that y points to a node. Section 5.4.3 ------------- 1. No. Suppose that n is the new node, that p is its parent, and that p has a - balance factor before n's insertion (a similar argument applies if p's balance factor is +). Then, for n's insertion to decrease p's balance factor to -2, n would have to be the left child of p. But if p had a - balance factor before the insertion, it already had a left child, so n cannot be the new left of p. This is a contradiction, so case 3 will never be applied to the parent of a newly inserted node. 2. <0> <0> <--> __..-' `._ _' `._ _' <0> <-> <-> <0> <-> _' \ _' _' _' \ _' <0> <0> <0> <0> <0> <0> <0> In the leftmost tree, case 2 applies to the root's left child and the root's balance factor does not change. In the middle tree, case 1 applies to the root's left child and case 2 applies to the root. In the rightmost tree, case 1 applies to the root's left child and case 3 applies to the root. The tree on the right requires rebalancing, and the others do not. 3. Type char may be signed or unsigned, depending on the C compiler and/or how the C compiler is run. Also, a common use for subscripting an array with a character type is to translate an arbitrary character to another character or a set of properties. For example, this is a common way to implement the standard C functions from ctype.h. This means that subscripting such an array with a char value can have different behavior when char changes between signed and unsigned with different compilers (or with the same compiler invoked with different options). See also: [ISO 1990], section 6.1.2.5; [Kernighan 1988], section A4.2. 4. Here is one possibility: 650. = for (p = y; p != n; p = p->avl_link[cache & 1], cache >>= 1) if ((cache & 1) == 0) p->avl_balance--; else p->avl_balance++; Also, replace the declarations of da[] and k by these: unsigned long cache = 0; /* Cached comparison results. */ int k = 0; /* Number of cached comparison results. */ and replace the second paragraph of code within the loop in step 1 by this: if (p->avl_balance != 0) z = q, y = p, cache = 0, k = 0; dir = cmp > 0; if (dir) cache |= 1ul << k; k++; It is interesting to note that the speed difference between this version and the standard version was found to be negligible, when compiled with full optimization under GCC (both 2.95.4 and 3.0.3) on x86. Section 5.4.4 ------------- 1. Because then y's right subtree would have height 1, so there's no way that y could have a +2 balance factor. 2. The value of y is set during the search for item to point to the closest node above the insertion point that has a nonzero balance factor, so any node below y along this search path, including x, must have had a 0 balance factor originally. All such nodes are updated to have a nonzero balance factor later, during step 3. So x must have either a - or + balance factor at the time of rebalancing. 3.1. | | y y | <--> <--> w __..-' _' <0> x => w => _' \ <+> <-> x y \ _' <0> <0> w x <0> <0> 3.2. | | y y | <--> <--> w ____...---' \ __..--' \ <0> x d w d _.-' `._ <+> h <--> h x y / `_ => _.-' \ => <0> <+> a w x c / \ _' \ h <-> <0> h-1 a b c d / \ / \ h h h-1 h b c a b h h-1 h h 3.3. | | y y | <--> <--> w ___...---' \ _.-' \ <0> x d w d __..-' `_ <+> h <-> h x y / `._ => __..-' \ => <-> <0> a w x c / \ / \ h <+> <-> h a b c d _' \ / \ h h-1 h h b c a b h-1 h h h-1 4. w should replace y as the left or right child of z. y != z->avl_link[0] has the value 1 if y is the right child of z, or 0 if y is the left child. So the overall expression replaces y with w as a child of z. The suggested substitution is a poor choice because if z == (struct avl_node *) &tree->root, z->avl_link[1] is undefined. 5. Yes. Section 5.5.2 ------------- 1. This approach cannot be used in libavl (see Exercise 4.8-3). 651. = struct avl_node *s; da[k] = 1; pa[k++] = p; for (;;) { da[k] = 0; pa[k++] = r; s = r->avl_link[0]; if (s->avl_link[0] == NULL) break; r = s; } p->avl_data = s->avl_data; r->avl_link[0] = s->avl_link[1]; p = s; 2. We could, if we use the standard libavl code for deletion case 3. The alternate version in Exercise 1 modifies item data, which would cause the wrong value to be returned later. Section 5.5.4 ------------- 1. Tree y started out with a + balance factor, meaning that its right subtree is taller than its left. So, even if y's left subtree had height 0, its right subtree has at least height 1, meaning that y must have at least one right child. 2. Rebalancing is required at each level if, at every level of the tree, the deletion causes a +2 or -2 balance factor at a node p while there is a +1 or -1 balance factor at p's child opposite the deletion. For example, consider the AVL tree below: 20 _____.....-----' `----....._____ 12 28 ___...--' `--...___ __.-' `-._ 7 17 25 31 _.' `-._ _.-' `_ _.-' `_ / \ 4 10 15 19 23 27 30 32 _' `_ / \ / \ / / \ / / 2 6 9 11 14 16 18 22 24 26 29 / \ / / / / 1 3 5 8 13 21 / 0 Deletion of node 32 in this tree leads to a -2 balance factor on the left side of node 31, causing a right rotation at node 31. This shortens the right subtree of node 28, causing it to have a -2 balance factor, leading to a right rotation there. This shortens the right subtree of node 20, causing it to have a -2 balance factor, forcing a right rotation there, too. Here is the final tree: 12 ___...--' `----....._____ 7 20 _.' `-._ __.-' `--...___ 4 10 17 25 _' `_ / \ _.-' `_ _.-' `-._ 2 6 9 11 15 19 23 28 / \ / / / \ / / \ / `_ 1 3 5 8 14 16 18 22 24 27 30 / / / / / 0 13 21 26 29 Incidentally, our original tree was an example of a "Fibonacci tree", a kind of binary tree whose form is defined recursively, as follows. A Fibonacci tree of order 0 is an empty tree and a Fibonacci tree of order 1 is a single node. A Fibonacci tree of order n >= 2 is a node whose left subtree is a Fibonacci tree of order n - 1 and whose right subtree is a Fibonacci tree of order n - 2. Our example is a Fibonacci tree of order 7. Any big-enough Fibonacci tree will exhibit this pathological behavior upon AVL deletion of its maximum node. Section 5.6 ----------- 1. At this point in the code, p points to the avl_data member of an struct avl_node. We want a pointer to the struct avl_node itself. To do this, we just subtract the offset of the avl_data member within the structure. A cast to char * is necessary before the subtraction, because offsetof returns a count of bytes, and a cast to struct avl_node * afterward, to make the result the right type. Chapter 6 ========= Section 6.1 ----------- 1. It must be a "complete binary tree" of exactly pow (2, n) - 1 nodes. If a red-black tree contains only red nodes, on the other hand, it cannot have more than one node, because of rule 1. 2. If a red-black tree's root is red, then we can transform it into an equivalent red-black tree with a black root simply by recoloring the root. This cannot violate rule 1, because it does not introduce a red node. It cannot violate rule 2 because it only affects the number of black nodes along paths that pass through the root, and it affects all of those paths equally, by increasing the number of black nodes along them by one. If, on the other hand, a red-black tree has a black root, we cannot in general recolor it to red, because this causes a violation of rule 1 if the root has a red child. 3. Yes and yes: _' \ _' \ __..-' \ __..-' \ _' \ _' \ Section 6.2 ----------- 1. C has a number of different namespaces. One of these is the namespace that contains struct, union, and enum tags. Names of structure members are in a namespace separate from this tag namespace, so it is okay to give an enum and a structure member the same name. On the other hand, it would be an error to give, e.g., a struct and an enum the same name. Section 6.4.2 ------------- 1. Inserting a red node can sometimes be done without breaking any rules. Inserting a black node will always break rule 2. Section 6.4.3 ------------- 1. We can't have k == 1, because then the new node would be the root, and the root doesn't have a parent that could be red. We don't need to rebalance k == 2, because the new node is a direct child of the root, and the root is always black. 2. Yes, it would, but if d has a red node as its root, case 1 will be selected instead. Section 6.5.1 ------------- 1. If p has no left child, that is, it is a leaf, then obviously we cannot swap colors. Now consider only the case where p does have a non-null left child x. Clearly, x must be red, because otherwise rule 2 would be violated at p. This means that p must be black to avoid a rule 1 violation. So the deletion will eliminate a black node, causing a rule 2 violation. This is exactly the sort of problem that the rebalancing step is designed to deal with, so we can rebalance starting from node x. 2. There are two cases in this algorithm, which uses a new struct avl_node * variable named x. Regardless of which one is chosen, x has the same meaning afterward: it is the node that replaced one of the children of the node at top of stack, and may be NULL if the node removed was a leaf. Case 1: If one of p's child pointers is NULL, then p can be replaced by the other child, or by NULL if both children are NULL: 652. = if (p->rb_link[0] == NULL || p->rb_link[1] == NULL) { x = p->rb_link[0]; if (x == NULL) x = p->rb_link[1]; } See also 653 and 654. Case 2: If both of p's child pointers are non-null, then we find p's successor and replace p's data by the successor's data, then delete the successor instead: 653. += else { struct rb_node *y; pa[k] = p; da[k++] = 1; y = p->rb_link[1]; while (y->rb_link[0] != NULL) { pa[k] = y; da[k++] = 0; y = y->rb_link[0]; } x = y->rb_link[1]; p->rb_data = y->rb_data; p = y; } In either case, we need to update the node above the deleted node to point to x. 654. += pa[k - 1]->rb_link[da[k - 1]] = x; See also: [Cormen 1990], section 14.4. Chapter 7 ========= Section 7.2 ----------- 1. An enumerated type is compatible with some C integer type, but the particular type is up to the C compiler. Many C compilers will always pick int as the type of an enumeration type. But we want to conserve space in the structure (see {No value for `tbstnodesizebrief'}), so we specify unsigned char explicitly as the type. See also: [ISO 1990], section 6.5.2.2; [ISO 1999], section 6.7.2.2. Section 7.6 ----------- 1. When we add a node to a formerly empty tree, this statement will set tree->tbst_root, thereby breaking the if statement's test. Section 7.7 ----------- 1. *Note Finding the Parent of a TBST Node::. Function find_parent() is implemented in . 655. = p = tree->tbst_root; if (p == NULL) return NULL; for (;;) { int cmp = tree->tbst_compare (item, p->tbst_data, tree->tbst_param); if (cmp == 0) break; p = p->tbst_link[cmp > 0]; } q = find_parent (tree, p); dir = q->tbst_link[0] != p; See also: [Knuth 1997], exercise 2.3.1-19. 2. Yes. We can bind a pointer and a tag into a single structure, then use that structure for our links and for the root in the table structure. /* A tagged link. */ struct tbst_link { struct tbst_node *tbst_ptr; /* Child pointer or thread. */ unsigned char tbst_tag; /* Tag. */ }; /* A threaded binary search tree node. */ struct tbst_node { struct tbst_link tbst_link[2]; /* Links. */ void *tbst_data; /* Pointer to data. */ }; /* Tree data structure. */ struct tbst_table { struct tbst_link tbst_root; /* Tree's root; tag is unused. */ tbst_comparison_func *tbst_compare; /* Comparison function. */ void *tbst_param; /* Extra argument to tbst_compare. */ struct libavl_allocator *tbst_alloc; /* Memory allocator. */ size_t tbst_count; /* Number of items in tree. */ }; The main disadvantage of this approach is in storage space: many machines have alignment restrictions for pointers, so the nonadjacent unsigned chars cause space to be wasted. Alternatively, we could keep the current arrangement of the node structure and change tbst_root in struct tbst_table from a pointer to an instance of struct tbst_node. 3. Much simpler than the implementation given before: 656. = struct tbst_node *s = r->tbst_link[0]; while (s->tbst_tag[0] == TBST_CHILD) { r = s; s = r->tbst_link[0]; } p->tbst_data = s->tbst_data; if (s->tbst_tag[1] == TBST_THREAD) { r->tbst_tag[0] = TBST_THREAD; r->tbst_link[0] = p; } else { q = r->tbst_link[0] = s->tbst_link[1]; while (q->tbst_tag[0] == TBST_CHILD) q = q->tbst_link[0]; q->tbst_link[0] = p; } p = s; This code is included in 658. 4. If all the possible deletions from a given TBST are considered, then no link will be followed more than once to update a left thread, and similarly for right threads. Averaged over all the possible deletions, this is a constant. For example, take the following TBST: 6 ____....----' `._ 3 7 ____....----' `--..___ _' \ 0 5 [6] [] / `--..___ _.-' \ [] 2 4 [6] _.-' \ _' \ 1 [3] [3] [5] _' \ [0] [2] Consider right threads that must be updated on deletion. Nodes 2, 3, 5, and 6 have right threads pointing to them. To update the right thread to node 2, we follow the link to node 1; to update node 3's, we move to 0, then 2; for node 5, we move to node 4; and for node 6, we move to 3, then 5. No link is followed more than once. Here's a summary table: Node Right Thread Left Thread Follows Follows 0: (none) 2, 1 1: (none) (none) 2: 1 (none) 3: 0, 2 5, 4 4: (none) (none) 5: 4 (none) 6: 3, 5 7 7: (none) (none) The important point here is that no number appears twice within a column. Section 7.9 ----------- 1. Suppose a node has a right thread. If the node has no left subtree, then the thread will be followed immediately when the node is reached. If the node does have a left subtree, then the left subtree will be traversed, and when the traversal is finished the node's predecessor's right thread will be followed back to the node, then its right thread will be followed. The node cannot be skipped, because all the nodes in its left subtree are less than it, so none of the right threads in its left subtree can skip beyond it. 2. The biggest potential for optimization probably comes from tbst_copy()'s habit of always keeping the TBST fully consistent as it builds it, which causes repeated assignments to link fields in order to keep threads correct at all times. The unthreaded BST copy function bst_copy() waited to initialize fields until it was ready for them. It may be possible, though difficult, to do this in tbst_copy() as well. Inlining and specializing copy_node() is a cheaper potential speedup. Chapter 8 ========= Section 8.1 ----------- 1. No: the compiler may insert padding between or after structure members. For example, today (2002) the most common desktop computers have 32-bit pointers and and 8-bit chars. On these systems, most compilers will pad out structures to a multiple of 32 bits. Under these circumstances, struct tavl_node is no larger than struct avl_node, because (32 + 32 + 8) and (32 + 32 + 8 + 8 + 8) both round up to the same multiple of 32 bits, or 96 bits. Section 8.2 ----------- 1. We just have to special-case the possibility that subtree b is a thread. /* Rotates right at *yp. */ static void rotate_right (struct tavl_node **yp) { struct tavl_node *y = *yp; struct tavl_node *x = y->tavl_link[0]; if (x->tavl_tag[1] == TAVL_THREAD) { x->tavl_tag[1] = TAVL_CHILD; y->tavl_tag[0] = TAVL_THREAD; y->tavl_link[0] = x; } else y->tavl_link[0] = x->tavl_link[1]; x->tavl_link[1] = y; *yp = x; } /* Rotates left at *xp. */ static void rotate_left (struct tavl_node **xp) { struct tavl_node *x = *xp; struct tavl_node *y = x->tavl_link[1]; if (y->tavl_tag[0] == TAVL_THREAD) { y->tavl_tag[0] = TAVL_CHILD; x->tavl_tag[1] = TAVL_THREAD; x->tavl_link[1] = y; } else x->tavl_link[1] = y->tavl_link[0]; y->tavl_link[0] = x; *xp = y; } Section 8.4.2 ------------- 1. Besides this change, the statement z->tavl_link[y != z->tavl_link[0]] = w; must be removed from , and copies added to the end of and . 657. = w = x->tavl_link[1]; rotate_left (&y->tavl_link[0]); rotate_right (&z->tavl_link[y != z->tavl_link[0]]); if (w->tavl_balance == -1) x->tavl_balance = 0, y->tavl_balance = +1; else if (w->tavl_balance == 0) x->tavl_balance = y->tavl_balance = 0; else /* w->tavl_balance == +1 */ x->tavl_balance = -1, y->tavl_balance = 0; w->tavl_balance = 0; Section 8.5.2 ------------- 1. We can just reuse the alternate implementation of case 4 for TBST deletion, following it by setting up q and dir as the rebalancing step expects them to be. 658. = tavl 656> q = r; dir = 0; Section 8.5.6 ------------- 1. Our argument here is similar to that in Exercise 7.7-4. Consider the links that are traversed to successfully find the parent of each node, besides the root, in the tree shown below. Do not include links followed on the side that does not lead to the node's parent. Because there are never more of these than on the successful side, they add only a constant time to the algorithm and can be ignored. 6 ____....----' `._ 3 7 ____....----' `--..___ _' \ 0 5 [6] [] / `--..___ _.-' \ [] 2 4 [6] _.-' \ _' \ 1 [3] [3] [5] _' \ [0] [2] The table below lists the links followed. The important point is that no link is listed twice. Node Links Followed to Node's Parent 0 0->2, 2->3 1 1->2 2 2->1, 1->0 3 3->5, 5->6 4 4->5 5 5->4, 4->3 6 (root) 7 7->6 This generalizes to all TBSTs. Because a TBST with n nodes contains only 2n links, this means we have an upper bound on finding the parent of every node in a TBST of at most 2n successful link traversals plus 2n unsuccessful link traversals. Averaging 4n over n nodes, we get an upper bound of 4n/n == 4 link traversals, on average, to find the parent of a given node. This upper bound applies only to the average case, not to the case of any individual node. In particular, it does not say that the usage of the algorithm in tavl_delete() will exhibit average behavior. In practice, however, the performance of this algorithm in tavl_delete() seems quite acceptable. See Exercise 3 for an alternative with more certain behavior. 2. Instead of storing a null pointer in the left thread of the least node and the right thread of the greatest node, store a pointer to a node "above the root". To make this work properly, tavl_root will have to become an actual node, not just a node pointer, because otherwise trying to find its right child would invoke undefined behavior. Also, both of tavl_root's children would have to be the root node. This is probably not worth it. On the surface it seems like a good idea but ugliness lurks beneath. 3. The necessary changes are pervasive, so the complete code for the modified function is presented below. The search step is borrowed from TRB deletion, presented in the next chapter. 659. = void * tavl_delete (struct tavl_table *tree, const void *item) { /* Stack of nodes. */ struct tavl_node *pa[TAVL_MAX_HEIGHT]; /* Nodes. */ unsigned char da[TAVL_MAX_HEIGHT]; /* tavl_link[] indexes. */ int k = 0; /* Stack pointer. */ struct tavl_node *p; /* Traverses tree to find node to delete. */ int cmp; /* Result of comparison between item and p. */ int dir; /* Child of p to visit next. */ assert (tree != NULL && item != NULL); tavl 350> return (void *) item; } 660. = if (p->tavl_tag[1] == TAVL_THREAD) { if (p->tavl_tag[0] == TAVL_CHILD) { } else { } } else { struct tavl_node *r = p->tavl_link[1]; if (r->tavl_tag[0] == TAVL_THREAD) { } else { } } tree->tavl_count--; tree->tavl_alloc->libavl_free (tree->tavl_alloc, p); This code is included in 659. 661. = struct tavl_node *r = p->tavl_link[0]; while (r->tavl_tag[1] == TAVL_CHILD) r = r->tavl_link[1]; r->tavl_link[1] = p->tavl_link[1]; pa[k - 1]->tavl_link[da[k - 1]] = p->tavl_link[0]; This code is included in 660. 662. = pa[k - 1]->tavl_link[da[k - 1]] = p->tavl_link[da[k - 1]]; if (pa[k - 1] != (struct tavl_node *) &tree->tavl_root) pa[k - 1]->tavl_tag[da[k - 1]] = TAVL_THREAD; This code is included in 660. 663. = r->tavl_link[0] = p->tavl_link[0]; r->tavl_tag[0] = p->tavl_tag[0]; r->tavl_balance = p->tavl_balance; if (r->tavl_tag[0] == TAVL_CHILD) { struct tavl_node *x = r->tavl_link[0]; while (x->tavl_tag[1] == TAVL_CHILD) x = x->tavl_link[1]; x->tavl_link[1] = r; } pa[k - 1]->tavl_link[da[k - 1]] = r; da[k] = 1; pa[k++] = r; This code is included in 660. 664. = struct tavl_node *s; int j = k++; for (;;) { da[k] = 0; pa[k++] = r; s = r->tavl_link[0]; if (s->tavl_tag[0] == TAVL_THREAD) break; r = s; } da[j] = 1; pa[j] = pa[j - 1]->tavl_link[da[j - 1]] = s; if (s->tavl_tag[1] == TAVL_CHILD) r->tavl_link[0] = s->tavl_link[1]; else { r->tavl_link[0] = s; r->tavl_tag[0] = TAVL_THREAD; } s->tavl_balance = p->tavl_balance; s->tavl_link[0] = p->tavl_link[0]; if (p->tavl_tag[0] == TAVL_CHILD) { struct tavl_node *x = p->tavl_link[0]; while (x->tavl_tag[1] == TAVL_CHILD) x = x->tavl_link[1]; x->tavl_link[1] = s; s->tavl_tag[0] = TAVL_CHILD; } s->tavl_link[1] = p->tavl_link[1]; s->tavl_tag[1] = TAVL_CHILD; This code is included in 660. 665. = assert (k > 0); while (--k > 0) { struct tavl_node *y = pa[k]; if (da[k] == 0) { y->tavl_balance++; if (y->tavl_balance == +1) break; else if (y->tavl_balance == +2) { } } else { } } This code is included in 659. 666. = struct tavl_node *x = y->tavl_link[1]; assert (x != NULL); if (x->tavl_balance == -1) { struct tavl_node *w; pa[k - 1]->tavl_link[da[k - 1]] = w; } else if (x->tavl_balance == 0) { y->tavl_link[1] = x->tavl_link[0]; x->tavl_link[0] = y; x->tavl_balance = -1; y->tavl_balance = +1; pa[k - 1]->tavl_link[da[k - 1]] = x; break; } else /* x->tavl_balance == +1 */ { if (x->tavl_tag[0] == TAVL_CHILD) y->tavl_link[1] = x->tavl_link[0]; else { y->tavl_tag[1] = TAVL_THREAD; x->tavl_tag[0] = TAVL_CHILD; } x->tavl_link[0] = y; x->tavl_balance = y->tavl_balance = 0; pa[k - 1]->tavl_link[da[k - 1]] = x; } This code is included in 665. 667. = y->tavl_balance--; if (y->tavl_balance == -1) break; else if (y->tavl_balance == -2) { struct tavl_node *x = y->tavl_link[0]; assert (x != NULL); if (x->tavl_balance == +1) { struct tavl_node *w; pa[k - 1]->tavl_link[da[k - 1]] = w; } else if (x->tavl_balance == 0) { y->tavl_link[0] = x->tavl_link[1]; x->tavl_link[1] = y; x->tavl_balance = +1; y->tavl_balance = -1; pa[k - 1]->tavl_link[da[k - 1]] = x; break; } else /* x->tavl_balance == -1 */ { if (x->tavl_tag[1] == TAVL_CHILD) y->tavl_link[0] = x->tavl_link[1]; else { y->tavl_tag[0] = TAVL_THREAD; x->tavl_tag[1] = TAVL_CHILD; } x->tavl_link[1] = y; x->tavl_balance = y->tavl_balance = 0; pa[k - 1]->tavl_link[da[k - 1]] = x; } } This code is included in 665. Chapter 9 ========= Section 9.3.3 ------------- 1. For a brief explanation of an algorithm similar to the one here, see *Note Inserting into a PRB Tree::. 668. = trb 327> void ** trb_probe (struct trb_table *tree, void *item) { struct trb_node *p; /* Traverses tree looking for insertion point. */ struct trb_node *n; /* Newly inserted node. */ int dir; /* Side of p on which n is inserted. */ assert (tree != NULL && item != NULL); trb 255> p = n; for (;;) { struct trb_node *f, *g; f = find_parent (tree, p); if (f == (struct trb_node *) &tree->trb_root || f->trb_color == TRB_BLACK) break; g = find_parent (tree, f); if (g == (struct trb_node *) &tree->trb_root) break; if (g->trb_link[0] == f) { struct trb_node *y = g->trb_link[1]; if (g->trb_tag[1] == TRB_CHILD && y->trb_color == TRB_RED) { f->trb_color = y->trb_color = TRB_BLACK; g->trb_color = TRB_RED; p = g; } else { struct trb_node *c, *x; if (f->trb_link[0] == p) y = f; else { x = f; y = x->trb_link[1]; x->trb_link[1] = y->trb_link[0]; y->trb_link[0] = x; g->trb_link[0] = y; if (y->trb_tag[0] == TRB_THREAD) { y->trb_tag[0] = TRB_CHILD; x->trb_tag[1] = TRB_THREAD; x->trb_link[1] = y; } } c = find_parent (tree, g); c->trb_link[c->trb_link[0] != g] = y; x = g; x->trb_color = TRB_RED; y->trb_color = TRB_BLACK; x->trb_link[0] = y->trb_link[1]; y->trb_link[1] = x; if (y->trb_tag[1] == TRB_THREAD) { y->trb_tag[1] = TRB_CHILD; x->trb_tag[0] = TRB_THREAD; x->trb_link[0] = y; } break; } } else { struct trb_node *y = g->trb_link[0]; if (g->trb_tag[0] == TRB_CHILD && y->trb_color == TRB_RED) { f->trb_color = y->trb_color = TRB_BLACK; g->trb_color = TRB_RED; p = g; } else { struct trb_node *c, *x; if (f->trb_link[1] == p) y = f; else { x = f; y = x->trb_link[0]; x->trb_link[0] = y->trb_link[1]; y->trb_link[1] = x; g->trb_link[1] = y; if (y->trb_tag[1] == TRB_THREAD) { y->trb_tag[1] = TRB_CHILD; x->trb_tag[0] = TRB_THREAD; x->trb_link[0] = y; } } c = find_parent (tree, g); c->trb_link[c->trb_link[0] != g] = y; x = g; x->trb_color = TRB_RED; y->trb_color = TRB_BLACK; x->trb_link[1] = y->trb_link[0]; y->trb_link[0] = x; if (y->trb_tag[0] == TRB_THREAD) { y->trb_tag[0] = TRB_CHILD; x->trb_tag[1] = TRB_THREAD; x->trb_link[1] = y; } break; } } } tree->trb_root->trb_color = TRB_BLACK; return &n->trb_data; } Section 9.4.2 ------------- 1. 669. = struct trb_node *s; da[k] = 1; pa[k++] = p; for (;;) { da[k] = 0; pa[k++] = r; s = r->trb_link[0]; if (s->trb_tag[0] == TRB_THREAD) break; r = s; } p->trb_data = s->trb_data; if (s->trb_tag[1] == TRB_THREAD) { r->trb_tag[0] = TRB_THREAD; r->trb_link[0] = p; } else { struct trb_node *t = r->trb_link[0] = s->trb_link[1]; while (t->trb_tag[0] == TRB_CHILD) t = t->trb_link[0]; t->trb_link[0] = p; } p = s; Section 9.4.5 ------------- 1. The code used in the rebalancing loop is related to . Variable x is initialized by step 2 here, though, because otherwise the pseudo-root node would be required to have a trb_tag[] member. 670. = trb 327> void * trb_delete (struct trb_table *tree, const void *item) { struct trb_node *p; /* Node to delete. */ struct trb_node *q; /* Parent of p. */ struct trb_node *x; /* Node we might want to recolor red (maybe NULL). */ struct trb_node *f; /* Parent of x. */ struct trb_node *g; /* Parent of f. */ int dir, cmp; assert (tree != NULL && item != NULL); trb 312> if (p->trb_tag[1] == TRB_THREAD) { if (p->trb_tag[0] == TRB_CHILD) { struct trb_node *t = p->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = p->trb_link[1]; x = q->trb_link[dir] = p->trb_link[0]; } else { q->trb_link[dir] = p->trb_link[dir]; if (q != (struct trb_node *) &tree->trb_root) q->trb_tag[dir] = TRB_THREAD; x = NULL; } f = q; } else { enum trb_color t; struct trb_node *r = p->trb_link[1]; if (r->trb_tag[0] == TRB_THREAD) { r->trb_link[0] = p->trb_link[0]; r->trb_tag[0] = p->trb_tag[0]; if (r->trb_tag[0] == TRB_CHILD) { struct trb_node *t = r->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = r; } q->trb_link[dir] = r; x = r->trb_tag[1] == TRB_CHILD ? r->trb_link[1] : NULL; t = r->trb_color; r->trb_color = p->trb_color; p->trb_color = t; f = r; dir = 1; } else { struct trb_node *s; for (;;) { s = r->trb_link[0]; if (s->trb_tag[0] == TRB_THREAD) break; r = s; } if (s->trb_tag[1] == TRB_CHILD) x = r->trb_link[0] = s->trb_link[1]; else { r->trb_link[0] = s; r->trb_tag[0] = TRB_THREAD; x = NULL; } s->trb_link[0] = p->trb_link[0]; if (p->trb_tag[0] == TRB_CHILD) { struct trb_node *t = p->trb_link[0]; while (t->trb_tag[1] == TRB_CHILD) t = t->trb_link[1]; t->trb_link[1] = s; s->trb_tag[0] = TRB_CHILD; } s->trb_link[1] = p->trb_link[1]; s->trb_tag[1] = TRB_CHILD; t = s->trb_color; s->trb_color = p->trb_color; p->trb_color = t; q->trb_link[dir] = s; f = r; dir = 0; } } if (p->trb_color == TRB_BLACK) { for (;;) { if (x != NULL && x->trb_color == TRB_RED) { x->trb_color = TRB_BLACK; break; } if (f == (struct trb_node *) &tree->trb_root) break; g = find_parent (tree, f); if (dir == 0) { struct trb_node *w = f->trb_link[1]; if (w->trb_color == TRB_RED) { w->trb_color = TRB_BLACK; f->trb_color = TRB_RED; f->trb_link[1] = w->trb_link[0]; w->trb_link[0] = f; g->trb_link[g->trb_link[0] != f] = w; g = w; w = f->trb_link[1]; } if ((w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) && (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK)) w->trb_color = TRB_RED; else { if (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK) { struct trb_node *y = w->trb_link[0]; y->trb_color = TRB_BLACK; w->trb_color = TRB_RED; w->trb_link[0] = y->trb_link[1]; y->trb_link[1] = w; w = f->trb_link[1] = y; if (w->trb_tag[1] == TRB_THREAD) { w->trb_tag[1] = TRB_CHILD; w->trb_link[1]->trb_tag[0] = TRB_THREAD; w->trb_link[1]->trb_link[0] = w; } } w->trb_color = f->trb_color; f->trb_color = TRB_BLACK; w->trb_link[1]->trb_color = TRB_BLACK; f->trb_link[1] = w->trb_link[0]; w->trb_link[0] = f; g->trb_link[g->trb_link[0] != f] = w; if (w->trb_tag[0] == TRB_THREAD) { w->trb_tag[0] = TRB_CHILD; f->trb_tag[1] = TRB_THREAD; f->trb_link[1] = w; } break; } } else { struct trb_node *w = f->trb_link[0]; if (w->trb_color == TRB_RED) { w->trb_color = TRB_BLACK; f->trb_color = TRB_RED; f->trb_link[0] = w->trb_link[1]; w->trb_link[1] = f; g->trb_link[g->trb_link[0] != f] = w; g = w; w = f->trb_link[0]; } if ((w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) && (w->trb_tag[1] == TRB_THREAD || w->trb_link[1]->trb_color == TRB_BLACK)) w->trb_color = TRB_RED; else { if (w->trb_tag[0] == TRB_THREAD || w->trb_link[0]->trb_color == TRB_BLACK) { struct trb_node *y = w->trb_link[1]; y->trb_color = TRB_BLACK; w->trb_color = TRB_RED; w->trb_link[1] = y->trb_link[0]; y->trb_link[0] = w; w = f->trb_link[0] = y; if (w->trb_tag[0] == TRB_THREAD) { w->trb_tag[0] = TRB_CHILD; w->trb_link[0]->trb_tag[1] = TRB_THREAD; w->trb_link[0]->trb_link[1] = w; } } w->trb_color = f->trb_color; f->trb_color = TRB_BLACK; w->trb_link[0]->trb_color = TRB_BLACK; f->trb_link[0] = w->trb_link[1]; w->trb_link[1] = f; g->trb_link[g->trb_link[0] != f] = w; if (w->trb_tag[1] == TRB_THREAD) { w->trb_tag[1] = TRB_CHILD; f->trb_tag[0] = TRB_THREAD; f->trb_link[0] = w; } break; } } x = f; f = find_parent (tree, x); if (f == (struct trb_node *) &tree->trb_root) break; dir = f->trb_link[0] != x; } } tree->trb_alloc->libavl_free (tree->trb_alloc, p); tree->trb_count--; return (void *) item; } Chapter 10 ========== 1. If we already have right-threaded trees, then we can get the benefits of a left-threaded tree just by reversing the sense of the comparison function, so there is no additional benefit to left-threaded trees. Section 10.5.1 -------------- 1. 671. = struct rtbst_node *s = r->rtbst_link[0]; while (s->rtbst_link[0] != NULL) { r = s; s = r->rtbst_link[0]; } p->rtbst_data = s->rtbst_data; if (s->rtbst_rtag == RTBST_THREAD) r->rtbst_link[0] = NULL; else r->rtbst_link[0] = s->rtbst_link[1]; p = s; Section 10.5.2 -------------- 1. This alternate version is not really an improvement: it runs up against the same problem as right-looking deletion, so it sometimes needs to search for a predecessor. 672. = struct rtbst_node *s = r->rtbst_link[1]; while (s->rtbst_rtag == RTBST_CHILD) { r = s; s = r->rtbst_link[1]; } p->rtbst_data = s->rtbst_data; if (s->rtbst_link[0] != NULL) { struct rtbst_node *t = s->rtbst_link[0]; while (t->rtbst_rtag == RTBST_CHILD) t = t->rtbst_link[1]; t->rtbst_link[1] = p; r->rtbst_link[1] = s->rtbst_link[0]; } else { r->rtbst_link[1] = p; r->rtbst_rtag = RTBST_THREAD; } p = s; Chapter 11 ========== Section 11.3 ------------ 1. /* Rotates right at *yp. */ static void rotate_right (struct rtbst_node **yp) { struct rtbst_node *y = *yp; struct rtbst_node *x = y->rtbst_link[0]; if (x->rtbst_rtag[1] == RTBST_THREAD) { x->rtbst_rtag = RTBST_CHILD; y->rtbst_link[0] = NULL; } else y->rtbst_link[0] = x->rtbst_link[1]; x->rtbst_link[1] = y; *yp = x; } /* Rotates left at *xp. */ static void rotate_left (struct rtbst_node **xp) { struct rtbst_node *x = *xp; struct rtbst_node *y = x->rtbst_link[1]; if (y->rtbst_link[0] == NULL) { x->rtbst_rtag = RTBST_THREAD; x->rtbst_link[1] = y; } else x->rtbst_link[1] = y->rtbst_link[0]; y->rtbst_link[0] = x; *xp = y; } Section 11.5.4 -------------- 1. There is no general efficient algorithm to find the parent of a node in an RTAVL tree. The lack of left threads means that half the time we must do a full search from the top of the tree. This would increase the execution time for deletion unacceptably. 2. 673. = if (p->rtavl_rtag == RTAVL_THREAD) { if (p->rtavl_link[0] != NULL) { } else { } } else { struct rtavl_node *r = p->rtavl_link[1]; if (r->rtavl_link[0] == NULL) { } else { } } tree->rtavl_alloc->libavl_free (tree->rtavl_alloc, p); 674. = struct rtavl_node *t = p->rtavl_link[0]; while (t->rtavl_rtag == RTAVL_CHILD) t = t->rtavl_link[1]; t->rtavl_link[1] = p->rtavl_link[1]; pa[k - 1]->rtavl_link[da[k - 1]] = p->rtavl_link[0]; This code is included in 673. 675. = pa[k - 1]->rtavl_link[da[k - 1]] = p->rtavl_link[da[k - 1]]; if (da[k - 1] == 1) pa[k - 1]->rtavl_rtag = RTAVL_THREAD; This code is included in 673. 676. = r->rtavl_link[0] = p->rtavl_link[0]; if (r->rtavl_link[0] != NULL) { struct rtavl_node *t = r->rtavl_link[0]; while (t->rtavl_rtag == RTAVL_CHILD) t = t->rtavl_link[1]; t->rtavl_link[1] = r; } pa[k - 1]->rtavl_link[da[k - 1]] = r; r->rtavl_balance = p->rtavl_balance; da[k] = 1; pa[k++] = r; This code is included in 673. 677. = struct rtavl_node *s; int j = k++; for (;;) { da[k] = 0; pa[k++] = r; s = r->rtavl_link[0]; if (s->rtavl_link[0] == NULL) break; r = s; } da[j] = 1; pa[j] = pa[j - 1]->rtavl_link[da[j - 1]] = s; if (s->rtavl_rtag == RTAVL_CHILD) r->rtavl_link[0] = s->rtavl_link[1]; else r->rtavl_link[0] = NULL; if (p->rtavl_link[0] != NULL) { struct rtavl_node *t = p->rtavl_link[0]; while (t->rtavl_rtag == RTAVL_CHILD) t = t->rtavl_link[1]; t->rtavl_link[1] = s; } s->rtavl_link[0] = p->rtavl_link[0]; s->rtavl_link[1] = p->rtavl_link[1]; s->rtavl_rtag = RTAVL_CHILD; s->rtavl_balance = p->rtavl_balance; This code is included in 673. 3. 678. = struct rtavl_node *s; da[k] = 0; pa[k++] = p; for (;;) { da[k] = 1; pa[k++] = r; s = r->rtavl_link[1]; if (s->rtavl_rtag == RTAVL_THREAD) break; r = s; } if (s->rtavl_link[0] != NULL) { struct rtavl_node *t = s->rtavl_link[0]; while (t->rtavl_rtag == RTAVL_CHILD) t = t->rtavl_link[1]; t->rtavl_link[1] = p; } p->rtavl_data = s->rtavl_data; if (s->rtavl_link[0] != NULL) r->rtavl_link[1] = s->rtavl_link[0]; else { r->rtavl_rtag = RTAVL_THREAD; r->rtavl_link[1] = p; } p = s; Chapter 13 ========== Section 13.4 ------------ 1. No. It would work, except for the important special case where q is the pseudo-root but p->pbst_parent is NULL. Section 13.7 ------------ 1. 679. = pbst 89> void pbst_balance (struct pbst_table *tree) { assert (tree != NULL); tree_to_vine (tree); vine_to_tree (tree); } 680. = static void vine_to_tree (struct pbst_table *tree) { unsigned long vine; /* Number of nodes in main vine. */ unsigned long leaves; /* Nodes in incomplete bottom level, if any. */ int height; /* Height of produced balanced tree. */ struct pbst_node *p, *q; /* Current visited node and its parent. */ pbst 91> pbst 92> pbst 93> } This code is included in 679. 681. = for (q = NULL, p = tree->pbst_root; p != NULL; q = p, p = p->pbst_link[0]) p->pbst_parent = q; This code is included in 680. 682. = static void compress (struct pbst_node *root, unsigned long count) { assert (root != NULL); while (count--) { struct pbst_node *red = root->pbst_link[0]; struct pbst_node *black = red->pbst_link[0]; root->pbst_link[0] = black; red->pbst_link[0] = black->pbst_link[1]; black->pbst_link[1] = red; red->pbst_parent = black; if (red->pbst_link[0] != NULL) red->pbst_link[0]->pbst_parent = red; root = black; } } This code is included in 680. Chapter 14 ========== Section 14.2 ------------ 1. /* Rotates right at *yp. */ static void rotate_right (struct pbst_node **yp) { struct pbst_node *y = *yp; struct pbst_node *x = y->pbst_link[0]; y->pbst_link[0] = x->pbst_link[1]; x->pbst_link[1] = y; *yp = x; x->pbst_parent = y->pbst_parent; y->pbst_parent = x; if (y->pbst_link[0] != NULL) y->pbst_link[0]->pbst_parent = y; } /* Rotates left at *xp. */ static void rotate_left (struct pbst_node **xp) { struct pbst_node *x = *xp; struct pbst_node *y = x->pbst_link[1]; x->pbst_link[1] = y->pbst_link[0]; y->pbst_link[0] = x; *xp = y; y->pbst_parent = x->pbst_parent; x->pbst_parent = y; if (x->pbst_link[1] != NULL) x->pbst_link[1]->pbst_parent = x; } Section 14.4.2 -------------- 1. Yes. Both code segments update the nodes along the direct path from y down to n, including node y but not node n. The plain AVL code excluded node n by updating nodes as it moved down to them and making arrival at node n the loop's termination condition. The PAVL code excludes node n by starting at it but updating the parent of each visited node instead of the node itself. There still could be a problem at the edge case where no nodes' balance factors were to be updated, but there is no such case. There is always at least one balance factor to update, because every inserted node has a parent whose balance factor is affected by its insertion. The one exception would be the first node inserted into an empty tree, but that was already handled as a special case. 2. Sure. There is no parallel to Exercise 5.4.4-4 because q is never the pseudo-root. Appendix E Catalogue of Algorithms ********************************** This appendix lists all of the algorithms described and implemented in this book, along with page number references. Each algorithm is listed under the least-specific type of tree to which it applies, which is not always the same as the place where it is introduced. For instance, rotations on threaded trees can be used in any threaded tree, so they appear under "Threaded Binary Search Tree Algorithms", despite their formal introduction later within the threaded AVL tree chapter. Sometimes multiple algorithms for accomplishing the same task are listed. In this case, the different algorithms are qualified by a few descriptive words. For the algorithm used in libavl, the description is enclosed by parentheses, and the description of each alternative algorithm is set off by a comma. Binary Search Tree Algorithms ============================= Advancing a traverser: *Note catalogue-entry-bst-17:: Backing up a traverser: *Note catalogue-entry-bst-18:: Balancing: *Note catalogue-entry-bst-27:: Copying (iterative; robust): *Note catalogue-entry-bst-23:: Copying, iterative: *Note catalogue-entry-bst-22:: Copying, recursive: *Note catalogue-entry-bst-21:: Copying, recursive; robust, version 1: *Note catalogue-entry-bst-46:: Copying, recursive; robust, version 2: *Note catalogue-entry-bst-47:: Copying, recursive; robust, version 3: *Note catalogue-entry-bst-48:: Creation: *Note catalogue-entry-bst-0:: Deletion (iterative): *Note catalogue-entry-bst-4:: Deletion, by merging: *Note catalogue-entry-bst-5:: Deletion, special case for no left child: *Note catalogue-entry-bst-40:: Deletion, with data modification: *Note catalogue-entry-bst-41:: Destruction (by rotation): *Note catalogue-entry-bst-24:: Destruction, iterative: *Note catalogue-entry-bst-26:: Destruction, recursive: *Note catalogue-entry-bst-25:: Getting the current item in a traverser: *Note catalogue-entry-bst-19:: Initialization of traverser as copy: *Note catalogue-entry-bst-15:: Initialization of traverser to found item: *Note catalogue-entry-bst-13:: Initialization of traverser to greatest item: *Note catalogue-entry-bst-12:: Initialization of traverser to inserted item: *Note catalogue-entry-bst-14:: Initialization of traverser to least item: *Note catalogue-entry-bst-11:: Initialization of traverser to null item: *Note catalogue-entry-bst-10:: Insertion (iterative): *Note catalogue-entry-bst-2:: Insertion, as root: *Note catalogue-entry-bst-3:: Insertion, as root, of existing node in arbitrary subtree: *Note catalogue-entry-bst-38:: Insertion, as root, of existing node in arbitrary subtree, robustly: *Note catalogue-entry-bst-39:: Insertion, using pointer to pointer: *Note catalogue-entry-bst-36:: Join, iterative: *Note catalogue-entry-bst-49:: Join, recursive: *Note catalogue-entry-bst-31:: Refreshing of a traverser (general): *Note catalogue-entry-bst-9:: Refreshing of a traverser, optimized: *Note catalogue-entry-bst-45:: Replacing the current item in a traverser: *Note catalogue-entry-bst-20:: Rotation, left: *Note catalogue-entry-bst-35:: Rotation, left double: *Note catalogue-entry-bst-32:: Rotation, right: *Note catalogue-entry-bst-34:: Rotation, right double: *Note catalogue-entry-bst-33:: Search: *Note catalogue-entry-bst-1:: Traversal (iterative; convenient, reliable): *Note catalogue-entry-bst-16:: Traversal, iterative: *Note catalogue-entry-bst-7:: Traversal, iterative; convenient: *Note catalogue-entry-bst-8:: Traversal, iterative; convenient, reliable: *Note catalogue-entry-bst-44:: Traversal, iterative; with dynamic stack: *Note catalogue-entry-bst-43:: Traversal, level order: *Note catalogue-entry-bst-37:: Traversal, recursive: *Note catalogue-entry-bst-6:: Traversal, recursive; with nested function: *Note catalogue-entry-bst-42:: Vine compression: *Note catalogue-entry-bst-30:: Vine from tree: *Note catalogue-entry-bst-28:: Vine to balanced tree: *Note catalogue-entry-bst-29:: AVL Tree Algorithms =================== Advancing a traverser: *Note catalogue-entry-avl-7:: Backing up a traverser: *Note catalogue-entry-avl-8:: Copying (iterative): *Note catalogue-entry-avl-9:: Deletion (iterative): *Note catalogue-entry-avl-2:: Deletion, with data modification: *Note catalogue-entry-avl-11:: Initialization of traverser to found item: *Note catalogue-entry-avl-6:: Initialization of traverser to greatest item: *Note catalogue-entry-avl-5:: Initialization of traverser to inserted item: *Note catalogue-entry-avl-3:: Initialization of traverser to least item: *Note catalogue-entry-avl-4:: Insertion (iterative): *Note catalogue-entry-avl-0:: Insertion, recursive: *Note catalogue-entry-avl-1:: Insertion, with bitmask: *Note catalogue-entry-avl-10:: Red-Black Tree Algorithms ========================= Deletion (iterative): *Note catalogue-entry-rb-2:: Deletion, with data modification: *Note catalogue-entry-rb-3:: Insertion (iterative): *Note catalogue-entry-rb-0:: Insertion, initial black: *Note catalogue-entry-rb-1:: Threaded Binary Search Tree Algorithms ====================================== Advancing a traverser: *Note catalogue-entry-tbst-10:: Backing up a traverser: *Note catalogue-entry-tbst-11:: Balancing: *Note catalogue-entry-tbst-15:: Copying: *Note catalogue-entry-tbst-13:: Copying a node: *Note catalogue-entry-tbst-12:: Creation: *Note catalogue-entry-tbst-0:: Deletion (parent tracking): *Note catalogue-entry-tbst-3:: Deletion, with data modification: *Note catalogue-entry-tbst-21:: Deletion, with parent node algorithm: *Note catalogue-entry-tbst-20:: Destruction: *Note catalogue-entry-tbst-14:: Initialization of traverser as copy: *Note catalogue-entry-tbst-9:: Initialization of traverser to found item: *Note catalogue-entry-tbst-7:: Initialization of traverser to greatest item: *Note catalogue-entry-tbst-6:: Initialization of traverser to inserted item: *Note catalogue-entry-tbst-8:: Initialization of traverser to least item: *Note catalogue-entry-tbst-5:: Initialization of traverser to null item: *Note catalogue-entry-tbst-4:: Insertion: *Note catalogue-entry-tbst-2:: Parent of a node: *Note catalogue-entry-tbst-19:: Rotation, left: *Note catalogue-entry-tbst-23:: Rotation, right: *Note catalogue-entry-tbst-22:: Search: *Note catalogue-entry-tbst-1:: Vine compression: *Note catalogue-entry-tbst-18:: Vine from tree: *Note catalogue-entry-tbst-16:: Vine to balanced tree: *Note catalogue-entry-tbst-17:: Threaded AVL Tree Algorithms ============================ Copying a node: *Note catalogue-entry-tavl-4:: Deletion (without stack): *Note catalogue-entry-tavl-3:: Deletion, with data modification: *Note catalogue-entry-tavl-6:: Deletion, with stack: *Note catalogue-entry-tavl-7:: Insertion: *Note catalogue-entry-tavl-0:: Rotation, left double, version 1: *Note catalogue-entry-tavl-1:: Rotation, left double, version 2: *Note catalogue-entry-tavl-5:: Rotation, right double: *Note catalogue-entry-tavl-2:: Threaded Red-Black Tree Algorithms ================================== Deletion (with stack): *Note catalogue-entry-trb-1:: Deletion, with data modification: *Note catalogue-entry-trb-3:: Deletion, without stack: *Note catalogue-entry-trb-4:: Insertion (with stack): *Note catalogue-entry-trb-0:: Insertion, without stack: *Note catalogue-entry-trb-2:: Right-Threaded Binary Search Tree Algorithms ============================================ Advancing a traverser: *Note catalogue-entry-rtbst-7:: Backing up a traverser: *Note catalogue-entry-rtbst-8:: Balancing: *Note catalogue-entry-rtbst-12:: Copying: *Note catalogue-entry-rtbst-10:: Copying a node: *Note catalogue-entry-rtbst-9:: Deletion (left-looking): *Note catalogue-entry-rtbst-3:: Deletion, right-looking: *Note catalogue-entry-rtbst-2:: Deletion, with data modification, left-looking: *Note catalogue-entry-rtbst-16:: Deletion, with data modification, right-looking: *Note catalogue-entry-rtbst-15:: Destruction: *Note catalogue-entry-rtbst-11:: Initialization of traverser to found item: *Note catalogue-entry-rtbst-6:: Initialization of traverser to greatest item: *Note catalogue-entry-rtbst-5:: Initialization of traverser to least item: *Note catalogue-entry-rtbst-4:: Insertion: *Note catalogue-entry-rtbst-1:: Rotation, left: *Note catalogue-entry-rtbst-18:: Rotation, right: *Note catalogue-entry-rtbst-17:: Search: *Note catalogue-entry-rtbst-0:: Vine compression: *Note catalogue-entry-rtbst-14:: Vine from tree: *Note catalogue-entry-rtbst-13:: Right-Threaded AVL Tree Algorithms ================================== Copying: *Note catalogue-entry-rtavl-2:: Copying a node: *Note catalogue-entry-rtavl-3:: Deletion (left-looking): *Note catalogue-entry-rtavl-1:: Deletion, right-looking: *Note catalogue-entry-rtavl-4:: Deletion, with data modification: *Note catalogue-entry-rtavl-5:: Insertion: *Note catalogue-entry-rtavl-0:: Right-Threaded Red-Black Tree Algorithms ======================================== Deletion: *Note catalogue-entry-rtrb-1:: Insertion: *Note catalogue-entry-rtrb-0:: Binary Search Tree with Parent Pointers Algorithms ================================================== Advancing a traverser: *Note catalogue-entry-pbst-6:: Backing up a traverser: *Note catalogue-entry-pbst-7:: Balancing (with later parent updates): *Note catalogue-entry-pbst-9:: Balancing, with integrated parent updates: *Note catalogue-entry-pbst-12:: Copying: *Note catalogue-entry-pbst-8:: Deletion: *Note catalogue-entry-pbst-1:: Initialization of traverser to found item: *Note catalogue-entry-pbst-4:: Initialization of traverser to greatest item: *Note catalogue-entry-pbst-3:: Initialization of traverser to inserted item: *Note catalogue-entry-pbst-5:: Initialization of traverser to least item: *Note catalogue-entry-pbst-2:: Insertion: *Note catalogue-entry-pbst-0:: Rotation, left: *Note catalogue-entry-pbst-16:: Rotation, right: *Note catalogue-entry-pbst-15:: Update parent pointers: *Note catalogue-entry-pbst-11:: Vine compression (with parent updates): *Note catalogue-entry-pbst-14:: Vine to balanced tree (without parent updates): *Note catalogue-entry-pbst-10:: Vine to balanced tree, with parent updates: *Note catalogue-entry-pbst-13:: AVL Tree with Parent Pointers Algorithms ======================================== Copying: *Note catalogue-entry-pavl-2:: Deletion: *Note catalogue-entry-pavl-1:: Insertion: *Note catalogue-entry-pavl-0:: Red-Black Tree with Parent Pointers Algorithms ============================================== Deletion: *Note catalogue-entry-prb-1:: Insertion: *Note catalogue-entry-prb-0:: Appendix F Index **************** aborting allocator: See ``Chapter 2''. array of search functions: See ``Chapter 3''. AVL copy function: See 5.7. AVL functions: See 5.3. AVL item deletion function: See 5.5. AVL item insertion function: See 5.4. AVL node structure: See 5.2. AVL traversal functions: See 5.6. AVL traverser advance function: See 5.6. AVL traverser back up function: See 5.6. AVL traverser greatest-item initializer: See 5.6. AVL traverser insertion initializer: See 5.6. AVL traverser least-item initializer: See 5.6. AVL traverser search initializer: See 5.6. AVL tree verify function: See 5.8. avl-test.c: See 5.8. avl.c: See 5. avl.h: See 5. avl_copy function: See 5.7. avl_delete function: See 5.5. AVL_H macro: See 5. avl_node structure: See 5.2. avl_probe function: See 5.4. avl_probe() local variables: See 5.4. avl_t_find function: See 5.6. avl_t_first function: See 5.6. avl_t_insert function: See 5.6. avl_t_last function: See 5.6. avl_t_next function: See 5.6. avl_t_prev function: See 5.6. bin-ary-test.c: See ``Chapter 3''. bin_cmp function: See ``Chapter 2''. binary search of ordered array: See 3.5. binary search tree entry: See 3.6. binary search using bsearch(): See ``Chapter 3''. binary_tree_entry structure: See 3.6. block structure: See 4.14.4. blp's implementation of bsearch(): See ``Chapter 3''. blp_bsearch function: See ``Chapter 3''. BST balance function: See 4.12. BST compression function: See 4.12.2.3. BST copy error helper function: See 4.10.3. BST copy function: See 4.10.3. BST creation function: See 4.5. BST destruction function: See 4.11.1. BST extra function prototypes: See 4.12. BST item deletion function: See 4.8. BST item deletion function, by merging: See 4.8.1. BST item insertion function: See 4.7. BST item insertion function, alternate version: See ``Chapter 4''. BST item insertion function, root insertion version: See 4.7.1. BST join function, iterative version: See ``Chapter 4''. BST join function, recursive version: See 4.13. BST maximum height: See 4.2.3. BST node structure: See 4.2.1. BST operations: See 4.4. BST overflow test function: See 4.14.3. BST print function: See 4.14.1.2. BST search function: See 4.6. BST table structure: See 4.2.2. BST test function: See 4.14.1. BST to vine function: See 4.12.1. BST traversal functions: See 4.9.3. BST traverser advance function: See 4.9.3.7. BST traverser back up function: See 4.9.3.8. BST traverser check function: See 4.14.1. BST traverser copy initializer: See 4.9.3.6. BST traverser current item function: See 4.9.3.9. BST traverser greatest-item initializer: See 4.9.3.3. BST traverser insertion initializer: See 4.9.3.5. BST traverser least-item initializer: See 4.9.3.2. BST traverser null initializer: See 4.9.3.1. BST traverser refresher: See 4.9.3. BST traverser refresher, with caching: See ``Chapter 4''. BST traverser replacement function: See 4.9.3.10. BST traverser search initializer: See 4.9.3.4. BST traverser structure: See 4.9.3. BST verify function: See 4.14.1.1. bst-test.c: See 4.14. bst.c: See 4. bst.h: See 4. bst_balance function: See 4.12. bst_copy function: See 4.10.3. bst_copy_iterative function <1>: See ``Chapter 4''. bst_copy_iterative function: See 4.10.2. bst_copy_recursive_1 function: See 4.10.1. bst_create function: See 4.5. bst_deallocate_recursive function: See ``Chapter 4''. bst_delete function <1>: See 4.8.1. bst_delete function: See 4.8. bst_destroy function <1>: See 4.11.3. bst_destroy function: See 4.11.1. bst_destroy_recursive function: See 4.11.2. bst_find function <1>: See ``Chapter 2''. bst_find function: See 4.6. BST_H macro: See 4. BST_MAX_HEIGHT macro: See 4.2.3. bst_node structure: See 4.2.1. bst_probe function <1>: See ``Chapter 4''. bst_probe function <2>: See 4.7.1. bst_probe function: See 4.7. bst_robust_copy_recursive_1 function: See ``Chapter 4''. bst_robust_copy_recursive_2 function: See ``Chapter 4''. bst_t_copy function: See 4.9.3.6. bst_t_cur function: See 4.9.3.9. bst_t_find function: See 4.9.3.4. bst_t_first function: See 4.9.3.2. bst_t_init function: See 4.9.3.1. bst_t_insert function: See 4.9.3.5. bst_t_last function: See 4.9.3.3. bst_t_next function: See 4.9.3.7. bst_t_prev function: See 4.9.3.8. bst_t_replace function: See 4.9.3.10. bst_table structure: See 4.2.2. bst_traverse_level_order function: See ``Chapter 4''. bst_traverser structure: See 4.9.3. BSTS functions: See ``Chapter 4''. BSTS structures: See ``Chapter 4''. BSTS test: See ``Chapter 4''. bsts.c: See ``Chapter 4''. bsts_find function: See ``Chapter 4''. bsts_insert function: See ``Chapter 4''. bsts_node structure: See ``Chapter 4''. bsts_tree structure: See ``Chapter 4''. calculate leaves: See 4.12.2.2. case 1 in AVL deletion: See 5.5.2. case 1 in BST deletion: See 4.8. case 1 in left-looking RTBST deletion: See 10.5.2. case 1 in left-side initial-black RB insertion rebalancing: See 6.4.5. case 1 in left-side PRB deletion rebalancing: See 15.4.2. case 1 in left-side PRB insertion rebalancing: See 15.3.2. case 1 in left-side RB deletion rebalancing: See 6.5.2. case 1 in left-side RB insertion rebalancing: See 6.4.3. case 1 in left-side RTRB insertion rebalancing: See 12.3.2. case 1 in left-side TRB deletion rebalancing: See 9.4.3. case 1 in left-side TRB insertion rebalancing: See 9.3.2. case 1 in PAVL deletion: See 14.5.1. case 1 in PBST deletion: See 13.4. case 1 in PRB deletion: See 15.4.1. case 1 in RB deletion: See 6.5.1. case 1 in right-looking RTBST deletion: See 10.5.1. case 1 in right-side initial-black RB insertion rebalancing: See 6.4.5. case 1 in right-side PRB deletion rebalancing: See 15.4.4. case 1 in right-side PRB insertion rebalancing: See 15.3.3. case 1 in right-side RB deletion rebalancing: See 6.5.4. case 1 in right-side RB insertion rebalancing: See 6.4.4. case 1 in right-side RTRB insertion rebalancing: See 12.3.2. case 1 in right-side TRB deletion rebalancing: See 9.4.5. case 1 in right-side TRB insertion rebalancing: See 9.3.3. case 1 in RTAVL deletion: See 11.5.2. case 1 in RTAVL deletion, right-looking: See ``Chapter 11''. case 1 in RTRB deletion: See 12.4.1. case 1 in TAVL deletion: See 8.5.2. case 1 in TAVL deletion, with stack: See ``Chapter 8''. case 1 in TBST deletion: See 7.7. case 1 in TRB deletion: See 9.4.2. case 1.5 in BST deletion: See ``Chapter 4''. case 2 in AVL deletion: See 5.5.2. case 2 in BST deletion: See 4.8. case 2 in left-looking RTBST deletion: See 10.5.2. case 2 in left-side initial-black RB insertion rebalancing: See 6.4.5. case 2 in left-side PRB deletion rebalancing: See 15.4.2. case 2 in left-side PRB insertion rebalancing: See 15.3.2. case 2 in left-side RB deletion rebalancing: See 6.5.2. case 2 in left-side RB insertion rebalancing: See 6.4.3. case 2 in left-side RTRB deletion rebalancing: See 12.4.2. case 2 in left-side RTRB insertion rebalancing: See 12.3.2. case 2 in left-side TRB deletion rebalancing: See 9.4.3. case 2 in left-side TRB insertion rebalancing: See 9.3.2. case 2 in PAVL deletion: See 14.5.1. case 2 in PBST deletion: See 13.4. case 2 in PRB deletion: See 15.4.1. case 2 in RB deletion: See 6.5.1. case 2 in right-looking RTBST deletion: See 10.5.1. case 2 in right-side initial-black RB insertion rebalancing: See 6.4.5. case 2 in right-side PRB deletion rebalancing: See 15.4.4. case 2 in right-side PRB insertion rebalancing: See 15.3.3. case 2 in right-side RB deletion rebalancing: See 6.5.4. case 2 in right-side RB insertion rebalancing: See 6.4.4. case 2 in right-side RTRB deletion rebalancing: See 12.4.2. case 2 in right-side RTRB insertion rebalancing: See 12.3.2. case 2 in right-side TRB deletion rebalancing: See 9.4.5. case 2 in right-side TRB insertion rebalancing: See 9.3.3. case 2 in RTAVL deletion: See 11.5.2. case 2 in RTAVL deletion, right-looking: See ``Chapter 11''. case 2 in RTRB deletion: See 12.4.1. case 2 in TAVL deletion: See 8.5.2. case 2 in TAVL deletion, with stack: See ``Chapter 8''. case 2 in TBST deletion: See 7.7. case 2 in TRB deletion: See 9.4.2. case 3 in AVL deletion: See 5.5.2. case 3 in AVL deletion, alternate version: See ``Chapter 5''. case 3 in BST deletion: See 4.8. case 3 in BST deletion, alternate version: See ``Chapter 4''. case 3 in left-looking RTBST deletion: See 10.5.2. case 3 in left-side initial-black RB insertion rebalancing: See 6.4.5. case 3 in left-side PRB insertion rebalancing: See 15.3.2. case 3 in left-side RB insertion rebalancing: See 6.4.3. case 3 in left-side RTRB insertion rebalancing: See 12.3.2. case 3 in left-side TRB insertion rebalancing: See 9.3.2. case 3 in PAVL deletion: See 14.5.1. case 3 in PBST deletion: See 13.4. case 3 in PRB deletion: See 15.4.1. case 3 in RB deletion: See 6.5.1. case 3 in right-looking RTBST deletion: See 10.5.1. case 3 in right-side initial-black RB insertion rebalancing: See 6.4.5. case 3 in right-side PRB insertion rebalancing: See 15.3.3. case 3 in right-side RB insertion rebalancing: See 6.4.4. case 3 in right-side RTRB insertion rebalancing: See 12.3.2. case 3 in right-side TRB insertion rebalancing: See 9.3.3. case 3 in RTAVL deletion: See 11.5.2. case 3 in RTAVL deletion, right-looking: See ``Chapter 11''. case 3 in RTRB deletion: See 12.4.1. case 3 in TAVL deletion: See 8.5.2. case 3 in TAVL deletion, with stack: See ``Chapter 8''. case 3 in TBST deletion: See 7.7. case 3 in TRB deletion: See 9.4.2. case 4 in left-looking RTBST deletion: See 10.5.2. case 4 in left-looking RTBST deletion, alternate version: See ``Chapter 10''. case 4 in right-looking RTBST deletion: See 10.5.1. case 4 in right-looking RTBST deletion, alternate version: See ``Chapter 10''. case 4 in RTAVL deletion: See 11.5.2. case 4 in RTAVL deletion, alternate version: See ``Chapter 11''. case 4 in RTAVL deletion, right-looking: See ``Chapter 11''. case 4 in RTRB deletion: See 12.4.1. case 4 in TAVL deletion: See 8.5.2. case 4 in TAVL deletion, alternate version: See ``Chapter 8''. case 4 in TAVL deletion, with stack: See ``Chapter 8''. case 4 in TBST deletion: See 7.7. case 4 in TBST deletion, alternate version: See ``Chapter 7''. case 4 in TRB deletion: See 9.4.2. case 4 in TRB deletion, alternate version: See ``Chapter 9''. cheat_search function: See ``Chapter 3''. cheating search: See ``Chapter 3''. check AVL tree structure: See 5.8. check BST structure: See 4.14.1.1. check counted nodes: See 4.14.1.1. check for tree height in range: See 4.12.2.2. check RB tree structure: See 6.6. check root is black: See 6.6. check that backward traversal works: See 4.14.1.1. check that forward traversal works: See 4.14.1.1. check that the tree contains all the elements it should: See 4.14.1.1. check that traversal from the null element works: See 4.14.1.1. check tree->bst_count is correct: See 4.14.1.1. check_traverser function: See 4.14.1. clean up after search tests: See ``Chapter 3''. command line parser: See ``B.2 Command-Line Parser''. compare two AVL trees for structure and content: See 5.8. compare two BSTs for structure and content: See 4.14.1. compare two PAVL trees for structure and content: See 14.8. compare two PBSTs for structure and content: See 13.8. compare two PRB trees for structure and content: See 15.5. compare two RB trees for structure and content: See 6.6. compare two RTAVL trees for structure and content: See 11.7. compare two RTBSTs for structure and content: See 10.10. compare two RTRB trees for structure and content: See 12.5. compare two TAVL trees for structure and content: See 8.7. compare two TBSTs for structure and content: See 7.12. compare two TRB trees for structure and content: See 9.5. compare_fixed_strings function: See ``Chapter 2''. compare_ints function <1>: See ``Chapter 3''. compare_ints function <2>: See ``Chapter 2''. compare_ints function: See 2.3. compare_trees function <1>: See 15.5. compare_trees function <2>: See 14.8. compare_trees function <3>: See 13.8. compare_trees function <4>: See 12.5. compare_trees function <5>: See 11.7. compare_trees function <6>: See 10.10. compare_trees function <7>: See 9.5. compare_trees function <8>: See 8.7. compare_trees function <9>: See 7.12. compare_trees function <10>: See 6.6. compare_trees function: See 5.8. comparison function for ints: See 2.3. compress function <1>: See ``Chapter 13''. compress function: See 7.11.2. copy_error_recovery function <1>: See 13.6. copy_error_recovery function <2>: See 10.7. copy_error_recovery function <3>: See 7.9. copy_error_recovery function: See 4.10.3. copy_node function <1>: See 11.6. copy_node function <2>: See 10.7. copy_node function <3>: See 8.6. copy_node function: See 7.9. default memory allocation functions: See 2.5. default memory allocator header: See 2.5. delete BST node: See 4.8. delete BST node by merging: See 4.8.1. delete item from AVL tree: See 5.5.2. delete item from PAVL tree: See 14.5.1. delete item from PRB tree: See 15.4.1. delete item from RB tree: See 6.5.1. delete item from RB tree, alternate version: See ``Chapter 6''. delete item from TAVL tree: See 8.5.2. delete item from TAVL tree, with stack: See ``Chapter 8''. delete item from TRB tree: See 9.4.2. delete PBST node: See 13.4. delete RTAVL node: See 11.5.2. delete RTAVL node, right-looking: See ``Chapter 11''. delete RTBST node, left-looking: See 10.5.2. delete RTBST node, right-looking: See 10.5.1. delete RTRB node: See 12.4.1. delete TBST node: See 7.7. delete_order enumeration: See 4.14.2. destroy a BST iteratively: See 4.11.3. destroy a BST recursively: See 4.11.2. ensure w is black in left-side PRB deletion rebalancing: See 15.4.2. ensure w is black in left-side RB deletion rebalancing: See 6.5.2. ensure w is black in left-side TRB deletion rebalancing: See 9.4.3. ensure w is black in right-side PRB deletion rebalancing: See 15.4.4. ensure w is black in right-side RB deletion rebalancing: See 6.5.4. ensure w is black in right-side TRB deletion rebalancing: See 9.4.5. error_node variable: See ``Chapter 4''. fail function: See 4.14.6. fallback_join function: See ``Chapter 4''. find BST node to delete: See 4.8. find BST node to delete by merging: See 4.8.1. find parent of a TBST node: See 8.5.6. find PBST node to delete: See 13.4. find predecessor of RTBST node with left child: See 10.6.5. find predecessor of RTBST node with no left child: See 10.6.5. find RTBST node to delete: See 10.5. find TBST node to delete: See 7.7. find TBST node to delete, with parent node algorithm: See ``Chapter 7''. find_parent function: See 8.5.6. finish up after BST deletion by merging: See 4.8.1. finish up after deleting BST node: See 4.8. finish up after deleting PBST node: See 13.4. finish up after deleting RTBST node: See 10.5. finish up after deleting TBST node: See 7.7. finish up after PRB deletion: See 15.4.3. finish up after RB deletion: See 6.5.3. finish up after RTRB deletion: See 12.4.3. finish up after TRB deletion: See 9.4.4. finish up and return after AVL deletion: See 5.5.5. first_item function: See 4.9.2.1. found insertion point in recursive AVL insertion: See 5.4.7. gen_balanced_tree function: See ``Chapter 4''. gen_deletions function: See ``Chapter 4''. gen_insertions function: See ``Chapter 4''. generate permutation for balanced tree: See ``Chapter 4''. generate random permutation of integers: See ``Chapter 4''. handle case where x has a right child: See 4.9.3.7. handle case where x has no right child: See 4.9.3.7. handle stack overflow during BST traversal: See ``Chapter 4''. handle_long_option function: See ``B.1 Option Parser''. handle_short_option function: See ``B.1 Option Parser''. initialize search test array: See ``Chapter 3''. initialize smaller and larger within binary search tree: See ``Chapter 3''. insert AVL node: See 5.4.2. insert n into arbitrary subtree: See ``Chapter 4''. insert new BST node, root insertion version: See 4.7.1. insert new node into RTBST tree: See 10.4. insert PAVL node: See 14.4.1. insert PBST node: See 13.3. insert PRB node: See 15.3.1. insert RB node: See 6.4.2. insert RTAVL node: See 11.4.1. insert RTRB node: See 12.3.1. insert TAVL node: See 8.4.1. insert TBST node: See 7.6. insert TRB node: See 9.3.1. insert_order enumeration: See 4.14.2. insertion and deletion order generation: See ``Chapter 4''. intermediate step between bst_copy_recursive_2() and bst_copy_iterative(): See ``Chapter 4''. iter variable: See ``Chapter 4''. iterative copy of BST: See 4.10.2. iterative traversal of BST, take 1: See 4.9.2. iterative traversal of BST, take 2: See 4.9.2. iterative traversal of BST, take 3: See 4.9.2. iterative traversal of BST, take 4: See 4.9.2. iterative traversal of BST, take 5: See 4.9.2. iterative traversal of BST, take 6: See 4.9.2.1. iterative traversal of BST, with dynamically allocated stack: See ``Chapter 4''. left-side rebalancing after initial-black RB insertion: See 6.4.5. left-side rebalancing after PRB deletion: See 15.4.2. left-side rebalancing after PRB insertion: See 15.3.2. left-side rebalancing after RB deletion: See 6.5.2. left-side rebalancing after RB insertion: See 6.4.3. left-side rebalancing after RTRB deletion: See 12.4.2. left-side rebalancing after RTRB insertion: See 12.3.2. left-side rebalancing after TRB deletion: See 9.4.3. left-side rebalancing after TRB insertion: See 9.3.2. left-side rebalancing case 1 in AVL deletion: See 5.5.4. left-side rebalancing case 1 in PAVL deletion: See 14.5.3. left-side rebalancing case 2 in AVL deletion: See 5.5.4. left-side rebalancing case 2 in PAVL deletion: See 14.5.3. level-order traversal: See ``Chapter 4''. LIBAVL_ALLOCATOR macro: See 2.5. libavl_allocator structure: See 2.5. license: See 1.4. main function <1>: See ``Chapter 3''. main function: See 4.14.7. main program to test binary_search_tree_array(): See ``Chapter 3''. make special case TBST vine into balanced tree and count height: See 7.11.2. make special case vine into balanced tree and count height: See 4.12.2.2. MAX_INPUT macro: See ``Chapter 3''. memory allocator: See 2.5. memory tracker: See 4.14.4. move BST node to root: See 4.7.1. move down then up in recursive AVL insertion: See 5.4.7. mt_allocate function: See 4.14.4. mt_allocator function: See 4.14.4. mt_allocator structure: See 4.14.4. mt_arg_index enumeration: See 4.14.4. mt_create function: See 4.14.4. mt_free function: See 4.14.4. mt_policy enumeration: See 4.14.4. new_block function: See 4.14.4. option parser: See ``B.1 Option Parser''. option structure: See 4.14.5. option_get function: See ``B.1 Option Parser''. option_init function: See ``B.1 Option Parser''. option_state structure: See ``B.1 Option Parser''. overflow testers <1>: See ``Chapter 4''. overflow testers: See 4.14.3. parse search test command line: See ``Chapter 3''. parse_command_line function: See ``B.2 Command-Line Parser''. PAVL copy function: See 14.7. PAVL functions: See 14.3. PAVL item deletion function: See 14.5. PAVL item insertion function: See 14.4. PAVL node structure: See 14.1. PAVL traversal functions: See 14.6. pavl-test.c: See 14.8. pavl.c: See 14. pavl.h: See 14. pavl_copy function: See 14.7. pavl_delete function: See 14.5. PAVL_H macro: See 14. pavl_node structure: See 14.1. pavl_probe function: See 14.4. PBST balance function: See 13.7. PBST balance function, with integrated parent updates: See ``Chapter 13''. PBST compression function: See ``Chapter 13''. PBST copy error helper function: See 13.6. PBST copy function: See 13.6. PBST extra function prototypes: See 13.7. PBST functions: See 13.2. PBST item deletion function: See 13.4. PBST item insertion function: See 13.3. PBST node structure: See 13.1. PBST traversal functions: See 13.5. PBST traverser advance function: See 13.5.5. PBST traverser back up function: See 13.5.6. PBST traverser first initializer: See 13.5.1. PBST traverser insertion initializer: See 13.5.4. PBST traverser last initializer: See 13.5.2. PBST traverser search initializer: See 13.5.3. pbst-test.c: See 13.8. pbst.c: See 13. pbst.h: See 13. pbst_balance function <1>: See ``Chapter 13''. pbst_balance function: See 13.7. pbst_copy function: See 13.6. pbst_delete function: See 13.4. PBST_H macro: See 13. pbst_node structure: See 13.1. pbst_probe function: See 13.3. pbst_t_find function: See 13.5.3. pbst_t_first function: See 13.5.1. pbst_t_insert function: See 13.5.4. pbst_t_last function: See 13.5.2. pbst_t_next function: See 13.5.5. pbst_t_prev function: See 13.5.6. permuted_integers function: See ``Chapter 4''. pgm_name variable: See 4.14.7. pool_allocator structure: See ``Chapter 2''. pool_allocator_free function: See ``Chapter 2''. pool_allocator_malloc function: See ``Chapter 2''. pool_allocator_tbl_create function: See ``Chapter 2''. PRB functions: See 15.2. PRB item deletion function: See 15.4. PRB item insertion function: See 15.3. PRB node structure: See 15.1. prb-test.c: See 15.5. prb.c: See 15. prb.h: See 15. prb_color enumeration: See 15.1. prb_delete function: See 15.4. PRB_H macro: See 15. prb_node structure: See 15.1. prb_probe function: See 15.3. print_tree_structure function <1>: See 10.10. print_tree_structure function: See 7.12. print_whole_tree function <1>: See 10.10. print_whole_tree function <2>: See 7.12. print_whole_tree function: See 4.14.1.2. probe function: See 5.4.7. process_node function: See 4.9.2.1. random number seeding: See ``Chapter 4''. RB functions: See 6.3. RB item deletion function: See 6.5. RB item insertion function: See 6.4. RB item insertion function, initial black: See 6.4.5. RB maximum height: See 6.2. RB node structure: See 6.2. RB tree verify function: See 6.6. rb-test.c: See 6.6. rb.c: See 6. rb.h: See 6. rb_color enumeration: See 6.2. rb_delete function: See 6.5. RB_H macro: See 6. RB_MAX_HEIGHT macro: See 6.2. rb_node structure: See 6.2. rb_probe function <1>: See 6.4.5. rb_probe function: See 6.4. rb_probe() local variables: See 6.4. rebalance + balance in TAVL insertion in left subtree, alternate version: See ``Chapter 8''. rebalance after AVL deletion: See 5.5.4. rebalance after AVL insertion: See 5.4.4. rebalance after initial-black RB insertion: See 6.4.5. rebalance after PAVL deletion: See 14.5.3. rebalance after PAVL insertion: See 14.4.3. rebalance after PRB insertion: See 15.3.2. rebalance after RB deletion: See 6.5.2. rebalance after RB insertion: See 6.4.3. rebalance after RTAVL deletion in left subtree: See 11.5.4. rebalance after RTAVL deletion in right subtree: See 11.5.4. rebalance after RTAVL insertion: See 11.4.2. rebalance after RTRB deletion: See 12.4.2. rebalance after RTRB insertion: See 12.3.2. rebalance after TAVL deletion: See 8.5.4. rebalance after TAVL deletion, with stack: See ``Chapter 8''. rebalance after TAVL insertion: See 8.4.2. rebalance after TRB insertion: See 9.3.2. rebalance AVL tree after insertion in left subtree: See 5.4.4. rebalance AVL tree after insertion in right subtree: See 5.4.5. rebalance for + balance factor after left-side RTAVL deletion: See 11.5.4. rebalance for + balance factor after right-side RTAVL deletion: See 11.5.4. rebalance for + balance factor after TAVL deletion in left subtree: See 8.5.4. rebalance for + balance factor after TAVL deletion in right subtree: See 8.5.5. rebalance for + balance factor in PAVL insertion in left subtree: See 14.4.3. rebalance for + balance factor in PAVL insertion in right subtree: See 14.4.4. rebalance for + balance factor in RTAVL insertion in left subtree: See 11.4.2. rebalance for + balance factor in RTAVL insertion in right subtree: See 11.4.2. rebalance for + balance factor in TAVL insertion in left subtree: See 8.4.2. rebalance for + balance factor in TAVL insertion in right subtree: See 8.4.3. rebalance for - balance factor after left-side RTAVL deletion: See 11.5.4. rebalance for - balance factor after right-side RTAVL deletion: See 11.5.4. rebalance for - balance factor after TAVL deletion in left subtree: See 8.5.4. rebalance for - balance factor after TAVL deletion in right subtree: See 8.5.5. rebalance for - balance factor in PAVL insertion in left subtree: See 14.4.3. rebalance for - balance factor in PAVL insertion in right subtree: See 14.4.4. rebalance for - balance factor in RTAVL insertion in left subtree: See 11.4.2. rebalance for - balance factor in RTAVL insertion in right subtree: See 11.4.2. rebalance for - balance factor in TAVL insertion in left subtree: See 8.4.2. rebalance for - balance factor in TAVL insertion in right subtree: See 8.4.3. rebalance for 0 balance factor after left-side RTAVL deletion: See 11.5.4. rebalance for 0 balance factor after right-side RTAVL deletion: See 11.5.4. rebalance for 0 balance factor after TAVL deletion in left subtree: See 8.5.4. rebalance for 0 balance factor after TAVL deletion in right subtree: See 8.5.5. rebalance PAVL tree after insertion in left subtree: See 14.4.3. rebalance PAVL tree after insertion in right subtree: See 14.4.4. rebalance RTAVL tree after insertion to left: See 11.4.2. rebalance RTAVL tree after insertion to right: See 11.4.2. rebalance TAVL tree after insertion in left subtree: See 8.4.2. rebalance TAVL tree after insertion in right subtree: See 8.4.3. rebalance tree after PRB deletion: See 15.4.2. rebalance tree after RB deletion: See 6.5.2. rebalance tree after TRB deletion: See 9.4.3. recurse_verify_tree function <1>: See 15.5. recurse_verify_tree function <2>: See 14.8. recurse_verify_tree function <3>: See 13.8. recurse_verify_tree function <4>: See 12.5. recurse_verify_tree function <5>: See 11.7. recurse_verify_tree function <6>: See 10.10. recurse_verify_tree function <7>: See 9.5. recurse_verify_tree function <8>: See 8.7. recurse_verify_tree function <9>: See 7.12. recurse_verify_tree function <10>: See 6.6. recurse_verify_tree function <11>: See 5.8. recurse_verify_tree function: See 4.14.1.1. recursive copy of BST, take 1: See 4.10.1. recursive copy of BST, take 2: See 4.10.1. recursive deallocation function: See ``Chapter 4''. recursive insertion into AVL tree: See 5.4.7. recursive traversal of BST: See 4.9.1. recursive traversal of BST, using nested function: See ``Chapter 4''. recursively verify AVL tree structure: See 5.8. recursively verify BST structure: See 4.14.1.1. recursively verify PAVL tree structure: See 14.8. recursively verify PBST structure: See 13.8. recursively verify PRB tree structure: See 15.5. recursively verify RB tree structure: See 6.6. recursively verify RTAVL tree structure: See 11.7. recursively verify RTBST structure: See 10.10. recursively verify RTRB tree structure: See 12.5. recursively verify TAVL tree structure: See 8.7. recursively verify TBST structure: See 7.12. recursively verify TRB tree structure: See 9.5. reduce TBST vine general case to special case: See 7.11.2. reduce vine general case to special case: See 4.12.2.2. reject_request function: See 4.14.4. right-side rebalancing after initial-black RB insertion: See 6.4.5. right-side rebalancing after PRB deletion: See 15.4.4. right-side rebalancing after PRB insertion: See 15.3.3. right-side rebalancing after RB deletion: See 6.5.4. right-side rebalancing after RB insertion: See 6.4.4. right-side rebalancing after RTRB deletion: See 12.4.2. right-side rebalancing after RTRB insertion: See 12.3.2. right-side rebalancing after TRB deletion: See 9.4.5. right-side rebalancing after TRB insertion: See 9.3.3. right-side rebalancing case 1 in PAVL deletion: See 14.5.4. right-side rebalancing case 2 in PAVL deletion: See 14.5.4. robust recursive copy of BST, take 1: See ``Chapter 4''. robust recursive copy of BST, take 2: See ``Chapter 4''. robust recursive copy of BST, take 3: See ``Chapter 4''. robust root insertion of existing node in arbitrary subtree: See ``Chapter 4''. robustly move BST node to root: See ``Chapter 4''. robustly search for insertion point in arbitrary subtree: See ``Chapter 4''. root insertion of existing node in arbitrary subtree: See ``Chapter 4''. root_insert function: See ``Chapter 4''. rotate left at x then right at y in AVL tree: See 5.4.4. rotate left at y in AVL tree: See 5.4.5. rotate right at x then left at y in AVL tree: See 5.4.5. rotate right at y in AVL tree: See 5.4.4. rotate_left function <1>: See ``Chapter 14''. rotate_left function <2>: See ``Chapter 11''. rotate_left function <3>: See ``Chapter 8''. rotate_left function: See ``Chapter 4''. rotate_right function <1>: See ``Chapter 14''. rotate_right function <2>: See ``Chapter 11''. rotate_right function <3>: See ``Chapter 8''. rotate_right function: See ``Chapter 4''. RTAVL copy function: See 11.6. RTAVL functions: See 11.2. RTAVL item deletion function: See 11.5. RTAVL item insertion function: See 11.4. RTAVL node copy function: See 11.6. RTAVL node structure: See 11.1. rtavl-test.c: See 11.7. rtavl.c: See 11. rtavl.h: See 11. rtavl_delete function: See 11.5. RTAVL_H macro: See 11. rtavl_node structure: See 11.1. rtavl_probe function: See 11.4. rtavl_tag enumeration: See 11.1. RTBST balance function: See 10.9. RTBST copy error helper function: See 10.7. RTBST copy function: See 10.7. RTBST destruction function: See 10.8. RTBST functions: See 10.2. RTBST item deletion function: See 10.5. RTBST item insertion function: See 10.4. RTBST main copy function: See 10.7. RTBST node copy function: See 10.7. RTBST node structure: See 10.1. RTBST print function: See 10.10. RTBST search function: See 10.3. RTBST traversal functions: See 10.6. RTBST traverser advance function: See 10.6.4. RTBST traverser back up function: See 10.6.5. RTBST traverser first initializer: See 10.6.1. RTBST traverser last initializer: See 10.6.2. RTBST traverser search initializer: See 10.6.3. RTBST tree-to-vine function: See 10.9. RTBST vine compression function: See 10.9. rtbst-test.c: See 10.10. rtbst.c: See 10. rtbst.h: See 10. rtbst_copy function: See 10.7. rtbst_delete function: See 10.5. rtbst_destroy function: See 10.8. rtbst_find function: See 10.3. RTBST_H macro: See 10. rtbst_node structure: See 10.1. rtbst_probe function: See 10.4. rtbst_t_find function: See 10.6.3. rtbst_t_first function: See 10.6.1. rtbst_t_last function: See 10.6.2. rtbst_t_next function: See 10.6.4. rtbst_t_prev function: See 10.6.5. rtbst_tag enumeration: See 10.1. RTRB functions: See 12.2. RTRB item deletion function: See 12.4. RTRB item insertion function: See 12.3. RTRB node structure: See 12.1. rtrb-test.c: See 12.5. rtrb.c: See 12. rtrb.h: See 12. rtrb_color enumeration: See 12.1. rtrb_delete function: See 12.4. RTRB_H macro: See 12. rtrb_node structure: See 12.1. rtrb_probe function: See 12.3. rtrb_tag enumeration: See 12.1. run search tests: See ``Chapter 3''. s variable <1>: See ``Chapter 11''. s variable <2>: See ``Chapter 9''. s variable: See ``Chapter 5''. search AVL tree for insertion point: See 5.4.1. search AVL tree for item to delete: See 5.5.1. search BST for insertion point, root insertion version: See 4.7.1. search for insertion point in arbitrary subtree: See ``Chapter 4''. search functions: See ``Chapter 3''. search of binary search tree stored as array: See 3.6. search PAVL tree for insertion point: See 14.4.1. search PBST tree for insertion point: See 13.3. search RB tree for insertion point: See 6.4.1. search RTAVL tree for insertion point: See 11.4.1. search RTAVL tree for item to delete: See 11.5.1. search RTBST for insertion point: See 10.4. search RTRB tree for insertion point: See 12.3.1. search TAVL tree for insertion point: See 8.4.1. search TAVL tree for item to delete: See 8.5.1. search TBST for insertion point: See 7.6. search test functions: See ``Chapter 3''. search test main program: See ``Chapter 3''. search TRB tree for insertion point: See 9.3.1. search TRB tree for item to delete: See 9.4.1. search_func structure: See ``Chapter 3''. seq-test.c: See ``Chapter 3''. sequentially search a sorted array of ints: See 3.3. sequentially search a sorted array of ints using a sentinel: See 3.4. sequentially search a sorted array of ints using a sentinel (2): See 3.4. sequentially search an array of ints: See 3.1. sequentially search an array of ints using a sentinel: See 3.2. set parents of main vine: See ``Chapter 13''. show bin-ary-test usage message: See ``Chapter 3''. srch-test.c: See ``Chapter 3''. start_timer function: See ``Chapter 3''. stoi function <1>: See ``Chapter 3''. stoi function: See ``B.2 Command-Line Parser''. stop_timer function: See ``Chapter 3''. string to integer function stoi(): See ``Chapter 3''. summing string lengths with next_item(): See 4.9.2.1. summing string lengths with walk(): See 4.9.2.1. symmetric case in PAVL deletion: See 14.5.4. symmetric case in TAVL deletion: See 8.5.5. symmetric case in TAVL deletion, with stack: See ``Chapter 8''. table assertion function control directives: See ``Chapter 2''. table assertion function prototypes: See 2.9. table assertion functions: See ``Chapter 2''. table count function prototype: See 2.7. table count macro: See ``Chapter 2''. table creation function prototypes: See 2.6. table function prototypes: See 2.11. table function types <1>: See 2.4. table function types: See 2.3. table insertion and deletion function prototypes: See 2.8. table insertion convenience functions: See ``Chapter 2''. table types: See 2.11. TAVL copy function: See 8.6. TAVL functions: See 8.3. TAVL item deletion function: See 8.5. TAVL item deletion function, with stack: See ``Chapter 8''. TAVL item insertion function: See 8.4. TAVL node copy function: See 8.6. TAVL node structure: See 8.1. tavl-test.c: See 8.7. tavl.c: See 8. tavl.h: See 8. tavl_delete function <1>: See ``Chapter 8''. tavl_delete function: See 8.5. TAVL_H macro: See 8. tavl_node structure: See 8.1. tavl_probe function: See 8.4. tavl_tag enumeration: See 8.1. tbl_allocator_default variable: See 2.5. tbl_assert_delete function: See ``Chapter 2''. tbl_assert_delete macro: See ``Chapter 2''. tbl_assert_insert function: See ``Chapter 2''. tbl_assert_insert macro: See ``Chapter 2''. tbl_comparison_func type: See 2.3. tbl_copy_func type: See 2.4. tbl_count macro: See ``Chapter 2''. tbl_free function: See 2.5. tbl_insert function: See ``Chapter 2''. tbl_item_func type: See 2.4. tbl_malloc_abort function: See ``Chapter 2''. tbl_replace function: See ``Chapter 2''. TBST balance function: See 7.11. TBST copy error helper function: See 7.9. TBST copy function: See 7.9. TBST creation function: See 7.4. TBST destruction function: See 7.10. TBST functions: See 7.3. TBST item deletion function: See 7.7. TBST item insertion function: See 7.6. TBST main balance function: See 7.11. TBST main copy function: See 7.9. TBST node copy function: See 7.9. TBST node structure: See 7.2. TBST print function: See 7.12. TBST search function: See 7.5. TBST table structure: See 7.2. TBST test function: See 7.12. TBST traversal functions: See 7.8. TBST traverser advance function: See 7.8.7. TBST traverser back up function: See 7.8.8. TBST traverser copy initializer: See 7.8.6. TBST traverser first initializer: See 7.8.2. TBST traverser insertion initializer: See 7.8.5. TBST traverser last initializer: See 7.8.3. TBST traverser null initializer: See 7.8.1. TBST traverser search initializer: See 7.8.4. TBST traverser structure: See 7.8. TBST tree-to-vine function: See 7.11.1. TBST verify function: See 7.12. TBST vine compression function: See 7.11.2. TBST vine-to-tree function: See 7.11.2. tbst-test.c: See 7.12. tbst.c: See 7. tbst.h: See 7. tbst_balance function: See 7.11. tbst_copy function: See 7.9. tbst_create function: See 7.4. tbst_delete function: See 7.7. tbst_destroy function: See 7.10. tbst_find function: See 7.5. TBST_H macro: See 7. tbst_link structure: See ``Chapter 7''. tbst_node structure <1>: See ``Chapter 7''. tbst_node structure: See 7.2. tbst_probe function: See 7.6. tbst_t_copy function: See 7.8.6. tbst_t_find function: See 7.8.4. tbst_t_first function: See 7.8.2. tbst_t_init function: See 7.8.1. tbst_t_insert function: See 7.8.5. tbst_t_last function: See 7.8.3. tbst_t_next function: See 7.8.7. tbst_t_prev function: See 7.8.8. tbst_table structure <1>: See ``Chapter 7''. tbst_table structure: See 7.2. tbst_tag enumeration: See 7.2. tbst_traverser structure: See 7.8. test BST traversal during modifications: See 4.14.1. test creating a BST and inserting into it: See 4.14.1. test declarations <1>: See 4.14.7. test declarations <2>: See 4.14.5. test declarations <3>: See 4.14.4. test declarations: See 4.14.2. test deleting from an empty tree: See 4.14.1. test deleting nodes from the BST and making copies of it: See 4.14.1. test destroying the tree: See 4.14.1. test enumeration: See 4.14.7. test main program: See 4.14.7. test prototypes <1>: See 4.14.6. test prototypes <2>: See 4.14.3. test prototypes: See 4.14.1. test TBST balancing: See 7.12. test utility functions: See 4.14.6. test.c: See 4.14. test.h: See 4.14. test_bst_copy function: See ``Chapter 4''. test_bst_t_find function: See ``Chapter 4''. test_bst_t_first function: See 4.14.3. test_bst_t_insert function: See ``Chapter 4''. test_bst_t_last function: See ``Chapter 4''. test_bst_t_next function: See ``Chapter 4''. test_bst_t_prev function: See ``Chapter 4''. test_correctness function <1>: See ``Chapter 4''. test_correctness function <2>: See 7.12. test_correctness function: See 4.14.1. TEST_H macro: See 4.14. test_options enumeration: See 4.14.7. test_overflow function <1>: See ``Chapter 4''. test_overflow function: See 4.14.3. time_seed function: See ``Chapter 4''. time_successful_search function: See ``Chapter 3''. time_unsuccessful_search function: See ``Chapter 3''. timer functions: See ``Chapter 3''. total_length function: See 4.9.2.1. transform left-side PRB deletion rebalancing case 3 into case 2: See 15.4.2. transform left-side RB deletion rebalancing case 3 into case 2: See 6.5.2. transform left-side RTRB deletion rebalancing case 3 into case 2: See 12.4.2. transform left-side TRB deletion rebalancing case 3 into case 2: See 9.4.3. transform right-side PRB deletion rebalancing case 3 into case 2: See 15.4.4. transform right-side RB deletion rebalancing case 3 into case 2: See 6.5.4. transform right-side RTRB deletion rebalancing case 3 into case 2: See 12.4.2. transform right-side TRB deletion rebalancing case 3 into case 2: See 9.4.5. trav_refresh function <1>: See ``Chapter 4''. trav_refresh function: See 4.9.3. traverse_iterative function <1>: See ``Chapter 4''. traverse_iterative function: See 4.9.2. traverse_recursive function: See 4.9.1. traverser constructor function prototypes: See 2.10.1. traverser manipulator function prototypes: See 2.10.2. traverser structure: See 4.9.2.1. TRB functions: See 9.2. TRB item deletion function: See 9.4. TRB item deletion function, without stack: See ``Chapter 9''. TRB item insertion function: See 9.3. TRB item insertion function, without stack: See ``Chapter 9''. TRB node structure: See 9.1. trb-test.c: See 9.5. trb.c: See 9. trb.h: See 9. trb_color enumeration: See 9.1. trb_delete function <1>: See ``Chapter 9''. trb_delete function: See 9.4. TRB_H macro: See 9. trb_node structure: See 9.1. trb_probe function <1>: See ``Chapter 9''. trb_probe function: See 9.3. trb_tag enumeration: See 9.1. tree_to_vine function <1>: See 10.9. tree_to_vine function <2>: See 7.11.1. tree_to_vine function: See 4.12.1. uniform binary search of ordered array: See ``Chapter 3''. update balance factors after AVL insertion: See 5.4.3. update balance factors after AVL insertion, with bitmasks: See ``Chapter 5''. update balance factors after PAVL insertion: See 14.4.2. update balance factors and rebalance after AVL deletion: See 5.5.3. update balance factors and rebalance after PAVL deletion: See 14.5.2. update balance factors and rebalance after RTAVL deletion: See 11.5.3. update balance factors and rebalance after TAVL deletion: See 8.5.3. update balance factors and rebalance after TAVL deletion, with stack: See ``Chapter 8''. update parent pointers function: See 13.7. update y's balance factor after left-side AVL deletion: See 5.5.3. update y's balance factor after right-side AVL deletion: See 5.5.6. update_parents function: See 13.7. usage function <1>: See ``Chapter 3''. usage function: See ``B.2 Command-Line Parser''. usage printer for search test program: See ``Chapter 3''. verify AVL node balance factor: See 5.8. verify binary search tree ordering: See 4.14.1.1. verify PBST node parent pointers: See 13.8. verify RB node color: See 6.6. verify RB node rule 1 compliance: See 6.6. verify RB node rule 2 compliance: See 6.6. verify RTRB node rule 1 compliance: See 12.5. verify TRB node rule 1 compliance: See 9.5. verify_tree function <1>: See 7.12. verify_tree function <2>: See 6.6. verify_tree function <3>: See 5.8. verify_tree function: See 4.14.1.1. vine to balanced BST function: See 4.12.2.2. vine to balanced PBST function: See 13.7. vine to balanced PBST function, with parent updates: See ``Chapter 13''. vine_to_tree function <1>: See ``Chapter 13''. vine_to_tree function <2>: See 13.7. vine_to_tree function: See 7.11.2. walk function <1>: See ``Chapter 4''. walk function: See 4.9.1. xmalloc function: See 4.14.6.