图书介绍

数据库系统实现 英文2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

数据库系统实现 英文
  • (美)HectorGarcia-Molina,JenniferWidom,JeffreyD.Ullman编著 著
  • 出版社: 北京:机械工业出版社
  • ISBN:9787111288602
  • 出版时间:2010
  • 标注页数:1184页
  • 文件大小:28MB
  • 文件页数:666页
  • 主题词:数据库系统-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

数据库系统实现 英文PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1 The Worlds of Database Systems1

1.1 The Evolution of Database Systems1

1.1.1 Early Database Management Systems2

1.1.2 Relational Database Systems3

1.1.3 Smaller and Smaller Systems3

1.1.4 Bigger and Bigger Systems4

1.1.5 Information Integration4

1.2 Overview of a Database Management System5

1.2.1 Data-Definition Language Commands5

1.2.2 Overview of Query Processing5

1.2.3 Storage and Buffer Management7

1.2.4 Transaction Processing8

1.2.5 The Query Processor9

1.3 Outline of Database-System Studies10

1.4 References for Chapter 112

Ⅰ Relational Database Modeling15

2 The Relational Model of Data17

2.1 An Overview of Data Models17

2.1.1 What is a Data Model?17

2.1.2 Important Data Models18

2.1.3 The Relational Model in Brief18

2.1.4 The Semistructured Model in Brief19

2.1.5 Other Data Models20

2.1.6 Comparison of Modeling Approaches21

2.2 Basics of the Relational Model21

2.2.1 Attributes22

2.2.2 Schemas22

2.2.3 Tuples22

2.2.4 Domains23

2.2.5 Equivalent Representations of a Relation23

2.2.6 Relation Instances24

2.2.7 Keys of Relations25

2.2.8 An Example Database Schema26

2.2.9 Exercises for Section 2.228

2.3 Defining a Relation Schema in SQL29

2.3.1 Relations in SQL29

2.3.2 Data Types30

2.3.3 Simple Table Declarations31

2.3.4 Modifying Relation Schemas33

2.3.5 Default Values34

2.3.6 Declaring Keys34

2.3.7 Exercises for Section 2.336

2.4 An Algebraic Query Language38

2.4.1 Why Do We Need a Special Query Language?38

2.4.2 What is an Algebra?38

2.4.3 Overview of Relational Algebra39

2.4.4 Set Operations on Relations39

2.4.5 Projection41

2.4.6 Selection42

2.4.7 Cartesian Product43

2.4.8 Natural Joins43

2.4.9 Theta-Joins45

2.4.10 Combining Operations to Form Queries47

2.4.11 Naming and Renaming49

2.4.12 Relationships Among Operations50

2.4.13 A Linear Notation for Algebraic Expressions51

2.4.14 Exercises for Section 2.452

2.5 Constraints on Relations58

2.5.1 Relational Algebra as a Constraint Language59

2.5.2 Referential Integrity Constraints59

2.5.3 Key Constraints60

2.5.4 Additional Constraint Examples61

2.5.5 Exercises for Section 2.562

2.6 Summary of Chapter 263

2.7 References for Chapter 265

3 Design Theory for Relational Databases67

3.1 Functional Dependencies67

3.1.1 Definition of Functional Dependency68

3.1.2 Keys of Relations70

3.1.3 Superkeys71

3.1.4 Exercises for Section 3.171

3.2 Rules About Functional Dependencies72

3.2.1 Reasoning About Functional Dependencies72

3.2.2 The Splitting/Combining Rule73

3.2.3 Trivial Functional Dependencies74

3.2.4 Computing the Closure of Attributes75

3.2.5 Why the Closure Algorithm Works77

3.2.6 The Transitive Rule79

3.2.7 Closing Sets of Functional Dependencies80

3.2.8 Projecting Functional Dependencies81

3.2.9 Exercises for Section 3.283

3.3 Design of Relational Database Schemas85

3.3.1 Anomalies86

3.3.2 Decomposing Relations86

3.3.3 Boyce-Codd Normal Form88

3.3.4 Decomposition into BCNF89

3.3.5 Exercises for Section 3.392

3.4 Decomposition:The Good,Bad,and Ugly93

3.4.1 Recovering Information from a Decomposition94

3.4.2 The Chase Test for Lossless Join96

3.4.3 Why the Chase Works99

3.4.4 Dependency Preservation100

3.4.5 Exercises for Section 3.4102

3.5 Third Normal Form102

3.5.1 Definition of Third Normal Form102

3.5.2 The Synthesis Algorithm for 3NF Schemas103

3.5.3 Why the 3NF Synthesis Algorithm Works104

3.5.4 Exercises for Section 3.5105

3.6 Multivalued Dependencies105

3.6.1 Attribute Independence and Its Consequent Redundancy106

3.6.2 Definition of Multivalued Dependencies107

3.6.3 Reasoning About Multivalued Dependencies108

3.6.4 Fourth Normal Form110

3.6.5 Decomposition into Fourth Normal Form111

3.6.6 Relationships Among Normal Forms113

3.6.7 Exercises for Section 3.6113

3.7 An Algorithm for Discovering MVD's115

3.7.1 The Closure and the Chase115

3.7.2 Extending the Chase to MVD's116

3.7.3 Why the Chase Works for MVD's118

3.7.4 Projecting MVD's119

3.7.5 Exercises for Section 3.7120

3.8 Summary of Chapter 3121

3.9 References for Chapter 3122

4 High-Level Database Models125

4.1 The Entity/Relationship Model126

4.1.1 Entity Sets126

4.1.2 Attributes126

4.1.3 Relationships127

4.1.4 Entity-Relationship Diagrams127

4.1.5 Instances of an E/R Diagram128

4.1.6 Multiplicity of Binary E/R Relationships129

4.1.7 Multiway Relationships130

4.1.8 Roles in Relationships131

4.1.9 Attributes on Relationships134

4.1.10 Converting Multiway Relationships to Binary134

4.1.11 Subclasses in the E/R Model135

4.1.12 Exercises for Section 4.1138

4.2 Design Principles140

4.2.1 Faithfulness140

4.2.2 Avoiding Redundancy141

4.2.3 Simplicity Counts142

4.2.4 Choosing the Right Relationships142

4.2.5 Picking the Right Kind of Element144

4.2.6 Exercises for Section 4.2145

4.3 Constraints in the E/R Model148

4.3.1 Keys in the E/R Model148

4.3.2 Representing Keys in the E/R Model149

4.3.3 Referential Integrity150

4.3.4 Degree Constraints151

4.3.5 Exercises for Section 4.3151

4.4 weak Entity Sets152

4.4.1 Causes of Weak Entity Sets152

4.4.2 Requirements for Weak Entity Sets153

4.4.3 Weak Entity Set Notation155

4.4.4 Exercises for Section 4.4156

4.5 From E/R Diagrams to Relational Designs157

4.5.1 From Entity Sets to Relations157

4.5.2 From E/R Relationships to Relations158

4.5.3 Combining Relations160

4.5.4 Handling Weak Entity Sets161

4.5.5 Exercises for Section 4.5163

4.6 Converting Subclass Structures to Relations165

4.6.1 E/R-Style Conversion166

4.6.2 An Object-Oriented Approach167

4.6.3 Using Null Values to Combine Relations168

4.6.4 Comparison of Approaches169

4.6.5 Exercises for Section 4.6171

4.7 Unified Modeling Language171

4.7.1 UML Classes172

4.7.2 Keys for UML classes173

4.7.3 Associations173

4.7.4 Self-Associations175

4.7.5 Association Classes175

4.7.6 Subclasses in UML176

4.7.7 Aggregations and Compositions177

4.7.8 Exercises for Section 4.7179

4.8 From UML Diagrams to Relations179

4.8.1 UML-to-Relations Basics179

4.8.2 From UML Subclasses to Relations180

4.8.3 From Aggregations and Compositions to Relations181

4.8.4 The UML Analog of Weak Entity Sets181

4.8.5 Exercises for Section 4.8183

4.9 Object Definition Language183

4.9.1 Class Declarations184

4.9.2 Attributes in ODL184

4.9.3 Relationships in ODL185

4.9.4 Inverse Relationships186

4.9.5 Multiplicity of Relationships186

4.9.6 Types in ODL188

4.9.7 Subclasses in ODL190

4.9.8 Declaring Keys in ODL191

4.9.9 Exercises for Section 4.9192

4.10 From ODL Designs to Relational Designs193

4.10.1 From ODL Classes to Relations193

4.10.2 Complex Attributes in Classes194

4.10.3 Representing Set-Valued Attributes195

4.10.4 Representing Other Type Constructors196

4.10.5 Representing ODL Relationships198

4.10.6 Exercises for Section 4.10198

4.11 Summary of Chapter 4200

4.12 References for Chapter 4202

Ⅱ Relational Database Programming203

5 Algebraic and Logical Query Languages205

5.1 Relational Operations on Bags205

5.1.1 Why Bags?206

5.1.2 Union,Intersection,and Difference of Bags207

5.1.3 Projection of Bags208

5.1.4 Selection on Bags209

5.1.5 Product of Bags210

5.1.6 Joins of Bags210

5.1.7 Exercises for Section 5.1212

5.2 Extended Operators of Relational Algebra213

5.2.1 Duplicate Elimination214

5.2.2 Aggregation Operators214

5.2.3 Grouping215

5.2.4 The Grouping Operator216

5.2.5 Extending the Projection Operator217

5.2.6 The Sorting Operator219

5.2.7 Outerjoins219

5.2.8 Exercises for Section 5.2222

5.3 A Logic for Relations222

5.3.1 Predicates and Atoms223

5.3.2 Arithmetic Atoms223

5.3.3 Datalog Rules and Queries224

5.3.4 Meaning of Datalog Rules225

5.3.5 Extensional and Intensional Predicates228

5.3.6 Datalog Rules Applied to Bags228

5.3.7 Exercises for Section 5.3230

5.4 Relational Algebra and Datalog230

5.4.1 Boolean Operations231

5.4.2 Projection232

5.4.3 Selection232

5.4.4 Product235

5.4.5 Joins235

5.4.6 Simulating Multiple Operations with Datalog236

5.4.7 Comparison Between Datalog and Relational Algebra238

5.4.8 Exercises for Section 5.4238

5.5 Summary of Chapter 5240

5.6 References for Chapter 5241

6 The Database Language SQL243

6.1 Simple Queries in SQL244

6.1.1 Projection in SQL246

6.1.2 Selection in SQL248

6.1.3 Comparison of Strings250

6.1.4 Pattern Matching in SQL250

6.1.5 Dates and Times251

6.1.6 Null Values and Comparisons Involving NULL252

6.1.7 The Truth-Value UNKNOWN253

6.1.8 Ordering the Output255

6.1.9 Exercises for Section 6.1256

6.2 Queries Involving More Than One Relation258

6.2.1 Products and Joins in SQL259

6.2.2 Disambiguating Attributes260

6.2.3 Tuple Variables261

6.2.4 Interpreting Multirelation Queries262

6.2.5 Union,Intersection,and Difference of Queries265

6.2.6 Exercises for Section 6.2267

6.3 Subqueries268

6.3.1 Subqueries that Produce Scalar Values269

6.3.2 Conditions Involving Relations270

6.3.3 Conditions Involving Tuples271

6.3.4 Correlated Subqueries273

6.3.5 Subqueries in FROM Clauses274

6.3.6 SQL Join Expressions275

6.3.7 Natural Joins276

6.3.8 Outerjoins277

6.3.9 Exercises for Section 6.3279

6.4 Full-Relation Operations281

6.4.1 Eliminating Duplicates281

6.4.2 Duplicates in Unions,Intersections,and Differences282

6.4.3 Grouping and Aggregation in SQL283

6.4.4 Aggregation Operators284

6.4.5 Grouping285

6.4.6 Grouping,Aggregation,and Nulls287

6.4.7 HAVING Clauses288

6.4.8 Exercises for Section 6.4289

6.5 Database Modifications291

6.5.1 Insertion291

6.5.2 Deletion292

6.5.3 Updates294

6.5.4 Exercises for Section 6.5295

6.6 Transactions in SQL296

6.6.1 Serializability296

6.6.2 Atomicity298

6.6.3 Transactions299

6.6.4 Read-Only Transactions300

6.6.5 Dirty Reads302

6.6.6 Other Isolation Levels304

6.6.7 Exercises for Section 6.6306

6.7 Summary of Chapter 6307

6.8 References for Chapter 6308

7 Constraints and Triggers311

7.1 Keys and Foreign Keys311

7.1.1 Declaring Foreign-Key Constraints312

7.1.2 Maintaining Referential Integrity313

7.1.3 Deferred Checking of Constraints315

7.1.4 Exercises for Section 7.1318

7.2 Constraints on Attributes and Tuples319

7.2.1 Not-Null Constraints319

7.2.2 Attribute-Based CHECK Constraints320

7.2.3 Tuple-Based CHECK Constraints321

7.2.4 Comparison of Tuple-and Attribute-Based Constraints323

7.2.5 Exercises for Section 7.2323

7.3 Modification of Constraints325

7.3.1 Giving Names to Constraints325

7.3.2 Altering Constraints on Tables326

7.3.3 Exercises for Section 7.3327

7.4 Assertions328

7.4.1 Creating Assertions328

7.4.2 Using Assertions329

7.4.3 Exercises for Section 7.4330

7.5 Triggers332

7.5.1 Triggers in SQL332

7.5.2 The Options for Trigger Design334

7.5.3 Exercises for Section 7.5337

7.6 Summary of Chapter 7339

7.7 References for Chapter 7339

8 Views and Indexes341

8.1 Virtual Views341

8.1.1 Declaring Views341

8.1.2 Querying Views343

8.1.3 Renaming Attributes343

8.1.4 Exercises for Section 8.1344

8.2 Modifying Views344

8.2.1 View Removal345

8.2.2 Updatable Views345

8.2.3 Instead-Of Triggers on Views347

8.2.4 Exercises for Section 8.2349

8.3 Indexes in SQL350

8.3.1 Motivation for Indexes350

8.3.2 Declaring Indexes351

8.3.3 Exercises for Section 8.3352

8.4 Selection of Indexes352

8.4.1 A Simple Cost Model352

8.4.2 Some Useful Indexes353

8.4.3 Calculating the Best Indexes to Create355

8.4.4 Automatic Selection of Indexes to Create357

8.4.5 Exercises for Section 8.4359

8.5 Materialized Views359

8.5.1 Maintaining a Materialized View360

8.5.2 Periodic Maintenance of Materialized Views362

8.5.3 Rewriting Queries to Use Materialized Views362

8.5.4 Automatic Creation of Materialized Views364

8.5.5 Exercises for Section 8.5365

8.6 Summary of Chapter 8366

8.7 References for Chapter 8367

9 SQL in a Server Environment369

9.1 The Three-Tier Architecture369

9.1.1 The Web-Server Tier370

9.1.2 The Application Tier371

9.1.3 The Database Tier372

9.2 The SQL Environment372

9.2.1 Environments373

9.2.2 Schemas374

9.2.3 Catalogs375

9.2.4 Clients and Servers in the SQL Environment375

9.2.5 Connections376

9.2.6 Sessions377

9.2.7 Modules378

9.3 The SQL/Host-Language Interface378

9.3.1 The Impedance Mismatch Problem380

9.3.2 Connecting SQL to the Host Language380

9.3.3 The DECLARE Section381

9.3.4 Using Shared Variables382

9.3.5 Single-Row Select Statements383

9.3.6 Cursors383

9.3.7 Modifications by Cursor386

9.3.8 Protecting Against Concurrent Updates387

9.3.9 Dynamic SQL388

9.3.10 Exercises for Section 9.3390

9.4 Stored Procedures391

9.4.1 Creating PSM Functions and Procedures391

9.4.2 Some Simple Statement Forms in PSM392

9.4.3 Branching Statements394

9.4.4 Queries in PSM395

9.4.5 Loops in PSM396

9.4.6 For-Loops398

9.4.7 Exceptions in PSM400

9.4.8 Using PSM Functions and Procedures402

9.4.9 Exercises for Section 9.4402

9.5 Using a call-Level Interface404

9.5.1 Introduction to SQL/CLI405

9.5.2 Processing Statements407

9.5.3 Fetching Data From a Query Result408

9.5.4 Passing Parameters to Queries410

9.5.5 Exercises for Section 9.5412

9.6 JDBC412

9.6.1 Introduction to JDBC412

9.6.2 Creating Statements in JDBC413

9.6.3 Cursor Operations in JDBC415

9.6.4 Parameter Passing416

9.6.5 Exercises for Section 9.6416

9.7 PHP416

9.7.1 PHP Basics417

9.7.2 Arrays418

9.7.3 The PEAR DB Library419

9.7.4 Creating a Database Connection Using DB419

9.7.5 Executing SQL Statements419

9.7.6 Cursor Operations in PHP420

9.7.7 Dynamic SQL in PHP421

9.7.8 Exercises for Section 9.7422

9.8 Summary of Chapter 9422

9.9 References for Chapter 9423

10 Advanced Topics in Relational Databases425

10.1 Security and User Authorization in SQL425

10.1.1 Privileges426

10.1.2 Creating Privileges427

10.1.3 The Privilege-Checking Process428

10.1.4 Granting Privileges430

10.1.5 Grant Diagrams431

10.1.6 Revoking Privileges433

10.1.7 Exercises for Section 10.1436

10.2 Recursion in SQL437

10.2.1 Defining Recursive Relations in SQL437

10.2.2 Problematic Expressions in Recursive SQL440

10.2.3 Exercises for Section 10.2443

10.3 The Object-Relational Model445

10.3.1 From Relations to Object-Relations445

10.3.2 Nested Relations446

10.3.3 References447

10.3.4 Object-Oriented Versus Object-Relational449

10.3.5 Exercises for Section 10.3450

10.4 User-Defined Types in SQL451

10.4.1 Defining Types in SQL451

10.4.2 Method Declarations in UDT's452

10.4.3 Method Definitions453

10.4.4 Declaring Relations with a UDT454

10.4.5 References454

10.4.6 Creating Object ID's for Tables455

10.4.7 Exercises for Section 10.4457

10.5 Operations on Object-Relational Data457

10.5.1 Following References457

10.5.2 Accessing Components of Tuples with a UDT458

10.5.3 Generator and Mutator Functions460

10.5.4 Ordering Relationships on UDT's461

10.5.5 Exercises for Section 10.5463

10.6 On-Line Analytic Processing464

10.6.1 OLAP and Data Warehouses465

10.6.2 OLAP Applications465

10.6.3 A Multidimensional View of OLAP Data466

10.6.4 Star Schemas467

10.6.5 Slicing and Dicing469

10.6.6 Exercises for Section 10.6472

10.7 Data Cubes473

10.7.1 The Cube Operator473

10.7.2 The Cube Operator in SQL475

10.7.3 Exercises for Section 10.7477

10.8 Summary of Chapter 10478

10.9 References for Chapter 10480

Ⅲ Modeling and Programming for semistructured Data481

11 The Semistructured-Data Model483

11.1 Semistructured Data483

11.1.1 Motivation for the Semistructured-Data Model483

11.1.2 Semistructured Data Representation484

11.1.3 Information Integration Via Semistructured Data486

11.1.4 Exercises for Section 11.1487

11.2 XML488

11.2.1 Semantic Tags488

11.2.2 XML With and Without a Schema489

11.2.3 Well-Formed XML489

11.2.4 Attributes490

11.2.5 Attributes That Connect Elements491

11.2.6 Namespaces493

11.2.7 XML and Databases493

11.2.8 Exercises for Section 11.2495

11.3 Document Type Definitions495

11.3.1 The Form of a DTD495

11.3.2 Using a DTD499

11.3.3 Attribute Lists499

11.3.4 Identifiers and References500

11.3.5 Exercises for Section 11.3502

11.4 XML Schema502

11.4.1 The Form of an XML Schema502

11.4.2 Elements503

11.4.3 Complex Types504

11.4.4 Attributes506

11.4.5 Restricted Simple Types507

11.4.6 Keys in XML Schema509

11.4.7 Foreign Keys in XML Schema510

11.4.8 Exercises for Section 11.4512

11.5 Summary of Chapter 11514

11.6 References for Chapter 11515

12 Programming Languages for XML517

12.1 XPath517

12.1.1 The XPath Data Model518

12.1.2 Document Nodes519

12.1.3 Path Expressions519

12.1.4 Relative Path Expressions521

12.1.5 Attributes in Path Expressions521

12.1.6 Axes521

12.1.7 Context of Expressions522

12.1.8 Wildcards523

12.1.9 Conditions in Path Expressions523

12.1.10 Exercises for Section 12.1526

12.2 XQuery528

12.2.1 XQuery Basics530

12.2.2 FLWR Expressions530

12.2.3 Replacement of Variables by Their Values534

12.2.4 Joins in XQuery536

12.2.5 XQuery Comparison Operators537

12.2.6 Elimination of Duplicates538

12.2.7 Quantification in XQuery539

12.2.8 Aggregations540

12.2.9 Branching in XQuery Expressions540

12.2.10 Ordering the Result of a Query541

12.2.11 Exercises for Section 12.2543

12.3 Extensible Stylesheet Language544

12.3.1 XSLT Basics544

12.3.2 Templates544

12.3.3 Obtaining Values From XML Data545

12.3.4 Recursive Use of Templates546

12.3.5 Iteration in XSLT549

12.3.6 Conditionals in XSLT551

12.3.7 Exercises for Section 12.3551

12.4 Summary of Chapter 12553

12.5 References for Chapter 12554

Ⅳ Database System Implementation555

13 Secondary Storage Management557

13.1 The Memory Hierarchy557

13.1.1 The Memory Hierarchy557

13.1.2 Transfer of Data Between Levels560

13.1.3 Volatile and Nonvolatile Storage560

13.1.4 Virtual Memory560

13.1.5 Exercises for Section 13.1561

13.2 Disks562

13.2.1 Mechanics of Disks562

13.2.2 The Disk Controller564

13.2.3 Disk Access Characteristics564

13.2.4 Exercises for Section 13.2567

13.3 Accelerating Access to Secondary Storage568

13.3.1 The I/O Model of Computation568

13.3.2 Organizing Data by Cylinders569

13.3.3 Using Multiple Disks570

13.3.4 Mirroring Disks571

13.3.5 Disk Scheduling and the Elevator Algorithm571

13.3.6 Prefetching and Large-Scale Buffering573

13.3.7 Exercises for Section 13.3573

13.4 Disk Failures575

13.4.1 Intermittent Failures576

13.4.2 Checksums576

13.4.3 Stable Storage577

13.4.4 Error-Handling Capabilities of Stable Storage578

13.4.5 Recovery from Disk Crashes578

13.4.6 Mirroring as a Redundancy Technique579

13.4.7 Parity Blocks580

13.4.8 An Improvement:RAID 5583

13.4.9 Coping With Multiple Disk Crashes584

13.4.10 Exercises for Section 13.4587

13.5 Arranging Data on Disk590

13.5.1 Fixed-Length Records590

13.5.2 Packing Fixed-Length Records into Blocks592

13.5.3 Exercises for Section 13.5593

13.6 Representing Block and Record Addresses593

13.6.1 Addresses in Client-Server Systems593

13.6.2 Logical and Structured Addresses595

13.6.3 Pointer Swizzling596

13.6.4 Returning Blocks to Disk600

13.6.5 Pinned Records and Blocks600

13.6.6 Exercises for Section 13.6602

13.7 Variable-Length Data and Records603

13.7.1 Records With Variable-Length Fields604

13.7.2 Records With Repeating Fields605

13.7.3 Variable-Format Records607

13.7.4 Records That Do Not Fit in a Block608

13.7.5 BLOBs608

13.7.6 Column Stores609

13.7.7 Exercises for Section 13.7610

13.8 Record Modifications612

13.8.1 Insertion612

13.8.2 Deletion614

13.8.3 Update615

13.8.4 Exercises for Section 13.8615

13.9 Summary of Chapter 13615

13.10 References for Chapter 13617

14 Index Structures619

14.1 Index-Structure Basics620

14.1.1 Sequential Files621

14.1.2 Dense Indexes621

14.1.3 Sparse Indexes622

14.1.4 Multiple Levels of Index623

14.1.5 Secondary Indexes624

14.1.6 Applications of Secondary Indexes625

14.1.7 Indirection in Secondary Indexes626

14.1.8 Document Retrieval and Inverted Indexes628

14.1.9 Exercises for Section 14.1631

14.2 B-Trees633

14.2.1 The Structure of B-trees634

14.2.2 Applications of B-trees637

14.2.3 Lookup in B-Trees639

14.2.4 Range Queries639

14.2.5 Insertion Into B-Trees640

14.2.6 Deletion From B-Trees642

14.2.7 Efficiency of B-Trees645

14.2.8 Exercises for Section 14.2646

14.3 Hash Tables648

14.3.1 Secondary-Storage Hash Tables649

14.3.2 Insertion Into a Hash Table649

14.3.3 Hash-Table Deletion650

14.3.4 Efficiency of Hash Table Indexes651

14.3.5 Extensible Hash Tables652

14.3.6 Insertion Into Extensible Hash Tables653

14.3.7 Linear Hash Tables655

14.3.8 Insertion Into Linear Hash Tables657

14.3.9 Exercises for Section 14.3659

14.4 Multidimensional Indexes661

14.4.1 Applications of Multidimensional Indexes661

14.4.2 Executing Range Queries Using Conventional Indexes663

14.4.3 Executing Nearest-Neighbor Queries Using Conventional Indexes664

14.4.4 Overview of Multidimensional Index Structures664

14.5 Hash Structures for Multidimensional Data665

14.5.1 Grid Files665

14.5.2 Lookup in a Grid File666

14.5.3 Insertion Into Grid Files667

14.5.4 Performance of Grid Files669

14.5.5 Partitioned Hash Functions671

14.5.6 Comparison of Grid Files and Partitioned Hashing673

14.5.7 Exercises for Section 14.5673

14.6 Tree Structures for Multidimensional Data675

14.6.1 Multiple-Key Indexes675

14.6.2 Performance of Multiple-Key Indexes676

14.6.3 kd-Trees677

14.6.4 Operations on kd-Trees679

14.6.5 Adapting kd-Trees to Secondary Storage681

14.6.6 Quad Trees681

14.6.7 R-Trees683

14.6.8 Operations on R-Trees684

14.6.9 Exercises for Section 14.6686

14.7 Bitmap Indexes688

14.7.1 Motivation for Bitmap Indexes689

14.7.2 Compressed Bitmaps691

14.7.3 Operating on Run-Length-Encoded Bit-Vectors693

14.7.4 Managing Bitmap Indexes693

14.7.5 Exercises for Section 14.7695

14.8 Summary of Chapter 14695

14.9 References for Chapter 14697

15 Query Execution701

15.1 Introduction to Physical-Query-Plan Operators703

15.1.1 Scanning Tables703

15.1.2 Sorting While Scanning Tables704

15.1.3 The Computation Model for Physical Operators704

15.1.4 Parameters for Measuring Costs705

15.1.5 I/O Cost for Scan Operators706

15.1.6 Iterators for Implementation of Physical Operators707

15.2 One-Pass Algorithms709

15.2.1 Ohe-Pass Algorithms for Tuple-at-a-Time Operations711

15.2.2 One-Pass Algorithms for Unary,Full-Relation Operations712

15.2.3 One-Pass Algorithms for Binary Operations715

15.2.4 Exercises for Section 15.2718

15.3 Nested-Loop Joins718

15.3.1 Tuple-Based Nested-Loop Join719

15.3.2 An Iterator for Tuple-Based Nested-Loop Join719

15.3.3 Block-Based Nested-Loop Join Algorithm719

15.3.4 Analysis of Nested-Loop Join721

15.3.5 Summary of Algorithms so Far722

15.3.6 Exercises for Section 15.3722

15.4 Two-Pass Algorithms Based on Sorting723

15.4.1 Two-Phase,Multiway Merge-Sort723

15.4.2 Duplicate Elimination Using Sorting725

15.4.3 Grouping and Aggregation Using Sorting726

15.4.4 A Sort-Based Union Algorithm726

15.4.5 Sort-Based Intersection and Diffefence727

15.4.6 A Simple Sort-Based Join Algorithm728

15.4.7 Analysis of Simple Sort-Join729

15.4.8 A More Efficient Sort-Based Join729

15.4.9 Summary of Sort-Based Algorithms730

15.4.10 Exercises for Section 15.4730

15.5 Two-Pass Algorithms Based on Hashing732

15.5.1 Partitioning Relations by Hashing732

15.5.2 A Hash-Based Algorithm for Duplicate Elimination732

15.5.3 Hash-Based Grouping and Aggregation733

15.5.4 Hash-Based Union,Intersection,and Difference734

15.5.5 The Hash-Join Algorithm734

15.5.6 Saving Some Disk I/O's735

15.5.7 Summary of Hash-Based Algorithms737

15.5.8 Exercises for Section 15.5738

15.6 Index-Based Algorithms739

15.6.1 Clustering and Nonclustering Indexes739

15.6.2 Index-Based Selection740

15.6.3 Joining by Using an Index742

15.6.4 Joins Using a Sorted Index743

15.6.5 Exercises for Section 15.6745

15.7 Buffer Management746

15.7.1 Buffer Management Architecture746

15.7.2 Buffer Management Strategies747

15.7.3 The Relationship Between Physical Operator Selection and Buffer Management750

15.7.4 Exercises for Section 15.7751

15.8 Algorithms Using More Than Two Passes752

15.8.1 Multipass Sort-Based Algorithms752

15.8.2 Performance of Multipass,Sort-Based Algorithms753

15.8.3 Multipass Hash-Based Algorithms754

15.8.4 Performance of Multipass Hash-Based Algorithms754

15.8.5 Exercises for Section 15.8755

15.9 Summary of Chapter 15756

15.10 References for Chapter 15757

16 The Query Compiler759

16.1 Parsing and Preprocessing760

16.1.1 Syntax Analysis and Parse Trees760

16.1.2 A Grammar for a Simple Subset of SQL761

16.1.3 The Preprocessor764

16.1.4 Preprocessing Queries Involving Views765

16.1.5 Exercises for Section 16.1767

16.2 Algebraic Laws for Improving Query Plans768

16.2.1 Commutative and Associative Laws768

16.2.2 Laws Involving Selection770

16.2.3 Pushing Selections772

16.2.4 Laws Involving Projection774

16.2.5 Laws About Joins and Products776

16.2.6 Laws Involving Duplicate Elimination777

16.2.7 Laws Involving Grouping and Aggregation777

16.2.8 Exercises for Section 16.2780

16.3 From Parse Trees to Logical Query Plans781

16.3.1 Conversion to Relational Algebra782

16.3.2 Removing Subqueries From Conditions783

16.3.3 Improving the Logical Query Plan788

16.3.4 Grouping Associative/Commutative Operators790

16.3.5 Exercises for Section 16.3791

16.4 Estimating the Cost of Operations792

16.4.1 Estimating Sizes of Intermediate Relations793

16.4.2 Estimating the Size of a Projection794

16.4.3 Estimating the Size of a Selection794

16.4.4 Estimating the Size of a Join797

16.4.5 Natural Joins With Multiple Join Attributes799

16.4.6 Joins of Many Relations800

16.4.7 Estimating Sizes for Other Operations801

16.4.8 Exercises for Section 16.4802

16.5 Introduction to Cost-Based Plan Selection803

16.5.1 Obtaining Estimates for Size Parameters804

16.5.2 Computation of Statistics807

16.5.3 Heuristics for Reducing the Cost of Logical Query Plans808

16.5.4 Approaches to Enumerating Physical Plans810

16.5.5 Exercises for Section 16.5813

16.6 Choosing an Order for Joins814

16.6.1 Significance of Left and Right Join Arguments815

16.6.2 Join Trees815

16.6.3 Left-Deep Join Trees816

16.6.4 Dynamic Programming to Select a Join Order and Grouping819

16.6.5 Dynamic Programming With More Detailed Cost Functions823

16.6.6 A Greedy Algorithm for Selecting a Join Order824

16.6.7 Exercises for Section 16.6825

16.7 Completing the Physical-Query-Plan826

16.7.1 Choosing a Selection Method827

16.7.2 Choosing a Join Method829

16.7.3 Pipelining Versus Materialization830

16.7 4 Pipelining Unary Operations830

16.7.5 Pipelining Binary Operations830

16.7.6 Notation for Physical Query Plans834

16.7.7 Ordering of Physical Operations837

16.7.8 Exercises for Section 16.7838

16.8 Summary of Chapter 16839

16.9 References for Chapter 16841

17 Coping With System Failures843

17.1 Issues and Models for Resilient Operation843

17.1.1 Failure Modes844

17.1.2 More About Transactions845

17.1.3 Correct Execution of Transactions846

17.1.4 The Primitive Operations of Transactions848

17.1.5 Exercises for Section 17.1851

17.2 Undo Logging851

17.2.1 Log Records851

17.2.2 The Undo-Logging Rules853

17.2.3 Recovery Using Undo Logging855

17.2.4 Checkpointing857

17.2.5 Nonquiescent Checkpointing858

17.2.6 Exercises for Section 17.2862

17.3 Redo Logging863

17.3.1 The Redo-Logging Rule863

17.3.2 Recovery With Redo Logging864

17.3.3 Checkpointing a Redo Log866

17.3.4 Recovery With a Checkpointed Redo Log867

17.3.5 Exercises for Section 17.3868

17.4 Undo/Redo Logging869

17.4.1 The Undo/Redo Rules870

17.4.2 Recovery With Undo/Redo Logging870

17.4.3 Checkpointing an Undo/Redo Log872

17.4.4 Exercises for Section 17.4874

17.5 Protecting Against Media Failures875

17.5.1 The Archive875

17.5.2 Nonquiescent Archiving875

17.5.3 Recovery Using an Archive and Log878

17.5.4 Exercises for Section 17.5879

17.6 Summary of Chapter 17879

17.7 References for Chapter 17881

18 Concurrency Control883

18.1 Serial and Serializable Schedules884

18.1.1 Schedules884

18.1.2 Serial Schedules885

18.1.3 Serializable Schedules886

18.1.4 The Effect of Transaction Semantics887

18.1.5 A Notation for Transactions and Schedules889

18.1.6 Exercises for Section 18.1889

18.2 Conflict-Serializability890

18.2.1 Conflicts890

18.2.2 Precedence Graphs and a Test for Conflict-Serializability892

18.2.3 Why the Precedence-Graph Test Works894

18.2.4 Exercises for Section 18.2895

18.3 Enforcing Serializability by Locks897

18.3.1 Locks898

18.3.2 The Locking Scheduler900

18.3.3 Two-Phase Locking900

18.3.4 Why Two-Phase Locking Works901

18.3.5 Exercises for Section 18.3903

18.4 Locking Systems With Several Lock Modes905

18.4.1 Shared and Exclusive Locks905

18.4.2 Compatibility Matrices907

18.4.3 Upgrading Locks908

18.4.4 Update Locks909

18.4.5 Increment Locks911

18.4.6 Exercises for Section 18.4913

18.5 An Architecture for a Locking Scheduler915

18.5.1 A Scheduler That Inserts Lock Actions915

18.5.2 The Lock Table918

18.5.3 Exercises for Section 18.5921

18.6 Hierarchies of Database Elements921

18.6.1 Locks With Multiple Granularity921

18.6.2 Warning Locks922

18.6.3 Phantoms and Handling Insertions Correctly926

18.6.4 Exercises for Section 18.6927

18.7 The Tree Protocol927

18.7.1 Motivation for Tree-Based Locking927

18.7.2 Rules for Access to Tree-Structured Data928

18.7.3 Why the Tree Protocol Works929

18.7.4 Exercises for Section 18.7932

18.8 Concurrency Control by Timestamps933

18.8.1 Timestamps934

18.8.2 Physically Unrealizable Behaviors934

18.8.3 Problems With Dirty Data935

18.8.4 The Rules for Timestamp-Based Scheduling937

18.8.5 Multiversion Timestamps939

18.8.6 Timestamps Versus Locking941

18.8.7 Exercises for Section 18.8942

18.9 Concurrency Control by Validation942

18.9.1 Architecture of a Validation-Based Scheduler942

18.9.2 The Validation Rules943

18.9.3 Comparison of Three Concurrency-Control Mechanisms946

18.9.4 Exercises for Section 18.9948

18.10 Summary of Chapter 18948

18.11 References for Chapter 18950

19 More About Transaction Management953

19.1 Serializability and Recoverability953

19.1.1 The Dirty-Data Problem954

19.1.2 Cascading Rollback955

19.1.3 Recoverable Schedules956

19.1.4 Schedules That Avoid Cascading Rollback957

19.1.5 Managing Rollbacks Using Locking957

19.1.6 Group Commit959

19.1.7 Logical Logging960

19.1.8 Recovery From Logical Logs963

19.1.9 Exercises for Section 19.1965

19.2 Deadlocks966

19.2.1 Deadlock Detection by Timeout967

19.2.2 The Waits-For Graph967

19.2.3 Deadlock Prevention by Ordering Elements970

19.2.4 Detecting Deadlocks by Timestamps970

19.2.5 Comparison of Deadlock-Management Methods972

19.2.6 Exercises for Section 19.2974

19.3 Long-Duration Transactions975

19.3.1 Problems of Long Transactions976

19.3.2 Sagas978

19.3.3 Compensating Transactions979

19.3.4 Why Compensating Transactions Work980

19.3.5 Exercises for Section 19.3981

19.4 Summary of Chapter 19982

19.5 References for Chapter 19983

20 Parallel and Distributed Databases985

20.1 Parallel Algorithms on Relations985

20.1.1 Models of Parallelism986

20.1.2 Tuple-at-a-Time Operations in Parallel989

20.1.3 Parallel Algorithms for Full-Relation Operations989

20.1.4 Performance of Parallel Algorithms990

20.1.5 Exercises for Section 20.1993

20.2 The Map-Reduce Parallelism Framework993

20.2.1 The Storage Model993

20.2.2 The Map Function994

20.2.3 The Reduce Function995

20.2.4 Exercises for Section 20.2996

20.3 Distributed Databases997

20.3.1 Distribution of Data997

20.3.2 Distributed Transactions998

20.3.3 Data Replication999

20.3.4 Exercises for Section 20.31000

20.4 Distributed Query Processing1000

20.4.1 The Distributed Join Problem1000

20.4.2 Semijoin Reductions1001

20.4.3 Joins of Many Relations1002

20.4.4 Acyclic Hypergraphs1003

20.4.5 Full Reducers for Acyclic Hypergraphs1005

20.4.6 Why the Full-Reducer Algorithm Works1006

20.4.7 Exercises for Section 20.41007

20.5 Distributed Commit1008

20.5.1 Supporting Distributed Atomicity1008

20.5.2 Two-Phase Commit1009

20.5.3 Recovery of Distributed Transactions1011

20.5.4 Exercises for Section 20.51013

20.6 Distributed Locking1014

20.6.1 Centralized Lock Systems1015

20.6.2 A Cost Model for Distributed Locking Algorithms1015

20.6.3 Locking Replicated Elements1016

20.6.4 Primary-Copy Locking1017

20.6.5 Global Locks From Local Locks1017

20.6.6 Exercises for Section 20.61019

20.7 Peer-to-Peer Distributed Search1020

20.7.1 Peer-to-Peer Networks1020

20.7.2 The Distributed-Hashing Problem1021

20.7.3 Centralized Solutions for Distributed Hashing1022

20.7.4 Chord Circles1022

20.7.5 Links in Chord Circles1024

20.7.6 Search Using Finger Tables1024

20.7.7 Adding New Nodes1027

20.7.8 When a Peer Leaves the Network1030

20.7.9 When a Peer Fails1030

20.7.10 Exercises for Section 20.71031

20.8 Summary of Chapter 201031

20.9 References for Chapter 201033

Ⅴ Other Issues in Management of Massive Data1035

21 Information Integration1037

21.1 Introduction to Information Integration1037

21.1.1 Why Information Integration?1038

21.1.2 The Heterogeneity Problem1040

21.2 Modes of Information Integration1041

21.2.1 Federated Database Systems1042

21.2.2 Data Warehouses1043

21.2.3 Mediators1046

21.2.4 Exercises for Section 21.21048

21.3 Wrappers in Mediator-Based Systems1049

21.3.1 Templates for Query Patterns1050

21.3.2 Wrapper Generators1051

21.3.3 Filters1052

21.3.4 Other Operations at the Wrapper1053

21.3.5 Exercises for Section 21.31054

21.4 Capability-Based Optimization1056

21.4.1 The Problem of Limited Source Capabilities1056

21.4.2 A Notation for Describing Source Capabilities1057

21.4.3 Capability-Based Query-Plan Selection1058

21.4.4 Adding Cost-Based Optimization1060

21.4.5 Exercises for Section 21.41060

21.5 Optimizing Mediator Queries1061

21.5.1 Simplified Adornment Notation1061

21.5.2 Obtaining Answers for Subgoals1062

21.5.3 The Chain Algorithm1063

21.5.4 Incorporating Union Views at the Mediator1067

21.5.5 Exercises for Section 21.51068

21.6 Local-as-View Mediators1069

21.6.1 Motivation for LAV Mediators1069

21.6.2 Terminology for LAV Mediation1070

21.6.3 Expanding Solutions1071

21.6.4 Containment of Conjunctive Queries1073

21.6.5 Why the Containment-Mapping Test Works1075

21.6.6 Finding Solutions to a Mediator Query1076

21.6.7 Why the LMSS Theorem Holds1077

21.6.8 Exercises for Section 21.61078

21.7 Entity Resolution1078

21.7.1 Deciding Whether Records Represent a Common Entity1079

21.7.2 Merging Similar Records1081

21.7.3 Useful Properties of Similarity and Merge Functions1082

21.7.4 The R-Swoosh Algorithm for ICAR Records1083

21.7 5 Why R-Swoosh Works1086

21.7 6 Other Approaches to Entity Resolution1086

21.7.7 Exercises for Section 21.71087

21.8 Summary of Chapter 211089

21.9 References for Chapter 211091

22 Data Mining1093

22.1 Frequent-Itemset Mining1093

22.1.1 The Market-Basket Model1094

22.1.2 Basic Definitions1095

22.1.3 Association Rules1097

22.1.4 The Computation Model for Frequent Itemsets1098

22.1.5 Exercises for Section 22.11099

22.2 Algorithms for Finding Frequent Itemsets1100

22.2.1 The Distribution of Frequent Itemsets1100

22.2.2 The Naive Algorithm for Finding Frequent Itemsets1101

22.2.3 The A-Priori Algorithm1102

22.2.4 Implementation of the A-Priori Algorithm1104

22.2.5 Making Better Use of Main Memory1105

22.2.6 When to Use the PCY Algorithm1106

22.2.7 The Multistage Algorithm1107

22.2.8 Exercises for Section 22.21109

22.3 Finding Similar Items1110

22.3.1 The Jaccard Measure of Similarity1110

22.3.2 Applications of Jaccard Similarity1110

22.3.3 Minhashing1112

22.3.4 Minhashing and Jaccard Distance1113

22.3.5 Why Minhashing Works1113

22.3.6 Implementing Minhashing1114

22.3.7 Exercises for Section 22.31115

22.4 Locality-Sensitive Hashing1116

22.4.1 Entity Resolution as an Example of LSH1117

22.4.2 Locality-Sensitive Hashing of Signatures1118

22.4.3 Combining Minhashing and Locality-Sensitive Hashing1121

22.4.4 Exercises for Section 22.41122

22.5 Clustering of Large-Scale Data1123

22.5.1 Applications of Clustering1123

22.5.2 Distance Measures1125

22.5.3 Agglomerative Clustering1128

22.5.4 k-Means Algorithms1130

22.5.5 k-Means for Large-Scale Data1132

22.5.6 Processing a Memory Load of Points1133

22.5.7 Exercises for Section 22.51136

22.6 Summary of Chapter 221137

22.7 References for Chapter 221139

23 Database Systems and the Internet1141

23.1 The Architecture of a Search Engine1141

23.1.1 Components of a Search Engine1142

23.1.2 Web Crawlers1143

23.1.3 Query Processing in Search Engines1146

23.1.4 Ranking Pages1146

23.2 PageRank for Identifying Important Pages1147

23.2.1 The Intuition Behind PageRank1147

23.2.2 Recursive Formulation of PageRank—First Try1148

23.2.3 Spider Traps and Dead Ends1150

23.2.4 PageRank Accounting for Spider Traps and Dead Ends1153

23.2.5 Exercises for Section 23.21154

23.3 Topic-Specific PageRank1156

23.3.1 Teleport Sets1156

23.3.2 Calculating A Topic-Specific PageRank1158

23.3.3 Link Spam1159

23.3.4 Topic-Specific PageRank and Link Spam1160

23.3.5 Exercises for Section 23.31161

23.4 Data Streams1161

23.4.1 Data-Stream-Management Systems1162

23.4.2 Stream Applications1163

23.4.3 A Data-Stream Data Model1164

23.4.4 Converting Streams Into Relations1165

23.4.5 Converting Relations Into Streams1166

23.4.6 Exercises for Section 23.41168

23.5 Data Mining of Streams1169

23.5.1 Motivation1169

23.5.2 Counting Bits1171

23.5.3 Counting the Number of Distinct Elements1175

23.5.4 Exercises for Section 23.51176

23.6 Summary of Chapter 231177

23.7 References for Chapter 231179

热门推荐