• SPSS Programming and Data Management, 2nd Edition A Guide for SPSS and SAS Users


  •   
  • FileName: spss_programming_and_data_management.pdf [preview-online]
    • Abstract: SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. ... A portion of the SPSS software contains zlib technology. Copyright ...

Download the ebook

SPSS Programming
®
and Data Management, 2nd Edition
A Guide for SPSS and SAS Users
® ®
Raynald Levesque
For more information about SPSS® software products, please visit our Web site at http://www.spss.com
or contact
SPSS Inc.
233 South Wacker Drive, 11th Floor
Chicago, IL 60606-6412
Tel: (312) 651-3000
Fax: (312) 651-3668
SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. for its proprietary
computer software. No material describing such software may be produced or distributed without the written
permission of the owners of the trademark and license rights in the software and the copyrights in the published
materials.
The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by
the Government is subject to restrictions as set forth in subdivision (c)(1)(ii) of The Rights in Technical Data and
Computer Software clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th
Floor, Chicago, IL 60606-6412.
General notice: Other product names mentioned herein are used for identification purposes only and may be
trademarks of their respective companies.
SAS is a registered trademark of SAS Institute Inc.
Windows is a registered trademark of Microsoft Corporation. Microsoft® Access, Microsoft® Excel, and Microsoft®
Word are products of Microsoft Corporation.
DataDirect, DataDirect Connect, INTERSOLV, and SequeLink are registered trademarks of DataDirect Technologies.
Portions of this product were created using LEADTOOLS © 1991–2000, LEAD Technologies, Inc.
ALL RIGHTS RESERVED.
LEAD, LEADTOOLS, and LEADVIEW are registered trademarks of LEAD Technologies, Inc.
Portions of this product were based on the work of the FreeType Team (http://www.freetype.org).
A portion of the SPSS software contains zlib technology. Copyright © 1995–2002 by Jean-loup Gailly and Mark Adler.
The zlib software is provided “as-is,” without express or implied warranty. In no event shall the authors of zlib be held
liable for any damages arising from the use of this software.
A portion of the SPSS software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All
rights reserved. The Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the
libraries are licensed from IBM and are available at http://oss.software.ibm.com/icu4j/. Sun makes no warranties to the
software of any kind.
Sax Basic is a trademark of Sax Software Corporation. Copyright © 1993–2004 by Polar Engineering and Consulting.
All rights reserved.
SPSS® Programming and Data Management, 2nd Edition: A Guide for SPSS® and SAS® Users
Copyright © 2005 by SPSS Inc.
All rights reserved.
Printed in the United States of America.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.
1234567890 06 05 04 03
ISBN 1-56827-355-X
Preface
Experienced data analysts know that a successful analysis or meaningful report often
requires more work in acquiring, merging, and transforming data than in specifying the
analysis or report itself. SPSS contains powerful tools for accomplishing and
automating these tasks. While much of this capability is available through the graphical
user interface, many of the most powerful features are available only through command
syntax, the macro facility that extends the power of command syntax, and the scripting
facility. Until now, no book or other documentation has focused on those features, and
many potential users have been unaware of the power available to them or have not
exploited it for lack of examples. This book fills that void.
Using This Book
The contents of this book and the accompanying CD are discussed in Chapter 1. In
particular, see the section “Using This Book” if you plan to run the examples on the
CD. The CD also contains additional command files, macros, and scripts that are
mentioned but not discussed in the book and that can be useful for solving specific
problems.
This edition has been updated to included numerous enhanced data management
features introduced in SPSS 13.0. Many examples will work with earlier versions, but
some examples rely on features not available prior to SPSS 13.0.
For SAS Users
If you have more experience with SAS than with SPSS for data management, see
Chapter 10 for comparisons of the different approaches to handling various types of
data management tasks. Quite often, there is not a simple command-for-command
relationship between the two programs, although each accomplishes the desired end.
iii
Send Me Comments
I welcome feedback from readers. Please send your suggestions and comments about
the book (not the software) to [email protected] Check my Web site at
www.spsstools.net for possible errata and generalizations or improvements of code
included on the companion CD.
Acknowledgments
First of all, I wish to thank the SPSS Senior Director of Publications, Bob Gruen, for
giving me the opportunity to work on this challenging project. In addition to providing
general guidance, Bob reviewed the macros chapter. Jon Peck reviewed and
contributed to the scripting chapter. Richard Cohen provided a new chapter on scoring.
In addition to reviewing all of the remaining chapters, Rick Oliver wrote the sections
on importing data from sources other than text files, data transformations, and the new
SPSS Output Management System. I enjoyed working with these gentlemen; the book
greatly benefited from their technical expertise and communications skills.
I also wish to thank Stephanie Schaller, who provided many sample SAS jobs and
helped to define what the SAS user would want to see, as well as Marsha Hollar and
Brian Teasley, the authors of the chapter “SPSS for SAS Programmers.”
On the nontechnical side, I am grateful to my spouse, Nicole Tousignant, who
demonstrated patience and provided support and encouragement during those
months when I was handling two jobs and working seven days a week. I dedicate this
book to her.
Raynald Levesque
iv
Contents
1 Overview 1
Data Management Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Using SPSS Data Management Facilities . . . . . . . . . . . . . . . . . . . 3
Graphical User Interface. . . . . . . . . . . . . . . . . . . . . . . . . .3
Command Language . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Macro Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Scripting Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Working with Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . 5
Creating Command Syntax Files . . . . . . . . . . . . . . . . . . . . . 5
Running SPSS Commands . . . . . . . . . . . . . . . . . . . . . . . . . 6
Syntax Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Using This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Documentation Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Best Practices and Efficiency Tips 11
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Customizing the Programming Environment . . . . . . . . . . . . . . . . 11
Displaying Commands in the Log . . . . . . . . . . . . . . . . . . . . 11
Displaying the Status Bar in Command Syntax Windows . . . . . . 12
Customizing the Toolbars . . . . . . . . . . . . . . . . . . . . . . . . 13
Protecting the Original Data . . . . . . . . . . . . . . . . . . . . . . . . . 16
Do Not Overwrite Original Variables . . . . . . . . . . . . . . . . . . 16
Using Temporary Transformations . . . . . . . . . . . . . . . . . . . 17
Using Temporary Variables . . . . . . . . . . . . . . . . . . . . . . . 18
Using Command Syntax to Document Work . . . . . . . . . . . . . . . . 20
Creating Command Syntax Files . . . . . . . . . . . . . . . . . . . . 20
v
Use EXECUTE Sparingly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Lag Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Using $CASENUM to Select Cases . . . . . . . . . . . . . . . . . . . 23
MISSING VALUES Command . . . . . . . . . . . . . . . . . . . . . . 24
WRITE and XSAVE Commands . . . . . . . . . . . . . . . . . . . . . 25
Using Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Using SET SEED to Reproduce Random Samples or Values . . . . . . . 25
Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Using INSERT with a Master Command Syntax File . . . . . . . . . 27
Defining Global Settings . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Getting Data into SPSS 33
Getting Data from Databases . . . . . . . . . . . . . . . . . . . . . . . . . 33
Installing Database Drivers . . . . . . . . . . . . . . . . . . . . . . . 33
Database Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Reading a Single Database Table . . . . . . . . . . . . . . . . . . . 35
Reading Multiple Tables . . . . . . . . . . . . . . . . . . . . . . . . . 37
Reading Excel Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Reading a “Typical” Worksheet . . . . . . . . . . . . . . . . . . . . 40
Reading Multiple Worksheets. . . . . . . . . . . . . . . . . . . . . . 43
Reading Text Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Simple Text Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Delimited Text Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Fixed-Width Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Text Data Files with Very Wide Records. . . . . . . . . . . . . . . . 55
Reading Different Types of Text Data . . . . . . . . . . . . . . . . . 56
Reading Complex Text Data Files . . . . . . . . . . . . . . . . . . . . . . 58
Mixed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Grouped Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Nested (Hierarchical) Files . . . . . . . . . . . . . . . . . . . . . . . 62
Repeating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Reading SAS Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vi
4 Basic Data Management 73
Variable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Variable Labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Value Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Measurement Level . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
Using Variable Properties As Templates . . . . . . . . . . . . . . . .79
Cleaning and Validating Data . . . . . . . . . . . . . . . . . . . . . . . . .80
Finding and Displaying Invalid Values. . . . . . . . . . . . . . . . . .80
Excluding Invalid Data from Analysis . . . . . . . . . . . . . . . . . .83
Finding and Filtering Duplicates . . . . . . . . . . . . . . . . . . . . .84
Merging Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88
Merging Files with the Same Cases but Different Variables . . . . .88
Merging Files with the Same Variables but Different Cases . . . . .92
Updating Data Files by Merging New Values from
Transaction Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
Aggregating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97
Aggregate Summary Functions . . . . . . . . . . . . . . . . . . . . .99
Weighting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Changing File Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Transposing Cases and Variables . . . . . . . . . . . . . . . . . . . 102
Cases to Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Variables to Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Transforming Data Values . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Recoding Categorical Variables . . . . . . . . . . . . . . . . . . . . 113
Banding Scale Variables . . . . . . . . . . . . . . . . . . . . . . . . 113
Simple Numeric Transformations . . . . . . . . . . . . . . . . . . . 116
Arithmetic and Statistical Functions. . . . . . . . . . . . . . . . . . 117
Random Value and Distribution Functions . . . . . . . . . . . . . . 118
String Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Working with Dates and Times . . . . . . . . . . . . . . . . . . . . . . . 126
Date Input and Display Formats . . . . . . . . . . . . . . . . . . . . 127
Date and Time Functions . . . . . . . . . . . . . . . . . . . . . . . . 130
vii
5 Advanced Programming Features 137
Command Syntax Programming Structures . . . . . . . . . . . . . . . 137
Indenting Commands in Programming Structures . . . . . . . . . 138
DO REPEAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
VECTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
LOOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Self-Adjusting Command Syntax . . . . . . . . . . . . . . . . . . . . . . 151
Using Command Syntax to Write Command Syntax . . . . . . . . 152
Auto-Adjusting Command Syntax Based on Data Conditions . . . 154
Executing Selective Portions of Command Syntax . . . . . . . . . 162
Excluding Variables from Analysis . . . . . . . . . . . . . . . . . . 165
Debugging Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . 168
Errors Caused by Different Syntax Rules for
Different Operational Modes . . . . . . . . . . . . . . . . . . . . . 168
Calculations Affected by Low Default MXLOOPS Setting . . . . . 169
Missing Values in DO IF-ELSE IF-END IF Structures . . . . . . . . 171
Disappearing Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 172
Locale-Sensitive Decimal Indicators. . . . . . . . . . . . . . . . . 174
6 Macros 177
A Very Basic Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Macro Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Positional Arguments . . . . . . . . . . . . . . . . . . . . . . . . . 180
Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Conditional Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Looping Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Macro Expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Doing Arithmetic with Macro Variables . . . . . . . . . . . . . . . . . . 189
Macro Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Importing from MS Access . . . . . . . . . . . . . . . . . . . . . . 190
viii
Defining a List of Variables between Two Variables . . . . . . . . 193
Changing Variable Formats . . . . . . . . . . . . . . . . . . . . . . . 195
Reducing a String to Minimum Length . . . . . . . . . . . . . . . . 198
Including a Procedure in a Loop . . . . . . . . . . . . . . . . . . . . 201
Counting Distinct Values across Variables . . . . . . . . . . . . . . 204
Recursive Macro (Macro Calling Itself). . . . . . . . . . . . . . . . 206
Random Samples and Selections . . . . . . . . . . . . . . . . . . . 208
Generating Simulated Data . . . . . . . . . . . . . . . . . . . . . . 217
Working with Many Files . . . . . . . . . . . . . . . . . . . . . . . . 219
Finding All Combinations of Three Letters Out of N . . . . . . . . . 225
Creating Variables Containing Bounds of the CI for the Mean . . . 228
Debugging Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Printback of the Expanded Syntax . . . . . . . . . . . . . . . . . . . 232
Print Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Examples of Error Messages . . . . . . . . . . . . . . . . . . . . . . 233
Other Macro Examples Included with SPSS. . . . . . . . . . . . . . . . 236
7 Scripting 237
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Scripting or OMS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Tasks for Scripting. . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Automation Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Script Window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Global Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Invoking a Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Debugging a Script . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Scripts Included with SPSS . . . . . . . . . . . . . . . . . . . . . . . . . 245
Sample Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Add File Date to Filename. . . . . . . . . . . . . . . . . . . . . . . . 246
Run Simple Statistics on All Variables . . . . . . . . . . . . . . . . 248
Using a Parameter in the Script Command . . . . . . . . . . . . . . 250
An Autoscript That Accepts a Parameter from Syntax . . . . . . . 251
ix
Set Data Editor Column Width to Match Data. . . . . . . . . . . . 253
Set the Length of All String Variables to the
Maximum Length of the Data . . . . . . . . . . . . . . . . . . . . . 255
Modify Page Title in Left Pane of Output Window . . . . . . . . . 258
Print Syntax with Path, Date, and Page Numbers . . . . . . . . . 261
Create PowerPoint Presentation . . . . . . . . . . . . . . . . . . . 265
Utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Empty Designated Output Window . . . . . . . . . . . . . . . . . . 272
Count Number of Errors . . . . . . . . . . . . . . . . . . . . . . . . 274
Find String in the Viewer Outline . . . . . . . . . . . . . . . . . . . 278
Check Viewer for Errors . . . . . . . . . . . . . . . . . . . . . . . . 281
A Challenge: Missing Labels . . . . . . . . . . . . . . . . . . . . . . . . 284
Synchronizing Scripts and Syntax . . . . . . . . . . . . . . . . . . . . . 284
Illustration of the Problem . . . . . . . . . . . . . . . . . . . . . . . 284
Synchronizing without the IsBusy Method . . . . . . . . . . . . . 287
Other Scripts Included on the CD . . . . . . . . . . . . . . . . . . . . . 291
8 Scoring Data with Predictive Models 293
The Basics of Scoring Data . . . . . . . . . . . . . . . . . . . . . . . . . 294
Command Syntax for Scoring . . . . . . . . . . . . . . . . . . . . . 294
Mapping Model Variables to SPSS Variables. . . . . . . . . . . . 295
Missing Values in Scoring . . . . . . . . . . . . . . . . . . . . . . . 296
Using Predictive Modeling to Identify Potential Customers . . . . . . 296
Building and Saving Predictive Models . . . . . . . . . . . . . . . 297
Commands for Scoring Your Data . . . . . . . . . . . . . . . . . . 303
Including Post-Scoring Transformations . . . . . . . . . . . . . . 304
Getting Data and Saving Results . . . . . . . . . . . . . . . . . . . 305
Running Your Scoring Job Using the SPSS Batch Facility. . . . . 306
x
9 Exporting Data and Results 309
Output Management System . . . . . . . . . . . . . . . . . . . . . . . . 309
Using Output Results as Input Data . . . . . . . . . . . . . . . . . . 310
Transforming OXML with XSLT. . . . . . . . . . . . . . . . . . . . . 319
Exporting Data to Other Applications and Formats . . . . . . . . . . . . 334
Saving Data in SAS Format . . . . . . . . . . . . . . . . . . . . . . . 334
Saving Data in Excel Format . . . . . . . . . . . . . . . . . . . . . . 335
Writing Data Back to a Database . . . . . . . . . . . . . . . . . . . 335
Saving Data in Text Format . . . . . . . . . . . . . . . . . . . . . . . 337
Exporting Results to Word, Excel, and PowerPoint . . . . . . . . . . . . 337
Customizing HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
10 SPSS for SAS Programmers 339
Reading Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Reading Database Tables. . . . . . . . . . . . . . . . . . . . . . . . 339
Reading Excel Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Reading Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Merging Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Merging Files with the Same Cases but Different Variables . . . . 346
Merging Files with the Same Variables but Different Cases . . . . 347
Aggregating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Assigning Variable Properties . . . . . . . . . . . . . . . . . . . . . . . . 351
Variable Labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Value Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Cleaning and Validating Data . . . . . . . . . . . . . . . . . . . . . . . . 353
Finding and Displaying Invalid Values. . . . . . . . . . . . . . . . . 354
Finding and Filtering Duplicates . . . . . . . . . . . . . . . . . . . . 356
Transforming Data Values . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Recoding Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Banding Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
xi
Numeric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Random Number Functions . . . . . . . . . . . . . . . . . . . . . . 362
String Concatenation. . . . . . . . . . . . . . . . . . . . . . . . . . 363
String Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Working with Dates and Times . . . . . . . . . . . . . . . . . . . . . . . 365
Calculating and Converting Date and Time Intervals. . . . . . . . 365
Adding to or Subtracting from One Date to Find Another Date . . 367
Extracting Date and Time Information . . . . . . . . . . . . . . . . 368
Index 371
Chapter
1
Overview
Most researchers and others who work regularly with data recognize that much more
of their time goes into various stages of acquiring and preparing data than into
building models and producing reports. SPSS offers a rich set of tools for carrying out
those data management tasks. This book offers many examples of how these tools can
be used to bring in data from almost any source, clean it, transform it, merge it with
other data, and get it into the kind of shape required to produce reliable models and
informative reports. It is intended for use with other documentation resources that go
into more detail about specific features but have fewer extended examples.
For readers who may be more familiar with the data management commands in the
SAS system, Chapter 10 provides examples that demonstrate how some common data
management tasks are handled in both SAS and SPSS.
Data Management Tasks
The data management, or data preparation, tasks that you need to perform may be
quite simple or quite complex. They will typically involve some or all of the
following:
Get and define the data. Getting data requires reading it from a source, such as a
database, spreadsheet, text file, or file saved by another analysis program. Defining it
means providing the information that SPSS needs to analyze it correctly and present
meaningful reports and analyses. In many cases, that information comes directly from
the source, but you may want to provide additional metadata that describes the data,
such as descriptive value labels, missing value codes, and level of measurement for
selected variables.
1
2
Chapter 1
Combine data from various sources. You can read data from multiple database tables
directly into SPSS. You can also combine multiple SPSS data files to add cases or add
information to each case.
Clean the data. Data often come with duplicate records, missing information, and
impossible (or highly unlikely) values or combinations of values. Checking for these
anomalies helps to ensure valid results in analyses.
Aggregate, select, sort, and weight cases. Often, you want to work with just a sample
or selection of the data, or you want to aggregate the data so that each case represents
a subgroup of a large original file. By saving aggregated files and merging them back
to the original data, you can compare individual values to group means or other
statistics. Weighting cases allows you to give some more influence than others in
analyses.
Transform data. Often, the variables that you want to test or report aren’t actually in
your original data but are functions of existing variables—ratios between variables,
ages calculated from birth dates, counts of positive responses or missing responses
across multiple questions, last names when you have names such as “Harold B.
Williams,” and so on. Or, variables may not be coded consistently. You might also want
to collapse a lot of infrequently used values into one category. SPSS offers a powerful
set of facilities for transforming data values and selecting which cases should be
analyzed.
Restructure data for analysis. Various reports and analytical procedures require that the
data be organized in a particular way. For example, independent samples tests typically
require that all of the measured values be in one variable and that one or more
classifying variables indicate which sample each value belongs to; if you have one
variable for each sample (such as one column for those who accepted an offer and
another column for those who did not), you need to restructure your data. The opposite
may be true if you want to compare two or more measurements on the same cases.
Export data and results. After preparing the data and running reports and/or analyses,
you can export both the data and the results to other applications. You can even export
results as data for further analysis in SPSS or other applications.
3
Overview
Using SPSS Data Management Facilities
SPSS provides facilities for performing all of the tasks mentioned in the previous
section and a good deal more.
Graphical User Interface
Many SPSS data management tasks are most easily performed through the graphical
user interface that provides dialog boxes and wizards to aid with specifications.
The File menu contains the options for reading data into the system.
The Data menu provides options for file-level tasks, such as merging two or more
data files together, aggregating data, restructuring data files, and selecting subsets
of cases.
The Transform menu, as shown in Figure 1-1, provides options for case-level
transformations, such as recoding data values and computing new data values (see
Figure 1-2).
Figure 1-1
Transform menu
4
Chapter 1
Figure 1-2
Recoding scale values into banded categories
This book does not discuss the graphical user interface in any great detail. The Help
system provides detailed tutorials about using the graphical user interface, and almost
all dialog boxes have Help buttons that display dialog-box-specific Help topics.
Command Language
The primary emphasis of this book is on using the SPSS command language, or
command syntax, to write programs to solve data management problems. Although the
command language may not be as user friendly as the graphical user interface, it has
several distinct advantages:
You can save command files and run them repeatedly and in unattended batch
mode.
Some data management facilities are available only through the command
language and are not available via the menus and dialog boxes.
Command syntax provides documentation of your work, making it clear how you
obtained your results and making it possible to reproduce them.
5
Overview
Macro Facility
SPSS has a macro facility that can be used to streamline the coding of repetitive
commands and build command streams that can be run many times with varied
parameters. You could, for example, create a complex command stream in which a
variable or filename appears multiple times, define that stream as a macro with the name
as an argument, and then call that stream as part of a job simply by naming the macro
and specifying the name as an argument. Or, you could define a stream that iterates
across a list of names. See Chapter 6 for more information about the macro facility.
Scripting Facility
In addition to the command language and the macro facility, you can automate many
tasks with the SPSS scripting facility using standard programming languages, such as
Visual Basic and C++.
Working with Command Syntax
If you haven’t worked with SPSS command syntax before, there are a few things you
should know. A detailed introduction to SPSS command syntax is available in the
“Universals” section in the SPSS Command Syntax Reference.
Creating Command Syntax Files
An SPSS command file is a simple text file. You can use any text editor to create a
command syntax file, but SPSS provides a number of tools to make your job easier.
Most features available in the graphical user interface have command syntax
equivalents, and there are several ways to reveal this underlying command syntax:
Use the Paste button. Make selections from the menus and dialog boxes, and then
click the Paste button instead of the OK button. This will paste the underlying
commands into a command syntax window.
Record commands in the log. Select Display commands in the log on the Viewer tab
in the Options dialog box (Edit menu, Options). As you run analyses, the
commands for your dialog box selections will be recorded and displayed in the log
in the Viewer window. You can then copy and paste the commands from the
Viewer into a syntax window or text editor.
6
Chapter 1
Retrieve commands from the journal file. Most actions that you perform in the
graphical user interface (and all commands that you run from a command syntax
window) are automatically recorded in the journal file in the form of command
syntax. The default name of the journal file is spss.jnl. The default location varies,
depending on your operating system. Both the name and location of the journal file
are displayed on the General tab in the Options dialog box (Edit menu, Options).
Running SPSS Commands
Once you have a set of commands, you can run the commands in a number of ways:
Highlight the commands that you want to run in a command syntax window and
click the Run button.
Invoke one command file from another with the INCLUDE or INSERT command (see
Chapter 2 for more information).
Use the Production Facility to create production jobs that can run unattended and
even start unattended (and automatically) using common scheduling software. See
the Help system for more information about the Production Facility.
Use SPSSB (available only with the server version) to run command files from a
command line and automatically route results to different output destinations in
different formats. See the SPSSB documentation supplied with the SPSS server
software for more information.
Figure 1-3
Command syntax in a syntax window
7
Overview
Syntax Rules
Commands run from a command syntax window during a typical SPSS session must
follow the interactive command syntax rules:
Each command must start on a new line.
Each command must end with a period (.).
Commands files run via SPSSB or the Production Facility or invoked via the INCLUDE
command must follow the batch command syntax rules:
Each command must start in the first column of a new line.
Each command continuation line must be indented at least one space.


Use: 0.3638