INFORMS Journal on Computing
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


INFORMS JOURNAL ON COMPUTING
Vol. 19, No. 4, Fall 2007, pp. 520-533
DOI: 10.1287/ijoc.1060.0185
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Castro, J.
Right arrow Search for Related Content

A Shortest-Paths Heuristic for Statistical Data Protection in Positive Tables

Jordi Castro

Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, 08034 Barcelona, Catalonia, Spain
jordi.castro{at}upc.edu

National statistical agencies (NSAs) routinely release large amounts of tabular information. Prior to dissemination, tabular data need to be processed to avoid disclosure of individual confidential information. Cell suppression is one of the most widely used techniques by NSAs. Optimal procedures for cell suppression are computationally expensive with large real-world data sets, so heuristic procedures are used. Most heuristics for positive tables (i.e., cell values are nonnegative) rely on the solution of minimum-cost network-flows subproblems. A very efficient heuristic based on shortest paths already exists, but it is only appropriate for general tables (i.e., cell values can be either positive or negative), whereas in practice most tables are positive. We present a method that sensibly combines and improves previous approaches, overcoming some of their drawbacks: it is designed for positive tables and requires only the solution of shortest-path subproblems—therefore being much more efficient than other network-flows heuristics. We report extensive computational experience in the solution of randomly generated and real-world instances, comparing the heuristic with alternative procedures. The results show that the method, currently included in a software package for statistical data protection, fits NSA needs: it is extremely efficient and provides good solutions.

Key words: statistical disclosure control; cell-suppression problem; linear programming; network optimization; shortest paths
History: received November 2004; revised November 2005; accepted March 2006.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by INFORMS.