What is a Data Model?The data model describes everything connected with the handling and management of data, including generation by detector components, calibration, I/O buses, readout, triggering, filtering and processing by online computers, data format, data compression, local and wide-area networks, hardware for temporary and permanent storage, procedures for copying and moving, data skims, databases, search algorithms, and interfaces between offline analysis and data.
Why do we need a data model? The experience of CLEO and other major high energy physics experiments has demonstrated quite conclusively that changes to complex data systems, e.g., interfaces, data structures, file format, generally cause tremendous headaches for users. While the particular headache always seems to have a particular cause (limitations of Fortran, failing to rebuild libraries, forgetting to notify users, loss of ability to read old data), the problem could almost always have been avoided (or at least the system could have crashed "gracefully") if the change had been made inside a well-defined system. Since change is the one constant in any experiment, an important reason for having a data model is to manage change.
A data model provides other benefits as well. A large system described by a well-defined data model behaves consistently from one subsystem to another because it follows an overall philosophy. This reduces the time it takes a new user to learn how to analyze data, or an old user to learn a new subsystem. A good data model also decouples higher level functions at the user level from lower level systems providing data. Such decoupling allows new analysis frameworks to be constructed without worrying about details of how the data is brought in (the users see one interface). Conversely, necessary changes made to the data format are not visible at the user analysis level. The separation of data analysers from data providers has the additional benefit of allowing simple standalone tools to be constructed, since they look at data though the same interfaces.
Components of the Data ModelThe breakdown shown below is my first attempt to classify the various pieces making up the CLEO data model. Where possible, I have grouped components together in hierarchies, although the placement of some items are not clear (some belong in more than one place). In any case, I expect that this breakdown will go through some changes over the next few weeks as people provide more input.
Over time we will need to have people's names associated with these components. In particular, it will be necessary to involve the online system developers.
Points to ponder
People involved in data model development