The CLEO Data Model Home Page

What is a Data Model?

The data model describes everything connected with the handling and management of data, including generation by detector components, calibration, I/O buses, readout, triggering, filtering and processing by online computers, data format, data compression, local and wide-area networks, hardware for temporary and permanent storage, procedures for copying and moving, data skims, databases, search algorithms, and interfaces between offline analysis and data.

Why do we need a data model? The experience of CLEO and other major high energy physics experiments has demonstrated quite conclusively that changes to complex data systems, e.g., interfaces, data structures, file format, generally cause tremendous headaches for users. While the particular headache always seems to have a particular cause (limitations of Fortran, failing to rebuild libraries, forgetting to notify users, loss of ability to read old data), the problem could almost always have been avoided (or at least the system could have crashed "gracefully") if the change had been made inside a well-defined system. Since change is the one constant in any experiment, an important reason for having a data model is to manage change.

A data model provides other benefits as well. A large system described by a well-defined data model behaves consistently from one subsystem to another because it follows an overall philosophy. This reduces the time it takes a new user to learn how to analyze data, or an old user to learn a new subsystem. A good data model also decouples higher level functions at the user level from lower level systems providing data. Such decoupling allows new analysis frameworks to be constructed without worrying about details of how the data is brought in (the users see one interface). Conversely, necessary changes made to the data format are not visible at the user analysis level. The separation of data analysers from data providers has the additional benefit of allowing simple standalone tools to be constructed, since they look at data though the same interfaces.

Components of the Data Model

The breakdown shown below is my first attempt to classify the various pieces making up the CLEO data model. Where possible, I have grouped components together in hierarchies, although the placement of some items are not clear (some belong in more than one place). In any case, I expect that this breakdown will go through some changes over the next few weeks as people provide more input.

Over time we will need to have people's names associated with these components. In particular, it will be necessary to involve the online system developers.

  • Online
    • Flow of data from detector to crates
    • Triggers
    • Level 3 processing
    • Readout
    • Writing to disk/tape
    • Calibrations
    • Test data generation
    • Mapping of electronic channels to hits (time dependencies)
    • Mapping of electronic channels to detector elements
  • Permanent data format(s)
    • Event, BR, ER data
    • Calibrations
    • Hardware records
    • Geometry, constants, etc.
    • Electronic channels
  • Data architecture
    • Organization of data (streams, records, frames, etc.)
    • Data servers (e.g., Karp, ROAR, Zebra, database, POM)
    • Interfacing data servers to analysis frameworks (e.g., glue layer)
    • Analysis frameworks (Fortran, C++, physics oriented)
      • Data structures in these frameworks
      • Reading data into structures (faulting, "get" routines, etc.)
      • Saving user defined info
      • Writing data (e.g., skims)
    • Interface philosophy
  • Tools
    • Data monitoring (e.g., usage statistics, locations)
    • Disk pool management
    • Java or Motif based
  • Storage, including disk and tape
    • Hierarchical storage management
    • Data location service
    • Resource management
    • User data
  • Managing change
    • Versioning
    • Adding, deleting items from structures
    • Exception handling
  • Implementation
    • Benchmark speed tests of data servers
    • Benchmark physics analyses
  • Schedule

    Points to ponder

    • Some thoughts on versioning for managing change, by Mike Athanas and Greg Sharp.
    • Where does CLEO job control fit into the discussion?
    • Conceptual model vs. system model
    • Conceptual model vs. implementation
    • CLEO III vs CLEO II, II.5

    Meetings, papers and writeups

    People involved in data model development

    Related information

    Last Updated: December 5, 2000 | HEE Home | Comments & Suggestions