4th Quattor Workshop Summary (October 2007)

Written
29 October 2007
Author
Michel Jouvin

4th Quattor Workshop - UAM (Madrid) - 29-30/10/07

Agenda

Site Reports

CERN

Main instance : ~6100 nodes (+ 1K nodes in 6 months)

  • Template “deoptimisation” : had to get rid of most value(/xxx/yyy.tpl) for moving to namespaces. Result is performance slower by a factor of 2.
  • Still using panc v6 : performance not satisfactory with panc v7 first tests. No time for extensive test. Current compilation time is 20’ on a 2 dual-core system (16 GB of memory).
  • CDB : still using a flat namespace, moving slowly to namespaces. Optimising for multi-core machines…
  • SPMA + SWrep : extended authentication added (Kerberos 5), many bug fixes and improvements
  • CDB2SQL : no progress
  • CCM : authenticated mode added
  • Initial installation : still using PrepareInstall instead of AII
    interacts with both CDB and CDBSQL to get machine parameters and some CERN specific components (LAN DB, SINDES…). Installation done by AIMS. Very successful but a little bit ugly to maintain.

Quattor development activities

  • Release management related projects (ETICS integration, Savannah…) on hold
  • Default template set has been moved to namespace and build tools updated
  • Maintenance of a few core modules : CDB2SQL, SPMA, SWrep, CDB…

Other activities :

  • Xen-based virtualisation based on ncm-xen and ncm-filesystems : development of a Xen Hypervisor.
  • Integration between Quattor and SLS in progress

LAL

About using Squid to cache RPM at every site : Grid-Ireland experience is that they are some drawbacks if you have something outdated in the cache. Can be difficult to clear all caches. Prefer to use rsync at deployment time to ensure that every site repository is up to date.

NIKHEF

NIKHEF-ELPROD : ~150 nodes, strong increase expected

  • Also an installation test-bed with 15 nodes

Currently using AII for initial installation and Quattor for configuration of OS and MW

  • OS configured with NCM components
  • MW configured with ncm-yaim : frequent patches required to YAIM

Quattor components used :

  • Moved to panc v7 : faster than v6.
  • AII, SPMA
  • SCDB without Subversion (using CVS for versioning)… Only Ant tools are used.

Issues :

  • PAN compiler performances as a significant increase of number of nodes is expected
  • Scaling problems when deploying to 130 nodes simultaneously : try to monitor update results with Ganglia.

Future :

  • XEN virtualisation
  • Nagios setup under Quattor

GRID-Ireland

Quattor managing 18 sites with a total 400 nodes

  • Single CVS moving to SVN
  • Replicated SW repositories at each site (rsync)
  • 3 deployment servers : production, tests, e-Learning

Recent developments :

  • 17 sites re-installed with Quattor/Xen : fully automated PXE installation of hosts and guests
  • Migration to SVN : integration in progress with local web deployment tool. Realised the high number of unused, obsolete templates.
  • New compute resources : ~140 Condor VMs

QWG usage :

  • “Pointer” to particular QWG revision (by rev number), checked out semi-automatically. Plan to pull QWG templates via
  • Real sites containing clusters, ability to select sites. Local target ‘‘compile.sites’’.
  • Plan to use Stijn’s dummy WN template

Issues :

  • Desperately need monitoring integrated to detect failed deployments. At least documentation about currently existing solutions.
  • int.eu.grid site backed out of Quattor due to conflicts with APT/YAIM
  • Getting started with Quattor is still difficult

PIC

Several Quattor servers :

  • Quattor01 : still Quattor 1.1, CDB
  • Quattor02 : Quattor 1.3, SCDB, QWG glite-3.1
  • Quattor03 : will replace Quattor01/02 soon with QWG glite-3.0 and glite-3.1 (mainly worker nodes and user interfaces at first).

Local developments :

  • ncm-snmp : needed for Nagios monitoring
  • Local command to hide SCDB/CDB differences : gettpl, …

UAM

2 grid clusters : UAM-LCG2 (150 nodes) and GVMUAM-LCG2 (300 nodes)

  • Quattor 1.2, SCDB, QWG 3.0.2-x (not the most recent), AII, SWrep
  • Still configuring d-Cache manually : considering moving to Quattor

1 non grid cluster still managed using CDB

  • Different set of people managing this cluster, used to CDB

Morgan-Stanley (Nick Williams)

Not yet using Quattor, looking at it as one possible solution for Morgan Stanley needs.

Currently using an home made product called ‘‘Aurora’’

  • Quite good but designed at beginning of 90s
  • Defines all distributed services and host configuration : how machine is configured, how to access applications, …
  • Designed with the ability to restore a machine in a previous state in 20 minutes
  • Based on an homogeneous view of the resources, in particular through use of AFS : nothing installed locally

Current configuration is 20K Unix servers, includes “grid systems” (pool of nodes running the same application) with home-gown MW.

  • Aurora designed for 5K machines
  • Need to integrate risk management, an important part of financial business (failover…)
  • Users asking for custom configurations, quicker and more agile.
  • Machine organised into ‘‘bucket’’ : ~2500 machines sharing most of their configuration settings.
  • A campus made of multiple buckets, a region made up of data centres.
  • Significant work done at night to synchronise configurations

Core of the new Aurora must be :

  • A configuration system
  • An entitlements system : control of rights to do an action
  • Move off AFS if possible : RPM ?
  • Virtualisation
  • Work done by Unix Engineering group (20 people, based in London)
  • Support for Linux and Solaris

Philips

Restarting work on Quattor : new persons found.

Current effort is to transfer Quattor knowledge to system management to get more people involved.

  • Quattor 1.3, SCDB, QWG templates 3.1

Issues

  • How to revert to a previous tag
  • What actions to perform when new QWG templates are out

CNAF

Still using Quattor 1.2 without SCDB/QWG : lack of time to look at it.

  • SLC 4 support has been added, mainly 32-bit
  • Quattor 1.3 update still planned, will use namespace at the same time (lot of CNAF specific templates to update)
  • Using last AII version : no problem so far…

Evaluating Xen : some issues using pypxeboot

Training of CNAF staff still a problem : Quattor often considered as the responsible for deleting what was done manually…

  • Very time consuming

Candidate for hosting next Quattor workshop…

Core Components

PAN Compiler - C. Loomis

Current status

  • v6 deprecated and frozen : no bug fixes or enhancements. Still used by default by CDB.
  • v7 is the production version (7.2.6) : (almost) 100% backward compatible, very limited enhancements. Used by default by SCDB.
  • v8 : development version. First version with language-level changes compared to v6. Expect a production version before Christmas

v7 performance

  • Faster than v6 for almost everybody, except CERN. Rudimentary profiling added to v7 to help investigate the problem.
  • Better memory management : can make huge differences with large number of templates
  • Generic performance tests done :
  • v7 significantly faster than v6 (x5-10) for almost every operation, except variable ‘‘self’’ reference test where v7.2.5 was 50x slower than v6. Fixed in 7.2.6 where performance is comparable with v6.
  • Many operation slower on multi-core than on single-core : requires some investigation.
  • Performance on CERN profiles is better with panc v7.2.6 than with v6 (x1.5 to x2). Memory use is around 2 GB for 920 profiles (probably much less than panc v6 but difficult to get exact consumption for v6).
  • Possibility to explicitly set the number of thread to use but all tests showed the best performance is obtained with the number of thread matching the number of cores (default behaviour).

Planned language changes

  • v7 : foreach statement, bind statement (replacement for one form of type), bit and misc. functions, some deprecated keywords
  • v8 :
  • Enhancements : format function, i18n support, performance and include logging, automatic variable for current template name, simple DML and copy-on-write optimisation (only copy changed parts), final for structure templates.
  • Deprecated : lower-case automatic variables, type synonym for bind.
  • Unsupported : define, description and descro keyword, delete statement
  • v9 : deprecated include with literal name (important simplification of grammar), removal of lower-case automatic variables and type as a synonym for bind
  • v10 : removal of include with literal name
  • v11 : include DML without braces

Other requests

  • Cast operator : bring lot of complications. Delayed for now.
  • Forced assignment : change the type of variable. Risk of undetected errors. Delayed for now.

Roadmap discussion

  • Deprecate include with literal name in v8 : the change required is compatible with panc v6. Agreed.
  • Introduce include DML without braces at the same time former include syntax is unsupported (v9).
  • CERN / CNAF agree to migrate to panc v7 asap. For CERN, mainly the problem of assessing performance problems are fixed.
  • Interval of 6 months minimum between major releases. Support version N-1.

CDB

Changes in the last 6 months

  • cdb-cli : support for user-defined procedures, minimal support for disconnected operations, preparation for namespace support
  • CDB deployer : ease setup of a test configuration on a set of nodes.
    • Previously involved a lot of manual operations (mainly copy back and forth from production templates).
    • Take advantages of namespaces and loadpaths : a ‘‘stage’’ (area) is prepend to the template name by defining the appropriate loadpath
    • A (non-staged) meta-template defines the appropriate loadpath
    • CDB deploy command (template replication) allow to move a configuration from one stage to another one (e.g. : test, preprod, prod)

Not done yet

  • Visualisation/navigation tool : PAN parser available but not yet integrated. Another option is to wait for panc v8.
  • SOAP eating too much memory..
  • Fine-grained locking
  • Common authentication service
  • Reverting committed changes

Other issues :

  • ME leaving CERN in a couple of months…

SCDB

See slides.

Discussion

  • Add script for generating hardware templates from a CSV file into SCDB utilities
  • Allow Google to index Trac LCGQWG site

Other Core Modules

NotD

Framework which allows users to authenticate using different mechanism and configure notification using wassh-ssm

CDB2SQL

No progress foreseen.

  • Plan is to rewrite with a direct XML DB backend to avoid re-parsing XML

Generic, plug-in based, login and authentication library

No real progress

  • Mainly for CDB, SWrep

CCM

  • Support for fetching foreign profiles added
  • Unprivileged CCM execution still pending

NCM

Lots of pending issues/requests. No progress.

Configuration Dispatch Daemon (cdispd)

  • Global activation switch
  • Restart after changes in modules
  • Improve handling of pre dependencies (Savannah 23814)

Node Configuration Dispatcher (ncm-ncd)

  • Generic ‘requirement’ like package manager to allow a site to use its preferred package manager (Savannah 17681)
  • NOACTION_SUPPORTED to allow dry run of a component (Savannah 20040)
  • –skip option to run all components except a few ones (Savannah 20619)
  • Support for RPM-less components (Savannah 21204)

NCM components

  • ncm-filesystems by Luis
  • Used at CERN for Xen-based virtualisation
  • lib/blockdevices : generic support for managing block devices in other components
  • ncm-rac for managing Oracle Database Clusters

PAN templates

  • New schema for enclosures released
  • Fixes in set_partitions() for logical partitions
  • Support gLite WMS in schema
  • Support for md (Multiple Device) devices
  • Document Quattor standard PAN functions (push, npush, …) in some wiki page.

SPMA/rpmt

  • SPMA should upgrade and not uninstall/install a package if arch and version is changed (Savannah 19934)
  • SPMA gives up at the first IO error (Savannah 29029) : let rpmt run until the end to get the full list of errors

Other Developments

QWG Templates

See slides.

Xen Support

Main tool for Xen support is ncm-xen :

  • Several updates since March
  • Actual configuration done by several variables (to be documented on QWG wiki) :
  • XEN_GUESTS : list of VMs to manage, corresponding to profiles
  • XEN_BOOTLOADER : pypxeboot, pygrub
  • XEN_BOOTARGS : e.g. Virtual Interface strings required to boot with pypxeboot
  • XEN_RAM
  • Future plans : delegate filesystem management to ncm-filesystems, support for other use cases (downloading images, use copy on write…)
  • Any need for supporting other Virtual Machine Managers (VMware…) ? Or a generic component ncm-virt ?

Grid-Ireland experience :

  • PXE/KS install of host + guests : install_server machine type added. To be committed into standard templates

Enclosures

Enclosure description is required to address ore complex system configuration : multiple motherboards per case, blades, VMs…

  • Required to manage interventions : all nodes within an enclosure must be powered down
  • Generation of KS config file for all systems into the enclosure
  • ‘/hardware/enclosure’ used to describe enclosure : CERN uses ‘/system/enclosure’. Agreement to use /system rather than /hardware as this is not necessarily real hardware.

2 options :

  • parent references children : already in use in Xen VMs
  • children reference parent
  • Both options are more or less incompatible : need to identify the main use cases.
  • For VMs, parent has to know about children : generate Xen config files, set up local backing store for VM filesystem…
  • For VMs, children don’t car about their parents
  • The same apply to AII
  • “parent references children” seems to be the appropriate approach : CERN also uses it. Is there any missed use case for the other option ?

Filesystems and Block Devices

See slides.

Current layout (organised per hardware disk) has several limitations :

  • Poor control of advanced options (tuning, raid spanning several disks)
  • SW raid is very difficult to describe with current schema
  • Hardware raid is impossible to control

Design for proposed new model (Tim Bell/Andras Horvath)

  • Separation between filesystems and block devices descriptions
  • Filesystems reference block devices
  • Several kind of block devices : block, hardware_raid
  • Partitions are described inside block devices
  • LVM and SW raid defined separately from block devices, referencing block devices

Drawbacks and problems of proposed new model

  • Too complex to implement : bi-directional inter-dependencies
  • Too close to “natural approach”

New proposal (Luis)

Pure top-down model

  • Filesystems referencing block devices
  • Block devices can be LVM volumes groups, software raid, hardware raid, partitions, disks… Under /system/blockdevices. Several types of block devices available
  • A SW or LVM refers to another set of block devices (partitions, hardware raid, disks…)
  • Partitions are a king of block devices referencing hardware devices they seat in.
  • Very flexible and easy to extend
  • Currently allocated size is not checked against the (declared) disk size. Need to be added.

This model has been implemented into ncm-filesystems and AII v2. Ready for integration into the main schema.

  • Need to confirm agreement from Tim and Andras

ncm-filesystems

Handles creation and modification of filesystems after installation

  • Required for several use cases, in particular VM management
  • Currently filesystems and block-devices are described into component configuration : will be changed as soon as they have been included into the standard schema under /system.

Uses a Perl class library (lib/blockdevices) to do the real jobs.

  • Same library is used by AII v2

Future plans

  • Quota management
  • Resize operations
  • Hardware raid management

AII

AII 1.x status and problems

  • KS support embedded into AII : difficult to support other installation systems (like JumpStart)
  • SW raid difficult
  • LVM support buggy
  • Current schema structure limits possibility for any improvement

AII 2.0 design goals

  • Support for installation method other than KS+PXE
  • Easier maintenance of Kickstart template : allow for easy integration of site customisation
  • Support for complex device schemes
  • Command line (aii-shellfe) interface unchanged (only --notify has not been re-implemented)
  • AII is now just a plug-in loader (ncm-ncd model). 3 available plug-ins :
  • osinstall plug-in : KS installation
  • NBP plug-in : PXE configuration
  • DHCP plug-in : DHCP configuration
  • AII plug-ins inherit from NCM::Component and fetch profiles directly from (S)CDB
  • KS generator :
  • Uses ncm-lib-blockdevices, block-devices/filesystems configured during %pre installation phase (better control of what is created/destroyed)
  • Uses hooks for local site customisation : no more KS template. Ability to use and combine an arbitrary number of hooks. Organised by Kickstart phase (pre_install, post_install, post_reboot). Hooks must be implement as new methods for a NCM object subclass.
  • PXE generator : replaces PXE template and nbp-aii. Backward compatible, configuration relocated to /system/aii/nbp/pxelinux.

Future plans

  • Privilege dropping : AII doesn’t need to run as root for the most part.
  • Enclosure support : --recursive ?

First production version may be ready by end of 2007.

Release Process and Component Review

Time for a new release ? 1.4 ?

  • Many changes especially AII, block devices but not yet mature
  • Status page for components need to be updated : no dynamic page yet.
  • Wait for stability of AII v2 and new filesystem/block-device stuff before going for a new release (Quattor releases are not used for updating components, in particular in QWG).

Component review

  • checklines should probably be merged with filecopy (maintainer no longer there)
  • sshkeys usefulness to be reviewed : may be quite tightly coupled with grid
  • postgresql : still in very early stages

Code maturity tags

  • STABLE-x.y.z for each component
  • OBSOLETE tag : several format allow to indicate replacement or print a message
  • Processed by Quattor build tools : OBSOLETE component cannot be built, Quattor release contains only STABLE
  • Expose the information into HTML

Documentation

Automatic refreshing would be nice

  • Idea is to run the tool gencompswebdoc to update Web documentation from CVS
  • Link to open Savannah bugs not implementable : only one category for all components

Developers’ convention document

  • Focus on component writer guide : give recommendations, describe API
  • Look at enforcing some checks before commit but don’t put too much effort

Review the possibility to support other languages for writing NCM components

Would be very good for acceptance by new adopters

  • Main API required is a getTree()` implementation
  • Need to investigate how to integrate into the existing ncm-ncd framework
  • Discuss in more details at next meeting : somebody need to begin to look at the requirements… (Cal ready to contribute)

Quattor Monitoring

Need to improve monitoring of Quattor operations

  • In fact, script/sensors already exist for both LEMON (CERN) and Nagios (GRIF/LAL)
  • Start by documenting what exists before developing something new
  • May use the “agnostic” format developed by Grid Monitoring Working Group that can be imported by all main monitoring frameworks

Wrap Up

Quattor paper : try to resubmit LISA paper to other journal and conferences

  • Main candidate is “Concurrency and Computation: Practice and Experience” journal
  • Need to decide between being more technical or accessible for non specialist
  • Need to enhance/refine tool comparison
  • Be more detailed about use cases
  • Also prepare a new version for LISA08
  • New coordination : ME/Luis

Next workshop : CNAF, probably March 17-18.