7th Quattor Workshop Summary (March 2009)

Written
11 March 2009
Author
Michel Jouvin

7th Quattor Workshop - London - 11-13/3/09

See Agenda.

Site Reports

LAL - G. Philippon

Grid Ireland - S. Childs

Not much changes in the number of admins or resources

  • Virtualization of most nodes (Xen)

Developments in monitoring:

  • NAgios + Ganglia
  • MonAMI to feed Ganglia from DPM and Torque
  • LEMON : upgraded to lemon-web on SL5

Many tools upgraded : checkdeps, quatview

  • ncm-accounts massive speedup

SCDB : merge “hierarchical site” model into trunk

Issues:

  • Get network and file sytems fully under Quattor control
  • Consistent scheme for monitoring in Quattor
  • Dummy WN speedup trick integrated into the compiler

NIKHEF - R. Starink

4 clusters: 300 machines

  • Currently 5 people involved with Quattor

SCDB + local changes: deployment not done by SCDB bu by local tools to allow deployment of a specific machine

  • Related to historical way of managing systems at NIKHEF but has the disadventage that a postponed change deployment may break something later on some other nodes…

Xen: 7 hosts, 38 guests

  • Based on QWG but issues with host and guests in different clusters: workaround found

Monitoring entirely based on Nagios with 1 master and 3 slaves

  • Based on QWG but some mods to handle hierarchy of servers that are willing to share

panc v7 to v8 transition: no problem but no performance improvement observed

  • Very happy of new logging features

NCM components:

  • ncm-openvpn to configure server and clients
  • ncm-yaim: complete refactoring/rewriting, new features, some backward incompatible changes

Issues:

  • WN compilation speed-up has some pb with compile-time dependency
  • Strength of community with increase usage: what about the ability to support everybody?

LAPP - E. Fede

Quattor server is running on a VMware virtual machine

  • 110 profiles
  • 4 people using it

Running autobuild of RPMs for NCM components and other core components from SourceForge

  • trunk and tags/latest
  • repodata available from YUM

CERN - V. Lefébure

Main instance: 7500 profiles in 139 clusters

  • +1200 increase
  • 1900 profiles corresponding to machines not managed by Quattor
  • Running the last version of CDB, panc v8 ready
  • v8: 20% improvement in compile time but not yet in production. all know issues solved
  • Problem of use of RECORD type by some components (ncm-httpd, ncm-tomcat….)

Xen-based virtualization: support for SLC5 hypervisors ready

Issues of number of users: 65 ACL groups

Package list templates: working on automation

  • Use of comps.xml
  • Automatic detection of missing dependencies

CDB2SQL: Python version fast but buggy, no manpower to fix it, reverted to previous version

Morgan & Stanley - N. Williams

In production now: AQB, AQDB, LEMON

  • 7500 nodes, compile at 10 minutes (8-core machine) but aii-shellfe –notify at 1h (with patches not yet committed) !
  • 5 template admins
  • New building just commissionned and expect to double the number of machine in the next months, plan to keep one server (+1 for redundancy)

Issues:

  • Format change of XML profiles painful: dropping LINK support forces “big-bang” changes
  • Configuration success feedback: thinking at implementing a DB of last time a component was run, updated by ncm-ncd/ncm-cidspd that could be compared with the timestamp of the last configuration

Will submit code now via SourceForge

  • Waiting for approval to open source : AQDB, FUSE interface to configuration browsing, AII, CCM patches

UAM - Laura del Caño

Luis left, Laura is his replacement.

Proposal of tasks that UAM could handle:

  • Maintenance of monitoring tools
  • openvz support
  • AII

5 clusters

  • Use of ant local tasks for template management
  • Performance tests of new machines configured with Quattor

New component in progress:

  • ncm-amanda to configure Amanda backup SW
  • ncm-pnp4nagios

Some local developments:

  • Postgresql DB to store machine info and group them into categories + ant local task to generate the profile and some other templates (monitoring) for the machine
  • SinDes alternative used to manage secured access to profile (AII hook)

Greek Grid - D. Zilaskos

1 Quattor server to manage 2 clusters representing 133 machines spanning 13 subnets

  • 4 Xen hosts, 19 guests
  • SVN server installed with Trac
  • 3 admins + 2 new people who recently joined

Developpements and issues:

  • Wiki guides for Quattor newbies
  • New components: still in progress… some services like Hydra evolving very quickly
  • Involvement in OAT benefits as work is implemented and tested locally with Quattor
  • Thinking at an administration model for Southern Europe based on GRIF/GRID Ireland experience
  • Lot of small sites, very limited effort available…

CNAF - A. Chierici

90% of the templates adapted to the new schema

  • Inspired by QWG
  • Next step is migration of gLite nodes to SLC5

Xen used on 3 servers providing 16 cores each

  • LHCb T2 running on Xen
  • Quattor used to configure the profiles of guests but Dom0 managed by hand
  • Investigating KVM

Planning to install soon new ncm-yaim and ncm-accounts

CDB vs. SCDB: still thinking about migration but need to investigate the impact for the users

  • Is CDB still supported after ME left ?
  • Who will take care of new core release ?

Presenting a poster about Quattor at CHEP: anyone who could help to present it ?

One new people to help with Quattor at CNAF (Elisabetha)

  • All new sysadmins teached to use Quattor

Aquilion Drive Through - W. Hertlein

Aquilon is an architecture to address system management at M&S, in particular scalability, management delegation, application-centric management…

  • Goal is to install hundred of machines without any manual intervention

AQDB is the CDB replacement: no direct interaction between users and templates, everything goes through the Aquilon broker (AQB)

  • Aquilon configuration stored into AQDB

Workflow of provisionning machines:

  • Rack is the unit of work: racks delivered cabled
  • Limited number of vendors and models
  • When racks is powered on, top of rack (tor_switch) switch send a DHCP request and receive a temporary address that will allow to configure it after discoverin its type
  • Switch entered in AQDB
  • DNS integration : all servers configured with the same information
  • DNS DB built from a periodic dump of AQDB (every 3h)
  • DHCP integration: a DHCP server set to server a specific set of machines
  • Configuration built from a periodic dump of AQDB
  • Discovering machine: done by scripts scanning the tor_switch an dquerying it with snmp get
  • Create a machine entry in AQDB for each discovered MAC entry
  • After a machine has been entered in AQDB, create a machine plenary (PAN) template describing the HW
  • Machine plenary template is an equivalent of SCDB hardware/machine + the service/personnality
  • Creating hosts: logical entry for the host created and associated with a plenary template
  • IP is derived by choosing an available IP from the tor_switch subnet and put in the plenary template to avoid an IP address change triggers recompilation of another host
  • Wait for the propagation of previous information, in particular DNS: may take a couple of hours
  • Next day build out the hosts binding host to required services and use panc to compile the template
  • Invoke aii-shellfe to switch over the install pxe image (done through AQB)
  • DNS + Krb propagation delay: looking for some optimizations in the future
  • Finishing the build process: a script runs on the newly built host to update its information in AQDB
  • Record any management interfaces found and if successful update the status information
  • Grid hand-off: after successful build of a host, it is transferred to an application groups
  • Deployment schedules are set up in advance
  • Use personality to set up a host with application defaults
  • Spread hosts for a group over several racks for better resilience

Managing services: a service may have several instance

  • 1 service instance is bind to a host

Monitoring hosts to meet SLA

  • Hosts are monitored by a daemon that does periodic snmp sweep of all the hosts
  • Hosts are grouped by personality with a threshold of hosts allowed to be offline
  • Host personality also defines a reboot schedule: potential previous threshold violation will prevent the reboot
  • Include some draining capabilities
  • LEMON is enabled as a service to provide visualization of aggregated metrics
  • LEMON configured from Aquilon to guarantee consistency

Updating personalities involve working on a local copy of the personality and creating a new temporary one (AQ domain), compiling it with a fake profile and putting the change back into AQDB

  • Currently no check enforced by ‘aq put’ that the new version compiles. Rely on conventions…
  • A test host can be associated with the new personality and reconfigured to validate the changes
  • AQ then allows to merge changes into a production personality and reconfigure all nodes using it

Monitoring

Monitoring Templates in QWG - S. Kenny

LEMON:

  • NCM components: fmonagent, oramonserver
  • Templates: standard/monitoring/lemon
  • Web front-end via filecopy

Nagios:

  • NCM components: nagios, ncg
  • Templates: standard/monitoring/nagios
  • A few changes to support Nagios 3
  • Hosts created from HW DB
  • Services defied as separate templates and added to NAGIOS_SERVICE_TEMPLATES variable but not very scalable
  • ncm-ncg currently being developped to produce an input file for WLCG NCG which generates the service definition: looks promising

Ganglia: configured with filecopy

  • Would be better to have a component generating the required config file on client and server from the site hierarchi description

MonAMI: configured with filecopy

Currently, every monitoring tool has its own configuration. Ideally monitoring schema should be mostly tool-independant. Proposed model based on current LEMON config:

  • Host: coming from DB_MACHINE
  • Cluster: group of nodes sharing the same node types
  • Super-cluster:
  • Need to be part of the information in the node profile so that it can be used by several components
  • May be connected with some representation of M&S personalities into the schema

Core Components

PAN Compiler - C. Loomis

Code hosted on SourceForge since 8.2.6

  • Bug tracking moved to SF too

Production version is v8

  • v7 deprecated
  • 8.2.3 : CDB compatibility, annotation
  • 8.2.4 : selective debugging
  • 8.2.7 introduces prepend/append functions to replace push/npush

Outstanding bugs:

  • Race condition in validation
  • Corner cases with unintuitive behaviour
  • Enforce final flag in structure templates: no real request…

Enhancement requests (in priority order):

  1. Restricted include (aka entitlements)
  2. Add perf tips to documentation
  3. XInclude directive to replace Embedded
  4. Add OBJECT to debug() and error() output
  5. Add prefix/define statements to shorten literal paths
  6. Internationalize error messages
  7. Enable/disable debugging from within pan
  8. Allow include to take a list (from M&S)

Other ideas/wishes:

  • Better Eclipse integration: editor, debugger, dialog boxes to select options for ant/pan
  • Ability to include a file which is not a template inside a template as an alternative to <<EOF
  • File name should be relative to the loadpath as for other templates
  • Rework string escaping handling for automatic escaping by the compiler and a prefix in the string to know if it has been escaped or not without ambiguity

Restricted include statements: design questions

  • Allow the scope of configuration tree changes to be specified when including a template ?
    • : require modification of templates to do entitlement
  • Specified allowed areas of tree ? or restricted (disallowed) areas ?
  • Need both, discusse if exclude before include or the opposite
  • Can ‘/a’ be fixed but ‘/a/b/c’ be changed ?
  • Enforce entitlement on loadpaths ?
  • What about variables ?
  • Allow wildcards and/or regular expressions on the template names ?

NCM Components - N. Williams

ncm-cron: more smearing

  • frequency may be replaced by timing, a nlist allowing to specify a targert (eg. interval into hours) + a smear interval
  • hours/days/month/… are strings that allow expressions and are validated
  • If present timing takes precedence over frequency

ncm-download: retrieve URL

  • funtionality similar to filecopy but contents download with curl form a URL
  • Support proxies
  • Can use spnego support into curl for authenticated access to URL, using Krb principal for example
  • Support global definition of http server to use and relative URLs
  • Subclassable: a method allow to tell the component the configuration path to use, instead of the default, allowing the component to be called by another component for one specific file
  • An idea to reuse in other components like ncm-filecopy
  • A problem with ncm-ncd which might not match exception

CCM: support for local CCM DB format

  • Specified with dbformat: can be GDBM_File, DB_File, CDB_File (compact DB, nothing to do with Quattor CDB!)
  • Format is stored in file .fmt close to .db file: only the writers need to take care of it, read is properly handled by CCM library
  • Download ‘‘if-modified-since’’ local disk version, ignoring the mtime in CDP notification message
  • ccm-fetch now uses the library coming from ccm-fetch.new (3 years old!)

ncm-cdispd/ncd: would like to add configuration status feedback

  • /var/lib/ncm contains an entry per component:
  • If the file is empty, component must run
  • If run successfully, delete the files
  • If failed, write the exception into the file
  • If a component is inactive, ncm-cdispd remove the file for the component in var/lib/ncm unconditionnaly (without checking its contents)

spma/aii-ks: randomization of proxy choosen when multiple available

  • If the AII server processing the kickstart file is in the list of proxies, use it
  • Support for a list of “installack” servers, requested in // to ensure consistency when several AII servers are used in //
  • Require ncm-nscd start before starting configuration to ensure name service can be restarted without impact on the configuration process

AII: –notify, –firmware, use_fqdn

  • --notify: automatically configures anything thant needs configurating, based on profiles-info.xml
  • --firmwaresets alternate pxe boot target to boot relevant firmware installer based on HW information from the profile
  • Schema change required
  • Reset done by another external tool
  • Global lock no longer requested for aii-shellfe, only for aii-dhcp.
  • Only a lock on the specific client host
  • Fixed caching, created multi-level cache hierarchy where first level is the DNS domain name

QWG

SourceForge Migration - S. Child

Main source repository now on SF as a snapshot (without history)

  • 22 developpers registered: many active developpers still missing
  • Send your SF account to Stephen
  • Build tools basically working but unmaintanable now that ME left

quattor.org now is a wiki (MediaWiki)

  • An account is required to edit pages
  • Feel free to add new pages for experimental stuff without linking it to other pages

Issue tracker used only by panc so far.

  • Would be better if we could use the Trac one as it seems to be available on SF

Download menu: mainly pan currently, difficult to automate updates

Documentation menu is unused as this is only plain web pages

Mailing lists still hosted at CERN: next candidate for migration

  • Create new lists with current subcribers
  • Block submission on mailing lists at CERN

QWG code and documentation still hosted at LAL

  • Trac now available on SF, test it as it would be much easier
  • Check if Trac DB can be reimported and how to migrate SVN (snapshot would be acceptable).

User branches: no problem, create them in branches, currently unused.

Announcing new tags on the mailing list: no consensus, not sure it will be manageable

  • Would be preferable to have a dashboard, let’s discuss it as part of QBT future

Quattor Datawarehousing - N. Williams

Quattor and databases:

  • Managing relations between objects and metadata having integrity constraints should involve a relational model and generally connected to enterprise
  • Managing configuration behaviours should be the task of an appropriate language like PAN
  • What is configured where ? require a DB like CDB2SQL
  • Not only display of existing configuration with different views but also historical views

QuatView: uploads selected information from profiles into MySQL, displays with a web browser

  • Big limitation : single timestamp
  • Currently only MySQL but should be easy to use other back-ends
  • Recent additions/enhancements to have the web client much more flexible and usable, eg. search on every column

CDB2SQL: similar features to QuatView but without the web tool to browse the DB

  • Use DBI API
  • Use DB bulk insert
  • Implemented as a server module that can run anywhere
  • Detect changed profiles
  • -ora version is in fact not Oracle specific (MySQL is still the default) but has a lot of Oracle specific additions (mainly views)
  • -dist version more recent and faster but currently Orcale specific, although not hard to change
  • cdb2sql just schedule uploads of XML that are done by a multi-threaded process

Server modules:

  • CDB and SCDB send CDP notification that profiles have changed
  • SCDB uses a CCM CDP notification sent to the host affected
  • CDB send a message to a server in charge of taking appropriate actions for the affected clients
  • Involves calling cdb2sql-sync

’'’Proposal’’’: use cdb2sql with Quatview web interface

  • Extend schema to provide versioned data : from underlying SCM ? from XML ?
  • Build into the profile by panc ?
  • Include revision tag in data
  • Publish datawarehouse to AII instead of notifying datawarehouse and AII in //

Quattor and Inventory Management

CERN - V. Lefébure

CDB2SQL to retrieve information from Quattor CDB. All other applications using DB produced by CDB2SQL

CERN specific schema extension to describe the HW.

  • Include the template name describing the HW
  • Track warranty contract ID
  • Track power consumption
  • Also used for machines not managed by Quattor to keep track of them in the inventory

CC Tracker: displays a geographical view of the machines managed by Quattor

HMW: application to handle installation, move, rename, retire… of a machine

  • Form-based application with lot of parameters pre-filled with the spread sheet delivered with the machines
  • Interconnect with other DB/apps involved: Remedy, LAN DB

Request for new HW done by users using a Web form

Morgan & Stanley Asset Management - S. d’Aquila

'’Note: M&S interested to hear feedback on how this work may be useful for the community.’’

Aurora was relying heavily on AFS

  • Configuration lives into hand-crafted files, prone to human error, impossible to browse in a reasonnable amount of time
  • Poor data quality of data about hosts
  • No fine-grained entitlement

Aquilon design goals:

  • Model systems by what make them similar rather than different
  • Manage resource at a higher level with no manual intervention and application-centric view
  • Avoid NIH syndrom
  • Vendor neutral with the option to open source
  • Build a system it is easy to interact with
  • Mix of declarative configuration (configure this machine as a mail server) and proscriptive configuration wher the entire configuration is under the control of a configuration management tool.
  • Ensure consistency between server and clients configuration

Why use a DB:

  • Fast efficient retrieval of data with referential integrity
  • Concurrency control, transaction, fine-grain locking for free
  • Be a source of information for legacy systems

Technology choices:

  • Twisted Python: event driven network application framework
  • SqlAlchemy: object relational mapper and SQL toolkit

Assets managed:

  • Hardware: machines and their components
  • Real estate they occupy
  • Namespaces: DNS domains, resetved tcp/udp, users, groups
  • Services

Lot of effort put in architecture definition and taxonomy:

  • Personality: what is machine role ?
  • Archetype: philosophical basis behind the build process
  • Examples: Aquilon, Aurora, Aegis, Windows
  • Future: clusters of VMware, network gear, SAN/NAS devices…
  • Location:a flexible way of hierarchically organizing our stuff (regions, campus, building…)
  • Configuration assumption: it is usually better to use a service instance local rather than remote or at least as closest as possible

A system configuration is made of:

  • Hostname (and associated interfaces)
  • Archetype and its requirements
  • Personality
  • Services it uses structured as an ordered list of templates
  • For each service, define instances: a host providing the service, its location and the template associated
  • Service mapping responsible for automatic selection of instance based on requirements defined somewhere else like archetype, personality

Future directions:

  • Same kind of requirements for archetypes to personalities
  • Advanced entitlement and audit capabilities

Quattor in Amazon Cloud - C. Loomis

AWS vs. Xen:

  • Network ocnfiguration: all machines have private and public IP addresses but users cannot predict or allocate them before starting the machine.
  • Network interface uses the private address for configuration
  • DNS contains only public address, not the private one
  • IP address can be changed on the fly when using Elastic IP
  • hostname command doesn’t return the DNS name associated with the public IP address
  • Installation: PXE not supported for installation of the machine, must start from an existing machine image
  • Must use limited list of supported kernels: only RHEL5 kernels

Current config:

  • Quattor server in the cloud: quattor.stratuslab.org
  • Only packages and profiles, no AII. Only httpd
  • SVN at SixSq
  • Base VM image used already has the basic quattor client installed + a script run during first boot as part of init.d to do the first ccm-fetch and ncm-ncd

Issues and questions:

  • Multiple machines can use the same profile: easy and clean way to define only one WN per site.
  • Machine names not known at compile time : how to link batch server and clients, nfs server and clients ? How to handle late binding ?
  • Change notifications fail : no link between profile name and machine name
  • Allow machines to register for changes ?
  • Move to “chat room” (Jabber?) messaging for changes ?
  • Workflow: how to manage image disks, IP addresses, machine lifecycle ?
  • Should Quattor manage only image instead of machines ?

Virtualization Update - S. Child

Xen:

  • ncm-xen pretty stable but needs some work
  • Should delete managed configuration files when removed from profile
  • Filesystems code should be removed and use ncm-filesystmes
  • QWG helper code: “database” mapping guests to hosts, xen/configure_guests() function
  • configure_guests() populates /software/components/xen/domains from guest templates, automatically set /hardware/location to the DOM0 host name
  • guests and host in different cluster: a SCDB workaround implemented by adding the guest clusters in cluster.pan.includes

Current virtualization deployment:

  • TCD and NIKHEF: Xen + QWG
  • NIKHEF has some local tricks for cross-cluster HW handling
  • CERN: Xen + enclosures
  • Enclosures are close to XEN_DB : a nlist of parents with their children as the value
  • M&S: VMware + XML injection from config into hypervisor
  • No way to manage VMware server with Quattor (black box)

openvz: used by UAM, currently in testing, configured with Quattor

VM migration: only proposal from Luis, on http://quattor.org

  • Is it within the scope of Quattor or should it be integrated in some new lifecycle workflow manager ?
  • Need to find concrete use case for VM migration ?

Enclosures: generic model for host/guests dependencies

  • VM, blades…
  • Move to it ?

European FP7 Quattor Project - C. Loomis

Goal: get additional manpower to make some significant changes to Quattor toolkit and do the documentation

  • Currently a strong community with several developpers
  • But no dedicated people: mainly fixing urgent problems

FP7 Infastructure calls in Sept. 09 related to EGI

  • One of them related to middleware, including tools and services for deployment

Institutional requirements:

  • Need to identify a lead member, CNRS could act
  • Must involve typically, at least 3-5 different countries
  • Partners must be legal bodies, including JRU
  • Both academic and commercial partners

Project requirements:

  • Scale of funding must fit in the call guidelines.
  • Usually significant matching effort required: european funding only covers 50-80% of the project cost
  • Matching resources may be people and manpower
  • Must have JRA, NA and SA componenents
  • Must include dissemination and sustainability plans: should fit easily with the SF move

Project outline proposed:

  • NA: management, quality control, dissemination/training
  • Pre-packaged appliance to ease starting with Quattor
  • SA: build and test infrastructure, release management
  • Focus on Quattor use for gLite, QWG may be a good link with production infrastructure
  • JRA: improvement of current tools, control of cloud and virtualized resources, full lifecycle management, IPv6 support
  • EU has a strong push on IPv6
  • For current tools, focus on move to API allowing language independance
  • Must take into account that SA activities are better funded than others

Concrete tasks for building the project:

  • Need to identify partners and the project activities they are interested in: must cover all major parts of the Quattor toolkit
  • Must define the roadmap for the new features we want
  • Determine plans for dissemination and sustainability

Timeline for preparing the project:

  • June: identify the partners and lead partner
  • Detailed outline of project by Sept.
  • Finalize project description and financial aspects by Dec.

Improving Quattor’s Accessibility - C. Loomis

Quattor jas a steep and difficult learning curve

  • Both for users and developpers
  • Some is linked to the comprehensive nature of the toolkit but not all
  • Some reasons include:
  • Inadequate documentation, despite the improvements
  • Treating Quattor as an all or nothing affairs
  • Leftovers from old projects

Inconsistent branding: Quattor name is very seldom seen in the individual tools

  • Very difficult for new users/admins to identify Quattor parts: daemons, config file names…
  • Move all config files to /etc/quattor
  • Review acronyms and change those which are not useful
  • Need to decide what we want to change and what is the roadmap for a non disruptive change

Documentation:

  • Lack of a good overview document
  • Incomplete and often outdated
  • Difficult to relate to a particular release tool
  • Automate documentation as much as possible
  • For example, use annotation in templates to document ‘‘public’’ variables

Tutorials:

  • Very important but material often outdated
  • Generally focused on managing a whole site with Quattor
  • Need some tutorials on how to incrementally add Quattor to an existing management infrastructure
  • Explore Amazon AWS as a way to provide testing resources to people who want to evaluate Quattor

Multiple APIs/Tools

  • Better document strength and weakness of similar tools (eg. SCDB/CDB)
  • Have a good justification for multiple tools
  • Need a way to mark deprecated components and APIs

Coding practices: have a set a coding practices (at least for Perl-based components) with a good agreement but nearly completely unenforced

Missing functionalities

  • PAN: Eclipse-based editor and debugger
  • Components: lots of duplicated code for skeleton of component, cannot run component as standalone script
  • Hooks for monitoring Quattor status
  • Messaging infrastructure inside Quattor

MEthodology:

  • Agree on a list of problems
  • Evaluate te impact of fixing each of the problems
  • Implement them along with other developments
  • Automate as much as possible: prevent regressing problems that have been fixed

Quattor SW Process - C. Loomis

Current situation:

  • Quattor build tools mostly standardized with some adhoc checks for conventions
  • Nightly build performed from trunk
  • No dashboard of current state of code base
  • Few automated tests of Quattor tools
  • Little gathering of documentation: script exists but need to be integrated into SF
  • No quality checks of the Perl code
  • panc is using Java findbugs, similar things exists for Perl

Continous build: suggest moving to system like Hudson

  • Language agnostic
  • Build when there is a code change
  • Can define multiple builds and hierarchies of builds
  • Provides a dashboard of current and past results
  • Must be done outside SF

Coding standards and development guidelines

  • Good documents from Luis but nothing to enforce them
  • Should start at running Perl::Critic: highly configurable, large set of code checks, possibility to include project specific requirements

Unit tests:

  • Having a good set of unit tests makes refactoring and changing code more reliable
  • Many choices for framework for Perl: Perl’s own may be easiest
  • Need to implement in a way that many tests come along for free
  • Not easy for NCM components as the test should include the action done: need to add ability in underlying libraries to use another root
  • May be used for other purposes like configuring images

Documentation: something like Perl::Tidy may help to generate documentation in a format easily usable in a central place

  • Can tidy perl code extracting POD documentation and producing HTML pages

Quattor build tools: contain many thing relating to code checking that should be moved to more appropriate tools

  • Try to design a new generation of QBT starting with the core features of the current ones (probably tagging releases) and adding later checks with appropriate tools

Future Features

Working groups of people collaborating in developments of new features, before a final, complete release

  • Monitoring, visualization/datawarehousing, virtualization, messaging infrastructure…

Template viewer: evaluate how to move forward based on panc output

  • tplview still based on direct parsing of pan templates

Conclusions

’'’Decisions’’’:

  • Software contribution and fixes: code maintenance is a common responsability, everybody is allowed to commit a change in any component
  • No need to get permission from the maintainer
  • Maintainer is a person with a global overview able to help other peoples with their contributions and who may rev iew the changes

’'’Actions’’’:

  • Merge monitoring nagios3/ templates into the main nagios/ templates: Guillaume/Michel
  • Short term: remove nagios3/ from branches
  • Migrate mailing lists to SF
  • Setup of Trac on SF: Michel
  • Migration of QWG/SCDB to SF: Stephen/Michel, medium priority
  • Roadmap and strategy for consistent naming of command names, config files…
  • Perl module namespace: N. William
  • Command names: Stephen, use quattor- as a common prefix
  • Config files: Michel, move to /etc/quattor
  • Proposal for a new generation QBT : Cal
  • Evaluate possibility of a chat room in SF for support and other real-time discussions
  • Software tools:
  • Autobuild: Eric to look at Hudson and other continous build system
  • Perl::Critic: Cal
  • Test frameworks: Nick
  • Perl::Tidy or similar tools: Stephen
  • Review “dummy WN” hack and try to reimplement in a more sensible way: Stephen
  • Prepare FP7 project: coordinated by Cal, expression of interest by others
  • Make schema more modular if possible: Cal

’'’Next workshop’’’:

  • Proposal: Brussel for the next one, Greece for winter 2010
  • Fix the dates with a Doodle as soon as possible, target: end of October
  • Factor-out specific topics, like gLite templates (1/2 day before or after)
  • // sessions for specific working groups or topics
  • Plan 1 for FP7 project
  • Consider day or 2 tutorial
  • Co-located to allow contacts between developpers and users ?
  • Tutorials at LISA or same
  • Announce widely
  • Fill agenda early: name programe chairs, identify main topics