Summary of 21th Quattor workshop (2016-03-22 to 2016-03-24, Bruxelles)
Site News
Bruxelles
- Moving the grid site to HTCondor
- Very old SCDB desynchronized with mainstream SCDB: will restart a new one from scratch
- Or maybe restart with Aquilon if it is ready for production with the template library
RAL
- Work on Ceph: configured with Aquilon + ncm-ceph
- Reworked configuration description for Quattor to disentangle feature configuration and OS configuration
- Effort to get new people using Aquilon, in other groups
- Aquilon: 75% of resources, SCDB: 25%
panc compiler
Not much work in the last month
Threading issue seen at MS mainly: so far never succeeded to find a way to reproduce it in an helpful way
- Clojure migration may have helped but will no append in a foreseeable feature
Possibility to have a documentation explaining the panc architecture and internals to help contributions
- Cal agrees to do it
Suppression of escaping: basically ready in the compiler, need to add the metadata in the profile about it
- /metadata/panc
- Will require a new major version
.tpl removal: also basically ready
- Also require a new major version
Next releases: James will try to learn the panc release process
- Soon: a next minor release with stuff currently merged in master
- Later: a next major release with new metadata for escaping and the .tpl removal
Configuration module status
ncm-metaconfig: support for many services
- No list available in the doc : look at the source tree…
- See how we can improve exposure of available modules
- Plan to extract annotation from schema
- moab: obsoletes ncm-maui and ncm-moab
- One missing feature: ability to run a service-specific validation command on the configuration built before restarting the daemon.
- See https://github.com/quattor/configuration-modules-core/issues/739.
Aquilon
Aquilon and template library for grid components
- No major problem foreseen
- Some work required to turn current personality in the template library into features
Aquilon release: no real progress
- RAL still working on a proper documentation to setup a new Aquilon box
- A package exists but is untested and difficult to test in an existing installation
Aquilon documentation
- Duplication installation documentation: repository README and quattor.org documentation
- Merge contents in documentation, remove it from repo README
- Ensure that the documentation is complete enough to get a working broken usable with the documentation ‘Using Aquilon’
Aquilon configuration development organized around sandboxes which are Git clone of the configuration
- Each sandbox is a branch in the broker Git repo: ability to rebase on production branch or push (publish) changes to production (
aq deploy
)- Git commands used to rebase a snapshot on something else (e.g. production)
- You can also deploy a node from a sandbox for testing (
aq manage --sandbox
) - Changes cannot be made directly into the production branch
aq show_sandbox --all
archetype: used to define disconnected subsets of the configuration, must provide a base
and a final
template
base
included at the very beginning of the configurationfinal
included at the end- Ability to share the template library between archetypes putting the template library in a separate Git repo
- Included in the load path
- Cannot be modified but templates in the template library can be replaced by one in the sandbox/main repo
Quattor version defined as a final
variable in one template: ensures that all nodes are running the same version
- New version testing: create a new snapshot, associate a few test nodes with the snapshot and deploy on them only
clusters: set of different machines sharing the same configuration
features: Aquilon looks for a template config.pan
in the template directory (can contains several directories)
- features are located in the
features/
directory - Order of feature inclusions is not guaranteed but 2 stages (pre and post) in the inclusion process
- Create a meta-feature that includes other features in the appropriate order if necessary and the 2 stages are not enough
- Including features in others features means that their usage cannot be tracked by Aquilon: normally they are added under /metadata in the profile
- Things included by features are generally placed in
common/
directory
service: defines relationship between hosts, for example DNS server to used, based on a “service map”
- A service map defines the criteria to select the host belonging to one location, for example the network
Hardware Templates
For network card, need a manufacturer part in the namespace as in for CPUs
- Agreement to do the necessary changes and cleanup the template names to be devices rather than driver names
In Aquilon, this information is improperly as vendor (–vendor): suggest deprecated this option and adding a –manufacturer option
FreeIPA support
Goal: replace SINDES
FreeIPA will bring Kerberos support.
- Uses LDAP as its db backend
- Provides a one-time password to allows initialization of keytab: no longer usable as soon as the machine has its keytab
- FreeIPA has its own Kerberos infrastructure/realm: to use interact with another Kerberos realm, the recommended approach is a bi-directional trust (machines are put in the FreeIPA realm)
Impact: CCM for profile download, ncm-download for file download, AII for initial configuration
- ncm-download already Kerberos-ready
- AII : already has a hook to initialize/register in IPA
- CCM : need to support the new authentication methods
Proposal: move the download logic to CAF::Download that could be used by all the components that need it (including CCM).
- Stijn working on it, no timeline yet for the full support but 16.4 should bring the basic support allowing FreeIPA initialization and downloading profiles from trusted Kerberized web server
- No support for FreeIPA vault in this initial version (no use of future CAF::Download)
- A PR is already open with preliminary code (not doing the actual download) for CAF::Download
Server-side: any Kerberized web server can act as a server for downloading profiles and files, no need to set up a IPA server (but this is also possible to do it)
- Need to document how to set up a properly configured server for serving profiles
- freeIPA server is required to do the secret management
Ceph support
Examples about to be added to template-library-examples repo (currently in [PR[(https://github.com/quattor/template-library-examples/pull/29))
- Part of these templates could become part of template-library-standard
- Support defining different pools corresponding to different crushmap buckets
- All disks on the machine except sda are considered OSDs
- Component currently don’t use ceph-disk but it may be a useful addition for disks dedicated to Ceph
- Disk properly initialized with the Ceph-related data (like LVM, MD…)
- Would allow to keep disk management outside Quattor: easier for Ceph admins
- Control by Quattor allows to use only a disk partition rather than a full disk for Ceph: useful when using SSD shared between OS and Ceph journal for example
ncm-ceph runs on only one node and connects to all Ceph nodes through ssh to do the appropriate actions
- Not optimal but no easy other way…
- May look at Jewell coming release to see new possibilities: ideally run the component on each Ceph node and do only what is relevant to the node
- In the short term, would be good to allow to disable some part of the configuration in the profile (for example configuration of a specific OSD)
- New ceph-prepare could also help, allowing to assign ID on OSDs rather than trying to retrieve the assigned ID
Cloud
OpenStack support
- Support for all core services available, including Cinder, Ceilometer
- Everything managed with ncm-metacopy but TT files delivered with filecopy: need to add unit tests before putting as a standard metaconfig service
- No specific OpenStack tree in the configuration
VM management with Quattor
- LAL way: a very basic image with a contextualization script that triggers Quattor installation and configuration
- No impact on Quattor except removing network information from the profile
- RAL doing a similar approach: goal is to hide the configuration system from the user provisioning the resource
- UGent: profile contains the information that the profile is a VM and aii-configure instantiate and configure the new VM with the profile
Schema extension for VMs
- Currently used by UGent to pass information to AII about some aspects of the VM configuration: networks to use, datastore, graphics…
- Proposal: /system/cloud defined as an open dict in the schema and binded to something more specific based on the cloud MW used
- Discuss if an abstract description common to all cloud MW make sense
Cloud resource provisioning: RAL integrated it into their cloud dashboard
- Interaction with Aquilon behind the scene
CAF
Proposal to add support for event tracking (CAF::History, See PR)
- Objective: makes components using CAF modules to be able to report about what was done by CAF
- For example track/report changed files
- Will also allow to remove files created by Quattor when a configuration module (or a metaconfig service) is removed, if desirable
- CAF::History allows to build a list of event that can be processed later
- ncm-ncd PR demonstrates the use of CAF::History to report all the touched files at the end of a component run
- Could imagine to flush the event list to a database at the end of a ncm-ncd run to keep track of modifications made by Quattor
- Nathan would prefer immediate reporting of the events rather than a deferred one (in case a component crashes before reporting occurred)
- Improve current implementation to track if an event has already been reported and makes possible to call the report method several times still reporting event once.
- A global property could be taken into account to decide when an event is added to the list whether immediate reporting should occur.
CAF::Check: a wrapper around LC:Check that allows proper reporting, suppresses raising exceptions and implements CAF::NoAction
- May help moving out of LC::Check at some point
Miscellanous issues
See discuss@workshop issues in GitHub
ncm-query: ccm
is a functional replacement and has a --format ncmquery
that is mostly compatible with ncm-query
- Pb: ncm-query format is undefined so difficult to assess compatibility
- Discussion about
--format ncmquery
: agreement to remove it as it creates a dependency on this undefined format in the future- In script, it is recommended to move to the json output of
ccm
- In script, it is recommended to move to the json output of
ncm-spma:
- PR open to move YUM repo definition to a non standard directory (/etc/yum.quattor.repo.d), defined as the default one in /etc/yum.conf, to avoid problems due to RPMs installing repo file that will interfere with Quattor deployment later
- yumng plugin: agreement to merge it as soon as MS has merged the last changes
- yum plugin: run
yum clean
for a repo about to be removed (else it will not be clean by later runs ofyum clean
)
push() imporperly works in for loop in some circumstances: see https://github.com/quattor/template-library-core/issues/104
- Agreement that the behavior exhibited in the issue is incorrect
- Test the PR against the template library and checks that it produces no difference (that nothing depends on the wrong behavior
- PR uses append internally
ncm-fstab/ncm-filesystems: duplicated code that cannot be easily removed (https://github.com/quattor/configuration-modules-core/pull/600)
- Proposal: incorporate ncm-filesystems features into ncm-fstab as an optional feature, disabled by default
- Agreement that something should be done, proposed approach reasonable but ncm-fstab may become quite misleading
- Need to ensure that the new component addresses all the corner cases in term of /etc/fstab management
RPMs more compliant with rpmlint (https://github.com/quattor/maven-tools/pull/78): will require a ChangeLog but agreement that the real value of the ChangeLog will be very limited
- One minimalistic option would be to put a link to the web site but is it adding any real value? Nobody considering that it adds any value, may just ship an empty one
Duplication between aii-dhcp (from aii-core) and aii-dhcp dhcp.pm module (not used anywhere): https://github.com/quattor/aii/issues/166
- Probably some uncompleted work: need more investigation
Template library PR testing: explore using Travis instead of Jenkins
- Check the test setup in template-library-core (pan linter)
Documentation
- Complete the development documentation
ReadtheDocs documentation
- latest contains the last version of configuration-modules-core, ccm and a few other things
- Need to add configuration-modules-grid
- schema parsed to retrieved type of properties and whether they are required or optional (may want to add the default value in the future)
- Attempt to integrate it into the release process for next release
- When this is done, update web site documentation to redirect to ReadTheDocs
Next release
16.4 planned end of April
- RC expected to start around the 25th