Quattor Workshop - Strasbourg - 12-14/10/2011
SINDES2 - V. Lefebure
Main new features (see presentations at previous workshops)
- Based on Krb rather than a private CA/certificates.
- Unique API for human clients and machines
Jan left CERN in July but still doing some follow-up to help with the move to production at CERN
- 1st production cluster expected at Christmas
- Migration beginning of next year
Sources will be put in SF Quattor as soon as the code is considered ready for production.
Stijn: still using v1 at UGant.
- Same for Bruxelles grid site where it is mainly used for certificate delivery…
MS: not interested by PKI, already have a production Krb infrastructure
- Chose to add Krb encryption support in ncm-ccm and ncm-download
- Profile encrypted before sending it but stored plain text on the server
RAL: gave up with SINDES
- Will ncm-download with its new support of Krb as a replacement
Quattor Status at MS - N. Williams
MS is doing Quattor its own way… not using Quattor directly, Aquilon a layer above Quattor
- Quattor used underneath
- Aquilon providing many advanced features not found in other config DB implementation: personalities, archetype, template sandboxes for development/testing without interfering with production, multiple panc version support…
- archetype implemented as a pan loadpath
- 20K real machines managed with Aquilon
- 20K virtual machines, including VMware ESX servers using Quattor Remote Deployer (QRD)
- 50 NetApp servers using QRD too
- Some Windows management : pre-configuration (host name; IP config…) passed to Windows team
- No plan to go really further: Windows team relies on its own tools
- Everything managed from the same server
- Reusing QWG machine type concept but extending it (personnalities)
- Working on Hadoop configuration
Still wanting to open-source Aquilon and QRD but lack of time to do the code clean up.
Basically forked AII to fix the issues found.
Need to discuss how to streamline feeding MS contribution into the code mainstream.
Start an Aquilon section on Trac wiki…
Aquilon Appliance Walk-Through - N. Williams
Goal: enable an easier evaluation of Aquilon
- Installing Aquilon to evaluate it can be painful
- Can also be used to do development validation
- Allow to import an existing Quattor configuration
- Zero requirement on existinf infrastructure
Based on Ubuntu Turnkey distribution (intended to create appliances) + sqlite
- Provides interconnection with a datawarehouse using cdb2sql
- Web-based management interface to help creating/managing machines
- Written in Pylons
- ssh connection to appliance where Aquilon command line interface can be used
Several modes implemented using URLs
/reset: allow to reset Aquilon configuration and reinitialize it
- 11 steps before installing the 1st machine : add an archetype, personality, os, define network parameters…
- Machine management
- 3 steps per machine: select machine HW, machine network config (including algorithm to allocate IP address), host associating a machine, an archetype, a personality, an OS version, build status…
Aquilon privilege management: 6 possible roles
Last version of the appliance not yet on SF… but soon!
QRC / QRD - N. Williams
Goal: manage non-Linux machines like appliances, ESX clusters…
- QRC = Quattor Remote Configure (equivalent of
- QRD = Quattor Remote Dispatcher (equivalent of
Idea is to have a machine that acts as a proxy and listens for changes in profiles to implement them on a remote device using a management API for the device.
- Use components in a namespace specific to the device instead of NCM::Component
- QRD is the complex part in charge of doing profile change analysis: support failover between 2 QRD servers
- QRD synchronization rewritten using Apache Zookeeper
QRD listens to CDB notification rather than CDP: the notification is sent when there is a profile change, no matter what is the exact profile modified.
QRC in a Git repository on SF.
- QRD currently being rewritten in Python, not yet in SF.
MS currently managing mainly ESX clusters with QRD/QRC. Planning at managing network switches
We may discuss in the future possibility for common schema for some appliances/devices and a library of connectors for widely used devices like CISCO switches…
Pan Compiler Update - C. Loomis
v9 is the actively developped version
- Currently 8.4.7 with deprecated features removed
- Main change is new options to fix annotation issues with namespaces
- Documentation reorganized: just one book merging previous documentations
- Simplified, streamlined code, no new feature (yet?)
- Migration to clojure: improved support for multithreading
- Is clojure licence (EPL 1.0, Eclipse) an issue?
Migration v8 to v9: use 8.4.7 with deprecrated feature warnings to identify problems
- Use relevant option to turn them into fatal errors
v9 release candidate available but not yet in SF (soon)
- Will be tagged as a release as soon as there is some feedback
QWG Templates Update - M. Jouvin
StratusLab templates: not really part of the QWG templates (yet) but feedback from the community is welcome.
Monitoring Templates - R. Starink
Work in progress behind the scene…
Documentation available on the wiki.
UGant is using icinga and started with a previous version of Nagios templates: another fork…
- Need to see if we can avoid the cost of another future merge…
- icinga as a Nagios fork has a very similar configuration, check if there is the need for a new configuration component
- Need to identify the use cases that cannot be implemented by current QWG templates
Discuss on quattor-devel
See [/wiki/Meetings/Workshops/20110316#DevelopmentProcessDiscussion Michel’s presentation] at last workshop: basically the same questions after almost 1 year of experience with SCRUM.
Scrum / standup meeting
- Agreemenent this is rather positive
- Weekly meeting should happen every week, whether Michel is available or not
- Weekly meeting should be on time: let’s move it to 2:15 pm to increase the chance to start immediatly
- Meetings should remain short for everybody, not only for the late comers…!!!
- Add EVO connection into the reminder
- Have more formal planning/review of sprints and advertize sprint outcome on the mailing list
- Let’s try 1 1/2 month sprint
People working on some developments need to let others know through a
quattor-devel list or participation to standup meetings.
- Ask the general mailing list about the interest of the community
- If yes, propose to have a larger meeting for the sprint review meeting
- Plan them in advance and add an event in Quattor Indico area (and reference it on the wiki)
- Send a reminder 2 weeks in advance
- Difficult to use with SVN + old build tools: need to put some effort to make progress on the migration to new build tools and Git
- Not really a problem for QWG where trunk is really for work in progress
- As discussed at the last workshop, good candidates for migration are CCM-related things (ccm, ncm-cdispd, ncm-ncd, QuattorFS), AII and its plugins
- Encourage people who are contributing major changes to make a
branch, as it has been done for
- Use stand-up/review meetings to decide merging into trunk
=== Kickstart / AII === #AII
Disk partitionning: current scheme has advantages (full control) but doesn’t allow to format large boot disks (> 2GB)
- Support for GPT partitions: need to check if Anaconda supports it
- Using standard Anaconda partitionning features would help taking advantage of new features but should accept to give up some control
- Need to check that reinstallation works in all configuration
Kickstart configuration is done by an AII plugin: could if necessary replace the plugin or add a hook
- NIKHEF already ignores pre-script and produces the kickstart instructions required to use standard Anaconda formatting features
- NIKHEF has written a hook disklayout that could be used as an alternative to aii-ks call to lib-blockdevices
- In fact
lib-blockdevicesalready generates (almost) the required information for Anaconda, just with
Need to add as a standard AII function the function allowing to add a new hook whatever are the hooks already defined
- Such an helper function already exists in hook for SINDES
aii-ks: really fix pbs mentionned in ticket #235 (See
aii-pxelinux: see if it possible to specify the MAC address to use
rather than the interface name
- From http://wiki.centos.org/TipsAndTricks/KickStart:\
A third method works if you are doing PXE based installations. Then you add IPAPPEND 2 to the PXE configuration file and use ksdevice=bootif. In this case anaconda will use the interface that did the PXE boot (this does not necessarily needs to be the first one with a active link).
=== SL/RHEL6 Support === #RHEL6
MAC address binding to interfaces: now done by
udev rather than the
interface config file.
- May be a source of problems as
udevconfig is done by Kickstart before interface config files are configured by Quattor with some risk of inconsistency
- 1 possibility to explore: use non standard name in Quattor config
- Check impact on apps and operations
On compute nodes running a RHEL6 derivative, kernel option
nohz=off) in Grub config to get better performances
- Scheduler option to make it more responsive on desktops
- Define by defaults for RHEL6 derivative?
=== SPMA vs. YUM === #YUM
See Steve Traylen’s https://twiki.cern.ch/twiki/bin/view/Main/SteveTraylen/Spma2Yum.
- All SPMA features can now be implemented with YUM
- Main drawback is the cost of the migration…
Several open questions
- `ncm-yum requires some extensions not handled by anybody
- What is the right level of detail in package list: proposed use of repository snapshots will probably not work accross sites to implement things like OS errata
- We need to implement some kind of metadependency for a packager:
something like a componenent alias name implemented in
- This may replace the need for SPMA as an abstract layer to the packager, something that doesn’t work properly as the package description is quite tightly coupled with the actual packager, making a common schema difficult.
=== Change Scheduling === #ChangeScheduling
MS use cases
spmanot allowed to run on a running host, can only run at reboot
ncm-spmaruns but with the flag to run
spmadisabled to avoid preventing other components to run
- Some actions cannot be scheduled without reinstalling: would be great to let the user knows that his changes will not be applied immediatly
- May be handled at Aquilon (SCDB?) level: will require a partial diff of XML profiles: need to find a tool to do this
Partial configuration changes look difficult as Quattor is really designed to enforce consistency on a “all or nothing” basis.
The most important feature would be the implementation of changes at controlled time.
For SPMA, it’d be nice to establish a black list of packages that should not be upgraded except at boot time.
ncm-ncd: add a mechanism to blacklist some components if some
conditions are not met.
List of possibile conditions/states (local time, boot context…) shoud be well defined and a library/common method should be available for all components to evaluate the state/condition the same way.
- Exact actions done if a condition is met will remain the
responsability of the components or
Would be useful to let a component advertize that a change will require a reboot
- Could be implemented through the status file produced when the component is run
=== Network Configuration === #NetworkConfig
MS main requirements
- policy routing
- ethertools parameters
Some patches contributed for current
ncm-network to handle this but
would be good to rethink the schema
- In particular the component configuration should be moved to
/software/componentsand most of the things currently under
/system/networkshould be moved there to allow an easier support of appliances and non Linux boxes
- In particular most things currently under
- Some children of interfaces in the configuration should be probably at the same level (bonds, vlans…) in the same way we have blockdevices and filesystems
- Gabor will initiate the discussion on the mailing list with a proposal
ncm-network rewrite: a basic version is available in SVN, ready for
- Try to deploy on test machines after initial code review
=== QWG Global Variables === #QWGVars
Hot discussion about use of global variables in QWG
- Contents unvalidated
- Request to put them in the profile to get them validated
- Michel strongly opposed as the state value at one point in time may be misleading: what is important is the value in the context of a given template, something that cannot be tracked in the XML profile
- Side discussion: filecopy unvalided contents
Until we have a pan debugger (no concrete project anymore), some possibilities:
- Ensure there is the appropriate debug statements in the template
- Consider as bugs templates using a variable without any content validation (when it makes sense) and fix them!
- See with Cal if a
dump_global_variables()function would make sense in PAN to dump all global variable at one point in time
- If only a subset of the variables are needed, can probably be done
dump_global_variablecould accept as an arguement a regexp to match the variable name to restrict to variables used in the template
Discuss with call the possibility to dislay with a Pan option the name
of the template entered and wheter it is executed or ignored because it
Document the global variable “namespaces” used in QWG variables.
=== Configuration Component Logging === #CompLogging
Document/agree on what is the expected level of default verbosity for a component (log/info) and what is the debug level to use for what.
=== Writing Components in Python === #CompPython
ncm-ncdshould fork a process to run a component rather than running the component inside
ncm-ncd: not a very big change…
- Implication on locking?
- CCM API in Python: almost done by MS as part of Quattor FS
- No locking implemented yet to access the profile
- Python version of CAF to ensure consistency of some low-level operation: more work…
- Also the idea of using a standard library to access config files and ensure they use a consistent format
Add to future sprints a proof of concept development and attempt to
address the required changes in
Sprint 3 Review
See [/wiki/Development/Scrum/Sprint-2011-03 sprint backlog].
New sprint is [/wiki/Development/Scrum/Sprint-2011-04 sprint 2011-04], due December 1st.