[heroes] Offsite Meeting Minutes 2016-12-02 to 2016-12-04

4 Dec 2016

      What
====
openSUSE Heroes offsite team meeting minutes

Where
=====
At the SUSE Headquarter

When
====
Friday, 2016-12-02 until Sunday, 2016-12-04

Who
===
* All time
  * Daniel Maslowski <info@orangecms.org>
  * Sarah Julia Kriesch <sarah-julia.kriesch@gmx.de>
  * Christian Boltz <opensuse@cboltz.de>
  * Theo Chatzimichos <tampakrap@opensuse.org>
  * Gerhard Schlotter <gschlotter@suse.de>
  * Markus Rückert <mrueckert@suse.de>
  * Lars Vogdt <lrupp@suse.de>
  * Christian Müller <cmueller@suse.de>
  * Thorsten Bro <tbro@suse.de>
* Temporary
  * Christoph Wickert <cwickert@suse.de> (only Friday 11:30)
  * Richard Brown <rbrown@suse.de> (only Sunday)

Topics
======
 * SaltStack training  => Friday
 * SUSE Cloud training => Saturday
 * Packaging workshop  => Skipped
 * Ticket wrangling
 * Securing our infrastructure
 * Documenting our infrastructure
 * admin / infrastructure policy
 * contact persons and their responsibilities (service map)
 * list of machines
 * Mailing lists, their setup and policies
 * Kiwi images
 * build.opensuse.org => infrastructure project
 * Sponsoring update
 * Mirrors and the mirror infrastructure
 * external services => giving them to the right (external) people ?
 * gitlab => waits for freeIPA
 * tasks to take with you (home work :-)  => please update your personal pages
   in the wiki an on connect.o.o
 * monitoring   => done, see below
 * log analysis => discuss deeper during next meeting
 * mediawiki
 * we need to check with Klaas if he wants to maintain hermes AI: tbro
 * key management and access control => see FreeIPA
 * next meeting => regular meeting dates/time
 * key signing party
 * internal key signing party within all the heroes
 * DNSSEC and more use of IPv6 => next meeting (CloudFlare as starting point)
 * reinstall all openSUSE machines every 6 months from scratch?
 * connect.opensuse.org => we are blocked by the board now since more than 12
   months.  What to do with that?

Agenda
======
1. Introduction round
2. Server room tour
3. SaltStack training
 * meanwhile lrupp updated the progress opensuse-admin wiki page a lot
4. Ticket wrangling
 * AI Theo: Migrate the documentation from the redmine wiki to the public one
 * AI Theo: The opensuse-admin wiki will be migrated to a private subproject
   (right now it is public and contains sensitive data)
 * AI Theo: The provo project will be moved to opensuse-admin to proper
   categories (after confirmation (or no objections) from the heroes ml)
 * See these to get more information:
   * https://en.wikipedia.org/wiki/Comparison_of_issue-tracking_systems
   * https://en.wikipedia.org/wiki/Comparison_of_help_desk_issue_tracking_softwar...
 * Lars and Darix will provide testing instances of:
   * Darix: Zammad is recommended => https://zammad.com/
   * Lars: Request Tracker => https://bestpractical.com/request-tracker
   * cmueller OTRS => https://www.otrs.com/
   * tbro BRIMIR => https://getbrimir.com/
5. Documenting our infrastructure
 * admin / infrastructure policy: https://en.opensuse.org/openSUSE:Infrastructure_policy
 * We are going for different root password per host
 * Default user authentication method would be through user ssh key via normal
   user (via FreeIPA)
 * We are going to use password-store as the tool to store the password
 * We need to implement a wiki page with Machinelist and appropriate people per
   service (the page needs to be moved from the internal SUSE redmine wiki)
6. Contact persons and responsibilities
 * Instread trying to provide a static list of services at
   https://en.opensuse.org/index.php?title=openSUSE:Services_help we will
   provide an automatic approach by listing services that are in the monitoring
   setup on http://status.opensuse.org/ and point people to it. This will
   answer the question about the list of machines for our customers
 * Our team page was improved: https://en.opensuse.org/openSUSE:Heroes
   (openSUSE:infrastructure and openSUSE:Services_team redirect there). We
   removed deprecated info, updated accordingly, and put video presentations of
   past oSC.
7. Who is still using static.opensuse.org
8. Sponsoring update
 * There is an offer of a free CDN for opensuse.org <http://opensuse.org/>.by
   the UK based IT company CDN77 - we should try it out and sent them feedback.
   Contact person is Oskar Gottlieb <oskar.gottlieb@cdn77.com> AI: darix, theo
   to do some testing
 * Currently in contact with some external companies who want to sponsor
   hardware for the build service. Our problem here: we need unique hardware to
   make administration more easy. We can provide some specs of our requirements
   and they want to buy the hardware for us. Our main problem might be that
   there is still no foundation, so companies/people can not get a donation
   receipt back from us: they really spend the money/hardware without getting
   anything back from us.
   * The build service always need build power aka machines that act as build
     workers
   * we might also think about some test systems that can be used for testing
     service deployments (that can be deployed in the openSUSE Cloud later)
   * openQA got some new hardware recently, but is also always searching for
     more "workers"
   * new switches or other small infrastructure hardware might also be an option
   * additional storage is also a good idea, but this starts around 7k EUR and
     has an open end. Problem here: we like to get compatible hardware that
     might be used as JBOD for example
   * As future idea: ask our local vendors for a "donation pool", which
     includes some pre-configured machines that can be paid by sponsors
   * AI: SUSE-IT team (especially cmueller and tbro) to provide a list of
     hardware that is planned for openSUSE as a wishlist
   * AI: cmueller to join the donation@o.o list to answer quickly on questions
     around hardware sponsoring
9. Packages and Kiwi images
 * We have a specific project called openSUSE:Infrastructure on
   build.opensuse.org, which contains packages that are not included in the
   base system. The packages should be linkpac'd from Factory if applicable,
   otherwise from the devel repo, and lock the revision.
 * In a sub-project there we build special JeOS images that can be used in the
   openSUSE cluster in NUE at the moment. Theo is preparing new images for the
   openSUSE Cloud in Provo. In case of doubt: this will be 42.2 images. Notes:
   * we need to use a newer kiwi than the one that is shipped with 42.2, that
     is why we added the Virtualization:Appliances:Builder project
   * the boot code did not change from 42.1 to 42.2 - so we use
     oemboot/suse-leap42.1 as boot code
   * the image is very minimal - it only contains a base system (so you can
     run zypper), network tools and a salt-minion. It is missing cloud
     specific packages/tools, which need to be added for the openSUSE Cloud
   * Provisioning is not automated: this means that the Salt key is not
     automatically integrated/accepted. So this will stay manual work for now
     unless someone wants to help Theo with it. As Cloud has some automatism
     in this area ("cloud init"), this might be an easy step - but needs
     someone to work on it. cmueller will join with Theo to get this fixed.
10. Infrastructure repository
 * The openSUSE:Infrastructure repository is used as "production" repository.
   So there is no room for testing packages inside.
 * If you want to test packages or prepare an update, please do this in any
   other repository and submit (or update the linkref of) the package in the
   Infrastructure repository once you are done
 * For testing and production, we might use the Salt feature to lock a package
   on a specific version.
 * If you have more than 2 source packages for a machine/service that need to
   go in the Infrastructure repository, use a sub-project for this. The name
   of the subproject AND the description should note and describe the use case
   of the packages in this repository (include the DNS entry of the machine in
   case of doubt).
 * Please consider moving/integrating your packages into the main distribution
   to make our life easier and to keep the infrastructure repository small.
 * If there is a need to update packages that are in the official openSUSE
   repository, we will NOT put the updated version in the core/main
   Infrastructure project. Instead: these packages need to end up in a
   separate sub-project below the Infrastructure project - so only the
   machines that need those separate packages can add the repository.
 * The Heroes team will review the packages in the Infrastructure repository
   (including sub-projects) every 6 months and clean up (including a check for
   orphaned packages on the machines).
 * TODO: enhance the current policy with the things listed above
 * A lot of effort is spent these days by Theo to move existing VMs to Leap
   and to move packages to the openSUSE:infrastructure repository.
11. FreeIPA demontration by darix
 * the plan is to implement freeIPA at least for the internal DNS and account
   management
 * administrators will get access to a machine based on LDAP groups (and
   kerberos tickets), which makes it easier to track who is who and who gets
   access to which machine
 * As the FreeIPA account will not be connected to the general authentication
   mechanism we use for openSUSE, only openSUSE Administrators will get a
   freeIPA account.
 * Administrators will be able to access the FreeIPA instances via OpenVPN (we
   need policy regarding access)
 * gitlab.opensuse.org (and probably also other services provided just for
   administrators) will also switch over from local authentification or from
   the openSUSE (bugzila) account to freeIPA
 * The openSUSE (bugzilla) and the FreeIPA usernames will need to match - this
   should become a policy ;-)
 * there will be two freeIPA instances for high availability and access reasons:
   * one instance will run in Nuremberg on the normal cluster
   * one instance will run on an Admin node in Provo (outside the Cloud, to be
     able to reach the Cloud DMZ network)
   * these instances will be connected to each other through an OpenVPN tunnel
12. Cloud training by cmueller
 * Cloud tries to hide his functionality behind some fancy names to avoid that
   normal admins understand what is going on. But we are so smooth that we
   understand both sides after this training. :-)
 * Hardware nodes have different names and functionality:
   * Crowbar    => the admin node, used only for managing bare metal stuff
   * Controller => they run services that control the cloud service itself
   * Network    => special controller nodes, that have an additional network
     functionality to provide routing functionality for the virtual machines
     (the "instances" in cloud speech)
   * Storage    => providing only raw block devices that can be used by
     instances
   * Compute    => hosts that really run the virtual machines ("instances" - as
     said above)
   * Instances  => that's what you really want: the virtual machines that
     provide your services
 * The openSUSE Cloud in Provo consists of the following bare metal machines:
   * 1x Admin node => will run the following VMs:
     * Crowbar (also running a Chef server to manage the other Cloud machines)
     * openVPN for Cloud Admins
     * FreeIPA, including an openVPN setup for Heroes (so they can access
       their VMs)
     * Logging server
     * Monitoring server
     * SMT server
   * 2x Controller nodes
     * will become an HA cluster running all needed cloud services
   * 3x Compute nodes
     * will also be network controllers
   * 2x 10Gb switches
   * 1x 1Gb switch
 * [[Place the Image with the overview here]]
 * Network setup:
   * fixed/intern network : this network is statically linked to an instance;
     all instances inside this network can "talk" to each other
   * float network : this is our "provider" network (the one with external
     IPs). No Cloud instance has direct access to an IP of this network - only
     via NAT on the network service side
   * transport/SDN network : we use this network for inter - compute
     communication between our instances. If an instance on compute node 1
     wants to "talk" to an instance on compute node 2, they will use this
     network. This network allows mode fine tuning (therefore the name SDN)
     but for the moment, this is not used.
   * DMZ : the DMZ network provides a web interface for "customers" who want
     to manage their machines (start/stop/reset, get console access, setup new
     machines). We will not expose this DMZ network to the outside, but instead
     provide a dedicated VPN server that allows the openSUSE Heroes to connect
     to it to reach the WebUI (which should not needed so often).
13. Salt topology
 * We will set up separate Syndics in each location, and a Master of Masters in
   the near future in Nuremberg
 * The codebase will be the same for all locations (the same git repo
   repository will be used for both the states and pillars)
 * HA setup is currently secondary
 * The salt masters will use the same openvpn as the FreeIPA to communicate to
   each other
14. Monitoring
 * there will be a non-public monitoring installation in the future
 * we will forward community people to http://status.opensuse.org/ to get the
   "user overview" of our services
 * we need to find a solution for the instances in NUE and PRV: they are
   running in internal, not-routed, firewalled networks - but we want to
   provide one unique WebUI for our Admins to get "the big picture" at once.
   => to be checked/implemented by the monitoring admins
15. Mirrors and the mirror infrastructure
 * In general, we have the following machines behind "download.opensuse.org":
   * mirrordb{3,4} => a PostgreSQL cluster containing the database (85GB size)
   * pontifex3 => the VM behind download.opensuse.org, using mirrorbrain (and
     providing a lot of other vhosts, btw)
   * scanner-opensuse => a VM that is constantly scanning the external mirror
     servers. Currently this VM is inside the SUSE network and should be moved
     to the external cluster. Problem might be, that the external cluster uses
     another IP address, so scans might fail on some mirrors who only allow the
     current IP address. But this is fixable.
 * https://github.com/openSUSE/mirrorpinky should become a WebUI for
   mirrorbrain - but the current status is "sleeping" (no time to work on it).
   If you find someone who wants to help out, ping us ;-)
 * AI Theo: We need to set up a virtual machine as the mirrorbrain management
   machine, so that community people can also do mirror administration
 * The same machine will serve as a secondary scanner machine (the main one
   being behind the SUSE network now). Mirror admins will be notified for the
   new IP, and after six months we will shut down the old scanner.
 * Mirror related tickets need some love
 * The mirror documentation might need updating as well, would be nice if our
   mirror experts could go through over them.
   * https://en.opensuse.org/openSUSE:Mirror_infrastructure for mirror admins
   * https://progress.opensuse.org/projects/opensuse-admin for mirrorbrain admins
16. Discussion with rbrown (openSUSE Board)
 * progress.opensuse.org => openSUSE Heroes will become Admins on the redmine
   instance there to be able to help with the general administration
 * Sponsoring => the board will redirect sponsoring offers around hardware or
   mirroring to donations@opensuse.org, where our admins can take over. Please
   note that we want to "loan" the hardware and not get it completely handed
   over to us.
 * connect.opensuse.org => cleanup of the openSUSE members should be done until
   the new board election early next year. So we can start with the Email
   migration to Heinlein once this is finished.
   * What to do with the application itself? We currently have no maintainer
     for the application - but on the other side the used version is very old
     and needs to be updates. As this tool is containing the user database
     (especially the membership information), it becomes a critical part of the
     infrastructure.
   * As the maintainer is not acting on it any more, we need to find a
     solution. We need a meeting between the openSUSE board, the admin and some
     members of the Heroes.
 * freeIPA => JFYI: for the administration of the openSUSE infrastructure, the
   Heroes will introduce and use FreeIPA for managing a couple of things that
   are currently distributed in terms of management (User accounts and access
   restrictions, certificates and probably also DNS).
 * What about separating openSUSE user accounts from the current Microfocus / 
   Novell / SUSE authentication mechanism? A benefit would be that we might see
   more users/contributors as it would become more clear to them who is getting
   the data (requested once you create an account) and who is using it. That is
   currently a major point why this question here comes up over and over again.
   We see problems with tools like bugzilla that should be able to use other
   authentication systems than the current ones in that case. There is a high
   risk of duplicated log in data, if other authentication systems come into
   play. We might hit other problems in the authentication area and it will
   take probably a long time to get this solved. But if we can not get any
   better solution, we might focus on this.
 * When do we get a foundation? A major benefit from our point was the
   sponsoring, that would be possible with a foundation. This is now partly
   solved (see above). The board itself was blocked during the year with all
   the technical changes (Tumbleweed, Leap, etc.) inside the distribution and
   does not see this question as critical (that's a main reason why this was
   not driven with high priority). Other arguments are legal and generic
   problems comming with such a foundation - which will introduce a lot of
   additional organization overhead. In the end: the main reason behind this
   question (easy sponsoring) is somehow solved meanwhile, so we hopefully only
   need to do a good marketing about "how to sponsor openSUSE" now and can
   finally get rid of the question....
17. opensuse.org services running on non-infra managed machines
 * planet and paste run on non-infra managed machines, which violates our
   policy. Furthermore, people ask us to fix tickets on services we can't
   reach.
   * planet.o.o needs a meeting to discuss the current legal issues that might
   be there (or not anymore).
   * Same for paste.o.o
 * we need a meeting with the board to talk about: paste.o.o, planet and
   connect  AI: Theo to set this up
18. Team meetings
 * next personal meeting will take place during the openSUSE Conference 2017 in
   May (in Nuremberg)
 * IRC meetings should take place once a month at the first Sunday at 19:00 CET
   (18:00 UTC); Topics:
   * ticket wrangling
   * status round:
   * special projects (like the wiki migration)
   * changes since last meeting
   * being available for questions from others
 * We will take meeting minutes after every meeting, rotating the role of the
   meeting taker alphabetically based on username
-- 
Theo Chatzimichos <tampakrap@opensuse.org> <tchatzimichos@suse.com>
System Administrator
SUSE Operations and Services Team