Mailinglist Archive: heroes (21 mails)

< Previous Next >
[heroes] Offsite Meeting Minutes 2016-12-02 to 2016-12-04
What
====
openSUSE Heroes offsite team meeting minutes

Where
=====
At the SUSE Headquarter

When
====
Friday, 2016-12-02 until Sunday, 2016-12-04

Who
===
* All time
* Daniel Maslowski <info@xxxxxxxxxxxxx>
* Sarah Julia Kriesch <sarah-julia.kriesch@xxxxxx>
* Christian Boltz <opensuse@xxxxxxxxx>
* Theo Chatzimichos <tampakrap@xxxxxxxxxxxx>
* Gerhard Schlotter <gschlotter@xxxxxxx>
* Markus Rückert <mrueckert@xxxxxxx>
* Lars Vogdt <lrupp@xxxxxxx>
* Christian Müller <cmueller@xxxxxxx>
* Thorsten Bro <tbro@xxxxxxx>
* Temporary
* Christoph Wickert <cwickert@xxxxxxx> (only Friday 11:30)
* Richard Brown <rbrown@xxxxxxx> (only Sunday)

Topics
======
* SaltStack training => Friday
* SUSE Cloud training => Saturday
* Packaging workshop => Skipped
* Ticket wrangling
* Securing our infrastructure
* Documenting our infrastructure
* admin / infrastructure policy
* contact persons and their responsibilities (service map)
* list of machines
* Mailing lists, their setup and policies
* Kiwi images
* build.opensuse.org => infrastructure project
* Sponsoring update
* Mirrors and the mirror infrastructure
* external services => giving them to the right (external) people ?
* gitlab => waits for freeIPA
* tasks to take with you (home work :-) => please update your personal pages
in the wiki an on connect.o.o
* monitoring => done, see below
* log analysis => discuss deeper during next meeting
* mediawiki
* we need to check with Klaas if he wants to maintain hermes AI: tbro
* key management and access control => see FreeIPA
* next meeting => regular meeting dates/time
* key signing party
* internal key signing party within all the heroes
* DNSSEC and more use of IPv6 => next meeting (CloudFlare as starting point)
* reinstall all openSUSE machines every 6 months from scratch?
* connect.opensuse.org => we are blocked by the board now since more than 12
months. What to do with that?

Agenda
======
1. Introduction round
2. Server room tour
3. SaltStack training
* meanwhile lrupp updated the progress opensuse-admin wiki page a lot
4. Ticket wrangling
* AI Theo: Migrate the documentation from the redmine wiki to the public one
* AI Theo: The opensuse-admin wiki will be migrated to a private subproject
(right now it is public and contains sensitive data)
* AI Theo: The provo project will be moved to opensuse-admin to proper
categories (after confirmation (or no objections) from the heroes ml)
* See these to get more information:
* https://en.wikipedia.org/wiki/Comparison_of_issue-tracking_systems
*
https://en.wikipedia.org/wiki/Comparison_of_help_desk_issue_tracking_software
* Lars and Darix will provide testing instances of:
* Darix: Zammad is recommended => https://zammad.com/
* Lars: Request Tracker => https://bestpractical.com/request-tracker
* cmueller OTRS => https://www.otrs.com/
* tbro BRIMIR => https://getbrimir.com/
5. Documenting our infrastructure
* admin / infrastructure policy:
https://en.opensuse.org/openSUSE:Infrastructure_policy
* We are going for different root password per host
* Default user authentication method would be through user ssh key via normal
user (via FreeIPA)
* We are going to use password-store as the tool to store the password
* We need to implement a wiki page with Machinelist and appropriate people per
service (the page needs to be moved from the internal SUSE redmine wiki)
6. Contact persons and responsibilities
* Instread trying to provide a static list of services at
https://en.opensuse.org/index.php?title=openSUSE:Services_help we will
provide an automatic approach by listing services that are in the monitoring
setup on http://status.opensuse.org/ and point people to it. This will
answer the question about the list of machines for our customers
* Our team page was improved: https://en.opensuse.org/openSUSE:Heroes
(openSUSE:infrastructure and openSUSE:Services_team redirect there). We
removed deprecated info, updated accordingly, and put video presentations of
past oSC.
7. Who is still using static.opensuse.org
8. Sponsoring update
* There is an offer of a free CDN for opensuse.org <http://opensuse.org/>.by
the UK based IT company CDN77 - we should try it out and sent them feedback.
Contact person is Oskar Gottlieb <oskar.gottlieb@xxxxxxxxx> AI: darix, theo
to do some testing
* Currently in contact with some external companies who want to sponsor
hardware for the build service. Our problem here: we need unique hardware to
make administration more easy. We can provide some specs of our requirements
and they want to buy the hardware for us. Our main problem might be that
there is still no foundation, so companies/people can not get a donation
receipt back from us: they really spend the money/hardware without getting
anything back from us.
* The build service always need build power aka machines that act as build
workers
* we might also think about some test systems that can be used for testing
service deployments (that can be deployed in the openSUSE Cloud later)
* openQA got some new hardware recently, but is also always searching for
more "workers"
* new switches or other small infrastructure hardware might also be an option
* additional storage is also a good idea, but this starts around 7k EUR and
has an open end. Problem here: we like to get compatible hardware that
might be used as JBOD for example
* As future idea: ask our local vendors for a "donation pool", which
includes some pre-configured machines that can be paid by sponsors
* AI: SUSE-IT team (especially cmueller and tbro) to provide a list of
hardware that is planned for openSUSE as a wishlist
* AI: cmueller to join the donation@o.o list to answer quickly on questions
around hardware sponsoring
9. Packages and Kiwi images
* We have a specific project called openSUSE:Infrastructure on
build.opensuse.org, which contains packages that are not included in the
base system. The packages should be linkpac'd from Factory if applicable,
otherwise from the devel repo, and lock the revision.
* In a sub-project there we build special JeOS images that can be used in the
openSUSE cluster in NUE at the moment. Theo is preparing new images for the
openSUSE Cloud in Provo. In case of doubt: this will be 42.2 images. Notes:
* we need to use a newer kiwi than the one that is shipped with 42.2, that
is why we added the Virtualization:Appliances:Builder project
* the boot code did not change from 42.1 to 42.2 - so we use
oemboot/suse-leap42.1 as boot code
* the image is very minimal - it only contains a base system (so you can
run zypper), network tools and a salt-minion. It is missing cloud
specific packages/tools, which need to be added for the openSUSE Cloud
* Provisioning is not automated: this means that the Salt key is not
automatically integrated/accepted. So this will stay manual work for now
unless someone wants to help Theo with it. As Cloud has some automatism
in this area ("cloud init"), this might be an easy step - but needs
someone to work on it. cmueller will join with Theo to get this fixed.
10. Infrastructure repository
* The openSUSE:Infrastructure repository is used as "production" repository.
So there is no room for testing packages inside.
* If you want to test packages or prepare an update, please do this in any
other repository and submit (or update the linkref of) the package in the
Infrastructure repository once you are done
* For testing and production, we might use the Salt feature to lock a package
on a specific version.
* If you have more than 2 source packages for a machine/service that need to
go in the Infrastructure repository, use a sub-project for this. The name
of the subproject AND the description should note and describe the use case
of the packages in this repository (include the DNS entry of the machine in
case of doubt).
* Please consider moving/integrating your packages into the main distribution
to make our life easier and to keep the infrastructure repository small.
* If there is a need to update packages that are in the official openSUSE
repository, we will NOT put the updated version in the core/main
Infrastructure project. Instead: these packages need to end up in a
separate sub-project below the Infrastructure project - so only the
machines that need those separate packages can add the repository.
* The Heroes team will review the packages in the Infrastructure repository
(including sub-projects) every 6 months and clean up (including a check for
orphaned packages on the machines).
* TODO: enhance the current policy with the things listed above
* A lot of effort is spent these days by Theo to move existing VMs to Leap
and to move packages to the openSUSE:infrastructure repository.
11. FreeIPA demontration by darix
* the plan is to implement freeIPA at least for the internal DNS and account
management
* administrators will get access to a machine based on LDAP groups (and
kerberos tickets), which makes it easier to track who is who and who gets
access to which machine
* As the FreeIPA account will not be connected to the general authentication
mechanism we use for openSUSE, only openSUSE Administrators will get a
freeIPA account.
* Administrators will be able to access the FreeIPA instances via OpenVPN (we
need policy regarding access)
* gitlab.opensuse.org (and probably also other services provided just for
administrators) will also switch over from local authentification or from
the openSUSE (bugzila) account to freeIPA
* The openSUSE (bugzilla) and the FreeIPA usernames will need to match - this
should become a policy ;-)
* there will be two freeIPA instances for high availability and access reasons:
* one instance will run in Nuremberg on the normal cluster
* one instance will run on an Admin node in Provo (outside the Cloud, to be
able to reach the Cloud DMZ network)
* these instances will be connected to each other through an OpenVPN tunnel
12. Cloud training by cmueller
* Cloud tries to hide his functionality behind some fancy names to avoid that
normal admins understand what is going on. But we are so smooth that we
understand both sides after this training. :-)
* Hardware nodes have different names and functionality:
* Crowbar => the admin node, used only for managing bare metal stuff
* Controller => they run services that control the cloud service itself
* Network => special controller nodes, that have an additional network
functionality to provide routing functionality for the virtual machines
(the "instances" in cloud speech)
* Storage => providing only raw block devices that can be used by
instances
* Compute => hosts that really run the virtual machines ("instances" - as
said above)
* Instances => that's what you really want: the virtual machines that
provide your services
* The openSUSE Cloud in Provo consists of the following bare metal machines:
* 1x Admin node => will run the following VMs:
* Crowbar (also running a Chef server to manage the other Cloud machines)
* openVPN for Cloud Admins
* FreeIPA, including an openVPN setup for Heroes (so they can access
their VMs)
* Logging server
* Monitoring server
* SMT server
* 2x Controller nodes
* will become an HA cluster running all needed cloud services
* 3x Compute nodes
* will also be network controllers
* 2x 10Gb switches
* 1x 1Gb switch
* [[Place the Image with the overview here]]
* Network setup:
* fixed/intern network : this network is statically linked to an instance;
all instances inside this network can "talk" to each other
* float network : this is our "provider" network (the one with external
IPs). No Cloud instance has direct access to an IP of this network - only
via NAT on the network service side
* transport/SDN network : we use this network for inter - compute
communication between our instances. If an instance on compute node 1
wants to "talk" to an instance on compute node 2, they will use this
network. This network allows mode fine tuning (therefore the name SDN)
but for the moment, this is not used.
* DMZ : the DMZ network provides a web interface for "customers" who want
to manage their machines (start/stop/reset, get console access, setup new
machines). We will not expose this DMZ network to the outside, but instead
provide a dedicated VPN server that allows the openSUSE Heroes to connect
to it to reach the WebUI (which should not needed so often).
13. Salt topology
* We will set up separate Syndics in each location, and a Master of Masters in
the near future in Nuremberg
* The codebase will be the same for all locations (the same git repo
repository will be used for both the states and pillars)
* HA setup is currently secondary
* The salt masters will use the same openvpn as the FreeIPA to communicate to
each other
14. Monitoring
* there will be a non-public monitoring installation in the future
* we will forward community people to http://status.opensuse.org/ to get the
"user overview" of our services
* we need to find a solution for the instances in NUE and PRV: they are
running in internal, not-routed, firewalled networks - but we want to
provide one unique WebUI for our Admins to get "the big picture" at once.
=> to be checked/implemented by the monitoring admins
15. Mirrors and the mirror infrastructure
* In general, we have the following machines behind "download.opensuse.org":
* mirrordb{3,4} => a PostgreSQL cluster containing the database (85GB size)
* pontifex3 => the VM behind download.opensuse.org, using mirrorbrain (and
providing a lot of other vhosts, btw)
* scanner-opensuse => a VM that is constantly scanning the external mirror
servers. Currently this VM is inside the SUSE network and should be moved
to the external cluster. Problem might be, that the external cluster uses
another IP address, so scans might fail on some mirrors who only allow the
current IP address. But this is fixable.
* https://github.com/openSUSE/mirrorpinky should become a WebUI for
mirrorbrain - but the current status is "sleeping" (no time to work on it).
If you find someone who wants to help out, ping us ;-)
* AI Theo: We need to set up a virtual machine as the mirrorbrain management
machine, so that community people can also do mirror administration
* The same machine will serve as a secondary scanner machine (the main one
being behind the SUSE network now). Mirror admins will be notified for the
new IP, and after six months we will shut down the old scanner.
* Mirror related tickets need some love
* The mirror documentation might need updating as well, would be nice if our
mirror experts could go through over them.
* https://en.opensuse.org/openSUSE:Mirror_infrastructure for mirror admins
* https://progress.opensuse.org/projects/opensuse-admin for mirrorbrain
admins
16. Discussion with rbrown (openSUSE Board)
* progress.opensuse.org => openSUSE Heroes will become Admins on the redmine
instance there to be able to help with the general administration
* Sponsoring => the board will redirect sponsoring offers around hardware or
mirroring to donations@xxxxxxxxxxxx, where our admins can take over. Please
note that we want to "loan" the hardware and not get it completely handed
over to us.
* connect.opensuse.org => cleanup of the openSUSE members should be done until
the new board election early next year. So we can start with the Email
migration to Heinlein once this is finished.
* What to do with the application itself? We currently have no maintainer
for the application - but on the other side the used version is very old
and needs to be updates. As this tool is containing the user database
(especially the membership information), it becomes a critical part of the
infrastructure.
* As the maintainer is not acting on it any more, we need to find a
solution. We need a meeting between the openSUSE board, the admin and some
members of the Heroes.
* freeIPA => JFYI: for the administration of the openSUSE infrastructure, the
Heroes will introduce and use FreeIPA for managing a couple of things that
are currently distributed in terms of management (User accounts and access
restrictions, certificates and probably also DNS).
* What about separating openSUSE user accounts from the current Microfocus /
Novell / SUSE authentication mechanism? A benefit would be that we might see
more users/contributors as it would become more clear to them who is getting
the data (requested once you create an account) and who is using it. That is
currently a major point why this question here comes up over and over again.
We see problems with tools like bugzilla that should be able to use other
authentication systems than the current ones in that case. There is a high
risk of duplicated log in data, if other authentication systems come into
play. We might hit other problems in the authentication area and it will
take probably a long time to get this solved. But if we can not get any
better solution, we might focus on this.
* When do we get a foundation? A major benefit from our point was the
sponsoring, that would be possible with a foundation. This is now partly
solved (see above). The board itself was blocked during the year with all
the technical changes (Tumbleweed, Leap, etc.) inside the distribution and
does not see this question as critical (that's a main reason why this was
not driven with high priority). Other arguments are legal and generic
problems comming with such a foundation - which will introduce a lot of
additional organization overhead. In the end: the main reason behind this
question (easy sponsoring) is somehow solved meanwhile, so we hopefully only
need to do a good marketing about "how to sponsor openSUSE" now and can
finally get rid of the question....
17. opensuse.org services running on non-infra managed machines
* planet and paste run on non-infra managed machines, which violates our
policy. Furthermore, people ask us to fix tickets on services we can't
reach.
* planet.o.o needs a meeting to discuss the current legal issues that might
be there (or not anymore).
* Same for paste.o.o
* we need a meeting with the board to talk about: paste.o.o, planet and
connect AI: Theo to set this up
18. Team meetings
* next personal meeting will take place during the openSUSE Conference 2017 in
May (in Nuremberg)
* IRC meetings should take place once a month at the first Sunday at 19:00 CET
(18:00 UTC); Topics:
* ticket wrangling
* status round:
* special projects (like the wiki migration)
* changes since last meeting
* being available for questions from others
* We will take meeting minutes after every meeting, rotating the role of the
meeting taker alphabetically based on username
--
Theo Chatzimichos <tampakrap@xxxxxxxxxxxx> <tchatzimichos@xxxxxxxx>
System Administrator
SUSE Operations and Services Team
< Previous Next >