What ==== openSUSE Heroes offsite team meeting minutes Where ===== At the SUSE Headquarter When ==== Friday, 2016-12-02 until Sunday, 2016-12-04 Who === * All time * Daniel Maslowski <info@orangecms.org> * Sarah Julia Kriesch <sarah-julia.kriesch@gmx.de> * Christian Boltz <opensuse@cboltz.de> * Theo Chatzimichos <tampakrap@opensuse.org> * Gerhard Schlotter <gschlotter@suse.de> * Markus Rückert <mrueckert@suse.de> * Lars Vogdt <lrupp@suse.de> * Christian Müller <cmueller@suse.de> * Thorsten Bro <tbro@suse.de> * Temporary * Christoph Wickert <cwickert@suse.de> (only Friday 11:30) * Richard Brown <rbrown@suse.de> (only Sunday) Topics ====== * SaltStack training => Friday * SUSE Cloud training => Saturday * Packaging workshop => Skipped * Ticket wrangling * Securing our infrastructure * Documenting our infrastructure * admin / infrastructure policy * contact persons and their responsibilities (service map) * list of machines * Mailing lists, their setup and policies * Kiwi images * build.opensuse.org => infrastructure project * Sponsoring update * Mirrors and the mirror infrastructure * external services => giving them to the right (external) people ? * gitlab => waits for freeIPA * tasks to take with you (home work :-) => please update your personal pages in the wiki an on connect.o.o * monitoring => done, see below * log analysis => discuss deeper during next meeting * mediawiki * we need to check with Klaas if he wants to maintain hermes AI: tbro * key management and access control => see FreeIPA * next meeting => regular meeting dates/time * key signing party * internal key signing party within all the heroes * DNSSEC and more use of IPv6 => next meeting (CloudFlare as starting point) * reinstall all openSUSE machines every 6 months from scratch? * connect.opensuse.org => we are blocked by the board now since more than 12 months. What to do with that? Agenda ====== 1. Introduction round 2. Server room tour 3. SaltStack training * meanwhile lrupp updated the progress opensuse-admin wiki page a lot 4. Ticket wrangling * AI Theo: Migrate the documentation from the redmine wiki to the public one * AI Theo: The opensuse-admin wiki will be migrated to a private subproject (right now it is public and contains sensitive data) * AI Theo: The provo project will be moved to opensuse-admin to proper categories (after confirmation (or no objections) from the heroes ml) * See these to get more information: * https://en.wikipedia.org/wiki/Comparison_of_issue-tracking_systems * https://en.wikipedia.org/wiki/Comparison_of_help_desk_issue_tracking_softwar... * Lars and Darix will provide testing instances of: * Darix: Zammad is recommended => https://zammad.com/ * Lars: Request Tracker => https://bestpractical.com/request-tracker * cmueller OTRS => https://www.otrs.com/ * tbro BRIMIR => https://getbrimir.com/ 5. Documenting our infrastructure * admin / infrastructure policy: https://en.opensuse.org/openSUSE:Infrastructure_policy * We are going for different root password per host * Default user authentication method would be through user ssh key via normal user (via FreeIPA) * We are going to use password-store as the tool to store the password * We need to implement a wiki page with Machinelist and appropriate people per service (the page needs to be moved from the internal SUSE redmine wiki) 6. Contact persons and responsibilities * Instread trying to provide a static list of services at https://en.opensuse.org/index.php?title=openSUSE:Services_help we will provide an automatic approach by listing services that are in the monitoring setup on http://status.opensuse.org/ and point people to it. This will answer the question about the list of machines for our customers * Our team page was improved: https://en.opensuse.org/openSUSE:Heroes (openSUSE:infrastructure and openSUSE:Services_team redirect there). We removed deprecated info, updated accordingly, and put video presentations of past oSC. 7. Who is still using static.opensuse.org 8. Sponsoring update * There is an offer of a free CDN for opensuse.org <http://opensuse.org/>.by the UK based IT company CDN77 - we should try it out and sent them feedback. Contact person is Oskar Gottlieb <oskar.gottlieb@cdn77.com> AI: darix, theo to do some testing * Currently in contact with some external companies who want to sponsor hardware for the build service. Our problem here: we need unique hardware to make administration more easy. We can provide some specs of our requirements and they want to buy the hardware for us. Our main problem might be that there is still no foundation, so companies/people can not get a donation receipt back from us: they really spend the money/hardware without getting anything back from us. * The build service always need build power aka machines that act as build workers * we might also think about some test systems that can be used for testing service deployments (that can be deployed in the openSUSE Cloud later) * openQA got some new hardware recently, but is also always searching for more "workers" * new switches or other small infrastructure hardware might also be an option * additional storage is also a good idea, but this starts around 7k EUR and has an open end. Problem here: we like to get compatible hardware that might be used as JBOD for example * As future idea: ask our local vendors for a "donation pool", which includes some pre-configured machines that can be paid by sponsors * AI: SUSE-IT team (especially cmueller and tbro) to provide a list of hardware that is planned for openSUSE as a wishlist * AI: cmueller to join the donation@o.o list to answer quickly on questions around hardware sponsoring 9. Packages and Kiwi images * We have a specific project called openSUSE:Infrastructure on build.opensuse.org, which contains packages that are not included in the base system. The packages should be linkpac'd from Factory if applicable, otherwise from the devel repo, and lock the revision. * In a sub-project there we build special JeOS images that can be used in the openSUSE cluster in NUE at the moment. Theo is preparing new images for the openSUSE Cloud in Provo. In case of doubt: this will be 42.2 images. Notes: * we need to use a newer kiwi than the one that is shipped with 42.2, that is why we added the Virtualization:Appliances:Builder project * the boot code did not change from 42.1 to 42.2 - so we use oemboot/suse-leap42.1 as boot code * the image is very minimal - it only contains a base system (so you can run zypper), network tools and a salt-minion. It is missing cloud specific packages/tools, which need to be added for the openSUSE Cloud * Provisioning is not automated: this means that the Salt key is not automatically integrated/accepted. So this will stay manual work for now unless someone wants to help Theo with it. As Cloud has some automatism in this area ("cloud init"), this might be an easy step - but needs someone to work on it. cmueller will join with Theo to get this fixed. 10. Infrastructure repository * The openSUSE:Infrastructure repository is used as "production" repository. So there is no room for testing packages inside. * If you want to test packages or prepare an update, please do this in any other repository and submit (or update the linkref of) the package in the Infrastructure repository once you are done * For testing and production, we might use the Salt feature to lock a package on a specific version. * If you have more than 2 source packages for a machine/service that need to go in the Infrastructure repository, use a sub-project for this. The name of the subproject AND the description should note and describe the use case of the packages in this repository (include the DNS entry of the machine in case of doubt). * Please consider moving/integrating your packages into the main distribution to make our life easier and to keep the infrastructure repository small. * If there is a need to update packages that are in the official openSUSE repository, we will NOT put the updated version in the core/main Infrastructure project. Instead: these packages need to end up in a separate sub-project below the Infrastructure project - so only the machines that need those separate packages can add the repository. * The Heroes team will review the packages in the Infrastructure repository (including sub-projects) every 6 months and clean up (including a check for orphaned packages on the machines). * TODO: enhance the current policy with the things listed above * A lot of effort is spent these days by Theo to move existing VMs to Leap and to move packages to the openSUSE:infrastructure repository. 11. FreeIPA demontration by darix * the plan is to implement freeIPA at least for the internal DNS and account management * administrators will get access to a machine based on LDAP groups (and kerberos tickets), which makes it easier to track who is who and who gets access to which machine * As the FreeIPA account will not be connected to the general authentication mechanism we use for openSUSE, only openSUSE Administrators will get a freeIPA account. * Administrators will be able to access the FreeIPA instances via OpenVPN (we need policy regarding access) * gitlab.opensuse.org (and probably also other services provided just for administrators) will also switch over from local authentification or from the openSUSE (bugzila) account to freeIPA * The openSUSE (bugzilla) and the FreeIPA usernames will need to match - this should become a policy ;-) * there will be two freeIPA instances for high availability and access reasons: * one instance will run in Nuremberg on the normal cluster * one instance will run on an Admin node in Provo (outside the Cloud, to be able to reach the Cloud DMZ network) * these instances will be connected to each other through an OpenVPN tunnel 12. Cloud training by cmueller * Cloud tries to hide his functionality behind some fancy names to avoid that normal admins understand what is going on. But we are so smooth that we understand both sides after this training. :-) * Hardware nodes have different names and functionality: * Crowbar => the admin node, used only for managing bare metal stuff * Controller => they run services that control the cloud service itself * Network => special controller nodes, that have an additional network functionality to provide routing functionality for the virtual machines (the "instances" in cloud speech) * Storage => providing only raw block devices that can be used by instances * Compute => hosts that really run the virtual machines ("instances" - as said above) * Instances => that's what you really want: the virtual machines that provide your services * The openSUSE Cloud in Provo consists of the following bare metal machines: * 1x Admin node => will run the following VMs: * Crowbar (also running a Chef server to manage the other Cloud machines) * openVPN for Cloud Admins * FreeIPA, including an openVPN setup for Heroes (so they can access their VMs) * Logging server * Monitoring server * SMT server * 2x Controller nodes * will become an HA cluster running all needed cloud services * 3x Compute nodes * will also be network controllers * 2x 10Gb switches * 1x 1Gb switch * [[Place the Image with the overview here]] * Network setup: * fixed/intern network : this network is statically linked to an instance; all instances inside this network can "talk" to each other * float network : this is our "provider" network (the one with external IPs). No Cloud instance has direct access to an IP of this network - only via NAT on the network service side * transport/SDN network : we use this network for inter - compute communication between our instances. If an instance on compute node 1 wants to "talk" to an instance on compute node 2, they will use this network. This network allows mode fine tuning (therefore the name SDN) but for the moment, this is not used. * DMZ : the DMZ network provides a web interface for "customers" who want to manage their machines (start/stop/reset, get console access, setup new machines). We will not expose this DMZ network to the outside, but instead provide a dedicated VPN server that allows the openSUSE Heroes to connect to it to reach the WebUI (which should not needed so often). 13. Salt topology * We will set up separate Syndics in each location, and a Master of Masters in the near future in Nuremberg * The codebase will be the same for all locations (the same git repo repository will be used for both the states and pillars) * HA setup is currently secondary * The salt masters will use the same openvpn as the FreeIPA to communicate to each other 14. Monitoring * there will be a non-public monitoring installation in the future * we will forward community people to http://status.opensuse.org/ to get the "user overview" of our services * we need to find a solution for the instances in NUE and PRV: they are running in internal, not-routed, firewalled networks - but we want to provide one unique WebUI for our Admins to get "the big picture" at once. => to be checked/implemented by the monitoring admins 15. Mirrors and the mirror infrastructure * In general, we have the following machines behind "download.opensuse.org": * mirrordb{3,4} => a PostgreSQL cluster containing the database (85GB size) * pontifex3 => the VM behind download.opensuse.org, using mirrorbrain (and providing a lot of other vhosts, btw) * scanner-opensuse => a VM that is constantly scanning the external mirror servers. Currently this VM is inside the SUSE network and should be moved to the external cluster. Problem might be, that the external cluster uses another IP address, so scans might fail on some mirrors who only allow the current IP address. But this is fixable. * https://github.com/openSUSE/mirrorpinky should become a WebUI for mirrorbrain - but the current status is "sleeping" (no time to work on it). If you find someone who wants to help out, ping us ;-) * AI Theo: We need to set up a virtual machine as the mirrorbrain management machine, so that community people can also do mirror administration * The same machine will serve as a secondary scanner machine (the main one being behind the SUSE network now). Mirror admins will be notified for the new IP, and after six months we will shut down the old scanner. * Mirror related tickets need some love * The mirror documentation might need updating as well, would be nice if our mirror experts could go through over them. * https://en.opensuse.org/openSUSE:Mirror_infrastructure for mirror admins * https://progress.opensuse.org/projects/opensuse-admin for mirrorbrain admins 16. Discussion with rbrown (openSUSE Board) * progress.opensuse.org => openSUSE Heroes will become Admins on the redmine instance there to be able to help with the general administration * Sponsoring => the board will redirect sponsoring offers around hardware or mirroring to donations@opensuse.org, where our admins can take over. Please note that we want to "loan" the hardware and not get it completely handed over to us. * connect.opensuse.org => cleanup of the openSUSE members should be done until the new board election early next year. So we can start with the Email migration to Heinlein once this is finished. * What to do with the application itself? We currently have no maintainer for the application - but on the other side the used version is very old and needs to be updates. As this tool is containing the user database (especially the membership information), it becomes a critical part of the infrastructure. * As the maintainer is not acting on it any more, we need to find a solution. We need a meeting between the openSUSE board, the admin and some members of the Heroes. * freeIPA => JFYI: for the administration of the openSUSE infrastructure, the Heroes will introduce and use FreeIPA for managing a couple of things that are currently distributed in terms of management (User accounts and access restrictions, certificates and probably also DNS). * What about separating openSUSE user accounts from the current Microfocus / Novell / SUSE authentication mechanism? A benefit would be that we might see more users/contributors as it would become more clear to them who is getting the data (requested once you create an account) and who is using it. That is currently a major point why this question here comes up over and over again. We see problems with tools like bugzilla that should be able to use other authentication systems than the current ones in that case. There is a high risk of duplicated log in data, if other authentication systems come into play. We might hit other problems in the authentication area and it will take probably a long time to get this solved. But if we can not get any better solution, we might focus on this. * When do we get a foundation? A major benefit from our point was the sponsoring, that would be possible with a foundation. This is now partly solved (see above). The board itself was blocked during the year with all the technical changes (Tumbleweed, Leap, etc.) inside the distribution and does not see this question as critical (that's a main reason why this was not driven with high priority). Other arguments are legal and generic problems comming with such a foundation - which will introduce a lot of additional organization overhead. In the end: the main reason behind this question (easy sponsoring) is somehow solved meanwhile, so we hopefully only need to do a good marketing about "how to sponsor openSUSE" now and can finally get rid of the question.... 17. opensuse.org services running on non-infra managed machines * planet and paste run on non-infra managed machines, which violates our policy. Furthermore, people ask us to fix tickets on services we can't reach. * planet.o.o needs a meeting to discuss the current legal issues that might be there (or not anymore). * Same for paste.o.o * we need a meeting with the board to talk about: paste.o.o, planet and connect AI: Theo to set this up 18. Team meetings * next personal meeting will take place during the openSUSE Conference 2017 in May (in Nuremberg) * IRC meetings should take place once a month at the first Sunday at 19:00 CET (18:00 UTC); Topics: * ticket wrangling * status round: * special projects (like the wiki migration) * changes since last meeting * being available for questions from others * We will take meeting minutes after every meeting, rotating the role of the meeting taker alphabetically based on username -- Theo Chatzimichos <tampakrap@opensuse.org> <tchatzimichos@suse.com> System Administrator SUSE Operations and Services Team