Hi guys,

I managed to sneak into the arm mini summit at kernel summit. Here are a few notes I and others took throughout:

- openSUSE got recognized for working on ARM
- we should report bugs / problems to the respective MLs (panda crash, origen issues, ...)
- generic zImage coming (one kernel for all SoCs), some technical / political roadblocks still but should show up soon'ish


----- etherpad notes -------

 
Info: https://sites.google.com/site/kernelsummit2012/workshops/arm
Agenda: https://docs.google.com/spreadsheet/ccc?key=0AqglsllX3y3fdGoxY2lTc3RoTE9zTFcxSGZoR21WYWc#gid=0
IRC: #armsummit on Freenode
WIFI: "Linux Foundation", password "linux2012"
 
Monday
Secure Monitor APIs [WD]
- OMAP calls: http://git.kernel.org/?p=linux/kernel/git/tmlind/linux-omap.git;a=blob;f=arch/arm/mach-omap2/include/mach/omap-secure.h;hb=HEAD
- Calxeda copied the SMC calls from OMAP
- Samsung has reset and boot CPU calls:http://thread.gmane.org/gmane.linux.ports.arm.kernel/183607/http://thread.gmane.org/gmane.linux.ports.arm.kernel/183608/
- On hypervisor guests, HVC instead of SMC might be used, currently none on KVM, Xen uses paravirtualization
- Some SMCs (e.g. l2 cache setup) might be called very early, necessary since it affects boot time - for any future SMCs we would like to avoid this, preferably push to device drivers, for some errata this might be unavoidable
- Another level of APIs is used for secure software on OMAP (audio/video encryption paths)
- We need to agree on which services exist and how to access them, atleast define them in the device tree
- Also secure monitor calls can be redefined by dynamically-loaded software (e.g. "PPA") - these are not simply defined by the boot ROM
- ux500 pushed all secure calls behind the TEE (Trusted Execution Environment) APIs: http://git.linaro.org/gitweb?p=bsp/st-ericsson/linux-3.0-ux500.git;a=tree;f=drivers/tee;hb=HEAD
- We have two classes of problems:
 - Really early stuff that needs to be done before the MMU is turned on
 - Less early stuff that can be handled in a more generic way using device tree and all
- Samsung suggested high-level abstraction:
    http://lists.infradead.org/pipermail/linux-arm-kernel/2012-August/115785.html
 - Useful for everybody apart from OMAP?
 
 
Stale Platform Deprecation [OJ]
 
- Easy to build-test, but do the older platforms still boot?
- Deprecate non-DT platforms after some time of inactivity? (1-2yrs?)
- OMAP1 not a burden, mainline still actively used
- Some (many) drivers still to be updated for DT...
- Appended dtb blob to deal with legacy (non-DT) bootloaders
- Tony to post list of OMAP board files marked for deprecation
- Magnus to write git plugin to identify dead boards
- In-tree defconfigs should be superset of options => shmobile needs to do this!!
- mother of all defconfigs for single zImage
- A patch to delete pnx-4008 is in the pipe
- What to do with  mach-bcmring? Broken for 2 years, but hobbyists now working on it.  Horrible OS abstraction layer. Move to staging?
- Dead platforms: mach-ks8695, mach-h720x, mach-l7200, mach-netx, mach-w90x900
- Long term deprecation: mach-ixp4xx,
- mach-msm/<some board files> ...tricky.
 
 
Virtualisation (Xen vs KVM)
 
- Xen and KVM under active development for ARM, although both currently out-of-tree
- Virtualisation extensions introduced in latest revision of ARMv7 (Cortex-A7, A15)
    * HYP mode (PL2), non-secure only, higher privilege level than svc (OS) but different system interface (translation, registers etc)
    * VMID to identify virtual machine (guest), used to tag TLBs and caches
    * Some peripherals virtualised in hardware (GIC, generic timers)
    * Second-stage translation: IPA => PA
- KVM doesn't support nesting or CPU emulation (ie. guest CPU matches the real hardware)
- Paul: What if HYP is used for other purposes (power management etc)? KVM: Tough. Mutual exclusion of privilege level (one or the other)
- SMC interface should be used for power management / cluster switching instead of HVC:
    https://silver.arm.com/download/download.tm?pv=1303201    [free registration required...]
- Boot protocol: kernel must be entered in hyp mode in order to make use of virtualisation extensions
- Major performance issue is due to QEMU (probably due to the traps)
    ? possibility of creating a new subarch that uses virtio
        - may be best to simply strip down the Versatile Express platform
- Development done on the kvm-arm mailing list
- Magnus: what about virtualized IOMMU development?
Xen
    * no QEMU used, so no emulated hardware - only minimal set of devices supported (serial)
        - serial needed due to kernel assembly language code that touches UART
    * extensive use of DT
        - very concerned about ACPI on ARM
            - 50% of Xen codebase for x86 is ACPI parser - so big it does not fit into the hypervisor
            - ACPI generator also required; would be bloated
        - concerned also about DT format changes across kernel releases, since Xen is modifying this
            - Stephen: when will DT device bindings be declared as concrete & stable?
                - Olof/Arnd: once the main platforms are completely described in DT
    * bootloaders need modification to load hypervisor, etc.
        - grub for ARM?
    * two different Xen ports for ARM
        - one relies on the hypervisor instructions and so only works on A7/A15
        - the other (Samsung) can run on older ARM cores
    * patch status:
        - hypervisor side almost fully upstream
        - Linux side: sent to the lists
        - devicetree patches sent to the DT mailing list
    * console is paravirtualized, rather than emulated hardware
 
 
DMA Mapping [MS]
 
- Many changes in last year, most of it merged for 3.5
- Conversion to dma_map_ops-based implementation
- Core DMA mapping code is architecture-independent, but the integration layer is currently only implemented for ARM
    - not very difficult to add integration support for other architectures
- CMA stuck in catch-22 situation of not being enabled / not being used
    * Could enable by default with warnings on allocation failures => see how many bug reports we receive
    * Convert carve-out (?) code to CMA for all platforms
        - i.mx?, OMAP, SH/Mobile, msm?
        - also get rid of GFP_DMA tricks for carve-outs
- Use of highmem with CMA still needs work
- Arnd: for callers who require mappings at specific virtual addresses, IOMMU API calls should be used
    - Stephen: needed because nVidia chips use address 0 for some boot-related code
    - Marek: our codec device requires firmware to be loaded at the lowest address, we just assume that the first allocated buffer for codec device gets the lowest address
 
 
Hacking
 
- Virtualisation (KVM / Xen guys)
- randconfig
- SMP ops
- PMU (PaulW!)
- Timer-based delay loop
- Platform data testing
- Common clock
 
 
Tuesday
 
arm-soc Process Review
 
- Keeping everybody happy is impossible!
- work split between arnd and ojn working well
    * Arnd dislikes rejecting patches
    * Late patches/pull requests also a problem
- multiple branches vs single linear histor
    * Although some (Tegra) prefer the latter, can live with the current approach
- Used to have many conflicts and dependendies, although this seems to have died down (why?)
    * Moving stuff to drivers/ may have helped
- Signed-tags are `really nice' => allows some commentary for the merge
    * Most pull requests should use them [upgrade your git!]
    http://git-blame.blogspot.com/2012/01/using-signed-tag-in-pull-requesyts.html
- shmobile needs to follow same topic branch style as everyone else
- Requirements for merging new platform code into arm-soc are somehow now coming from customers too (!)
 
 
single zImage (again)
 
- Work done to eliminate duplicate header files across platforms (mostly done)
    * Problem now device drivers including platform-specific headers (mach/*.h)
    * Including these headers is not something we *want* to do, but unfortunately it happens
    * Rename mach/* headers to include platform name as prefix (at least fixes the build)
    * rmk: break drivers including mach/ headers as incentive for people to fix their code
    * Magnus to extend checkpatch.pl to flag up code including these headers
- Three remaining problematic headers: uncompress.h, gpio.h, timex.h
- CLOCK_TICK_RATE problematic (timex.h) but not required for NO_HZ...
    * tlindgren: not required for platforms using clocksource/clockevents (? -- needs to be checked)
    * arnd: value is usually different and usually wrong
- Rob: highbank, picoxcell, socfgpa, vexpress, (zynq) can all be built together without the header prefixing
- Arnd: Samsung platforms huge problem with mach/ includes due to earlier work to reduce the amount of code
- After header prefixing, drivers including mach files hit largely in mach-pxa, mach-sa1100
    * To make it worse, many of these are common...
    * for plat- includes, omap the main one, although largely platform data
- First cleanup pass should be straightforward
    * Remaining drivers can be annoted with depends on !ARM_MULTIPLATFORM
    * tlindgren can do first pass for omap in next week or so for drivers/
- Frameworks required for single zImage: common clock, sparse IRQ
- Arnd will fix Marc's smp_ops patch series and re-post
- Kconfig changes add new option `allow multiple platforms to be selected'
    * Architecture level {v4,v4T,v5}, {v6,v7}
    * Choice of SoC family with default selection
    * linker barfs if no machine selected (empty mach_info)
    * Choice of whether to expose individual board files in Kconfig left to the platform
 
 
AArch64
 
- Catalin has a talk at LinuxCon to introduce the architecture in more depth (please attend!)
- 2nd version of patches on LKML
- DT-only, etc
- Potentially ACPI coming from server folks / our friends in Washington
- Initial platform code for very simple vexpress model
- Aim as high as we can wrt late initialisation of platform code (as a module?) for the moment
- timers, gic under drivers/ -- could also be used by arch/arm/ with ifdefs for the inline asm
- device initialisation via DT as opposed to early hooks in the platform code
- Push back on platforms with things like broken secure interface, etc
    * Olof: difficult if popular platform turns up with another OS but let's start from this position and see
- Not clear when platform code will turn up from silicon vendors but not expected within the next year
- v8 CPUs may be drop-in replacements for v7 CPUs on existing SoCs
    * Server guys more likely to start from scratch
- Arnd: vexpress could be pushed as it's the only `hardware' available at the moment
- Catalin: prefer to push only the core code initially, then cleanup SoC code prior to posting (use as an example)
- Upstreaming: aiming to be in -next for 3.7-rc1
    * Arnd: nah, just stick it in -next now and aim for 3.7 in mainline
    * Action: Arnd to Ack all the patches
- personality discussion ongoing...
    * Arnd: just copy x86
    * Catalin: no. <long discussion about PER_LINUX>
- 2MB limit on size of dtb
    * Rob: 1MB limit for AArch32 already
    * Mark Brown: multi-megabyte co-efficient data for baseband processors could also come via device-tree (for example)
    * Turns out dtb header contains size information, so let's just use that and be done with it
- DMA operations
    * Single set of ops, currently untested (model doesn't have any DMA-capable devices)
    * DMA32 likely to be required
    * swiotlb currently implemented but will be dropped for initial posting as untested
- Stuff to share between arch/arm and arch/arm64
    * dts files
    * perf
    * generic timer
    * gic
    * kvm / Xen    [if/when merged]
 
 
big.LITTLE
 
- Linux Plumbers scheduling micro-conference
- big.LITTLE => DVFS by other means
    * big cores and little cores, architecturally identical
    * Different approaches
- Switching: switch between big and little cores transparently
    * Complications where cluster size not identical (i.e. more of one type than the other)
- MP: kernel aware of all cores
    * Paul Turner has patches to track per-task load history => can be used to pick appropriate target core for a given task
    * v3 posted to LKML, will publish git tree
    * `more accurate load tracking at lower cost'
- Morten Rasmussen has scheduler patches based on the above
    * Reduces wakeups on the big cores
- spreading vs race-to-idle -- related discussion on LKML
- tglx making CPU hotplug suck less
    * parking/unparking kthreads rather than killing/creating
    * worst-case scenario an order or magnitude faster with new approach
    * stop_machine() needs to be removed when CPU is offlined
- topology static in header files, cache-level descriptions
    * need more for big.LITTLE description as micro-architecture radically different
 
 
DMA Engine
 
- no device-tree bindings
    * many things currently blocked on it
    * Patches from Jon Hunter
- Other Patches from Vinod Koul
    * Arnd doesn't think good idea (the DT information in the client should be enough information for the DMAEngine driver)
    * The controversy is over the client not needing to know any of the information.
- Discussing patches from Jon
    * The managing of these properties should be hidden within the DMA subsystem.  The clients and DMA driver shouldn't have to deal with these properties.
    * (It should be like regulators, etc do it)
    * Propose binding uses integer flags field for describing things like direction
        * Mark Brown: `not exactly a readability triumph'
        * Suggestion: why not use a fixed index for read/write/ etc
          But, it is more complex.  We want the request API to be simple.
          It can't be just an index. <Arnd lists bunch of implementation problems with DT>
        * Stephen Warren: we could use multiple properties?  One property is hard to read.
          (Mark Brown argues we really need preprocessor support for dtc)
          Arnd suggests 'dmas' lists all of them, and another 'dma-names' that names them. (key,value) pairs like we do with irqs and regs.
          Arnd: having fixed property names allows something to scan the whole tree to find properties.  If the property has a different name for each device, it is hard to find the dma properties.
          Stephen: You would have to scan for prefixes anyway, as well as regulator.
          The flags are only needed if we want to name the channels.  Makes it more complex for devices with only one channel.
          Paul W: By forcing naming to be consistent, it helps with HW changesintroducing additional channels for existing IP.  Consensus on having name.
        * Q: Which direction are 'read' and 'write'? A: Don't say read or write. :)  See macros in dmaengine.h (enum dma_transfer_direction) [clarification from Vinod Koul - Intel]
        * What about a virtual channel manager that handles the mappings.  Vinod: Drivers shouldn't be knowing about channels, etc.  Argues for platform handling this.  Allocating and filtering channels should be handled by generic code, not individual drivers.


Alex