Re: [yast-devel] YaST XML Parser

3 Apr 2020

      On Mon, Mar 30, 2020 at 10:07:53PM +0200, josef Reidinger wrote:
...
Hi,
I am currently working on research how to improve XML parser in
YaST. What we have nowadays is libxml2 based c++ parser ( that
almost noone use directly ) and XML module ( module as a code, not
YaST module :). I check usage of XML module and main usage is data
to XML and back ( with variant xml as string or xml as file ).
There is just two additional functionality. One is checking xml
error ( almost noone use it ) and setting metadata for generated
xml ( bad API as it should be part of that data to XML method ).
Most importantly, we have the initial concepts wrong.

This is not about a "YaST parser" for "XML". What YaST  parses and
writes is a specific subset of XML, let's call it YaST-XML:

1) It has a 1 to 1 correspondence* to YCP/Ruby  data types (maps, lists,
booleans, symbols, integers, strings)

2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

3) It uses config:type attributes for (1)
where xmlns:config="http://www.suse.com/1.0/configns" is a different
namespace (WTF).

4) Arrays are tagged "listitem" in the generic case but we have a
long list of specific tags for specific arrays.

*: there are corner cases, like having trouble distinguishing a
missing value from an empty value

One exception to YaST-XML is the one-click installer which uses a
non-YCP XML schema.
...
So my question is what we would like to have better?
One thing for
sure that hit us often is optional schema validation ( as some XML
is prevalidated like control files for products of roles, but
autoyast is user generated/written ).
Yes, validation is good.
...
Also some nicer error
reporting would be nice because current XMLError method is almost
never used (and yes, you should read nicer as using exception that
can/have to be catched otherwise it report popup with internal
error and not cause some strange error later ).
Better error handling is also good.
...
Do you think that
it makes sense at all to have own module as ruby, perl and also
python, for whose we currently have bindings, all have own good (
good as better then our ) parser. So does it makes sense to have
own XML parser beside backward compatibility and for new stuff as
already seen on some places just use rexml or nokogiri that e.g.
already have support for relax ng validation[1]? Or do we have
some functionality that we would like to have on top of standard
parsers?
As explained at the top, we must have a special library because we
have a special kind of XML.
...
Only thing that current parser have on top of generic xml parsers
is understanding of type attribute that do automatic type
conversion so `
It is not magic. Calling things magic will make people avoid
understanding them which is bad.

...
is also source of some bugs as
e.g. hash does not have this type attribute and result is that
`<a><key>b</key>c</a>` is returned as `"c"` and not hash, which
cause many recent failures we get with typos in autoyast profiles.
Let's have test cases for these to ensure that the schemas can
distinguish them and the error reports are helpful.
...
And as bonus we do not specify this types in schema, so during
validation if you omit type it is still valid xml, but it crashes
in code as it expect different type.
We must use the correct terms:

WELL-FORMED XML means, roughly, syntactically correct disregarding
the DTD or schema

VALID XML means, obeying the DTD or schema (in addition to being
well formed)

For example, any XML parser can check for well-formedness otherwise
it is not worth being called a XML parser. We do not get bugs about
malformed profiles, people are competent enough not to use them.

The bug-reported profiles are invalid, either in the sense of not
obeying the autoyast schema, or even violating some of the common
properties of YaST-XML.
...
I would welcome any suggestions or ideas how your ideal xml parser
should look like.
-- 
Martin Vidner, YaST Team http://en.opensuse.org/User:Mvidner