Re: [yast-devel] YaST XML Parser

8 Apr 2020

      On Fri, Apr 03, 2020 at 05:21:21PM +0200, josef Reidinger wrote:
...
On Fri, 3 Apr 2020 16:05:04 +0200
Martin Vidner  wrote:
...
This is not about a "YaST parser" for "XML". What YaST  parses and
writes is a specific subset of XML, let's call it YaST-XML:
1) It has a 1 to 1 correspondence* to YCP/Ruby  data types (maps, lists,
booleans, symbols, integers, strings)
2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"
Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV.
Can we simple assume this namespace if none is defined?
The quick answer is no. Just like with other programming languages,
for throwaway scripts working without namespaces is fine but as soon
as you start building anything bigger or longer lasting, namespaces
are needed to resolve conflicts and organize things.

If you insist, I can look deeper but now I estimate that allowing no
namespace would make things harder rather than easier.
...
...
3) It uses config:type attributes for (1)
where xmlns:config="http://www.suse.com/1.0/configns" is a different
namespace (WTF).
Same here. Do we need for our own subset of xml namespace for one attribute?
This is a different case :)
A fun half-hour reading the standard* has revealed that attributes do
not inherit the namespaces of their elements etc etc, so we can fix
a common bug by allowing type="boolean" alongside
config:type="boolean".

*) https://www.w3.org/TR/REC-xml-names/#scoping-defaulting
...
...
4) Arrays are tagged "listitem" in the generic case but we have a
long list of specific tags for specific arrays.
Yes, I already face it and it is defined programatically ( so you need modify code if you add new list ). Also used only only for writting. Reading does not care about name.
...
*: there are corner cases, like having trouble distinguishing a
missing value from an empty value
One exception to YaST-XML is the one-click installer which uses a
non-YCP XML schema.
Question is should we use yast xml parser for non yast/ycp xml schema?
I think that e.g. for scc we use generic xml parser as yast one
does not bring any value and maybe it is also true in this case.
For YaST-XML, make a library layer, but for other XML simply use a
regular XML parser.
...
...
...
So my question is what we would like to have better?
One thing for
sure that hit us often is optional schema validation ( as some XML
is prevalidated like control files for products of roles, but
autoyast is user generated/written ).
Yes, validation is good.
...
Also some nicer error
reporting would be nice because current XMLError method is almost
never used (and yes, you should read nicer as using exception that
can/have to be catched otherwise it report popup with internal
error and not cause some strange error later ).
Better error handling is also good.
...
Do you think that
it makes sense at all to have own module as ruby, perl and also
python, for whose we currently have bindings, all have own good (
good as better then our ) parser. So does it makes sense to have
own XML parser beside backward compatibility and for new stuff as
already seen on some places just use rexml or nokogiri that e.g.
already have support for relax ng validation[1]? Or do we have
some functionality that we would like to have on top of standard
parsers?
As explained at the top, we must have a special library because we
have a special kind of XML.
But do we need that special kind of XML? Why we cannot use common XML? or something that supports types like Yaml or json.
It is common XML. It's not like we are using curly braces {} instead
of angle brackets <>.

Maybe the best term for it is a language binding, yast-xml-bindings
;-)
It is an intermediate layer: it does not have meaning like a specific
schema, but it maps XML concepts to YCP and Ruby concepts.

If you "used common XML" you would either have 3 different XML
parsers for each part of YaST, or you would end up with YaST-XML-NG,
just like the old one, only different. (@)

About YAML or JSON, I don't get your point.
...
...
...
Only thing that current parser have on top of generic xml parsers
is understanding of type attribute that do automatic type
conversion so `
It is not magic. Calling things magic will make people avoid
understanding them which is bad.

It is magic for people that work with common XML.

"It is magic" means "I don't understand it". Ask and I will explain.

MAGIC! IS! FORBIDDEN!

...
XML is whole just structured strings. Structured into elements with names, attributes and values.
See my point (@) above.
...
...
...
is also source of some bugs as
e.g. hash does not have this type attribute and result is that
`<a><key>b</key>c</a>` is returned as `"c"` and not hash, which
cause many recent failures we get with typos in autoyast profiles.
Let's have test cases for these to ensure that the schemas can
distinguish them and the error reports are helpful.
I think source of this is that we use typed xml, but omit types for string and hash and just guessing it. As usually we stop in middle of road.
Let's write the existing rules:

No type attribute:
   Contains just elements (and whitespace) => Hash
   Contains no elements => String
   Contains elements AND string (<a><key>b</key>c</a>) => WTF

We should make "WTF" an explicit error in YaST-XML, not silently
converting it to a string as we do now(?)
...
...
...
And as bonus we do not specify this types in schema, so during
validation if you omit type it is still valid xml, but it crashes
in code as it expect different type.
We must use the correct terms:
WELL-FORMED XML means, roughly, syntactically correct disregarding
the DTD or schema
VALID XML means, obeying the DTD or schema (in addition to being
well formed)
For example, any XML parser can check for well-formedness otherwise
it is not worth being called a XML parser. We do not get bugs about
malformed profiles, people are competent enough not to use them.
The bug-reported profiles are invalid, either in the sense of not
obeying the autoyast schema, or even violating some of the common
properties of YaST-XML.
My problem here is that types are mandatory in Yast-XML, but with exception of string and hash, but relax ng schema we are using does not require them.
AFAIK the schema does require the types. See
/usr/share/YaST2/schema/autoyast/rng/common.rng
Do you have a test case?
...
Also what I do not like is that we specify types in multiple
places. It is kind of in schema ( where it validate value, but not
check that attribute is set ), we have it in XML itself and
sometimes in code.
I don't understand this, please give an example.
...
So from what you write, I understand that it makes sense to have
specific xml parser for subset of XML we use. My question is if we
should try to improve somehow that subset? Like
...
not having types in namespace
agree
...
not need to define global namespace
disagree
...
and maybe if we use validation trying to get type from schema if it is not
defined.
I don't understand this part.

-- 
Martin Vidner, YaST Team
http://en.opensuse.org/User:Mvidner