[yast-devel] YaST XML Parser

newer
[yast-devel] Pending Pull Requests

older
[yast-devel] Yast XML parser and...

josef Reidinger

30 Mar 2020 30 Mar '20

20:07

Hi, I am currently working on research how to improve XML parser in YaST. What we have nowadays is libxml2 based c++ parser ( that almost noone use directly ) and XML module ( module as a code, not YaST module :). I check usage of XML module and main usage is data to XML and back ( with variant xml as string or xml as file ). There is just two additional functionality. One is checking xml error ( almost noone use it ) and setting metadata for generated xml ( bad API as it should be part of that data to XML method ). So my question is what we would like to have better? One thing for sure that hit us often is optional schema validation ( as some XML is prevalidated like control files for products of roles, but autoyast is user generated/written ). Also some nicer error reporting would be nice because current XMLError method is almost never used (and yes, you should read nicer as using exception that can/have to be catched otherwise it report popup with internal error and not cause some strange error later ). Do you think that it makes sense at all to have own module as ruby, perl and also python, for whose we currently have bindings, all have own good ( good as better then our ) parser. So does it makes sense to have own XML parser beside backward compatibility and for new stuff as already seen on some places just use rexml or nokogiri that e.g. already have support for relax ng validation[1]? Or do we have some functionality that we would like to have on top of standard parsers? Only thing that current parser have on top of generic xml parsers is understanding of type attribute that do automatic type conversion so `<a type="boolean>true</a>` is returned as `true` and not `"true"`. But this magic is also source of some bugs as e.g. hash does not have this type attribute and result is that `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which cause many recent failures we get with typos in autoyast profiles. And as bonus we do not specify this types in schema, so during validation if you omit type it is still valid xml, but it crashes in code as it expect different type. I would welcome any suggestions or ideas how your ideal xml parser should look like. Thanks Josef [1] https://stackoverflow.com/questions/913489/how-do-i-validate-xml-via-relax-n... -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Show replies by date

Ancor Gonzalez Sosa

1 Apr 1 Apr

08:21

On 2020-03-30 22:07, josef Reidinger wrote:

...

Hi,

...

Or do we have some functionality that we would like to have on top of standard parsers?

I think it makes sense to offer our own parser. Otherwise, we could end up with different parts of YaST using different approaches (rexml vs nokogiri vs next-ruby-thing vs whatever-python provides). That's specially bad because each parser comes with its own mindset.

...

Only thing that current parser have on top of generic xml parsers is understandingof type attribute

Which is something that is, on itself, enough to justify the creation of a YaST parser (internally based on an existing solution, of course). Cheers. -- Ancor González Sosa YaST Team at SUSE Linux GmbH -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

josef Reidinger

2 Apr 2 Apr

11:26

On Wed, 1 Apr 2020 10:21:48 +0200 Ancor Gonzalez Sosa <ancor@suse.de> wrote:

...

On 2020-03-30 22:07, josef Reidinger wrote:

...
Hi,

...
Or do we have some functionality that we would like to have on top of standard parsers?

I think it makes sense to offer our own parser. Otherwise, we could end up with different parts of YaST using different approaches (rexml vs nokogiri vs next-ruby-thing vs whatever-python provides). That's specially bad because each parser comes with its own mindset.

...
Only thing that current parser have on top of generic xml parsers is understandingof type attribute

Which is something that is, on itself, enough to justify the creation of a YaST parser (internally based on an existing solution, of course).

OK, question is do we want more features? Or better which features you are missing in current XML parser? optional schema validation? More strict parsing? Better logging? Extending existing types, so it can contain more types in type attribute ( like e.g. class name that is then constructed via e.g. method from_xml) ? Josef

...

Cheers.

-- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Martin Vidner

3 Apr 3 Apr

14:05

On Mon, Mar 30, 2020 at 10:07:53PM +0200, josef Reidinger wrote:

...

Hi, I am currently working on research how to improve XML parser in YaST. What we have nowadays is libxml2 based c++ parser ( that almost noone use directly ) and XML module ( module as a code, not YaST module :). I check usage of XML module and main usage is data to XML and back ( with variant xml as string or xml as file ). There is just two additional functionality. One is checking xml error ( almost noone use it ) and setting metadata for generated xml ( bad API as it should be part of that data to XML method ).

Most importantly, we have the initial concepts wrong. This is not about a "YaST parser" for "XML". What YaST parses and writes is a specific subset of XML, let's call it YaST-XML: 1) It has a 1 to 1 correspondence* to YCP/Ruby data types (maps, lists, booleans, symbols, integers, strings) 2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns" 3) It uses config:type attributes for (1) where xmlns:config="http://www.suse.com/1.0/configns" is a different namespace (WTF). 4) Arrays are tagged "listitem" in the generic case but we have a long list of specific tags for specific arrays. *: there are corner cases, like having trouble distinguishing a missing value from an empty value One exception to YaST-XML is the one-click installer which uses a non-YCP XML schema.

...

So my question is what we would like to have better? One thing for sure that hit us often is optional schema validation ( as some XML is prevalidated like control files for products of roles, but autoyast is user generated/written ).

Yes, validation is good.

...

Also some nicer error reporting would be nice because current XMLError method is almost never used (and yes, you should read nicer as using exception that can/have to be catched otherwise it report popup with internal error and not cause some strange error later ).

Better error handling is also good.

...

Do you think that it makes sense at all to have own module as ruby, perl and also python, for whose we currently have bindings, all have own good ( good as better then our ) parser. So does it makes sense to have own XML parser beside backward compatibility and for new stuff as already seen on some places just use rexml or nokogiri that e.g. already have support for relax ng validation[1]? Or do we have some functionality that we would like to have on top of standard parsers?

As explained at the top, we must have a special library because we have a special kind of XML.

...

Only thing that current parser have on top of generic xml parsers is understanding of type attribute that do automatic type conversion so `<a type="boolean>true</a>` is returned as `true` and not `"true"`. But this magic

It is not magic. Calling things magic will make people avoid understanding them which is bad.

...

is also source of some bugs as e.g. hash does not have this type attribute and result is that `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which cause many recent failures we get with typos in autoyast profiles.

Let's have test cases for these to ensure that the schemas can distinguish them and the error reports are helpful.

...

And as bonus we do not specify this types in schema, so during validation if you omit type it is still valid xml, but it crashes in code as it expect different type.

We must use the correct terms: WELL-FORMED XML means, roughly, syntactically correct disregarding the DTD or schema VALID XML means, obeying the DTD or schema (in addition to being well formed) For example, any XML parser can check for well-formedness otherwise it is not worth being called a XML parser. We do not get bugs about malformed profiles, people are competent enough not to use them. The bug-reported profiles are invalid, either in the sense of not obeying the autoyast schema, or even violating some of the common properties of YaST-XML.

...

I would welcome any suggestions or ideas how your ideal xml parser should look like. -- Martin Vidner, YaST Team http://en.opensuse.org/User:Mvidner

josef Reidinger

15:21

On Fri, 3 Apr 2020 16:05:04 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...

On Mon, Mar 30, 2020 at 10:07:53PM +0200, josef Reidinger wrote:

...
Hi, I am currently working on research how to improve XML parser in YaST. What we have nowadays is libxml2 based c++ parser ( that almost noone use directly ) and XML module ( module as a code, not YaST module :). I check usage of XML module and main usage is data to XML and back ( with variant xml as string or xml as file ). There is just two additional functionality. One is checking xml error ( almost noone use it ) and setting metadata for generated xml ( bad API as it should be part of that data to XML method ).

Most importantly, we have the initial concepts wrong.

This is not about a "YaST parser" for "XML". What YaST parses and writes is a specific subset of XML, let's call it YaST-XML:

1) It has a 1 to 1 correspondence* to YCP/Ruby data types (maps, lists, booleans, symbols, integers, strings)

2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV. Can we simple assume this namespace if none is defined?

...

3) It uses config:type attributes for (1) where xmlns:config="http://www.suse.com/1.0/configns" is a different namespace (WTF).

Same here. Do we need for our own subset of xml namespace for one attribute?

...

4) Arrays are tagged "listitem" in the generic case but we have a long list of specific tags for specific arrays.

Yes, I already face it and it is defined programatically ( so you need modify code if you add new list ). Also used only only for writting. Reading does not care about name.

...

*: there are corner cases, like having trouble distinguishing a missing value from an empty value

One exception to YaST-XML is the one-click installer which uses a non-YCP XML schema.

Question is should we use yast xml parser for non yast/ycp xml schema? I think that e.g. for scc we use generic xml parser as yast one does not bring any value and maybe it is also true in this case.

...

...
So my question is what we would like to have better? One thing for sure that hit us often is optional schema validation ( as some XML is prevalidated like control files for products of roles, but autoyast is user generated/written ).

Yes, validation is good.

...
Also some nicer error reporting would be nice because current XMLError method is almost never used (and yes, you should read nicer as using exception that can/have to be catched otherwise it report popup with internal error and not cause some strange error later ).

Better error handling is also good.

...
Do you think that it makes sense at all to have own module as ruby, perl and also python, for whose we currently have bindings, all have own good ( good as better then our ) parser. So does it makes sense to have own XML parser beside backward compatibility and for new stuff as already seen on some places just use rexml or nokogiri that e.g. already have support for relax ng validation[1]? Or do we have some functionality that we would like to have on top of standard parsers?

As explained at the top, we must have a special library because we have a special kind of XML.

But do we need that special kind of XML? Why we cannot use common XML? or something that supports types like Yaml or json.

...

...
Only thing that current parser have on top of generic xml parsers is understanding of type attribute that do automatic type conversion so `<a type="boolean>true</a>` is returned as `true` and not `"true"`. But this magic

It is not magic. Calling things magic will make people avoid understanding them which is bad.

It is magic for people that work with common XML. XML is whole just structured strings. Structured into elements with names, attributes and values.

...

...
is also source of some bugs as e.g. hash does not have this type attribute and result is that `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which cause many recent failures we get with typos in autoyast profiles.

Let's have test cases for these to ensure that the schemas can distinguish them and the error reports are helpful.

I think source of this is that we use typed xml, but omit types for string and hash and just guessing it. As usually we stop in middle of road.

...

...
And as bonus we do not specify this types in schema, so during validation if you omit type it is still valid xml, but it crashes in code as it expect different type.

We must use the correct terms:

WELL-FORMED XML means, roughly, syntactically correct disregarding the DTD or schema

VALID XML means, obeying the DTD or schema (in addition to being well formed)

For example, any XML parser can check for well-formedness otherwise it is not worth being called a XML parser. We do not get bugs about malformed profiles, people are competent enough not to use them.

The bug-reported profiles are invalid, either in the sense of not obeying the autoyast schema, or even violating some of the common properties of YaST-XML.

My problem here is that types are mandatory in Yast-XML, but with exception of string and hash, but relax ng schema we are using does not require them. Also what I do not like is that we specify types in multiple places. It is kind of in schema ( where it validate value, but not check that attribute is set ), we have it in XML itself and sometimes in code. So from what you write, I understand that it makes sense to have specific xml parser for subset of XML we use. My question is if we should try to improve somehow that subset? Like not having types in namespace, not need to define global namespace and maybe if we use validation trying to get type from schema if it is not defined. What do you think?

...

...
I would welcome any suggestions or ideas how your ideal xml parser should look like.

-- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Martin Vidner

8 Apr 8 Apr

08:46

On Fri, Apr 03, 2020 at 05:21:21PM +0200, josef Reidinger wrote:

...

On Fri, 3 Apr 2020 16:05:04 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
This is not about a "YaST parser" for "XML". What YaST parses and writes is a specific subset of XML, let's call it YaST-XML:

1) It has a 1 to 1 correspondence* to YCP/Ruby data types (maps, lists, booleans, symbols, integers, strings)

2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV. Can we simple assume this namespace if none is defined?

The quick answer is no. Just like with other programming languages, for throwaway scripts working without namespaces is fine but as soon as you start building anything bigger or longer lasting, namespaces are needed to resolve conflicts and organize things. If you insist, I can look deeper but now I estimate that allowing no namespace would make things harder rather than easier.

...

...
3) It uses config:type attributes for (1) where xmlns:config="http://www.suse.com/1.0/configns" is a different namespace (WTF).

Same here. Do we need for our own subset of xml namespace for one attribute?

This is a different case :) A fun half-hour reading the standard* has revealed that attributes do not inherit the namespaces of their elements etc etc, so we can fix a common bug by allowing type="boolean" alongside config:type="boolean". *) https://www.w3.org/TR/REC-xml-names/#scoping-defaulting

...

...
4) Arrays are tagged "listitem" in the generic case but we have a long list of specific tags for specific arrays.

Yes, I already face it and it is defined programatically ( so you need modify code if you add new list ). Also used only only for writting. Reading does not care about name.

...
*: there are corner cases, like having trouble distinguishing a missing value from an empty value

One exception to YaST-XML is the one-click installer which uses a non-YCP XML schema.

Question is should we use yast xml parser for non yast/ycp xml schema? I think that e.g. for scc we use generic xml parser as yast one does not bring any value and maybe it is also true in this case.

For YaST-XML, make a library layer, but for other XML simply use a regular XML parser.

...

...
...
So my question is what we would like to have better? One thing for sure that hit us often is optional schema validation ( as some XML is prevalidated like control files for products of roles, but autoyast is user generated/written ).

Yes, validation is good.

...
Also some nicer error reporting would be nice because current XMLError method is almost never used (and yes, you should read nicer as using exception that can/have to be catched otherwise it report popup with internal error and not cause some strange error later ).

Better error handling is also good.

...
Do you think that it makes sense at all to have own module as ruby, perl and also python, for whose we currently have bindings, all have own good ( good as better then our ) parser. So does it makes sense to have own XML parser beside backward compatibility and for new stuff as already seen on some places just use rexml or nokogiri that e.g. already have support for relax ng validation[1]? Or do we have some functionality that we would like to have on top of standard parsers?

As explained at the top, we must have a special library because we have a special kind of XML.

But do we need that special kind of XML? Why we cannot use common XML? or something that supports types like Yaml or json.

It is common XML. It's not like we are using curly braces {} instead of angle brackets <>. Maybe the best term for it is a language binding, yast-xml-bindings ;-) It is an intermediate layer: it does not have meaning like a specific schema, but it maps XML concepts to YCP and Ruby concepts. If you "used common XML" you would either have 3 different XML parsers for each part of YaST, or you would end up with YaST-XML-NG, just like the old one, only different. (@) About YAML or JSON, I don't get your point.

...

...
...
Only thing that current parser have on top of generic xml parsers is understanding of type attribute that do automatic type conversion so `<a type="boolean>true</a>` is returned as `true` and not `"true"`. But this magic

It is not magic. Calling things magic will make people avoid understanding them which is bad.

It is magic for people that work with common XML.

"It is magic" means "I don't understand it". Ask and I will explain. MAGIC! IS! FORBIDDEN!

...

XML is whole just structured strings. Structured into elements with names, attributes and values.

See my point (@) above.

...

...
...
is also source of some bugs as e.g. hash does not have this type attribute and result is that `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which cause many recent failures we get with typos in autoyast profiles.

Let's have test cases for these to ensure that the schemas can distinguish them and the error reports are helpful.

I think source of this is that we use typed xml, but omit types for string and hash and just guessing it. As usually we stop in middle of road.

Let's write the existing rules: No type attribute: Contains just elements (and whitespace) => Hash Contains no elements => String Contains elements AND string (<a><key>b</key>c</a>) => WTF We should make "WTF" an explicit error in YaST-XML, not silently converting it to a string as we do now(?)

...

...
...
And as bonus we do not specify this types in schema, so during validation if you omit type it is still valid xml, but it crashes in code as it expect different type.

We must use the correct terms:

WELL-FORMED XML means, roughly, syntactically correct disregarding the DTD or schema

VALID XML means, obeying the DTD or schema (in addition to being well formed)

For example, any XML parser can check for well-formedness otherwise it is not worth being called a XML parser. We do not get bugs about malformed profiles, people are competent enough not to use them.

The bug-reported profiles are invalid, either in the sense of not obeying the autoyast schema, or even violating some of the common properties of YaST-XML.

My problem here is that types are mandatory in Yast-XML, but with exception of string and hash, but relax ng schema we are using does not require them.

AFAIK the schema does require the types. See /usr/share/YaST2/schema/autoyast/rng/common.rng Do you have a test case?

...

Also what I do not like is that we specify types in multiple places. It is kind of in schema ( where it validate value, but not check that attribute is set ), we have it in XML itself and sometimes in code.

I don't understand this, please give an example.

...

So from what you write, I understand that it makes sense to have specific xml parser for subset of XML we use. My question is if we should try to improve somehow that subset? Like

...

not having types in namespace

agree

...

not need to define global namespace

disagree

...

and maybe if we use validation trying to get type from schema if it is not defined.

I don't understand this part. -- Martin Vidner, YaST Team http://en.opensuse.org/User:Mvidner

josef Reidinger

11:33

On Wed, 8 Apr 2020 10:46:17 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...

On Fri, Apr 03, 2020 at 05:21:21PM +0200, josef Reidinger wrote:

...
On Fri, 3 Apr 2020 16:05:04 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
This is not about a "YaST parser" for "XML". What YaST parses and writes is a specific subset of XML, let's call it YaST-XML:

1) It has a 1 to 1 correspondence* to YCP/Ruby data types (maps, lists, booleans, symbols, integers, strings)

2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV. Can we simple assume this namespace if none is defined?

The quick answer is no. Just like with other programming languages, for throwaway scripts working without namespaces is fine but as soon as you start building anything bigger or longer lasting, namespaces are needed to resolve conflicts and organize things.

For me it looks like overengineering. Do we plan to have multiple namespaces ( like element namespaces to distinguish sub elements ) or combine multiple XML files? Now we have XML with simple structure and ignoring anything that does not have namespace is just another obstacle for me. Especially when you need to define namespace when you write file ( visible in test suite as currently `Yast::XML.XMLToYCPString(Yast::XML.YCPToXMLString("test", {"test" => 15}))` returns => {} So to make it work, you have to call in advance `Yast::XML.xmlCreateDoc("test", "rootElement" => "test", "namespace" => "http://www.suse.com/1.0/yast2ns", "typeNamespace" => "http://www.suse.com/1.0/configns") # skipping many unrelated options` and then it return properly input parameters. So we have generic serializer and then parser that works only on limited subset. Which I found quite confusing and hard to work with.

...

If you insist, I can look deeper but now I estimate that allowing no namespace would make things harder rather than easier.

I see many xmls that do not have namespace and works OK. as long as you do not mixture different types of xmls (e.g. try some xmls in /usr/share - wicked, gdb, libvirt, gimp they do not use namespaces )

...

...
...
3) It uses config:type attributes for (1) where xmlns:config="http://www.suse.com/1.0/configns" is a different namespace (WTF).

Same here. Do we need for our own subset of xml namespace for one attribute?

This is a different case :) A fun half-hour reading the standard* has revealed that attributes do not inherit the namespaces of their elements etc etc, so we can fix a common bug by allowing type="boolean" alongside config:type="boolean".

*) https://www.w3.org/TR/REC-xml-names/#scoping-defaulting

nokogiri has nice call remove_namespaces. Then it works in both cases.

...

...
...
4) Arrays are tagged "listitem" in the generic case but we have a long list of specific tags for specific arrays.

Yes, I already face it and it is defined programatically ( so you need modify code if you add new list ). Also used only only for writting. Reading does not care about name.

...
*: there are corner cases, like having trouble distinguishing a missing value from an empty value

One exception to YaST-XML is the one-click installer which uses a non-YCP XML schema.

Question is should we use yast xml parser for non yast/ycp xml schema? I think that e.g. for scc we use generic xml parser as yast one does not bring any value and maybe it is also true in this case.

For YaST-XML, make a library layer, but for other XML simply use a regular XML parser.

yes, also makes sense for me.

...

...
...
...
So my question is what we would like to have better? One thing for sure that hit us often is optional schema validation ( as some XML is prevalidated like control files for products of roles, but autoyast is user generated/written ).

Yes, validation is good.

...
Also some nicer error reporting would be nice because current XMLError method is almost never used (and yes, you should read nicer as using exception that can/have to be catched otherwise it report popup with internal error and not cause some strange error later ).

Better error handling is also good.

...
Do you think that it makes sense at all to have own module as ruby, perl and also python, for whose we currently have bindings, all have own good ( good as better then our ) parser. So does it makes sense to have own XML parser beside backward compatibility and for new stuff as already seen on some places just use rexml or nokogiri that e.g. already have support for relax ng validation[1]? Or do we have some functionality that we would like to have on top of standard parsers?

As explained at the top, we must have a special library because we have a special kind of XML.

But do we need that special kind of XML? Why we cannot use common XML? or something that supports types like Yaml or json.

It is common XML. It's not like we are using curly braces {} instead of angle brackets <>.

Maybe the best term for it is a language binding, yast-xml-bindings ;-) It is an intermediate layer: it does not have meaning like a specific schema, but it maps XML concepts to YCP and Ruby concepts.

If you "used common XML" you would either have 3 different XML parsers for each part of YaST, or you would end up with YaST-XML-NG, just like the old one, only different. (@)

Well, plan is to keep XML Module ( in yast2 ), that is used for that YaST-XML as serializer/deserializer for objects used in yast, maybe just YCP types ( currently it is just subset of YCP types ). Just use as backend nokogiri instead of own xml agent.

...

About YAML or JSON, I don't get your point.

My point is that YAML or JSON has as opposite to XML built-in support for basic data types list arrays, hashes or integers. XML has just strings, nothing more built in. So all xml serializers use different kind of types mapping.

...

...
...
...
Only thing that current parser have on top of generic xml parsers is understanding of type attribute that do automatic type conversion so `<a type="boolean>true</a>` is returned as `true` and not `"true"`. But this magic

It is not magic. Calling things magic will make people avoid understanding them which is bad.

It is magic for people that work with common XML.

"It is magic" means "I don't understand it". Ask and I will explain.

MAGIC! IS! FORBIDDEN!

...
XML is whole just structured strings. Structured into elements with names, attributes and values.

See my point (@) above.

...
...
...
is also source of some bugs as e.g. hash does not have this type attribute and result is that `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which cause many recent failures we get with typos in autoyast profiles.

Let's have test cases for these to ensure that the schemas can distinguish them and the error reports are helpful.

I think source of this is that we use typed xml, but omit types for string and hash and just guessing it. As usually we stop in middle of road.

Let's write the existing rules:

No type attribute: Contains just elements (and whitespace) => Hash Contains no elements => String Contains elements AND string (<a><key>b</key>c</a>) => WTF

We should make "WTF" an explicit error in YaST-XML, not silently converting it to a string as we do now(?)

Well, I try currently to be backward compatible even for buggy behavior but I think we can change it as it is not working profiles. So question is how to make it explicit error? For now XML module use nil as return value and set XMLError. Do you think we should use same way? Or for this kind of errors use exception? Code probably won't expect it.

...

...
...
...
And as bonus we do not specify this types in schema, so during validation if you omit type it is still valid xml, but it crashes in code as it expect different type.

We must use the correct terms:

WELL-FORMED XML means, roughly, syntactically correct disregarding the DTD or schema

VALID XML means, obeying the DTD or schema (in addition to being well formed)

For example, any XML parser can check for well-formedness otherwise it is not worth being called a XML parser. We do not get bugs about malformed profiles, people are competent enough not to use them.

The bug-reported profiles are invalid, either in the sense of not obeying the autoyast schema, or even violating some of the common properties of YaST-XML.

My problem here is that types are mandatory in Yast-XML, but with exception of string and hash, but relax ng schema we are using does not require them.

AFAIK the schema does require the types. See /usr/share/YaST2/schema/autoyast/rng/common.rng Do you have a test case?

Ah, I just think from number of bugs, but looks like it is usual problem with not validating that xml at all. So looks like schema contain it.

...

...
Also what I do not like is that we specify types in multiple places. It is kind of in schema ( where it validate value, but not check that attribute is set ), we have it in XML itself and sometimes in code.

I don't understand this, please give an example.

lets consider variable "a" you have type in xml as config:type="integer" then you need to specify that type also in rng as INTEGER and also you often has in code its type somewhere ( documentation, explicit to_integer or ruby to_i ). So you repeat often same info about type and we cannot infer it anyway. But probably this is just minor annoyance.

...

...
So from what you write, I understand that it makes sense to have specific xml parser for subset of XML we use. My question is if we should try to improve somehow that subset? Like

...
not having types in namespace

agree

...
not need to define global namespace

disagree

...
and maybe if we use validation trying to get type from schema if it is not defined.

I don't understand this part.

Well, as long as we start using builtin validation and required type attribute works there as discussed above, it is probably not needed ( I just hope that failed validation return reasonable error message from nokogiri, need to test it ).

...

Josef -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Stasiek Michalski

11:40

On Wed, Apr 8, 2020 at 13:33, josef Reidinger <jreidinger@suse.cz> wrote:

...

On Wed, 8 Apr 2020 10:46:17 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
If you insist, I can look deeper but now I estimate that allowing no namespace would make things harder rather than easier.

I see many xmls that do not have namespace and works OK. as long as you do not mixture different types of xmls (e.g. try some xmls in /usr/share - wicked, gdb, libvirt, gimp they do not use namespaces )

Some parsers are very strict about namespaces for more specialized uses of xml, like svg. As long as the filenames end with .xml the parsers will usually load it up like they would an html file, with very lenient parsing (the web created a good environment for non-conformant xml ;) LCP [Stasiek] https://lcp.world -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Ancor Gonzalez Sosa

11:52

On 2020-04-08 13:33, josef Reidinger wrote:

...

On Wed, 8 Apr 2020 10:46:17 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
On Fri, Apr 03, 2020 at 05:21:21PM +0200, josef Reidinger wrote:

...
On Fri, 3 Apr 2020 16:05:04 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV. Can we simple assume this namespace if none is defined?

The quick answer is no. Just like with other programming languages, for throwaway scripts working without namespaces is fine but as soon as you start building anything bigger or longer lasting, namespaces are needed to resolve conflicts and organize things.

For me it looks like overengineering. Do we plan to have multiple namespaces

Just a warning note from someone who doesn't know the topic so well. The sentence "that is overengineering, we are never going to do X" always triggers my spider-sense.[*] As much as we abuse of the term, it's in fact really hard to overengineer something. Having a piece of software that is not prepared for the future/present (because someone decided that X would never happen) is a WAY more common problem. And we usually cause by trying to avoid the almost-mythological overengineering. Cheers. [*] Spider-sense: see experience. -- Ancor González Sosa YaST Team at SUSE Linux GmbH -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

josef Reidinger

12:35

On Wed, 8 Apr 2020 13:52:44 +0200 Ancor Gonzalez Sosa <ancor@suse.de> wrote:

...

On 2020-04-08 13:33, josef Reidinger wrote:

...
On Wed, 8 Apr 2020 10:46:17 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
On Fri, Apr 03, 2020 at 05:21:21PM +0200, josef Reidinger wrote:

...
On Fri, 3 Apr 2020 16:05:04 +0200 Martin Vidner <mvidner@suse.cz> wrote:

...
2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns"

Can we set namespace if it is not defined in XML? What puzzle me the most about that format (not parser) now is not that xml is badly readable, but that it is very hard and error prone to write/modify it. And mandatory namespace is unnecessary from my POV. Can we simple assume this namespace if none is defined?

The quick answer is no. Just like with other programming languages, for throwaway scripts working without namespaces is fine but as soon as you start building anything bigger or longer lasting, namespaces are needed to resolve conflicts and organize things.

For me it looks like overengineering. Do we plan to have multiple namespaces

Just a warning note from someone who doesn't know the topic so well. The sentence "that is overengineering, we are never going to do X" always triggers my spider-sense.[*]

As much as we abuse of the term, it's in fact really hard to overengineer something. Having a piece of software that is not prepared for the future/present (because someone decided that X would never happen) is a WAY more common problem. And we usually cause by trying to avoid the almost-mythological overengineering.

Well, my goal is to use reasonable default as currently we have only one namespace. So if it is not defined, then our parser will use our namespace. And if we need in future more namespaces, then it should not be a problem as we use default one without namespaces and respect others. From UX ( and here user is user of XML so also autoyast users ) is better for me to allow to avoid namespace if there is just one and just use it. So I agree with you about closing for extension, I just think that this is not the case ( if I do not overlook something ). Josef

...

Cheers.

[*] Spider-sense: see experience.

-- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Ladislav Slezak

9 Apr 9 Apr

14:52

Dne 08. 04. 20 v 13:33 josef Reidinger napsal(a):

...

For me it looks like overengineering. Do we plan to have multiple namespaces ( like element namespaces to distinguish sub elements ) or combine multiple XML files? Now we have XML with simple structure and ignoring anything that does not have namespace is just another obstacle for me. Especially when you need to define namespace when you write file ( visible in test suite as currently `Yast::XML.XMLToYCPString(Yast::XML.YCPToXMLString("test", {"test" => 15}))` returns => {}

[...]

...

I see many xmls that do not have namespace and works OK. as long as you do not mixture different types of xmls (e.g. try some xmls in /usr/share - wicked, gdb, libvirt, gimp they do not use namespaces )

Disclaimer: I'm not an XML expert so take my opinions with a grain of salt... The namespaces are useful if you need to merge several different XML parts into one document. You can have customer.xml and order.xml containing the same tag, let's say <id>. If you merge them with namespaces you can still easily differ between <customer:id> and <order:id> and avoid ambiguity. A perfect example is XSL transformation which describes how to transform one XML document into another XML. And it's written, surprisingly, also in XML. Then with namespaces you can easily avoid conflicts between the literal XML input/output data and the XSL metadata describing the transformation itself. But the question is: do we need something like this for YaST? Is there any use case for that? I have never seen a need for anything like that... For me personally the YaST namespaces just make the things more difficult. E.g. if you need to define a XSL transformation for a control.xml you need to be careful to correctly match the input namespaces in tags and make sure no extra namespace is printed in the output. See e.g. https://github.com/yast/skelcd-control-SLES4SAP/blob/master/package/installa... So I think I could live without the namespaces, but as we are not much familiar with XML then my suggestion is to ask some real XML expert. Do you know some? Do we have one? -- Ladislav Slezák YaST Developer SUSE LINUX, s.r.o. Corso IIa Křižíkova 148/34 18600 Praha 8 -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

Ladislav Slezak

13:49

Dne 03. 04. 20 v 16:05 Martin Vidner napsal(a): [...]

...

VALID XML means, obeying the DTD or schema (in addition to being well formed)

For example, any XML parser can check for well-formedness otherwise it is not worth being called a XML parser. We do not get bugs about malformed profiles, people are competent enough not to use them.

Well, quite recently I got a bug where AutoYaST crashed and it turned out that the manually edited XML contained a typo, an extra "<" character. So yes, this might happen, esp. when you edit the XML manually. -- Ladislav Slezák YaST Developer SUSE LINUX, s.r.o. Corso IIa Křižíkova 148/34 18600 Praha 8 -- To unsubscribe, e-mail: yast-devel+unsubscribe@opensuse.org To contact the owner, e-mail: yast-devel+owner@opensuse.org

1775

Age (days ago)

1785

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Ancor Gonzalez Sosa
josef Reidinger
Ladislav Slezak
Martin Vidner
Stasiek Michalski