- Previous thread: Draft on some weird KR ideas
- Next thread: Finding ontology
- Threads sorted by date: w3c-semantic-web 201007
Hi All,
I know this is a known problem, but I have been bitten by the fact that there are legal RDF documents which I can't query using the SPARQL query language. And perhaps this should be looked at any future revision of RDF or SPARQL.
The issue arises because turtle doesn't forbid the use of certain characters, for example the backtick " ` " (%60), where as SPARQL does forbid it. Which means that I can write legal turtle, import it into my triplestore, but I wont be able to ever query that data via SPARQL.
For example, the following turtle is legal :
a foaf:Document .
foaf:primaryTopic .
But I cant write the following SPARQL query:
SELECT * WHERE { ?p ?o}
I thought this was due to the fact that the RDF spec [1] was written before the RFC which defined URIs [2], but I can't find a link to an RDF spec which pre dates 1998.
My question is why does SPARQL forbid the use of certain URIs when the beauty of the web (from my POV) is that it kinda works, even if you use invalid URIs for documents, for example the following web document is real, it works, but I can't use it as a URI in parts of the RDF world.
http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php
I bring this up, because one of our applications needs to make statements about the above document, and I am now slightly tempted to encode such information as a literal value, something along the lines of :
-:bnode0 a foo:WebDocument .
-bnode0 foo:hasAddress "http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php" .
Which is kinda rubbish from my POV.
My confusion lies in the following points:
1) What is a URI reference, and what is the relationship between a URI reference and a URI as per [2], see :
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#ref-uris
2) My example of the backtick issue shows the incompatibility of turtle and sparql. From my reading RDFXML only points to section 2.1 of RFC2396 which doesn't forbid the backtick. Is it legal to use a backtick in an RDFXML URI?
Mischa
[1] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ (i know there was a 1999 one too)
[2] http://www.ietf.org/rfc/rfc2396.txt
Hi All, I know this is a known problem, but I have been bitten by the fact that there are legal RDF documents which I can't query using the SPARQL query language. And perhaps this should be looked at any future revision of RDF or SPARQL. The issue arises because turtle doesn't forbid the use of certain characters, for example the backtick " ` " (%60), where as SPARQL does forbid it. Which means that I can write legal turtle, import it into my triplestore, but I wont be able to ever query that data via SPARQL.For example, the following turtle is legal : <http://example.com/mylamefoafdocument`uri> a foaf:Document . <http://example.com/mylamefoafdocument`uri> foaf:primaryTopic <http://example.com/mylamefoafdocument`uri#me> .But I cant write the following SPARQL query: SELECT * WHERE { <http://example.com/mylamefoafdocument`uri> ?p ?o}I thought this was due to the fact that the RDF spec [1] was written before the RFC which defined URIs [2], but I can't find a link to an RDF spec which pre dates 1998.My question is why does SPARQL forbid the use of certain URIs when the beauty of the web (from my POV) is that it kinda works, even if you use invalid URIs for documents, for example the following web document is real, it works, but I can't use it as a URI in parts of the RDF world. http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.phpI bring this up, because one of our applications needs to make statements about the above document, and I am now slightly tempted to encode such information as a literal value, something along the lines of : -:bnode0 a foo:WebDocument . -bnode0 foo:hasAddress "http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php" .Which is kinda rubbish from my POV.My confusion lies in the following points: 1) What is a URI reference, and what is the relationship between a URI reference and a URI as per [2], see : http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#ref-uris2) My example of the backtick issue shows the incompatibility of turtle and sparql. From my reading RDFXML only points to section 2.1 of RFC2396 which doesn't forbid the backtick. Is it legal to use a backtick in an RDFXML URI?Mischa [1] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ (i know there was a 1999 one too)[2] http://www.ietf.org/rfc/rfc2396.txt
RDF core was working in parallel with the IRI [1] work. URIRef [2] (as I understand it) was trying to anticipate what IRIs would be. URIRef and IRI are pretty close (but see below), and I think the general recommendation is that you should read 'URIRef' as 'IRI'.
SPARQL syntax is defined in terms of IRIs, although I'm not sure syntax is identical (it uses ([^"{}|^`]-[#x00-#x20])*), but it seems close enough.
Looking at the IRI spec ` is not permitted, however URIRef does allow it. 'Pretty close', but not close enough. Turtle, it seems, is in the URIRef camp.
It also doesn't seem to be permitted in URIs, [3] which makes URIRef feel like it's outside the mainstream.
Personally I would follow IRI and fix turtle. Why should RDF have its own URL/URI/IRI-ish syntax?
As for "http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php'", that does work when encoded.
Disclaimer: I may have got some or all of this wrong. Do not trust my assertions regarding the RFCs.
Damian
[1]
[2]
[3]
Hello,
Yeah I follow and thanks for the clarification, but as far as I can tell rdfxml (is a rec) is in the URIRef space too - please correct me if I am wrong. Which means that I still have the same problem of not being to querying for URI which I can import using rdfxml with sparql.
Agreed.
Do you think that the same logic should be applied to rdfxml too ? Otherwise there will be things you can write in turtle and not in rdfxml which you can subsequently sparql, which simply doesn't feel right to me.
I wonder if I should contact the current sparql working group, as they are currently active, and see how they respond. I think it is unfortunate that you can write valid rdf which can't be queried in sparql.
Yupo, I am aware I could just encode the ` as %60.
Thanks Damian,
Mischa
Hello, On 29 Jul 2010, at 13:51, Damian Steer wrote:On 29 Jul 2010, at 12:20, Mischa Tuffield wrote:Hi All, I know this is a known problem, but I have been bitten by the fact that there are legal RDF documents which I can't query using the SPARQL query language. And perhaps this should be looked at any future revision of RDF or SPARQL. The issue arises because turtle doesn't forbid the use of certain characters, for example the backtick " ` " (%60), where as SPARQL does forbid it. Which means that I can write legal turtle, import it into my triplestore, but I wont be able to ever query that data via SPARQL.For example, the following turtle is legal : <http://example.com/mylamefoafdocument`uri> a foaf:Document . <http://example.com/mylamefoafdocument`uri> foaf:primaryTopic <http://example.com/mylamefoafdocument`uri#me> .But I cant write the following SPARQL query: SELECT * WHERE { <http://example.com/mylamefoafdocument`uri> ?p ?o}I thought this was due to the fact that the RDF spec [1] was written before the RFC which defined URIs [2], but I can't find a link to an RDF spec which pre dates 1998.RDF core was working in parallel with the IRI [1] work. URIRef [2] (as I understand it) was trying to anticipate what IRIs would be. URIRef and IRI are pretty close (but see below), and I think the general recommendation is that you should read 'URIRef' as 'IRI'.SPARQL syntax is defined in terms of IRIs, although I'm not sure syntax is identical (it uses ([^<>"{}|^`]-[#x00-#x20])*), but it seems close enough.Looking at the IRI spec ` is not permitted, however URIRef does allow it. 'Pretty close', but not close enough. Turtle, it seems, is in the URIRef camp.Yeah I follow and thanks for the clarification, but as far as I can tell rdfxml (is a rec) is in the URIRef space too - please correct me if I am wrong. Which means that I still have the same problem of not being to querying for URI which I can import using rdfxml with sparql. It also doesn't seem to be permitted in URIs, [3] which makes URIRef feel like it's outside the mainstream.Agreed. Personally I would follow IRI and fix turtle. Why should RDF have its own URL/URI/IRI-ish syntax?Do you think that the same logic should be applied to rdfxml too ? Otherwise there will be things you can write in turtle and not in rdfxml which you can subsequently sparql, which simply doesn't feel right to me. I wonder if I should contact the current sparql working group, as they are currently active, and see how they respond. I think it is unfortunate that you can write valid rdf which can't be queried in sparql. As for "http://washington-press-release.com/41/Study%20Addresses%20`Cross-Selling`%20Within%20the%20Dating,%20Adult%20Dating%20Arena.php'", that does work when encoded.Yupo, I am aware I could just encode the ` as %60. Disclaimer: I may have got some or all of this wrong. Do not trust my assertions regarding the RFCs.DamianThanks Damian, Mischa[1] <http://www.ietf.org/rfc/rfc3987.txt>[2] <http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference>[3] <http://www.ietf.org/rfc/rfc2396.txt>
On Thu, Jul 29, 2010 at 10:05 AM, Mischa Tuffield
Well it's reasonably well known that it's possible to write N3 that
can't be encoded in RDF/XML, and that doesn't seem to have caused
great stress until now. Personally, I *much* prefer N3, so it doesn't
bother me what RDF/XML can't do. :-)
As for being queried in SPARQL, that's a relative concept. Yes, you
can't match it directly, as you've pointed out, but it can still be
returned in results (unless an implementation specifically tries to
put the data into an internal IRI and a validation error occurs, but
that's implementation specific). It's always possible to bind it to a
variable and return the data. Alternatively, if you really did want to
search for it, you could bind to a variable, and FILTER on its string
representation. Yes, it will be slow, but my point is that the
language isn't *completely* deficient (complain to Steve if it is).
;-)
Regards,
Paul Gearon
Well it's reasonably well known that it's possible to write N3 that
can't be encoded in RDF/XML, and that doesn't seem to have caused
great stress until now. Personally, I *much* prefer N3, so it doesn't
bother me what RDF/XML can't do. :-)
As for being queried in SPARQL, that's a relative concept. Yes, you
can't match it directly, as you've pointed out, but it can still be
returned in results (unless an implementation specifically tries to
put the data into an internal IRI and a validation error occurs, but
that's implementation specific). It's always possible to bind it to a
variable and return the data. Alternatively, if you really did want to
search for it, you could bind to a variable, and FILTER on its string
representation. Yes, it will be slow, but my point is that the
language isn't *completely* deficient (complain to Steve if it is).
;-)
Regards,
Paul Gearon
Oh yes, s/URIRef/IRI/ everywhere possible. For reference, [1] provides the rationale for the original decision not to do this substitution.
Damian
[1]
Sure, I too much prefer turtle to rdf/xml, way easier ... But I guess I am a slave to the tools I use, and as a result n3 isn't an option for me.
I understand that I can query the RDF using sparql and I can bind it to a variable to get the URI returned. I stumbled upon this because I was doing a "insert data {graph { ,
Thanks for the link, being an undergrad at that point in time, I didn't know what RDF was. I am guessing the key part of that email you linked is the bit which states :
Which I don't think it true at the moment, but I may be wrong.
Thanks for the link,
Mischa
On 29 Jul 2010, at 15:43, Damian Steer wrote:On 29 Jul 2010, at 15:05, Mischa Tuffield wrote:Hello, On 29 Jul 2010, at 13:51, Damian Steer wrote:Personally I would follow IRI and fix turtle. Why should RDF have its own URL/URI/IRI-ish syntax?Do you think that the same logic should be applied to rdfxml too ? Otherwise there will be things you can write in turtle and not in rdfxml which you can subsequently sparql, which simply doesn't feel right to me. Oh yes, s/URIRef/IRI/ everywhere possible. For reference, [1] provides the rationale for the original decision not to do this substitution.Damian[1] <http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0031.html>Thanks for the link, being an undergrad at that point in time, I didn't know what RDF was. I am guessing the key part of that email you linked is the bit which states : > RESOLVED (prop bwm, second gk, 0 agin, jjc abst)
> We continue to use the term "RDF URI reference" [although
> we note that
> the definition currently aligns with that of an absolute IRI ref.]
> ...Which I don't think it true at the moment, but I may be wrong. Thanks for the link, Mischa
On Thu, Jul 29, 2010 at 12:38 PM, Mischa Tuffield
>
>
On Thu, Jul 29, 2010 at 1:13 PM, Paul Gearon wrote:
I just realized that this *is* valid SPARQL 1.1. The documentation for
IRI() isn't defined everywhere yet (it has its own section, but
doesn't yet appear in the tables).
BTW, I'm not saying that this is the solution. (All those curly braces
give me the shivers). But it is *a* solution. :-)
Regards,
Paul Gearon
I just realized that this *is* valid SPARQL 1.1. The documentation for
IRI() isn't defined everywhere yet (it has its own section, but
doesn't yet appear in the tables).
BTW, I'm not saying that this is the solution. (All those curly braces
give me the shivers). But it is *a* solution. :-)
Regards,
Paul Gearon
always say:
but it still isn't a legal IRI.
There are two levels here:
The syntax, that says:
IRI-REF ::= ''
but also the syntax rules in the URI RFC (now RFC 3986) including any
scheme-specific rules.
Last time, IIRC DAWG decided not to copy over the full grammar for IRIs,
but to put in a more general but smaller pattern.
For example, "[" "]" are only legal as delimiters for IPv6 addresses in
the authority part.
Andy
This is dependant on the definition of the function "IRI()", and it seems odd to me, based on the function name, that it would generate an illegal IRI.
:)
Am guessing I should have just emailed the SPARQL WG people, but I thought I would also flag this matter with the RDF 2.0 folks.
Cheers,
Mischa
On 29 Jul 2010, at 18:40, Andy Seaborne wrote:On 29/07/2010 6:22 PM, Paul Gearon wrote:> On Thu, Jul 29, 2010 at 1:13 PM, Paul Gearon<gearon@ieee.org> wrote:>> <snip/>>>> In a hack similar to the one I mentioned with FILTER, but you can always say:>>>> insert { graph<http://example.com/graph> {>> ?u foo:Property "something" } }>> { { select IRI("http://example.com/mylamefoafdocument`uri") as ?u {} } }This is dependant on the definition of the function "IRI()", and it seems odd to me, based on the function name, that it would generate an illegal IRI. but it still isn't a legal IRI.:)There are two levels here:The syntax, that says:IRI-REF ::= '<' ([^<>"{}|^`]-[#x00-#x20])* '>'but also the syntax rules in the URI RFC (now RFC 3986) including any scheme-specific rules.Last time, IIRC DAWG decided not to copy over the full grammar for IRIs, but to put in a more general but smaller pattern.For example, "[" "]" are only legal as delimiters for IPv6 addresses in the authority part. AndyAm guessing I should have just emailed the SPARQL WG people, but I thought I would also flag this matter with the RDF 2.0 folks. Cheers, Mischa>>>> But then I realized that this uses a non-standard constructor for>> IRIs! I should raise this as a possible function for SPARQL 1.1.>> I just realized that this *is* valid SPARQL 1.1. The documentation for> IRI() isn't defined everywhere yet (it has its own section, but> doesn't yet appear in the tables).>> BTW, I'm not saying that this is the solution. (All those curly braces> give me the shivers). But it is *a* solution. :-)>> Regards,> Paul Gearon>
There is a function in "XQuery 1.0 and XPath 2.0 Functions and
Operators" what might help:
fn:encode-for-uri(string) -
http://www.w3.org/TR/xpath-functions/#func-encode-for-uri
although it will encode "/" and ":" as well as it's intended to produce
a string that can be used in a URI path segment not make a string safe
to use as an IRI.
The SPARQL-WG is discussing defining a core set of functions to be part
of spec (so any implementation can be expected to provide them): e.g.
http://www.w3.org/2009/sparql/wiki/Feature:FunctionLibrary#XQuery-1.0-and-XPath-2.0-Functions-and-Operators
(disclosure: I produced that list - by simply looking through the F&O
functions that covered datatypes required by SPARQL and made sense to
SPARQL)
Either adding a (string)-IRI do minimal encoding necessary are possible.
It's an RDF-maintenance question. (not 2.0 please!)
IMO SPARQL should follow, not lead, the decision for the data. The
original IRI decision in DAWG was made taking forward the reasoning that
the RDF-Core group had used, just made up-to-date because by then the
updated URI and the IRI RFCs had come out.
Andy
Makes sense Andy, for now I am just going to have to put more smarts at the application level. If I do all of my data import via sparql update I should be able to ensure that all the data in the triplestore abides to the sparql spec, which is seemingly more tightly spec'ed.
Um, wasn't meant to be controversial there, please ignore my naïvety. But from a developers POV, the thought of being able to import data which can always be sparql'ed seems odd to me.
As someone not involved in either the RDF or the SPARQL work, point taken.
Thanks for the pointers,
Mischa
On 29 Jul 2010, at 19:58, Andy Seaborne wrote:On 29/07/2010 6:59 PM, Mischa Tuffield wrote:On 29 Jul 2010, at 18:40, Andy Seaborne wrote:On 29/07/2010 6:22 PM, Paul Gearon wrote:> On Thu, Jul 29, 2010 at 1:13 PM, Paul Gearon<gearon@ieee.org<mailto:gearon@ieee.org>> wrote:>> <snip/>>>> In a hack similar to the one I mentioned with FILTER, but you canalways say:>>>> insert { graph<http://example.com/graph> {>> ?u foo:Property "something" } }>> { { select IRI("http://example.com/mylamefoafdocument`uri") as ?u{} } }This is dependant on the definition of the function "IRI()", and itseems odd to me, based on the function name, that it would generate anillegal IRI.but it still isn't a legal IRI.:)There is a function in "XQuery 1.0 and XPath 2.0 Functions and Operators" what might help:fn:encode-for-uri(string) ->stringhttp://www.w3.org/TR/xpath-functions/#func-encode-for-urialthough it will encode "/" and ":" as well as it's intended to produce a string that can be used in a URI path segment not make a string safe to use as an IRI.The SPARQL-WG is discussing defining a core set of functions to be part of spec (so any implementation can be expected to provide them): e.g.http://www.w3.org/2009/sparql/wiki/Feature:FunctionLibrary#XQuery-1.0-and-XPath-2.0-Functions-and-Operators(disclosure: I produced that list - by simply looking through the F&O functions that covered datatypes required by SPARQL and made sense to SPARQL)Either adding a <make-safe-for-absolute-iri>(string)->string or making IRI do minimal encoding necessary are possible.Makes sense Andy, for now I am just going to have to put more smarts at the application level. If I do all of my data import via sparql update I should be able to ensure that all the data in the triplestore abides to the sparql spec, which is seemingly more tightly spec'ed. There are two levels here:The syntax, that says:IRI-REF ::= '<' ([^<>"{}|^`]-[#x00-#x20])* '>'but also the syntax rules in the URI RFC (now RFC 3986) including anyscheme-specific rules.Last time, IIRC DAWG decided not to copy over the full grammar forIRIs, but to put in a more general but smaller pattern.For example, "[" "]" are only legal as delimiters for IPv6 addressesin the authority part.AndyAm guessing I should have just emailed the SPARQL WG people, but Ithought I would also flag this matter with the RDF 2.0 folks.It's an RDF-maintenance question. (not 2.0 please!)Um, wasn't meant to be controversial there, please ignore my naïvety. But from a developers POV, the thought of being able to import data which can always be sparql'ed seems odd to me. IMO SPARQL should follow, not lead, the decision for the data. The original IRI decision in DAWG was made taking forward the reasoning that the RDF-Core group had used, just made up-to-date because by then the updated URI and the IRI RFCs had come out.As someone not involved in either the RDF or the SPARQL work, point taken.Thanks for the pointers, Mischa AndyCheers,Mischa
Hello Paul,
I intended to write that any query processor should be able to optimize
{ ?s ?p ?o .
FILTER (isIRI(?o) && str(?o) = """fake://`backquoted`""") }
into internal equivalent of
?s ?p
but suddenly I've found that my own optimizer miss this rewriting in
some cases. Oops! I overslept this because our clients tend to write
{ ?s ?p ?o .
FILTER (isIRI(?o) && ?o = iri("""fake://`backquoted`""")) }
that is optimized correctly.
So thank you for the reason for fix and for an extra reason for iri()
built-in in SPARQL 1.1 .
Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com
I intended to write that any query processor should be able to optimize
{ ?s ?p ?o .
FILTER (isIRI(?o) && str(?o) = """fake://`backquoted`""") }
into internal equivalent of
?s ?p
but suddenly I've found that my own optimizer miss this rewriting in
some cases. Oops! I overslept this because our clients tend to write
{ ?s ?p ?o .
FILTER (isIRI(?o) && ?o = iri("""fake://`backquoted`""")) }
that is optimized correctly.
So thank you for the reason for fix and for an extra reason for iri()
built-in in SPARQL 1.1 .
Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com
This is a multi-part message in MIME format.
Late to this party, I have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF Concepts ...
[[
*Note:* this section anticipates an RFC on Internationalized Resource
Identifiers. Implementations may issue warnings concerning the use of
RDF URI References that do not conform with [IRI draft
] or its successors.
]]
we knew there may be changes - like the space issue, and this small
print was intended to (somewhat naughtily) include changes made
elsewhere in the future in the 2004 document.
If I have understood Mischa correctly, the problem is that it is
possible to enter illegal IRIs into a triple store in some fashion (e.g.
turtle, and then stuff doesn't work. Surprise, surprise: garbage in,
garbage out.
Solution: use a triple store that validates its input and rejects
garbage; tackle the problem at source.
If the turtle spec permits illegal IRIs then that is a bug with the spec.
If a turtle implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
Dogmatically
Jeremy
The issue arises because turtle doesn't forbid the use of
certain characters, for example the backtick " ` " (%60), where
as SPARQL does forbid it. Which means that I can write legal
turtle, import it into my triplestore, but I wont be able to
ever query that data via SPARQL.
Late to this party, I have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF Concepts ...
[[
Note: this section anticipates an RFC on
Internationalized Resource Identifiers. Implementations may issue
warnings concerning the use
of RDF URI References that do not conform with [IRI draft]
or its successors.
]]
we knew there may be changes - like the space issue, and this small
print was intended to (somewhat naughtily) include changes made
elsewhere in the future in the 2004 document.
If I have understood Mischa correctly, the problem is that it is
possible to enter illegal IRIs into a triple store in some fashion
(e.g. turtle, and then stuff doesn't work. Surprise, surprise:
garbage in, garbage out.
Solution: use a triple store that validates its input and rejects
garbage; tackle the problem at source.
If the turtle spec permits illegal IRIs then that is a bug with the
spec.
If a turtle implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
Dogmatically
Jeremy
Late to this party, I have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF Concepts ...
[[
*Note:* this section anticipates an RFC on Internationalized Resource
Identifiers. Implementations may issue warnings concerning the use of
RDF URI References that do not conform with [IRI draft
] or its successors.
]]
we knew there may be changes - like the space issue, and this small
print was intended to (somewhat naughtily) include changes made
elsewhere in the future in the 2004 document.
If I have understood Mischa correctly, the problem is that it is
possible to enter illegal IRIs into a triple store in some fashion (e.g.
turtle, and then stuff doesn't work. Surprise, surprise: garbage in,
garbage out.
Solution: use a triple store that validates its input and rejects
garbage; tackle the problem at source.
If the turtle spec permits illegal IRIs then that is a bug with the spec.
If a turtle implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
Dogmatically
Jeremy
The issue arises because turtle doesn't forbid the use of
certain characters, for example the backtick " ` " (%60), where
as SPARQL does forbid it. Which means that I can write legal
turtle, import it into my triplestore, but I wont be able to
ever query that data via SPARQL.
Late to this party, I have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF Concepts ...
[[
Note: this section anticipates an RFC on
Internationalized Resource Identifiers. Implementations may issue
warnings concerning the use
of RDF URI References that do not conform with [IRI draft]
or its successors.
]]
we knew there may be changes - like the space issue, and this small
print was intended to (somewhat naughtily) include changes made
elsewhere in the future in the 2004 document.
If I have understood Mischa correctly, the problem is that it is
possible to enter illegal IRIs into a triple store in some fashion
(e.g. turtle, and then stuff doesn't work. Surprise, surprise:
garbage in, garbage out.
Solution: use a triple store that validates its input and rejects
garbage; tackle the problem at source.
If the turtle spec permits illegal IRIs then that is a bug with the
spec.
If a turtle implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
Dogmatically
Jeremy
I must have missed the subtle message by not reading in between the lines there.
I think blaming turtle is harsh, as far as my reading of the spec goes, I can make use of URIs (with for example with a ` inside) in a valid RDF/XML document (as well as in turtle), which I can then import into a triplestore. If i tried to import the same triples into my triplestore using an "INSERT DATA" sparql update call, I will get an error back. I take this is due to the fact that SPARQL and RDF/XML (as this is the only RDF rec I am familiar with par - as I am not that well versed in RDFa) have different notions of what their URIs can be.
The triplestore I use, correctly validates legal RDF (as per turtle and RDF/XML specs by allowing URIRefs), and also correctly validates SPARQL as per spec (by only allowing IRIs).
Well, this logic makes me think that there is a bug in the RDF/XML spec too then?
And yes agreed, my whole point is that given the current use of IRIs and URIRefs in the SPARQL and RDF/XML specs respectively, one needs to very careful when developing software.
Mischa
The issue arises because turtle doesn't forbid the use of
certain characters, for example the backtick " ` " (%60), where
as SPARQL does forbid it. Which means that I can write legal
turtle, import it into my triplestore, but I wont be able to
ever query that data via SPARQL.
Late to this party, I have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF Concepts ...
[[
Note: this section anticipates an RFC on
Internationalized Resource Identifiers. Implementations may issue
warnings concerning the use
of RDF URI References that do not conform with [IRI draft]
or its successors.
]]
we knew there may be changes - like the space issue, and this small
print was intended to (somewhat naughtily) include changes made
elsewhere in the future in the 2004 document.I must have missed the subtle message by not reading in between the lines there.
If I have understood Mischa correctly, the problem is that it is
possible to enter illegal IRIs into a triple store in some fashion
(e.g. turtle, and then stuff doesn't work. Surprise, surprise:
garbage in, garbage out.I think blaming turtle is harsh, as far as my reading of the spec goes, I can make use of URIs (with for example with a ` inside) in a valid RDF/XML document (as well as in turtle), which I can then import into a triplestore. If i tried to import the same triples into my triplestore using an "INSERT DATA" sparql update call, I will get an error back. I take this is due to the fact that SPARQL and RDF/XML (as this is the only RDF rec I am familiar with par - as I am not that well versed in RDFa) have different notions of what their URIs can be.
Solution: use a triple store that validates its input and rejects
garbage; tackle the problem at source.The triplestore I use, correctly validates legal RDF (as per turtle and RDF/XML specs by allowing URIRefs), and also correctly validates SPARQL as per spec (by only allowing IRIs).
If the turtle spec permits illegal IRIs then that is a bug with the
spec.Well, this logic makes me think that there is a bug in the RDF/XML spec too then?
If a turtle implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.And yes agreed, my whole point is that given the current use of IRIs and URIRefs in the SPARQL and RDF/XML specs respectively, one needs to very careful when developing software.
Dogmatically
JeremyMischa
This is a multi-part message in MIME format.
Yes - as I said naughty of us.
The IRI people hadn't finished their work, and we were not going to wait
for them, but logically IRI is foundational and RDF is the next layer up.
There is a long tradition, which I do not like, of not validating URLs
but doing the best one can. When, as in SemWeb, the IRIs are the key
identifiers, validating them as much as possible is my comfort zone. I
believe I am in a minority position here.
The RDF/XML spec is clearly in error in that it depends on this half-way
house concept RDF URI Reference.
The turtle draft
http://www.w3.org/TeamSubmission/turtle/#relativeURI
in my view needs polishing in this area. It normatively refers to URI
and IRI specs, but doesn't make use of them in the text, except for a
reference to the base URI mechanism of the URI spec.
The grammar given for URIs (ucharacter*) is way too liberal resulting in
probable impossible to resolve issues with ill-formed relative URIs.
My view is that a triple store should validate IRIs against IRI spec.
Future looking advice - conform with IRI spec.
Late to this party, I
have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF
Concepts ...
[[
Note: this section anticipates an RFC on
Internationalized Resource Identifiers. Implementations
may issue warnings concerning the use of RDF URI
References that do not conform with [IRI
draft] or its successors.
]]
we knew there may be changes - like the space issue, and
this small print was intended to (somewhat naughtily)
include changes made elsewhere in the future in the 2004
document.
I must have missed the subtle message by not reading in
between the lines there.
Yes - as I said naughty of us.
The IRI people hadn't finished their work, and we were not going to
wait for them, but logically IRI is foundational and RDF is the next
layer up.
If I have understood Mischa correctly, the problem is that
it is possible to enter illegal IRIs into a triple store
in some fashion (e.g. turtle, and then stuff doesn't work.
Surprise, surprise: garbage in, garbage out.
I think blaming turtle is harsh, as far as my reading of
the spec goes, I can make use of URIs (with for example with
a ` inside) in a valid RDF/XML document (as well as in
turtle),
There is a long tradition, which I do not like, of not validating
URLs but doing the best one can. When, as in SemWeb, the IRIs are
the key identifiers, validating them as much as possible is my
comfort zone. I believe I am in a minority position here.
The RDF/XML spec is clearly in error in that it depends on this
half-way house concept RDF URI Reference.
The turtle draft
http://www.w3.org/TeamSubmission/turtle/#relativeURI
in my view needs polishing in this area. It normatively refers to
URI and IRI specs, but doesn't make use of them in the text, except
for a reference to the base URI mechanism of the URI spec.
The grammar given for URIs (ucharacter*) is way too liberal
resulting in probable impossible to resolve issues with ill-formed
relative URIs.
which I can then import into a triplestore. If i tried to
import the same triples into my triplestore using an "INSERT
DATA" sparql update call, I will get an error back. I take
this is due to the fact that SPARQL and RDF/XML (as this is
the only RDF rec I am familiar with par - as I am not that
well versed in RDFa) have different notions of what their
URIs can be.
Solution: use a
triple store that validates its input and rejects garbage;
tackle the problem at source.
The triplestore I use, correctly validates legal RDF (as
per turtle and RDF/XML specs by allowing URIRefs), and also
correctly validates SPARQL as per spec (by only allowing
IRIs).
My view is that a triple store should validate IRIs against IRI
spec.
If the turtle spec
permits illegal IRIs then that is a bug with the spec.
Well, this logic makes me think that there is a bug in
the RDF/XML spec too then?
If a turtle
implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
And yes agreed, my whole point is that given the
current use of IRIs and URIRefs in the SPARQL and RDF/XML
specs respectively, one needs to very careful when
developing software.
Future looking advice - conform with IRI spec.
Dogmatically
Jeremy
Mischa
Yes - as I said naughty of us.
The IRI people hadn't finished their work, and we were not going to wait
for them, but logically IRI is foundational and RDF is the next layer up.
There is a long tradition, which I do not like, of not validating URLs
but doing the best one can. When, as in SemWeb, the IRIs are the key
identifiers, validating them as much as possible is my comfort zone. I
believe I am in a minority position here.
The RDF/XML spec is clearly in error in that it depends on this half-way
house concept RDF URI Reference.
The turtle draft
http://www.w3.org/TeamSubmission/turtle/#relativeURI
in my view needs polishing in this area. It normatively refers to URI
and IRI specs, but doesn't make use of them in the text, except for a
reference to the base URI mechanism of the URI spec.
The grammar given for URIs (ucharacter*) is way too liberal resulting in
probable impossible to resolve issues with ill-formed relative URIs.
My view is that a triple store should validate IRIs against IRI spec.
Future looking advice - conform with IRI spec.
Late to this party, I
have very little sympathy with Mischa's issue.
First I would draw attention to the small print in RDF
Concepts ...
[[
Note: this section anticipates an RFC on
Internationalized Resource Identifiers. Implementations
may issue warnings concerning the use of RDF URI
References that do not conform with [IRI
draft] or its successors.
]]
we knew there may be changes - like the space issue, and
this small print was intended to (somewhat naughtily)
include changes made elsewhere in the future in the 2004
document.
I must have missed the subtle message by not reading in
between the lines there.
Yes - as I said naughty of us.
The IRI people hadn't finished their work, and we were not going to
wait for them, but logically IRI is foundational and RDF is the next
layer up.
If I have understood Mischa correctly, the problem is that
it is possible to enter illegal IRIs into a triple store
in some fashion (e.g. turtle, and then stuff doesn't work.
Surprise, surprise: garbage in, garbage out.
I think blaming turtle is harsh, as far as my reading of
the spec goes, I can make use of URIs (with for example with
a ` inside) in a valid RDF/XML document (as well as in
turtle),
There is a long tradition, which I do not like, of not validating
URLs but doing the best one can. When, as in SemWeb, the IRIs are
the key identifiers, validating them as much as possible is my
comfort zone. I believe I am in a minority position here.
The RDF/XML spec is clearly in error in that it depends on this
half-way house concept RDF URI Reference.
The turtle draft
http://www.w3.org/TeamSubmission/turtle/#relativeURI
in my view needs polishing in this area. It normatively refers to
URI and IRI specs, but doesn't make use of them in the text, except
for a reference to the base URI mechanism of the URI spec.
The grammar given for URIs (ucharacter*) is way too liberal
resulting in probable impossible to resolve issues with ill-formed
relative URIs.
which I can then import into a triplestore. If i tried to
import the same triples into my triplestore using an "INSERT
DATA" sparql update call, I will get an error back. I take
this is due to the fact that SPARQL and RDF/XML (as this is
the only RDF rec I am familiar with par - as I am not that
well versed in RDFa) have different notions of what their
URIs can be.
Solution: use a
triple store that validates its input and rejects garbage;
tackle the problem at source.
The triplestore I use, correctly validates legal RDF (as
per turtle and RDF/XML specs by allowing URIRefs), and also
correctly validates SPARQL as per spec (by only allowing
IRIs).
My view is that a triple store should validate IRIs against IRI
spec.
If the turtle spec
permits illegal IRIs then that is a bug with the spec.
Well, this logic makes me think that there is a bug in
the RDF/XML spec too then?
If a turtle
implementation allows illegal IRIs then that may be a
feature, but one that needs to be used with care.
And yes agreed, my whole point is that given the
current use of IRIs and URIRefs in the SPARQL and RDF/XML
specs respectively, one needs to very careful when
developing software.
Future looking advice - conform with IRI spec.
Dogmatically
Jeremy
Mischa
Related Threads
- What does it mean when you see a plus sign in between two words inside synonyms.txt? - lucene-solr-user
- errors from postfix - postfix-users
- Haskell-cafe - dependent types - haskell-cafe
- JCR query fails when filtering through more than 25 results - jackrabbit-users
- linux-cifs-client - PATCH 1/2 - mount.cifs: check for NULL pointer before calling strchr() - linux-cifs-client
- sipX-dev - Trying to figure out how caller id is getting to sipXbridge - sipx-dev
- How to email a contact card? - support-thunderbird
- Qt-creator - Line select from gutter - qt-creator
- emacs setup - clojure