svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse

Author: brane
Date: Fri Nov 9 11:54:21 2018
New Revision: 1846237
URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
More tests for peg revision parsing.

[...]
Yes indeed ... our target parsing is so incredibly reliable that
all its contents. I think this may be too much of a good thing.

The documented escaping rule for CLI consumers is to append an "@" after
every filename. By this rule, a CLI consumer that _wants_ to remove E/
would run "svn rm E/@". That command should parse as { target = 'E/',
peg = '' } regardless of whether the directory "E" contains a child
named "@".

I don't understand what change you're proposing.

Branko Čibej

2018-11-10 14:12:48 UTC

Author: brane
Date: Fri Nov 9 11:54:21 2018
New Revision: 1846237
URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
More tests for peg revision parsing.

[...]
Yes indeed ... our target parsing is so incredibly reliable that
all its contents. I think this may be too much of a good thing.

every filename. By this rule, a CLI consumer that _wants_ to remove E/
peg = '' } regardless of whether the directory "E" contains a child
I don't understand what change you're proposing.

Consider this:

$ svn rm E@
D E
$ svn rm @
svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
$ svn rm E/@
D E

would it not be more consistent that trying to remove 'E/@' would raise
the same error as trying to remove '@'? Because otherwise the behaviour
of the client depends on whether you're removing things from the current
directory or not. This becomes especially error-prone in the case when
E/@ exists:

$ svn rm E/@
D E
D E/@

I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

These usages were not ambiguous and possibly error-prone until we
invented the peg-revision syntax. At the time we failed to strictly
define the semantics , so maybe it's time we did that now. For example,
given the path

***@bar/baz

the parser will treat 'bar/baz' as the peg-revision specifier, even
though it's obvious that's not what the user had in mind. If we changed
the parser to only look for peg revisions on the basename of the path,
then at least the only restriction we'll have is that directory
separators can't be part of the peg revision specifier — which should be
an acceptable restriction, since currently we only support known
keywords, revision numbers or dates, none of which contain forward- or
backslashes.

Obviously that would only be a parser change, peg revisions would still
apply to whole paths for command semantics.

I'm aware that doing this would change the meaning of some forms of
commands, but for now I'm fairly sure we'd only produce new errors, not
silently behave differently. From the current set of tests, it seems to
follow that this inconsistency only arises with targets whose basename
starts with '@' or '.@' (and probably '..@'), the latter because '.'
(and '..') are removed by canonicalization.

-- Brane

Daniel Shahaf

2018-11-11 06:29:30 UTC

Author: brane
Date: Fri Nov 9 11:54:21 2018
New Revision: 1846237
URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
More tests for peg revision parsing.

[...]
Yes indeed ... our target parsing is so incredibly reliable that
all its contents. I think this may be too much of a good thing.

every filename. By this rule, a CLI consumer that _wants_ to remove E/
peg = '' } regardless of whether the directory "E" contains a child
I don't understand what change you're proposing.

D E
D E

The argument parsing is documented to work as follows: remove the last
"@" and everything after it — that's the peg rev. What remains is the
target dirent or URL. Thus, ultimately the reason for the different
errors is that "E/" is a valid OS-level path specification but "" is not
(in the sense that «stat("E/")» works and «stat("")» errors).

In other words, your examples are simply usage errors, but it so happens
that one of them is a syntax error and one of them is not. (It would
have been good if both usage errors had been syntax errors, but we're
12 years too late to get that right to begin with. C'est la vie.)

The correct syntaxes for removing "./@" and "E/@", of course, are
«svn rm E/@@» and «svn rm @@» respectively.

I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:

1. Because that would be backwards incompatible.

2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

IIRC «svn up @» used to be the same as «svn up ./», so I think we
shouldn't make it mean "update the file './@'" because that would be
a silent incompatibility for some users. I don't remember what the last
minor line with the non-error meaning was, though.

Post by Branko Äibej
Because otherwise the behaviour of the client depends on whether you're
removing things from the current directory or not.

To be more precise, the question is whether one spells the command

Post by Branko Äibej
This becomes especially error-prone in the case when
D E
I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

I see that there's an argument that we should require some option here,
à la --force-log, but frankly, I don't think files literally called "@"
or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
about.

Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

Post by Branko Äibej
These usages were not ambiguous and possibly error-prone until we
invented the peg-revision syntax. At the time we failed to strictly
define the semantics , so maybe it's time we did that now. For example,
given the path
the parser will treat 'bar/baz' as the peg-revision specifier, even
though it's obvious that's not what the user had in mind.
If we changed the parser to only look for peg revisions on the basename of
the path, then at least the only restriction we'll have is that directory
separators can't be part of the peg revision specifier — which should be an
acceptable restriction, since currently we only support known keywords,
revision numbers or dates, none of which contain forward- or backslashes.
Obviously that would only be a parser change, peg revisions would still
apply to whole paths for command semantics.

So you're proposing changing the parsing rule from "the last @" to
"the last @ that isn't followed by any os.path.sep"? That's forward
compatible if we constrain ourselves never to use slash or backslash in
a pegrev specifier, and backwards compatible in that every non-erroneous
usage will continue to work, so it's something we can consider.

Post by Branko Äibej
I'm aware that doing this would change the meaning of some forms of
commands, but for now I'm fairly sure we'd only produce new errors, not
silently behave differently. From the current set of tests, it seems to
follow that this inconsistency only arises with targets whose basename
(and '..') are removed by canonicalization.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else? (The former wouldn't seem to
address all the concerns you described.)

Cheers,

Daniel

P.S. Other languages also have examples of usage errors that aren't
syntax errors. For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.

Branko Čibej

2018-11-11 06:47:18 UTC

Author: brane
Date: Fri Nov 9 11:54:21 2018
New Revision: 1846237
URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
More tests for peg revision parsing.

[...]
Yes indeed ... our target parsing is so incredibly reliable that
all its contents. I think this may be too much of a good thing.

every filename. By this rule, a CLI consumer that _wants_ to remove E/
peg = '' } regardless of whether the directory "E" contains a child
I don't understand what change you're proposing.

D E
D E

The argument parsing is documented to work as follows: remove the last
target dirent or URL. Thus, ultimately the reason for the different
errors is that "E/" is a valid OS-level path specification but "" is not
(in the sense that «stat("E/")» works and «stat("")» errors).
In other words, your examples are simply usage errors, but it so happens
that one of them is a syntax error and one of them is not. (It would
have been good if both usage errors had been syntax errors, but we're
12 years too late to get that right to begin with. C'est la vie.)

Life can also be fixed to suck less.
I didn't say it should. I said it should error out just like 'svn rm @'
does.

Post by Daniel Shahaf
1. Because that would be backwards incompatible.

Obviouisly.

Post by Daniel Shahaf
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

I do think that this rule was not sufficiently thought out.

Post by Daniel Shahaf
a silent incompatibility for some users.

Again, I didn't say it should. I said it should be an error because

Post by Daniel Shahaf
I don't remember what the last
minor line with the non-error meaning was, though.

Post by Branko Äibej
Because otherwise the behaviour of the client depends on whether you're
removing things from the current directory or not.

To be more precise, the question is whether one spells the command

or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature
canonicalization is a bad thing.

Post by Branko Äibej
This becomes especially error-prone in the case when
D E
I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

I see that there's an argument that we should require some option here,
about.

Perhaps not, but then there's SVN-4530, so I'd say they're common enough.

Post by Daniel Shahaf
Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

Yes, it's called 'svn rm E'.

compatible if we constrain ourselves never to use slash or backslash in
a pegrev specifier, and backwards compatible in that every non-erroneous
usage will continue to work, so it's something we can consider.

I haven't tested that yet.
This won't work currently, but you're right, it _could_ work with the
proposed change. So my assumption that we'd only add more errors is wrong.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else? (The former wouldn't seem to
address all the concerns you described.)
Cheers,
Daniel
P.S. Other languages also have examples of usage errors that aren't
syntax errors. For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.

I really don't see how this comparison is valid. C operator syntax and
precedence is precisely defined. Our usage of @ in paths may be
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

-- Brane

Branko Čibej

2018-11-11 06:52:55 UTC

Post by Branko Äibej
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

A really educational example of this is svn_cl__copy. It goes out of its
way to ignore a specific kind of peg-revision parsing error. I'm having
a really hard time trying to convince myself that can be the right thing
to do. If one particular command has to try to fix parsing errors in
order to work correctly, then I'd say that the parser is just wrong.

-- Brane

Daniel Shahaf

2018-11-11 09:54:37 UTC

Post by Branko Äibej
does.

It wasn't clear to me what you were proposing.

It still isn't, actually. Have I overlooked a commit or email where you
spelled out what the new algorithm would be? None of the emails in this
thread spells out an algorithm.

Post by Daniel Shahaf
Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

Yes, it's called 'svn rm E'.

That won't work in the general case:

% /bin/mkdir ***@bar ***@bar@ ***@bar/@
% svn add .
% svn ci
% <how do I 'svn rm' ***@bar now?>

Secondly, the incumbent escaping syntax is independent of the subcommand,
filename, and tree contents: to run «svn verb» on the file called
${foo} on disk, one runs «svn verb -- "${foo}@"», no exceptions.
I think this property should be preserved by the new escaping rules.
(Not necessarily this specific syntax, but the
independence/no-exceptions aspect.)

Post by Daniel Shahaf
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

I do think that this rule was not sufficiently thought out.

Post by Daniel Shahaf
a silent incompatibility for some users.

Again, I didn't say it should. I said it should be an error because

As a point of fact, it is not ambiguous. "./@" means {path_or_url="./",
peg=""}.

Terminology aside, the problem you have identified here is that if
a user has a file called "@" and forgets to escape it, he doesn't get
a syntax error. That we can address.

Post by Daniel Shahaf
I don't remember what the last
minor line with the non-error meaning was, though.

Post by Branko Äibej
Because otherwise the behaviour of the client depends on whether you're
removing things from the current directory or not.

To be more precise, the question is whether one spells the command

canonicalization is a bad thing.

What change, if any, do you propose?

Post by Branko Äibej
This becomes especially error-prone in the case when
D E
I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

I see that there's an argument that we should require some option here,
about.

Perhaps not, but then there's SVN-4530, so I'd say they're common enough.

Fair enough.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else? (The former wouldn't seem to
address all the concerns you described.)
Cheers,
Daniel
P.S. Other languages also have examples of usage errors that aren't
syntax errors. For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.

I really don't see how this comparison is valid. C operator syntax and
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

The relevance of the example is that it's another case in which a usage
error (programmer forgot to paste «a*a» after the opening parenthesis)
is not a syntax error, in relation to your observation that using «svn
rm E/@» to remove "E/@" isn't a syntax error.

Branko Čibej

2018-11-11 11:02:14 UTC

Post by Branko Äibej
does.

It wasn't clear to me what you were proposing.
It still isn't, actually. Have I overlooked a commit or email where you
spelled out what the new algorithm would be? None of the emails in this
thread spells out an algorithm.

That's because I don't know it yet ... since I can't describe precisely
what is wrong with the way we currently do things, but I do have the
feeling that we're doing something wrong. I was hoping this discussion
would clarify things.

Post by Daniel Shahaf
Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

Yes, it's called 'svn rm E'.

% svn add .
% svn ci
Secondly, the incumbent escaping syntax is independent of the subcommand,
filename, and tree contents: to run «svn verb» on the file called
I think this property should be preserved by the new escaping rules.
(Not necessarily this specific syntax, but the
independence/no-exceptions aspect.)

Agreed.

Given your example above, this is what works now:

* to remove ***@bar: 'svn rm ***@bar@' or 'svn rm ***@bar/@'
* to remove ***@bar@: 'svn rm ***@bar@@' or 'svn rm ***@bar@/@'
* to remove ***@bar/@: 'svn rm ***@bar/@@' or 'svn rm ***@bar/@/@'

I /think/ I'm objecting to the fact that those extra directory
separators in the second variants have no effect ... but I'm not yet
sure what to do about it.

Post by Daniel Shahaf
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

I do think that this rule was not sufficiently thought out.

Post by Daniel Shahaf
a silent incompatibility for some users.

Again, I didn't say it should. I said it should be an error because

peg=""}.
Terminology aside, the problem you have identified here is that if
a syntax error. That we can address.

Post by Daniel Shahaf
I don't remember what the last
minor line with the non-error meaning was, though.

Post by Branko Äibej
Because otherwise the behaviour of the client depends on whether you're
removing things from the current directory or not.

To be more precise, the question is whether one spells the command

canonicalization is a bad thing.

What change, if any, do you propose?

As I said, I'm still trying to work this out. For example, one of the
things that's been driving me up the wall is that when the user writes
'foo/***@bar', the error message says a peg revision isn't allowed at
'***@bar', regardless of whether 'foo/***@bar' exists. Yes, the syntax is
wrong, the user should have typed 'foo/***@bar@' instead, but surely we
can be smart enough to notice that instead of emitting an error about
something the user almost certainly didn't have in mind?

Post by Branko Äibej
This becomes especially error-prone in the case when
D E
I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

I see that there's an argument that we should require some option here,
about.

Perhaps not, but then there's SVN-4530, so I'd say they're common enough.

Fair enough.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else? (The former wouldn't seem to
address all the concerns you described.)
Cheers,
Daniel
P.S. Other languages also have examples of usage errors that aren't
syntax errors. For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.

I really don't see how this comparison is valid. C operator syntax and
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

Ack.

-- Brane

Daniel Shahaf

2018-11-11 11:52:26 UTC

Post by Daniel Shahaf
It wasn't clear to me what you were proposing.
It still isn't, actually. Have I overlooked a commit or email where you
spelled out what the new algorithm would be? None of the emails in this
thread spells out an algorithm.

Makes sense.