Discussion:
Failed fs-test 65 (Was: Re: 1.9.5 up for signing/testing)
(too old to reply)
Evgeny Kotkov
2016-11-23 12:31:30 UTC
Permalink
[[[
svn_tests: E200006: Expected error SVN_ERR_SQLITE_BUSY but got
SVN_ERR_SQLITE_ERROR
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200044: SQLite transaction rollback failed
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
FAIL: fs-test 65: test commit with locked rep-cache
]]]
Hi Branko,

Is this failure reproducible? Does it happen if you run just fs-test#65?

I tried to witness the failure on my Windows and Linux machines with
SQLite 3.8.10.2, but the test passes for me (the relevant sqlite-test#2
also works fine).


Regards,
Evgeny Kotkov
Branko Čibej
2016-11-23 17:37:24 UTC
Permalink
Post by Evgeny Kotkov
[[[
svn_tests: E200006: Expected error SVN_ERR_SQLITE_BUSY but got
SVN_ERR_SQLITE_ERROR
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200044: SQLite transaction rollback failed
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
FAIL: fs-test 65: test commit with locked rep-cache
]]]
Hi Branko,
Is this failure reproducible? Does it happen if you run just fs-test#65?
I tried to witness the failure on my Windows and Linux machines with
SQLite 3.8.10.2, but the test passes for me (the relevant sqlite-test#2
also works fine).
It is reproducible whether run as a single test, all of fs-test or the
whole test suite; but only with SQLite 3.8.10.2. I suspect it is a test
bug, but haven't followed up; could be due to a bug in SQLite itself,
e.g., that the older version returns the wrong error code.

The only potential problem as far as I can see is that 3.8.10.2 is the
version of SQLite shipped with OSX — hence, the version that most
binaries will be linking with.

-- Brane
Daniel Shahaf
2016-11-23 17:52:43 UTC
Permalink
Post by Branko Čibej
Post by Evgeny Kotkov
[[[
svn_tests: E200006: Expected error SVN_ERR_SQLITE_BUSY but got
SVN_ERR_SQLITE_ERROR
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200044: SQLite transaction rollback failed
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
FAIL: fs-test 65: test commit with locked rep-cache
]]]
Hi Branko,
Is this failure reproducible? Does it happen if you run just fs-test#65?
I tried to witness the failure on my Windows and Linux machines with
SQLite 3.8.10.2, but the test passes for me (the relevant sqlite-test#2
also works fine).
It is reproducible whether run as a single test, all of fs-test or the
whole test suite; but only with SQLite 3.8.10.2. I suspect it is a test
bug, but haven't followed up; could be due to a bug in SQLite itself,
e.g., that the older version returns the wrong error code.
The only potential problem as far as I can see is that 3.8.10.2 is the
version of SQLite shipped with OSX — hence, the version that most
binaries will be linking with.
Does the test pass if you patch it to expect SVN_ERR_SQLITE_ERROR? Once
the shared lock is released, do subsequent commits operate normally
(finish timely and update rep-cache.db)?

I.e., does this failure mode have any impact beyond the wrong
apr_status_t value being returned by svn_fs_commit_txn()?

Cheers,

Daniel
Branko Čibej
2016-11-23 18:08:33 UTC
Permalink
Post by Daniel Shahaf
Post by Branko Čibej
Post by Evgeny Kotkov
[[[
svn_tests: E200006: Expected error SVN_ERR_SQLITE_BUSY but got
SVN_ERR_SQLITE_ERROR
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200030: sqlite[S10]: disk I/O error
svn_tests: E200044: SQLite transaction rollback failed
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
svn_tests: E200030: sqlite[S1]: cannot rollback - no transaction is active
FAIL: fs-test 65: test commit with locked rep-cache
]]]
Hi Branko,
Is this failure reproducible? Does it happen if you run just fs-test#65?
I tried to witness the failure on my Windows and Linux machines with
SQLite 3.8.10.2, but the test passes for me (the relevant sqlite-test#2
also works fine).
It is reproducible whether run as a single test, all of fs-test or the
whole test suite; but only with SQLite 3.8.10.2. I suspect it is a test
bug, but haven't followed up; could be due to a bug in SQLite itself,
e.g., that the older version returns the wrong error code.
The only potential problem as far as I can see is that 3.8.10.2 is the
version of SQLite shipped with OSX — hence, the version that most
binaries will be linking with.
Does the test pass if you patch it to expect SVN_ERR_SQLITE_ERROR? Once
the shared lock is released, do subsequent commits operate normally
(finish timely and update rep-cache.db)?
I.e., does this failure mode have any impact beyond the wrong
apr_status_t value being returned by svn_fs_commit_txn()?
Yes, it passes if I change that one line in the test.

I just double-checked our rep-cache and SQLite wrapper code and I don't
see any way that SQLite could've returned SQLITE_BUSY and we'd return
SVN_ERR_SQLITE_ERROR instead.

-- Brane
Evgeny Kotkov
2016-11-23 23:05:12 UTC
Permalink
Post by Branko Čibej
It is reproducible whether run as a single test, all of fs-test or the
whole test suite; but only with SQLite 3.8.10.2. I suspect it is a test
bug, but haven't followed up; could be due to a bug in SQLite itself,
e.g., that the older version returns the wrong error code.
The only potential problem as far as I can see is that 3.8.10.2 is the
version of SQLite shipped with OSX — hence, the version that most
binaries will be linking with.
Well, it gets interesting.

I can reproduce the failure under OS X 10.11 with bundled SQLite 3.8.10.2.
However, this failure is specific to SQLite that is shipped with OS X.
Official versions (including 3.8.10.2 from [1]) return SQLITE_BUSY,
as promised in the documentation, and the test passes.

`dtrace` shows that the version shipped with OS X is heavily patched.
There's a different syscall pattern: guarded_open_np/guarded_pwrite_np/
guarded_close_np instead of open/write/close, pread instead of lseek+read,
different fcntl calls, etc. The sqlite3.h headers are also a bit different.

I would guess that SQLite version shipped with OS X has a custom VFS
implementation, which doesn't keep some of the promises of the official
version, or contains a bug that results in an I/O error instead of expected
SQLITE_BUSY error.

Could you please confirm this by running the test with the official 3.8.10.2
from [1]?

[1] https://sqlite.org/2015/sqlite-amalgamation-3081002.zip


Thanks,
Evgeny Kotkov
Branko Čibej
2016-11-24 09:15:27 UTC
Permalink
Post by Evgeny Kotkov
Post by Branko Čibej
It is reproducible whether run as a single test, all of fs-test or the
whole test suite; but only with SQLite 3.8.10.2. I suspect it is a test
bug, but haven't followed up; could be due to a bug in SQLite itself,
e.g., that the older version returns the wrong error code.
The only potential problem as far as I can see is that 3.8.10.2 is the
version of SQLite shipped with OSX — hence, the version that most
binaries will be linking with.
Well, it gets interesting.
I can reproduce the failure under OS X 10.11 with bundled SQLite 3.8.10.2.
However, this failure is specific to SQLite that is shipped with OS X.
Official versions (including 3.8.10.2 from [1]) return SQLITE_BUSY,
as promised in the documentation, and the test passes.
`dtrace` shows that the version shipped with OS X is heavily patched.
There's a different syscall pattern: guarded_open_np/guarded_pwrite_np/
guarded_close_np instead of open/write/close, pread instead of lseek+read,
different fcntl calls, etc. The sqlite3.h headers are also a bit different.
I would guess that SQLite version shipped with OS X has a custom VFS
implementation, which doesn't keep some of the promises of the official
version, or contains a bug that results in an I/O error instead of expected
SQLITE_BUSY error.
Could you please confirm this by running the test with the official 3.8.10.2
from [1]?
[1] https://sqlite.org/2015/sqlite-amalgamation-3081002.zip
Indeed, fs-test passes with the stock SQLite. Running all the tests now
just to make sure there's no quirk elsewhere.

To be quite candid, I'm not surprised ... this wouldn't be the first
time that Apple messed up its patches of perfectly good upstream
software. :(


The question is: what do we do about it? Complaining to Apple isn't
likely to help. We could add a special case in the testcase for that
version of SQLite on OSX, just to keep the test output in the green. But
... that seems like just a bit overdone.

-- Brane
Julian Foad
2016-11-24 09:34:46 UTC
Permalink
Post by Branko Čibej
To be quite candid, I'm not surprised ... this wouldn't be the first
time that Apple messed up its patches of perfectly good upstream
software. :(
The question is: what do we do about it? Complaining to Apple isn't
likely to help. We could add a special case in the testcase for that
version of SQLite on OSX, just to keep the test output in the green. But
... that seems like just a bit overdone.
Just a drive-by thought: Should we report this to SQLite? Especially if
by distilling the Subversion test we could write a SQLite self-test. I
recall the SQLite team is big on thorough regression testing and so
would likely want to know about this.

(They then might have some influence with the Apple folks, or maybe not,
but that's for them to deal with. It won't change our situation in the
short time with regard to this particular version.)

- Julian
Evgeny Kotkov
2016-11-24 10:36:06 UTC
Permalink
Just a drive-by thought: Should we report this to SQLite? Especially if by
distilling the Subversion test we could write a SQLite self-test. I recall
the SQLite team is big on thorough regression testing and so would likely
want to know about this.
(They then might have some influence with the Apple folks, or maybe not, but
that's for them to deal with. It won't change our situation in the short
time with regard to this particular version.)
I will take care of reporting this to SQLite authors.

This is quite an issue, as this bug caused by a custom patch makes SQLite
(a _derivative_ distributed under the name of SQLite, to be precise) unusable
in a typical reader-writer case. In the described case, the vanilla version
would have blocked in a retry loop allowing for the reader to finish its work
and notifying the API user via xBusyHandler(), but the patch causes it to
fail immediately, and this prevents concurrency.
The question is: what do we do about it? Complaining to Apple isn't
likely to help. We could add a special case in the testcase for that
version of SQLite on OSX, just to keep the test output in the green. But
... that seems like just a bit overdone.
To my mind, the test is of less concern, compared to the issue itself.

I would prefer to keep the test catching this bug for now, unless that
causes severe issues for maintainers. If that happens, we could add a
workaround.


Regards,
Evgeny Kotkov
Evgeny Kotkov
2016-11-24 11:06:24 UTC
Permalink
Post by Evgeny Kotkov
Post by Branko Čibej
The question is: what do we do about it? Complaining to Apple isn't
likely to help. We could add a special case in the testcase for that
version of SQLite on OSX, just to keep the test output in the green. But
... that seems like just a bit overdone.
To my mind, the test is of less concern, compared to the issue itself.
I would prefer to keep the test catching this bug for now, unless that
causes severe issues for maintainers. If that happens, we could add a
workaround.
Actually, let me rephrase myself.

I think that we _should not_ change the test, because it does what it's
supposed to do (catch bugs). In this case, maintainers would be better
using official sqlite-amalgamation, until the bug in the customized SQLite
library shipped with OS X is fixed. The failing test is a proper indication
of this bug.


Regards,
Evgeny Kotkov
Branko Čibej
2016-11-24 12:23:46 UTC
Permalink
Post by Evgeny Kotkov
Post by Evgeny Kotkov
Post by Branko Čibej
The question is: what do we do about it? Complaining to Apple isn't
likely to help. We could add a special case in the testcase for that
version of SQLite on OSX, just to keep the test output in the green. But
... that seems like just a bit overdone.
To my mind, the test is of less concern, compared to the issue itself.
I would prefer to keep the test catching this bug for now, unless that
causes severe issues for maintainers. If that happens, we could add a
workaround.
Actually, let me rephrase myself.
I think that we _should not_ change the test, because it does what it's
supposed to do (catch bugs). In this case, maintainers would be better
using official sqlite-amalgamation, until the bug in the customized SQLite
library shipped with OS X is fixed. The failing test is a proper indication
of this bug.
Ack. It's going to be painful if I ever update the OSX build slave, but
you're right.

In the meantime, I've verified that, Homebrew's build does not use the
stock SQLite on the Mac but Homebrew's own version, which does not have
this problem. I don't know about Fink and other installers.

-- Brane

Continue reading on narkive:
Loading...