Tag Archives: cdot

SQLite: Can Using Unicode Collations be Improved? Attempt 2

In my quest to improve using Unicode collations in SQLite, I figured I needed to accomplish two things first. First, persist the locale for the DB between closing and opening it again. Second, change the default collation from BINARY (ASCII) to the a collator for the locale.

Here is the logic that I used for my implementation:
1. try to load the ICU collation for the given locale or use the system’s default locale
2. check if any tables other than the “locale table” exist. If they exist, exit
3. update the “locale table” with the name of the locale. (This allows reloading the collation when opening the DB)
4. using the ICU collation,  create an sqlite3 collation and make it the default collation for the DB

This is some of the source code for implementing the logic above:

int setDBDefaultLocale(sqlite3 * db, const char * const zLocale)
  int rc;
  sqlite3_stmt* stmt;
  UErrorCode status = U_ZERO_ERROR;
  UCollator *pUCollator= ucol_open(zLocale ? zLocale : uloc_getDefault(), &status);

    return SQLITE_ERROR;
  // check if any tables other than the "locale table" have been created.
  // if they exist, just quit, can not change default locale if a another table has already been created
  if ((rc = sqlite3_prepare_v2(db, "select * from sqlite_master where name != 'locale_table' and type = 'table'", 76, &stmt, 0)) != SQLITE_OK ||
      (rc = sqlite3_step(stmt)) != SQLITE_ROW)

      return rc;

  if (sqlite3_column_int(stmt, 0))

    return SQLITE_MISUSE;
  // creates a locale table if one doesn't exist and updates it with the value of zLocale
  updateLocaleTable(zLocale ? zLocale : uloc_getDefault());
  // create the collation
  if ((rc = sqlite3_create_collation_v2(db, "LOCALE", SQLITE_UTF16,
  (void *)pUCollator, icuCollationColl, icuCollationDel)) != SQLITE_OK)

    return rc;

  // set the default collating sequence to the locale collation
  db->pDfltColl = sqlite3FindCollSeq(db, SQLITE_UTF16, "LOCALE", 0);

  return db->pDfltColl ? SQLITE_OK : SQLITE_ERROR;

With the implementation above, I was able to accomplish the first task i.e. persist the locale between connections. But when I tried to change the locale, it would throw an unexpected exception. After much digging, I figured out that the call to sqlite3FindCollseq was returning a buggy collating sequence. There was a mismatch between the DB encoding and the collating sequence encoding. That was solved by querying the encoding of the db object and passing that into the function call:

  db->pDfltColl = sqlite3FindCollSeq(db, ENC(db), "LOCALE", 0);

But as I thought about it, because this implementation always creates a table, this breaks SQLite’s functionality that depends on having an empty database. (e.g. setting the database encoding, pragma encoding = ‘UTF-16’)

As such, I am now investigating other ways to persist the locale. I am thinking of actually appending the locale to the header of the database file. One of my colleagues suggested persisting the locale outside of the database, using either another DB instance or a file. I am now pursuing the file header option. So far I have found out that I may need to modify the Pager (pager.c) implementation, to add more space to store the locale and the Btree (btree.c) implementation, specifically the sqlite3BtreeOpen method to both read in the locale. From what I found out so far, seems like nothing has to change to persist the locale to the disk.

So now the fun begins to implement the above. Here goes…

Tagged , , , , , , ,

SQLite Adapter for NexJ Express – Overcoming the limitations of SQLite part 2

This is a continuation of my previous post: SQLite Adapter for NexJ Express – Overcoming the limitations of SQLite part 1.

I made mention of  the error codes SQLITE_LOCKED and SQLITE_BUSY the last time I blogged. These errors show up when new connections try to either read or write to an SQLite database instance that already has a connection with an active write operation.  Question 5 from the SQLite FAQ provides a quick description of the overall problem. The gist of it is that when interacting with an SQLite database using multiple concurrent connections, it only takes one writer connection to lock the database file and lock out all the rest. NexJ Express makes use of threading on the persistence layer and as such ran into this issue.

For several weeks we tried various solutions ranging from setting statement query time outs, creating an sqlite3_busy_handler to creating a singleton connection for all threads. The final solution was found after reading about transactions in chapter 5 of “The Definitive Guide to SQLite, Second Edition”. The chapter explains the transaction life cycle and the transition between the different lock states. Write operations cycle through the following states

Of note are PENDING states; PENDING locks are gateway locks.
Before a writer dumps to the disk, it needs to get an EXCLUSIVE lock. From RESERVED state when the writer gets PENDING(2), it affects the database in two ways:
1. It will hold onto the PENDING(2) lock no matter what
2. In PENDING(2) lock, no other connection can go from UNLOCKED to SHARED. This creates a deadlock  when a previous writer has committed a transaction and it tries to get a SHARED lock. The PENDING(2) lock prevents that previous writer from entering PENDING(1) hence a deadlock.

Because of the second effect noted above, all the previous trial solutions failed. The application would retry infinitely but not get an EXCLUSIVE lock. The book suggested the use of the BEGIN IMMEDIATE command instead of the BEGIN command to start a write transaction. This prevents the deadlock situation noted in 2 above. Here is an extract from The Definitive Guide to SQLite to give more detail:

Since you know you want to write to the database, then you need to start by issuing begin IMMEDIATE. If you get a SQLITE_BUSY, then at least you know what state you’re in. You know you can safely keep trying without holding up another connection. And once you finally do succeed, you know what state you are in then as well—RESERVED. Now you can use brute force if you have to because you are the one in the right. If you start with a begin exclusive, on the other hand, then you are assured that you won’t have any busy conditions to deal with at all. Just remember that in this case you are doing your work in EXCLUSIVE, which is not as good for concurrency as doing the work in RESERVED"

As such, the code for setAutoCommit in the Conn class was changed to use “begin immediate“. The application then needed to keep retrying if it received an SQLITE_LOCKED or SQLITE_BUSY errors. A busy handler was created that returns 1 (i.e. try again). This wasn’t enough since the application still received the SQLITE_LOCKED and SQLITE_BUSY exceptions even when trying to set auto commit to false. Seems it has to do with SQLite internal detecting a deadlock and just throwing an exception. Changes needed to be made to the org.sqlite.PrepStmt and org.sqlite.Conn classes. The methods: execute, executeBatch, executeQuery, executeUpdate, setAutoCommit, rollback and commit, were wrapped in try-catch blocks  to keep retrying if either an SQLITE_LOCKED or SQLITE_BUSY exception were thrown. The source code for this is in nexj.core.persistence.sql.SQLiteDataSourceWrapper.

With that solution implemented, the server now knows how to handle a locked database file. Using begin immediate has some side effects though and as such some unit tests in the JDBC driver fail.

The source code for the SQLite adapter is shared on bitbucket.org/cwdesautels/nexj-express-sqlite-adapter in the ‘sqlite’ branch. Check out the SQLite* classes in the package nexj.core.persistence.sql.
The source code to build a customized Xerial SQLite JDBC driver and SQLite shell can be found at bitbucket.org/gbatumbya/nexj-express-sqlite. (This repo use subrepos to track changes to gbatumbya/sqlite and gbatumbya/sqlite-jdbc)

Tagged , , , , , , ,