Tag Archives: SQLIte3

SQLite: Can Using Unicode Collations be Improved? Attempt 2

In my quest to improve using Unicode collations in SQLite, I figured I needed to accomplish two things first. First, persist the locale for the DB between closing and opening it again. Second, change the default collation from BINARY (ASCII) to the a collator for the locale.

Here is the logic that I used for my implementation:
1. try to load the ICU collation for the given locale or use the system’s default locale
2. check if any tables other than the “locale table” exist. If they exist, exit
3. update the “locale table” with the name of the locale. (This allows reloading the collation when opening the DB)
4. using the ICU collation,  create an sqlite3 collation and make it the default collation for the DB

This is some of the source code for implementing the logic above:

int setDBDefaultLocale(sqlite3 * db, const char * const zLocale)
{
  int rc;
  sqlite3_stmt* stmt;
  UErrorCode status = U_ZERO_ERROR;
  UCollator *pUCollator= ucol_open(zLocale ? zLocale : uloc_getDefault(), &status);

  if(!U_SUCCESS(status))
  {
    return SQLITE_ERROR;
  }
  // check if any tables other than the "locale table" have been created.
  // if they exist, just quit, can not change default locale if a another table has already been created
  if ((rc = sqlite3_prepare_v2(db, "select * from sqlite_master where name != 'locale_table' and type = 'table'", 76, &stmt, 0)) != SQLITE_OK ||
      (rc = sqlite3_step(stmt)) != SQLITE_ROW)
  {
     sqlite3_finalize(stmt);

      return rc;
  }

  if (sqlite3_column_int(stmt, 0))
  {
    sqlite3_finalize(stmt);
    localeErrMsg(vdbe);

    return SQLITE_MISUSE;
  }
  // creates a locale table if one doesn't exist and updates it with the value of zLocale
  updateLocaleTable(zLocale ? zLocale : uloc_getDefault());
  // create the collation
  if ((rc = sqlite3_create_collation_v2(db, "LOCALE", SQLITE_UTF16,
  (void *)pUCollator, icuCollationColl, icuCollationDel)) != SQLITE_OK)
  {
    ucol_close(pUCollator);

    return rc;
  }

  // set the default collating sequence to the locale collation
  db->pDfltColl = sqlite3FindCollSeq(db, SQLITE_UTF16, "LOCALE", 0);

  return db->pDfltColl ? SQLITE_OK : SQLITE_ERROR;
}

With the implementation above, I was able to accomplish the first task i.e. persist the locale between connections. But when I tried to change the locale, it would throw an unexpected exception. After much digging, I figured out that the call to sqlite3FindCollseq was returning a buggy collating sequence. There was a mismatch between the DB encoding and the collating sequence encoding. That was solved by querying the encoding of the db object and passing that into the function call:

...
  db->pDfltColl = sqlite3FindCollSeq(db, ENC(db), "LOCALE", 0);
...

But as I thought about it, because this implementation always creates a table, this breaks SQLite’s functionality that depends on having an empty database. (e.g. setting the database encoding, pragma encoding = ‘UTF-16’)

As such, I am now investigating other ways to persist the locale. I am thinking of actually appending the locale to the header of the database file. One of my colleagues suggested persisting the locale outside of the database, using either another DB instance or a file. I am now pursuing the file header option. So far I have found out that I may need to modify the Pager (pager.c) implementation, to add more space to store the locale and the Btree (btree.c) implementation, specifically the sqlite3BtreeOpen method to both read in the locale. From what I found out so far, seems like nothing has to change to persist the locale to the disk.

So now the fun begins to implement the above. Here goes…

Advertisements
Tagged , , , , , , ,

SQLite Adapter for NexJ Express – Overcoming the limitations of SQLite part 1

For the last two semesters I have been working on an SQLite persistence adapter for NexJ Model Server Express.  Today it is finally ready for the first code review.

Getting to this stage has taken much longer than I first anticipated. Turns out that SQLite’s “liteness” was turning into a blocker and at one point I even thought we would abandon the project altogether. I should have known this though. The SQLite website says to “Think of SQLite not as a replacement for Oracle but as a replacement for fopen()” and provides a list of situations where other RDBMS work better. In developing this adapter, we were actually pitting SQLite up against the likes of Oracle, PostgreSQL, MySQL, SQL server and other enterprise SQL servers.

First to be tackled was SQLite’s implementation of SQL92 or rather lack thereof; granted though, the documentation for supported features/syntax is very good. We needed to come up with work-arounds to do things that are taken for granted when developing for an enterprise level RDBMSs. For example it didn’t support altering a column’s definition or table constraint, dropping a column or constraint and there is no procedures. Most of the work-arounds required renaming the old table, creating the new table and copying data from the old to the new and then finally deleting the old table.
There was also the case where it was required to convert columns containing hex string values into BLOBs. SQLite provides the x'ABC123' syntax for this, but it is limited. It does not take columns as input. For this we developed an extension function binary(). It pretty much does what the x'' does only that it is exposed as a function call.
It was also required to provide a ranking of results for FTS queries. Again an extension function rank() was developed. The rank for a returned row is calculated as: number of  local hits / sum of global hits. The source of the extension functions is can be found here sqliteNexjExtension.c.

Next up was selecting a JDBC driver. SQLite does not provide an offical JDBC driver but their wiki provides a list of available drivers.  The decision was taken to go with Xerial SQLite JDBC Driver (a fork of  SQLiteJDBC).  We found this driver to be the most up-to-date  and its easy of use of native SQLite libraries was a big plus. The driver did not provide implementations of XADatasource and XAConnection. The team at NexJ Systems ended up providing us with implementations (nexj.core.persistence.sql.pseudoxa) that wraps DataSource and Connection and exposes them as XAResource‘s. (But just last week, I came across SQLiteConnectionPoolDataSource which I provides XA implementations for the SQLite JDBC driver. We may later on switch to using this.)
We found the driver to be buggy with setting: an encoding on a SQLiteDataSource and query time-out on a statement. Turns out that calling SQLiteDataSource.setEncoding("UTF8") causes an exception. The driver tries to execute the statement, pragma encoding = UTF-8. Note the hypen (-), SQLite expects that to be quoted. The javadoc on Statement.setQueryTimeout(int seconds) reads “Sets the number of seconds the driver will wait for a Statement object to execute to the given number of seconds“. Internal though, the driver calls sqlite3_busy_timeout which “sets a busy handler that sleeps for a specified amount of time when a table is locked”. This ends up having the effect that if a database is locked when first executed, the statement will sleep for the given number of seconds then try a second time instead of trying several times until the given number of seconds have past. The solution for this was to use a TimerTask (nexj.core.util.sql.StatementCancelationTask) that was already implemented.

Next week I will have a post about how the dreaded SQLITE_LOCKED and SQLITE_BUSY were overcome so that SQLite could be used in an JavaEE application server. The source code for the adapter is shared on bitbucket.org/cwdesautels/nexj-express-sqlite-adapter in the ‘sqlite’ branch. Check out the SQLite* classes in the package nexj.core.persistence.sql.

Tagged , , , ,