Issue Details (XML | Word | Printable)

Key: NUCDBFO-37
Type: Improvement Improvement
Status: Reopened Reopened
Priority: Major Major
Assignee: Unassigned
Reporter: Costantino Cerbo
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
DataNucleus Store DB4O

Make persisting an object that has a primary key (annotation @PrimaryKey) faster.

Created: 20/May/09 12:33 AM   Updated: 27/Jan/10 03:39 PM
Component/s: Persistence
Affects Version/s: 1.0.1
Fix Version/s: None

File Attachments: 1. Zip Archive id_is_exception_test.zip (2 kB)
2. Zip Archive newpatch.zip (3 kB)
3. Zip Archive patch.zip (17 kB)

Environment: linuy fedora 10, datanucleus-db40-1.1.0

Datastore: DB4O


 Description  « Hide
When the datastore is db4o, using a primary key (annotation @PrimaryKey) makes the performance of persisting an object very slow.
This happens because before persisting a retrieve is done, see logs below:

00:45:49,806 (main) DEBUG [DataNucleus.Connection] - Connection found in the pool : [org.datanucleus.store.db4o.ConnectionFactoryImpl$ManagedConnectionImpl@18c5e67, /home/c.cerbo/workspaces_eclipse/text-analysis/trunk/Repository/resource/eu/kostia/repository/initrepo/en.db]
00:45:49,806 (main) DEBUG [DataNucleus.Datastore.Retrieve] - Object "eu.kostia.repository.tokenizer.Abbreviation@181efb9" (id="2078978489099167307") being retrieved from DB4O
00:45:49,808 (main) DEBUG [DataNucleus.Datastore.Retrieve] - Execution Time = 2 ms
00:45:49,808 (main) DEBUG [DataNucleus.Connection] - Connection found in the pool : [org.datanucleus.store.db4o.ConnectionFactoryImpl$ManagedConnectionImpl@18c5e67, /home/c.cerbo/workspaces_eclipse/text-analysis/trunk/Repository/resource/eu/kostia/repository/initrepo/en.db]
00:45:49,808 (main) DEBUG [DataNucleus.Datastore.Persist] - Object "eu.kostia.repository.tokenizer.Abbreviation@181efb9" being inserted into DB4O with all reachable objects
00:45:49,809 (main) DEBUG [DataNucleus.Datastore.Persist] - Execution Time = 1 ms
00:45:49,809 (main) DEBUG [DataNucleus.Datastore] - Object "eu.kostia.repository.tokenizer.Abbreviation@181efb9" (id="2078978489099167307) persisted to DB4O

As the table size increases, retrieving takes of course always more time and persisting a very simple object may take even 100 ms!

Using an IdentityType DATASTORE and no Primary Key is the workaround to improve the performance (a lot! in my easy test case more than 300 times!)

This issues aims to reduce the performance gap between persisting object with and without primary key.

Sort Order: Ascending order - Click to sort in descending order
Andy Jefferson added a comment - 20/May/09 08:03 AM
DB4O does not enforce uniqueness of identity in the datastore (for application identity) so a check is essential.

Costantino Cerbo added a comment - 20/May/09 01:33 PM
That's right: there is also a big discussion about this theme in the db4o forum (http://developer.db4o.com/forums/thread/47309.aspx)

Anyway the check for uniqueness "manually" done by DataNucleus costs a lot of time, and I don't know why... maybe the Primary Key Field was not yet indexed.

Maybe there is potential to optimise this query: could you tell me where in the code this check happens?

Time allowing, I could investigate if it's possible to improve the performance and release a patch.

Costantino Cerbo added a comment - 20/May/09 01:42 PM
This is another interesting link discussing the behaviour of db4o:
http://tracker.db4o.com/browse/COR-161

Andy Jefferson added a comment - 20/May/09 01:49 PM
org.datanucleus.store.db4o.DB4OPersistenceHandler "locateObject", uses QBE

Costantino Cerbo added a comment - 20/May/09 02:04 PM
It seems that SODA queries are faster that QBE, see doc/tutorial/docs/Query.html#outline31 in the db4o tutorial (v. 7.9).
I'll try with SODA...

Another question: how do I figure witch attribute is set as PK (with annotation @PrimaryKey or XML configuration)?

Andy Jefferson added a comment - 20/May/09 02:36 PM
"cmd" has a getPKMemberPositions() and some other methods. A PK can have multiple attributes. "cmd" is the place to look for all such config info

Costantino Cerbo added a comment - 20/May/09 08:53 PM
Before trying to optimise, I've set my benchmark persisting 5000 simple object with only two fields: the id and a name.
With DATASTORE Identity it took an my system only 4467 ms, while with APPLICATION Identity the time was 25142 ms: almost 6 time more!

I've tryed to use a SODA query instead of a QBE, getting member names and values by using the methods on AbstractClassMetaData but no sensible improvement occurs. I'm further investigating other solutions...

Anyway substituting:
if (objectSet.size() > 0)
                {
                    isStored = true;
                }
with:
if (objectSet.hasNext() > 0)
                {
                    isStored = true;
                }
should be better, because the method size() may load the whole object tree, requiring more time.

Costantino Cerbo added a comment - 21/May/09 12:07 AM
No way to further optimise the query.
I'm now investigation a new solution: enforcing uniqueness of identity, using the UniqueFieldValueConstraint provided by db4o (see http://developer.db4o.com/blogs/product_news/archive/2007/03/20/unique-fields-constraints.aspx).
Enabling this constraint, we don't need the query anymore.

I set this constraint in DB4OManager#supportClassInDB4O because in this method are already available all information about PK and the db4o configuration.
Unfortunately it has no effect, because this constraint has to be set in the configuration before the ObjectContainer is opened, that means in org.datanucleus.store.db4o.ConnectionFactoryImpl.getNewConfiguration()

Now the problem is: in getNewConfiguration() I've neither the persistable class nor the identity field names, that the UniqueFieldValueConstraint need.
How do I get them there in the more reasonable way?

Other possibility:
do you know, if there is a way to activate changes in the configuration without reopen the db4o ObjectContainer?

Costantino Cerbo added a comment - 24/May/09 11:55 PM - edited
Enabling indexing in DB4OManager#supportClassInDB4O, when an instance of ObjectContainer already exists, do not take any effect on the current session but only the next time you open it (see http://developer.db4o.com/Resources/view.aspx/Reference/Tuning/Indexing, 2nd Paragraph).

Almost all the configuration options of db4o must be provided before opening the ObjectContainer: that means that it's very important to access the ClassMetaData in ConnectionFactoryImpl#getNewConfiguration()

To do so, I'm using the following code:
{code}
protected Configuration getNewConfiguration() {
                AbstractStoreManager sm = (AbstractStoreManager) omfContext.getStoreManager();
MetaDataManager mdm = sm.getMetaDataManager();
Collection<String> cmd = mdm.getClassesWithMetaData();
for (String className : cmd) {
ClassLoaderResolver clr = null;
AbstractClassMetaData metadata = mdm.getMetaDataForClass(className, clr);
AbstractMemberMetaData[] fmds = metadata.getManagedMembers();
for (AbstractMemberMetaData member : fmds) {
//TODO check is PK
}
}
...
{code}
Question: How do I get the ClassLoaderResolver?

Costantino Cerbo added a comment - 25/May/09 12:30 AM - edited
I got the ClassLoaderResolver as follows:
   ClassLoaderResolver clr = omfContext.getClassLoaderResolver(getClass().getClassLoader());
I hope it's right.

With the UniqueFieldValueConstraint the uniqueness of identity is preserved and the performance are much better: 5454 vs. more than 25000 ms with the previous method to persist 5000 simple objects.

 In case of duplicate IDs, the following exception occurs on the commit of the transaction:

Exception in thread "main" com.db4o.constraints.UniqueFieldValueConstraintViolationException: class: org.datanucleus.store.db4o.SimpleObject field: id
at com.db4o.constraints.UniqueFieldValueConstraint$1.ensureSingleOccurence(UniqueFieldValueConstraint.java:60)
at com.db4o.constraints.UniqueFieldValueConstraint$1.onEvent(UniqueFieldValueConstraint.java:91)
at com.db4o.internal.events.Event4Impl.onEvent(Event4Impl.java:76)
at com.db4o.internal.events.Event4Impl.trigger(Event4Impl.java:70)
at com.db4o.internal.events.EventPlatform$5.run(EventPlatform.java:55)
at com.db4o.internal.events.EventPlatform$8.run(EventPlatform.java:90)
at com.db4o.foundation.DynamicVariable.with(DynamicVariable.java:47)
at com.db4o.internal.events.EventPlatform.trigger(EventPlatform.java:88)
at com.db4o.internal.events.EventPlatform.triggerCommitEvent(EventPlatform.java:53)
at com.db4o.internal.events.EventRegistryImpl.commitOnStarted(EventRegistryImpl.java:107)
at com.db4o.internal.LocalTransaction.dispatchCommittingCallback(LocalTransaction.java:116)
at com.db4o.internal.LocalTransaction.commit(LocalTransaction.java:91)
at com.db4o.internal.LocalTransaction.commit(LocalTransaction.java:85)
at com.db4o.internal.LocalObjectContainer.commitTransaction(LocalObjectContainer.java:701)
at com.db4o.internal.LocalObjectContainer.close2(LocalObjectContainer.java:70)
at com.db4o.internal.ObjectContainerBase.close1(ObjectContainerBase.java:343)
at com.db4o.internal.ObjectContainerBase.close(ObjectContainerBase.java:325)
at org.datanucleus.store.db4o.ConnectionFactoryImpl.closeObjectContainerForObjectManager(ConnectionFactoryImpl.java:176)
at org.datanucleus.store.db4o.DB4OManager$1.objectManagerPreClose(DB4OManager.java:158)
at org.datanucleus.ObjectManagerImpl.close(ObjectManagerImpl.java:796)
at org.datanucleus.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:270)
at eu.kostia.commons.jdo.PersistenceManagerUtil.commitAndClose(PersistenceManagerUtil.java:30)
at org.datanucleus.store.db4o.SimpleObjectDAO.testPersist(SimpleObjectDAO.java:46)
at org.datanucleus.store.db4o.SimpleObjectDAO.main(SimpleObjectDAO.java:55)

I'll release the patch this week...

Andy Jefferson added a comment - 25/May/09 07:24 AM
Some comments :-
1. To comply with JDO spec, at the point of insert of the new object it needs to throw the exception (JDODatastoreException). We can't just throw DB4O specific exceptions.

2. Sometimes the metadata for a class is not known at the point of opening the ObjectManager, hence DB4O's requirement of having all of the info up front before opening the ObjectContainer is not workable.
e.g user calls pm.makePersistent() on a class and this class hasn't been met before, so DataNucleus goes and finds the metadata at that point. It was too late for your up-front registering of classes for unique fields.

Costantino Cerbo added a comment - 25/May/09 09:56 AM
The first point is no problem, I will throw a JDODatastoreException.

About the second one, that makes the things a lot more complicated! Is it possible to load all metadata when DataNucleus is started? or it would be too bad for the performance?
Meanwhile I've posted a question in the db4o forum (http://developer.db4o.com/forums/thread/55397.aspx) to ask if there is a way to add a UniqueFieldValueConstraint when the ObjectContainer is already open.

Do you see any other solution to this problem?


Andy Jefferson added a comment - 26/May/09 06:57 AM
Load all metadata at startup? Some people may use "persistence-unit" and this will load metadata at startup. Many people won't do that. DN will not be changed to load all in that situation since it makes no sense; lazy loading of metadata is a very useful feature; and there are many situations where a user wants to persist a subset of their model classes. Obviously you could just add your code to the getNewConfiguration() but you would still have to check in insertObject for the classes that weren't registered with the UniqueFieldValue, and if not yet set then use the current check on object duplication.

Costantino Cerbo added a comment - 26/May/09 10:28 AM
I use "persistence-unit", therefore I hadn't yet experience the problem... ;-)

Anyway your suggestion is a good alternative, but how do I check with db4o what contraints are set on a class and its fields?
II post also a question on db4o forum: http://developer.db4o.com/forums/permalink/55420/55420/ShowThread.aspx#55420

Costantino Cerbo added a comment - 26/May/09 03:29 PM
It seems, it's not possible to check what constraints are set on a class and its fields (see the above db4o forum).

Maybe we can verify if a "persistence-unit" is used and only in this case use the "new" way.
What do you think about? I know, this isn't the cleanest solution, but anyway I think it's a valuable workaroung, because the performance become much better (6x).

To verify if a "persistence-unit" is used, one can just check if the property "javax.jdo.option.PersistenceUnitName" is set, something like:
     omfContext.getPersistenceConfiguration().getStringProperty("javax.jdo.option.PersistenceUnitName") != null
isnt'it?

Andy Jefferson added a comment - 27/May/09 02:11 PM
The PMF prop "javax.jdo.option.PersistenceUnitName" may not be the best to check, more like "datanucleus.PersistenceUnitName" since that is used internally. Apart from that, fine.

Costantino Cerbo added a comment - 28/May/09 10:13 AM
Ok, I'll attach the patch to this issue on the weekend.

Costantino Cerbo added a comment - 30/May/09 01:24 AM
Posted patch (created with "diff -u") with the discussed changes.

Andy Jefferson added a comment - 30/May/09 08:38 PM
Thx for the patch. Comments :

1. can you just provide a patch with the things actually changed. This has changes in formatting, line lengths etc hence seeing what is actually changed is hard to analyse. My guess is that half of the patches would disappear if you retain the same formatting as the original. e.g any change at all in DB4OPersistenceHandler ?

2. You move everything to do with registering a class for use with DB4O into the ConnectionFactory so it now assumes that the class is already known about (or that's how I read it from trying to work out the change). It previously allowed for classes being registered on-the-fly when they were discovered.

Costantino Cerbo added a comment - 31/May/09 12:14 AM
Not at all. Answers:

1. you're right. I've attached a new patch where the formatting remains as the original.

2. I've undo this change. I've moved the method supportClassInDB4O(.) from DB4OManager to ConnectionFactoryImpl because I knew that most of the configuration settings in db4o should be set before opening an ObjectContainer/ObjectServer. But now after having better read the javadoc of com.db4o.config.Configuration I've experienced that generateVersionNumbers, cascadeOnUpdate, cascadeOnDelete and indexed may be applied to an open object container. Unfortunately is not the case for UniqueFieldValueConstraint. This constraint is applied in getNewConfiguration()

Other info:
I've found three new small problems in DB4OPersistenceHandler#findObject(.): they are marked a comment with my name.

Besides I wanted to run the tests but I could not, because I get this error in my workspace: "Build path contains duplicate entry: 'core' for project 'test.framework'" after checking out using the File for Subeclipse (as explained in http://www.datanucleus.org/development/building_eclipse.html).
I've also asked for help in the forum.

Andy Jefferson added a comment - 31/May/09 10:57 AM
Around line 585 of DB4OPersistenceHandler findObject() makes no sense. How can an "id" be an exception? An id for DB4O can only be DatastoreUniqueOID. It can't be anything else. If you have an exception there then that is not the place to put a check for it.
[Also you should not be referring to JDOExceptions in datastore code; use NucleusException].

The test suite works for me in Eclipse (3.4, on Linux). Do a clean all projects; eclipse building is just crap sometimes. There is no duplicate entry using SVN code.

Andy Jefferson added a comment - 31/May/09 11:17 AM
Parts of patch not affected by the above comments are applied to SVN trunk. Corrected formatting to match DN code conventions as per http://www.datanucleus.org/development/coding_standards.html and Eclipse conventions in http://datanucleus.svn.sourceforge.net/viewvc/datanucleus/code-conventions-eclipse.xml?revision=6402&view=markup

Also updated "local.repository" so that com.db4o.constraints is available for use

Costantino Cerbo added a comment - 31/May/09 11:28 AM
Try with this testcase (id_is_exception_test.zip) and set a brekpoint on:
  if (id instanceof DatastoreUniqueOID)
in
  DB4OPersistenceHandler#findObject(ObjectManager, Object).

It makes no sense, but in this case "id" is an instance of javax.jdo.JDOFatalInternalException with this error message:
  The key value passed to construct a SingleFieldIdentity of type "javax.jdo.identity.LongIdentity" for class "org.datanucleus.test.Person" is of an incorrect type ("java.lang.String") - should be "Long".

Andy Jefferson added a comment - 31/May/09 11:37 AM
Just post the stack trace that called findObject with an Exception and where the Exception was generated (its stack trace). That is where the situation is caused.

Andy Jefferson added a comment - 31/May/09 11:38 AM
Oh, and since it has no bearing on whether to check for existence of app id objects, then it is a separate issue

Costantino Cerbo added a comment - 31/May/09 11:41 AM
Here it is (stacktrace of id_is_exception_test.zip):
Exception in thread "main" java.lang.NullPointerException
at org.datanucleus.store.db4o.DB4OPersistenceHandler.findObject(DB4OPersistenceHandler.java:603)
at org.datanucleus.ObjectManagerImpl.findObject(ObjectManagerImpl.java:2330)
at org.datanucleus.jdo.JDOPersistenceManager.getObjectById(JDOPersistenceManager.java:1675)
at org.datanucleus.jdo.JDOPersistenceManager.getObjectById(JDOPersistenceManager.java:1771)
at org.datanucleus.test.Main.main(Main.java:45)

At line 603:
   if (cmd.getIdentityType() == IdentityType.APPLICATION)
cmd is null because a line above "className" was null due to "id" being an instance of JDOFatalInternalException.

Andy Jefferson added a comment - 31/May/09 11:48 AM
And the actual Exception though (and *its* stack trace), cos that says when it is created. The only thing that should happen with a user putting in crap is StupidUserException being thrown back, and that is for *all* datastores, not just DB4O. Hence no code in DB4O plugin gets changed for it

Costantino Cerbo added a comment - 31/May/09 12:13 PM
Ok, I'll open a a separate issue for this problem (when an user put an ID of a wrong type)

Costantino Cerbo added a comment - 31/May/09 12:28 PM
Done: the improvement for the exception handling is discussed in NUCDBFO-39.

Costantino Cerbo added a comment - 01/Jun/09 10:11 PM
Thanks for having applied my patch. As for me this issue may be closed.

Andy Jefferson added a comment - 09/Jun/09 02:46 PM
Moved comments from NUCDBFO-38 since this is the issue that introduces the problem.

================================================

You want to persist a class with a collection, for example:
class Person {
    ...
    private Collection<Address> address = new LinkedHashSet<Address>();
    ...
}

The class Address use a getter in a method called by LinkedHashSet (in this case hashCode):
class Address {
    ...
    public int hashCode() {
        return getId().intValue();
    }
    ...
}

When you persist an instance of Person, whose Collection of Addresses had correctly set IDs, the following exception is thrown:
java.lang.NullPointerException
at org.datanucleus.store.db4o.Address.hashCode(Address.java:56)
at java.util.HashMap.put(HashMap.java:372)
at java.util.HashSet.add(HashSet.java:200)
at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
at java.util.LinkedHashSet.<init>(LinkedHashSet.java:152)
at org.datanucleus.sco.simple.LinkedHashSet.initialise(LinkedHashSet.java:81)

The cause for this exception seems to be, that the lazy loading of the attributes doesn't happen when the method Address#getId() is invoked in Address#hashCode().

I've found the cause! It's a bug in db4o, not DataNucleus and exactly in the UniqueFieldValueConstraint that we introduced with NUCDBFO-37.


The solution of this issue is related to http://tracker.db4o.com/browse/COR-1670

================================================

Andy Jefferson added a comment - 15/Jun/09 10:45 AM
Moved to 1.1.3 since incomplete. Code in locateObject commented out so cases don't have problems with lazy loading

Andy Jefferson added a comment - 18/Jul/09 03:25 PM
Note that the store.db4o plugin is now branched for 1.1. SVN trunk is for 2.0, so if you have patches etc please provide for which branch you need it on

Andy Jefferson added a comment - 27/Jan/10 03:39 PM
No updates will be applied to 1.1 branch