![]() |
![]() |
|
[
Permalink
| « Hide
]
Andy Jefferson added a comment - 20/May/09 08:03 AM
DB4O does not enforce uniqueness of identity in the datastore (for application identity) so a check is essential.
That's right: there is also a big discussion about this theme in the db4o forum (http://developer.db4o.com/forums/thread/47309.aspx)
Anyway the check for uniqueness "manually" done by DataNucleus costs a lot of time, and I don't know why... maybe the Primary Key Field was not yet indexed. Maybe there is potential to optimise this query: could you tell me where in the code this check happens? Time allowing, I could investigate if it's possible to improve the performance and release a patch. This is another interesting link discussing the behaviour of db4o:
http://tracker.db4o.com/browse/COR-161 org.datanucleus.store.db4o.DB4OPersistenceHandler "locateObject", uses QBE
It seems that SODA queries are faster that QBE, see doc/tutorial/docs/Query.html#outline31 in the db4o tutorial (v. 7.9).
I'll try with SODA... Another question: how do I figure witch attribute is set as PK (with annotation @PrimaryKey or XML configuration)? "cmd" has a getPKMemberPositions() and some other methods. A PK can have multiple attributes. "cmd" is the place to look for all such config info
Before trying to optimise, I've set my benchmark persisting 5000 simple object with only two fields: the id and a name.
With DATASTORE Identity it took an my system only 4467 ms, while with APPLICATION Identity the time was 25142 ms: almost 6 time more! I've tryed to use a SODA query instead of a QBE, getting member names and values by using the methods on AbstractClassMetaData but no sensible improvement occurs. I'm further investigating other solutions... Anyway substituting: if (objectSet.size() > 0) { isStored = true; } with: if (objectSet.hasNext() > 0) { isStored = true; } should be better, because the method size() may load the whole object tree, requiring more time. No way to further optimise the query.
I'm now investigation a new solution: enforcing uniqueness of identity, using the UniqueFieldValueConstraint provided by db4o (see http://developer.db4o.com/blogs/product_news/archive/2007/03/20/unique-fields-constraints.aspx). Enabling this constraint, we don't need the query anymore. I set this constraint in DB4OManager#supportClassInDB4O because in this method are already available all information about PK and the db4o configuration. Unfortunately it has no effect, because this constraint has to be set in the configuration before the ObjectContainer is opened, that means in org.datanucleus.store.db4o.ConnectionFactoryImpl.getNewConfiguration() Now the problem is: in getNewConfiguration() I've neither the persistable class nor the identity field names, that the UniqueFieldValueConstraint need. How do I get them there in the more reasonable way? Other possibility: do you know, if there is a way to activate changes in the configuration without reopen the db4o ObjectContainer? Enabling indexing in DB4OManager#supportClassInDB4O, when an instance of ObjectContainer already exists, do not take any effect on the current session but only the next time you open it (see http://developer.db4o.com/Resources/view.aspx/Reference/Tuning/Indexing, 2nd Paragraph).
Almost all the configuration options of db4o must be provided before opening the ObjectContainer: that means that it's very important to access the ClassMetaData in ConnectionFactoryImpl#getNewConfiguration() To do so, I'm using the following code: {code} protected Configuration getNewConfiguration() { AbstractStoreManager sm = (AbstractStoreManager) omfContext.getStoreManager(); MetaDataManager mdm = sm.getMetaDataManager(); Collection<String> cmd = mdm.getClassesWithMetaData(); for (String className : cmd) { ClassLoaderResolver clr = null; AbstractClassMetaData metadata = mdm.getMetaDataForClass(className, clr); AbstractMemberMetaData[] fmds = metadata.getManagedMembers(); for (AbstractMemberMetaData member : fmds) { //TODO check is PK } } ... {code} Question: How do I get the ClassLoaderResolver? I got the ClassLoaderResolver as follows:
ClassLoaderResolver clr = omfContext.getClassLoaderResolver(getClass().getClassLoader()); I hope it's right. With the UniqueFieldValueConstraint the uniqueness of identity is preserved and the performance are much better: 5454 vs. more than 25000 ms with the previous method to persist 5000 simple objects. In case of duplicate IDs, the following exception occurs on the commit of the transaction: Exception in thread "main" com.db4o.constraints.UniqueFieldValueConstraintViolationException: class: org.datanucleus.store.db4o.SimpleObject field: id at com.db4o.constraints.UniqueFieldValueConstraint$1.ensureSingleOccurence(UniqueFieldValueConstraint.java:60) at com.db4o.constraints.UniqueFieldValueConstraint$1.onEvent(UniqueFieldValueConstraint.java:91) at com.db4o.internal.events.Event4Impl.onEvent(Event4Impl.java:76) at com.db4o.internal.events.Event4Impl.trigger(Event4Impl.java:70) at com.db4o.internal.events.EventPlatform$5.run(EventPlatform.java:55) at com.db4o.internal.events.EventPlatform$8.run(EventPlatform.java:90) at com.db4o.foundation.DynamicVariable.with(DynamicVariable.java:47) at com.db4o.internal.events.EventPlatform.trigger(EventPlatform.java:88) at com.db4o.internal.events.EventPlatform.triggerCommitEvent(EventPlatform.java:53) at com.db4o.internal.events.EventRegistryImpl.commitOnStarted(EventRegistryImpl.java:107) at com.db4o.internal.LocalTransaction.dispatchCommittingCallback(LocalTransaction.java:116) at com.db4o.internal.LocalTransaction.commit(LocalTransaction.java:91) at com.db4o.internal.LocalTransaction.commit(LocalTransaction.java:85) at com.db4o.internal.LocalObjectContainer.commitTransaction(LocalObjectContainer.java:701) at com.db4o.internal.LocalObjectContainer.close2(LocalObjectContainer.java:70) at com.db4o.internal.ObjectContainerBase.close1(ObjectContainerBase.java:343) at com.db4o.internal.ObjectContainerBase.close(ObjectContainerBase.java:325) at org.datanucleus.store.db4o.ConnectionFactoryImpl.closeObjectContainerForObjectManager(ConnectionFactoryImpl.java:176) at org.datanucleus.store.db4o.DB4OManager$1.objectManagerPreClose(DB4OManager.java:158) at org.datanucleus.ObjectManagerImpl.close(ObjectManagerImpl.java:796) at org.datanucleus.jdo.JDOPersistenceManager.close(JDOPersistenceManager.java:270) at eu.kostia.commons.jdo.PersistenceManagerUtil.commitAndClose(PersistenceManagerUtil.java:30) at org.datanucleus.store.db4o.SimpleObjectDAO.testPersist(SimpleObjectDAO.java:46) at org.datanucleus.store.db4o.SimpleObjectDAO.main(SimpleObjectDAO.java:55) I'll release the patch this week... Some comments :-
1. To comply with JDO spec, at the point of insert of the new object it needs to throw the exception (JDODatastoreException). We can't just throw DB4O specific exceptions. 2. Sometimes the metadata for a class is not known at the point of opening the ObjectManager, hence DB4O's requirement of having all of the info up front before opening the ObjectContainer is not workable. e.g user calls pm.makePersistent() on a class and this class hasn't been met before, so DataNucleus goes and finds the metadata at that point. It was too late for your up-front registering of classes for unique fields. The first point is no problem, I will throw a JDODatastoreException.
About the second one, that makes the things a lot more complicated! Is it possible to load all metadata when DataNucleus is started? or it would be too bad for the performance? Meanwhile I've posted a question in the db4o forum (http://developer.db4o.com/forums/thread/55397.aspx) to ask if there is a way to add a UniqueFieldValueConstraint when the ObjectContainer is already open. Do you see any other solution to this problem? Load all metadata at startup? Some people may use "persistence-unit" and this will load metadata at startup. Many people won't do that. DN will not be changed to load all in that situation since it makes no sense; lazy loading of metadata is a very useful feature; and there are many situations where a user wants to persist a subset of their model classes. Obviously you could just add your code to the getNewConfiguration() but you would still have to check in insertObject for the classes that weren't registered with the UniqueFieldValue, and if not yet set then use the current check on object duplication.
I use "persistence-unit", therefore I hadn't yet experience the problem... ;-)
Anyway your suggestion is a good alternative, but how do I check with db4o what contraints are set on a class and its fields? II post also a question on db4o forum: http://developer.db4o.com/forums/permalink/55420/55420/ShowThread.aspx#55420 It seems, it's not possible to check what constraints are set on a class and its fields (see the above db4o forum).
Maybe we can verify if a "persistence-unit" is used and only in this case use the "new" way. What do you think about? I know, this isn't the cleanest solution, but anyway I think it's a valuable workaroung, because the performance become much better (6x). To verify if a "persistence-unit" is used, one can just check if the property "javax.jdo.option.PersistenceUnitName" is set, something like: omfContext.getPersistenceConfiguration().getStringProperty("javax.jdo.option.PersistenceUnitName") != null isnt'it? The PMF prop "javax.jdo.option.PersistenceUnitName" may not be the best to check, more like "datanucleus.PersistenceUnitName" since that is used internally. Apart from that, fine.
Ok, I'll attach the patch to this issue on the weekend.
Posted patch (created with "diff -u") with the discussed changes.
Thx for the patch. Comments :
1. can you just provide a patch with the things actually changed. This has changes in formatting, line lengths etc hence seeing what is actually changed is hard to analyse. My guess is that half of the patches would disappear if you retain the same formatting as the original. e.g any change at all in DB4OPersistenceHandler ? 2. You move everything to do with registering a class for use with DB4O into the ConnectionFactory so it now assumes that the class is already known about (or that's how I read it from trying to work out the change). It previously allowed for classes being registered on-the-fly when they were discovered. Not at all. Answers:
1. you're right. I've attached a new patch where the formatting remains as the original. 2. I've undo this change. I've moved the method supportClassInDB4O(.) from DB4OManager to ConnectionFactoryImpl because I knew that most of the configuration settings in db4o should be set before opening an ObjectContainer/ObjectServer. But now after having better read the javadoc of com.db4o.config.Configuration I've experienced that generateVersionNumbers, cascadeOnUpdate, cascadeOnDelete and indexed may be applied to an open object container. Unfortunately is not the case for UniqueFieldValueConstraint. This constraint is applied in getNewConfiguration() Other info: I've found three new small problems in DB4OPersistenceHandler#findObject(.): they are marked a comment with my name. Besides I wanted to run the tests but I could not, because I get this error in my workspace: "Build path contains duplicate entry: 'core' for project 'test.framework'" after checking out using the File for Subeclipse (as explained in http://www.datanucleus.org/development/building_eclipse.html). I've also asked for help in the forum. Around line 585 of DB4OPersistenceHandler findObject() makes no sense. How can an "id" be an exception? An id for DB4O can only be DatastoreUniqueOID. It can't be anything else. If you have an exception there then that is not the place to put a check for it.
[Also you should not be referring to JDOExceptions in datastore code; use NucleusException]. The test suite works for me in Eclipse (3.4, on Linux). Do a clean all projects; eclipse building is just crap sometimes. There is no duplicate entry using SVN code. Parts of patch not affected by the above comments are applied to SVN trunk. Corrected formatting to match DN code conventions as per http://www.datanucleus.org/development/coding_standards.html and Eclipse conventions in http://datanucleus.svn.sourceforge.net/viewvc/datanucleus/code-conventions-eclipse.xml?revision=6402&view=markup
Also updated "local.repository" so that com.db4o.constraints is available for use Try with this testcase (id_is_exception_test.zip) and set a brekpoint on:
if (id instanceof DatastoreUniqueOID) in DB4OPersistenceHandler#findObject(ObjectManager, Object). It makes no sense, but in this case "id" is an instance of javax.jdo.JDOFatalInternalException with this error message: The key value passed to construct a SingleFieldIdentity of type "javax.jdo.identity.LongIdentity" for class "org.datanucleus.test.Person" is of an incorrect type ("java.lang.String") - should be "Long". Just post the stack trace that called findObject with an Exception and where the Exception was generated (its stack trace). That is where the situation is caused.
Oh, and since it has no bearing on whether to check for existence of app id objects, then it is a separate issue
Here it is (stacktrace of id_is_exception_test.zip):
Exception in thread "main" java.lang.NullPointerException at org.datanucleus.store.db4o.DB4OPersistenceHandler.findObject(DB4OPersistenceHandler.java:603) at org.datanucleus.ObjectManagerImpl.findObject(ObjectManagerImpl.java:2330) at org.datanucleus.jdo.JDOPersistenceManager.getObjectById(JDOPersistenceManager.java:1675) at org.datanucleus.jdo.JDOPersistenceManager.getObjectById(JDOPersistenceManager.java:1771) at org.datanucleus.test.Main.main(Main.java:45) At line 603: if (cmd.getIdentityType() == IdentityType.APPLICATION) cmd is null because a line above "className" was null due to "id" being an instance of JDOFatalInternalException. And the actual Exception though (and *its* stack trace), cos that says when it is created. The only thing that should happen with a user putting in crap is StupidUserException being thrown back, and that is for *all* datastores, not just DB4O. Hence no code in DB4O plugin gets changed for it
Ok, I'll open a a separate issue for this problem (when an user put an ID of a wrong type)
Done: the improvement for the exception handling is discussed in
Thanks for having applied my patch. As for me this issue may be closed.
Moved comments from
================================================ You want to persist a class with a collection, for example: class Person { ... private Collection<Address> address = new LinkedHashSet<Address>(); ... } The class Address use a getter in a method called by LinkedHashSet (in this case hashCode): class Address { ... public int hashCode() { return getId().intValue(); } ... } When you persist an instance of Person, whose Collection of Addresses had correctly set IDs, the following exception is thrown: java.lang.NullPointerException at org.datanucleus.store.db4o.Address.hashCode(Address.java:56) at java.util.HashMap.put(HashMap.java:372) at java.util.HashSet.add(HashSet.java:200) at java.util.AbstractCollection.addAll(AbstractCollection.java:305) at java.util.LinkedHashSet.<init>(LinkedHashSet.java:152) at org.datanucleus.sco.simple.LinkedHashSet.initialise(LinkedHashSet.java:81) The cause for this exception seems to be, that the lazy loading of the attributes doesn't happen when the method Address#getId() is invoked in Address#hashCode(). I've found the cause! It's a bug in db4o, not DataNucleus and exactly in the UniqueFieldValueConstraint that we introduced with NUCDBFO-37. The solution of this issue is related to http://tracker.db4o.com/browse/COR-1670 ================================================ Moved to 1.1.3 since incomplete. Code in locateObject commented out so cases don't have problems with lazy loading
Note that the store.db4o plugin is now branched for 1.1. SVN trunk is for 2.0, so if you have patches etc please provide for which branch you need it on
No updates will be applied to 1.1 branch
|
|||||||||||||||||||||||||||||||||||||||