Talk:RDF2Go

From semanticweb.org.edu
Jump to: navigation, search

Note: We get too much spam here. Please use our mailing list for feedback.


Open/Close

issue: when you have a bigger application and pass ModelSets around. One service "Manager" may manage the ModelSet, many services access and use it. Underneath is a Jena or Sesame model, kept hidden by the Manager. The "Service" then call something like Manager.getModelSet() and work happily on the model.

  • all services share the same modelset instance
  • what if the model is open or not? should the services open/close it?

At the moment, we are doomed, because what happens is this:

// there is only one instance of modelset, wrapping the database
Manager.getModelSet() {
 if (modelset==null) modelset = ModelFactory......doyourthing;
 return modelset;
}
// Service get it and do things:
Service.doThing() {
 ms =  Manager.getModelSet();
 boolean wasopen = ms.isOpen();
 if (!wasopen)
  ms.open;
try {
 ms.........dothethingthatclientsusuallydo
 ms....do another thing that takes 20 seconds
 ms.... is the modelset still open or did somebody else close it? who knows....
} finally {
 if (!wasopen)
  ms.close;
 }
}

Even trickier is this:

myModel = Manager.getModelSet().getModel(x);
// hm, was the modelset open?
// is myModel open now? don't know, lets open it
myModel.open()

// What if somebody else already asked for x and the ModelSet buffered the Model instance for x, and returned the same instance twice. or what if I ask again....

myModel = Manager.getModelSet().getModel(x); // what do I get now? is it open or closed?
myModel.isOpen() .... pretty undecideable now.
myModel.open();
myModel.add()....
myModel.close(); // ok, did the first model now also close?

etc etc.

As you can easily see, this happy opening/closing of the modelset can cause havoc, if two clients use the same modelset and run in parallel. But they don't even have to run parallel, i can imagine this is tricky in any environment.

o/c: Models Left Open

  • idea: establish a best practice recommendation: all modelsets have to be open all the time and have to be left open once opened, if someone else wants to access them in the future. A Model or ModelSet passed to you in a method parameter is assumed to be open already. The object that creates a ModelSet is responsible for opening and closing it. Calling close/open to flush bulk operations is ok but not needed, as Bulk operations are implemented in update(Diff).
    • idea: calling ModelSet.getModel(x) will return an open model, if the modelset was already open. (it should fail anyway if the modelset was not open).
    • idea: Closing the ModelSet will automatically close all models returned by getModel().
      • pro: probably quite handy when shutting down an application
      • issue: this may cause deadlocks and dangling connectsions on shutdown (it is not possible for a graceful shutdown then), because of sesame2 connection policy.
    • pro: rdf2go should stay simple
    • pro: that wouldnt change too much code

The code would change to this:

// there is only one instance of modelset, wrapping the database
Manager.getModelSet() {
 if (modelset==null)  { 
   modelset = ModelFactory......doyourthing;
   modelset.open();
 }
 return modelset;
}
Manager.shutdown() {
 if (modelset != null)
 { modelset.close(); // also closes any opened models
   modelset = null;
 }
}
// Service get it and do things:
Service.doThing() {
 ms =  Manager.getModelSet();
 ms.........dothethingthatclientsusuallydo
 ms....do another thing that takes 20 seconds
 ms.... the modelset stays open...
}

o/c: create independent ModelSet instances

  • idea: creating new ModelSets everytime clients call Manager.getModelSet(),

so that every service runs on a different modelset.

      • con: that will render the ModelSetListener useless (it cannot listen to updates) and create many modelsets.
  • idea: create some kind of "session" from a ModelSet, like Sesame2 or JDBC does:
ModelSetSession s = ModelSet.open();
s.doyourthing.
s.close();
    • con: that would destroy the simplicity and benefit of RDF2Go.

o/c: reference counting

  • idea: use the reference counting technique, and establishing a policy on how to handle open/close calls.
    • pro: (leo) everyone can then call open/close with try..finally clauses
    • con: (leo) adds complexity and LinesOfCode, everybody *should* then call open/close
    • explanation:

Basically each open call will do a +1 on a counter and each close call will do a -1. A model is actually open when the counter passed from 0 to 1 and closed when the counter goes from 1 to 0. An usage policy and good documentation should do the rest.

This approach is used in the Mac OS X Cocoa framework, though it concerns memory allocation.

The policy is described here: http://developer.apple.com/documentation/Cocoa/Conceptual/MemoryMgmt/Tasks/MemoryManagementRules.html

Then they have this special AutoreleasePools in order to handle special cases like factory methods, where you cannot call "release" on the created object before returning (otherwise it would be destroyed), but cannot call it after returning as well because you loose the reference to it.

  • idea: abandon auto-commit. Changes are only stored on "close()"
    • con: will cost some simplicity
    • con: lead to migration pains.
    • con: (leo) makes everything more complicated and is a guaranteed source for bugs

Open/Close in Sesame

SailConnection.close: "Any updates that haven't been committed yet will be rolled back."

"Care should be taken to properly close SailConnections as they might block concurrent queries and/or updates on the Sail while active, depending on the Sail-implementation that is being used."

No relevant issues for "open" filed in http://www.openrdf.org/issues/secure/IssueNavigator.jspa?

Quite some issues filed for "close": http://www.openrdf.org/issues/secure/IssueNavigator.jspa?mode=hide&requestId=10030

modelset.close() semant.css

issue: closing a modelset - will it close models returned by ModelSet.getModel() ?

Model wrapping ModelSet when created by ModelSet

Axel Rauschmayer: In general, I would like a Model to be just a "curried" wrapper for a ModelSet (currently it seems to be the other way around: a ModelSet wraps several Models). - Compare: There are resource centric Java APIs where wrapper objects hold a resource and provide the same API as the triple store (minus the "subject" part of each method signature). - Similarly, a Model could just be a triple wrapper for the quad API "ModelSet".

Idea: Make Model a wrapper for ModelSet when created by ModelSet.open(). Implement all methods of AbstractModelSet on the find.csspo) add.csspo) remove.csspo) methods of AbstractModelSet and NOT on the methods of AbstractModel.

  • pro: No more opening and closing.
  • pro: A more solid core API (ModelSet).
  • pro: I have use cases where I need Models to encapsulate *sets* of contexts (and not just a single context). This change makes it

possible, because one would never list the models of a ModelSet, just its context URIs and then instantiate a Model with 1 or more context URIs (varargs...). The Sesame quad API does this very well: Contexts are always a vararg; zero context URIs mean "look everywhere", 1 or more context URIs mean "look at all these contexts".

  • con: all the subclasses of AbstractModelSet would have to be rewritten (bad)
  • con: all the subclasses of AbstractModel would have to be rewritten (bad)
  • Idea: (Leo) make this change when we move to RDF2Go 5.0. Then, the adapters have to be changed anyway.

Lucene integration

Could the NEPOMUK LuceneSAIL be refactored to be a LuceneRDF2GoLayer? Currently triple stores do not support full text queries with e.g. partial string matches.

Leo: this is a thought that could lead into dangerous water. LuceneSail is based on the optimized transaction layer of sesame and the sesame query parser. RDF2Go does not know of transactions nor does it have queryparsers. Pushing the LuceneSail to the RDF2Go layer will bloat RDF2Go to a full RDF API (implementing a sparql parser and transactions), this will result in putting RDF2Go on the same level as Jena or Sesame, which will cause friction and could break the community. RDF2Go should adapt existing frameworks, not reprogram them. lso, the implementation will be slow as cancer if done in rdf2go.

Leo: a better idea would be to standardize the virtual properties, any sesame layer needs the same properties (match, query, score, snippet), these can be shared amongst jena and sesame. Then, any implementation (jena | sesame) will give the same results.

Issue: Load a model from a URL?

  • con: this would force all implementators to have http inside their impl. Instead, a util class could provide this
  • pro: Leo ??? whats the deal??? URL.openConnection() - this is plain stupid java, I see no HTTP there. just works.

Can this deal with following redirects? Broken mime-types? Retries? Encoding issues? Proxy configuration issues?

Whats the deal to have a util class like WebUtil.readFromURL( String url, Model m ) which loads the content into m. And maybe also WebUtil.readFromURL( String url ) which returns a Model, with the content?

leo: works for me

Issue: Auto-boxing of basic types as Literals would be nice

m.addStatement(person,age,12);
m.addStatement(person,age,23.2);
m.addStatement(person,age,true);

Max: what do you want to have here? Support for all primitive java types? Could be added.

Gunnar: Yes - although it's just sugar on top...

Axel: How about static factory methods for literals? This would avoid the combinatory explosion and be almost as pretty.

m.addStatement(person,age,typedLiteral(12));
m.addStatement(person,age,typedLiteral(23.2));
m.addStatement(person,age,typedLiteral(true));