Talk:Built in Types

From semanticweb.org
Jump to: navigation, search

Contents

[edit] Electron configuration

I'm not sure about having electron configuration as a built in type. There are a lof of different special types like this. I can see two possibilities for this. Either use several attributes of simple datatypes like string, character and integer or a user defined composite datatype. Here I'll outline a possible syntax for these two alternatives.

Alternative 1: If a simple datatype is used, the electron configuration attribute would have to be split up in several attributes to allow for flexible search.

A carbon atom has several atomic orbitals which specify its electron configuration. A possible syntax might be something like

[[has many::atomic orbitals::{{n=1}{l=s}{electrons=2}, {n=2}{l=s}{electrons=2}, {n=2}{l=p}{electrons=2}}]]

A way to make it recognize what this means would be to define two different types of attributes, normal attributes and instance attributes. The article called "atomic orbital" could then define normal attributes which would be valid for all atomic orbitals, but also instance attributes which would be defined but not used in the atomic orbital article itself, but used to decribe special instances of a atomic orbitals in articles about atoms. The instance attributes could be declared at the bottom of the atomic orbital article. Example syntax:

[[Attribute:n:=int]][[Attribute:l:=char]][[Attribute:electrons:=int]]

This could mean that other articles could then declare an instance of these attributes using the syntax

[[has one::atomic orbital::{n=value}{l=value}{electrons=value}]]
if it had one atomic orbital

or

[[has many::atomic orbital::{n=value}{l=value}{electrons=value},{n=value}{l=value}{electrons=value},
{n=value}{l=value}{electrons=value}]] 

if it has many.

Alternative 2: If userdefined composite datatypes are used instead you could instead define a single electronconfiguration attribute which would then have all the atomic orbitals built in. The syntax for defining electron configuration for a particular element could then be [[Electron Configuration:={n=3}{l=d}{electrons=6}, {n=4}{l=s}{electrons=2}]].

The electron configuration attribute would then need to have a special electron configuration datatype. This might be defined as

 
[[datatype:name:(element1=datatype, element2=datatype, element3=datatype...)]]

where datatype could be for example string, integer, character, complex number etc.

An example: Datatype definition

[[datatype:Colordatatype:(color1=string,color2=string)]]

Attribute definition

[[datatype::Type:Colordatatype]]

Use of this attribute in an article about an object with two colors

[[Colorattribute:={color1=red}{color2=blue}]]

So to summarize with how this might be used for electron configuration:

Datatype definition

[[datatype:Electronconfiguration:({n=int}{l=char}{electrons=int})]]

Attribute definition:

[[datatype::Type:Electronconfiguration]]

Use in an article about an atom

[[Electron Configuration:={n=3}{l=d}{electrons=6}, {n=4}{l=s}{electrons=2}]]

The electron configuration datatype was specified as a triple representing an atomic orbital and in this example there are two atomic orbitals separated by a comma.

Some notes though, say you wanted to define the datatype vector using this, it might be something like

[[datatype:Vector:(x=float)]]

The syntax should allow for using elements like this {value},{value},{value}, instead of {name1=value},{name2=value}... because if this was used in, say an attribute weirdvector of datatype vector:

[[Weirdvector:={1},{5},{3},{7},{8},{9}]]

would make more sense than

[[Weirdvector:={x=1},{x=5},{x=3},{x=7},{x=8},{x=9}]]

Now if you have managed to read through all this, feel free to comment :) Fuelbottle 23:21, 18 November 2005 (CET)

If I can sum this up, you argue for having composite datatypes and provide various syntactic variants how one could employ these in articles, and how users could even declare new composites based on existing types. I mostly agree with this, though one can still discuss about the syntax. The problem is not that the current system does not support composites (splitting a given value along the { } is not hard to do). But the current storage architecture is not really ready to cope with composites. The reason is given in the article (in short: composites in RDF are modelled by creating small trees: article->composite_root, composite_root->value1, composite_root->value2, ... -- but this causes problems when clearing the database from the data that was given in one article, since one now has to delete some data triples composite_root->... that do not refer directly to the article). So we would first have to support composites within the database in an efficient way (which will probably be needed for various applications anyway).
I would not write "has one" and "has many". If you have more than one, just give two annotations: [[atomic orbital:={n=1}{l=s}{electrons=2}]] [[atomic orbital:=:{n=2}{l=s}{electrons=2}]]. No extra syntax here. If you need an order of orbitals, then you have to make various attributes ("atomic orbital 1", "atomic orbital 2", ...). In general, I would prefer to give many semantic statements instead of packing everything into a single one. So one could also write [[hasDataField:=n:integer]], [[hasDataField:=l:integer]] in an article of an attribute, to state that the attribute has at least these two data fields (note that hasDataField in this case is just a special attribute of attributes, and "identifier:type" is a special format of the type of this attribute -- so we do not introduce too much new syntax). I also would use "," to separate values ([[atomic orbital:= n=2, l=s, electrons=2]]), since it looks less technical to me.
It might be better to move such discussions to our mailing lists (see our Sourceforge page), since talk edits are easily overlooked in the wiki. --Markus Krötzsch 20:15, 19 November 2005 (CET)

[edit] Feedback from User:Max_Völkel

The following Discussion was taken out of the Article.

User:Max_Völkel: I am not sure, if I understood that correctly. So I will just state our view on this. The data model we have looks a bit like this:

  • page (London)
    • relation type (Relation:located_in)
      • target page (England)
    • attribute type (Attribute:surface_area)
      • value (3453)
      • optional: unit (square miles)

for attribute types, we must link them to a type (Type:area). For each type, we provide a hardcoded mapping from the strings a user can enter "345345 sqm", "24243 sqkm", "3242 meters^2" ... to a predefined standard unit, in this case probably "square kilometers". We thus have to handle

  • different names for the same units (sqm, square miles)
  • different units for the same quantity (miles^2, km^2).

--Max Völkel 20:10, 18 November 2005 (CET)

What I'm talking about is, when a user has not defined the Type of an attribute (Attribute:has area, typeof, Type:area). In that likely case it should be possible to parse at least the common types anyway. Even Wikipedians have an high understanding and interest in how things work, the use should be as easy as possible.
btw: I would use "square meters" not "square kilometers" - the only exception for using the kilo-prefix is the mass.
MovGP0 02:55, 19 November 2005 (CET)

[edit] Strings

User:Max Völkel: the syntax you use completely forgets about the relationship between the page and the data value. This would lead to a statement (London, String, "Big Ben"). This is not very useful for machines. Fall-back to strings if the units are not understood or the attribute is unknown is of course the right way to go.

I'm thinking in that case on
[[Attribute:features]]
in the Article San Diego. Such a Attribute containing "many beaches" is just a string and only meaningful to humans. If you would instad use
[[has attraction:=beach|beaches]]
it get a meaning, because this links Attribute liks directly to the Article Beach. But then it would rather be a Relation and not a Attribute.
[[has attraction::beach|beaches]]
. So far as I've seen Attributes are not meant to be really meaningful to machines - there are just values you can seach for by defining limits.
MovGP0 03:32, 19 November 2005 (CET)

'Features' is still something else then, let's say, 'problems', so a user right now could look for all cities that have 'beach' included in a string with the relation 'features'. I must admit, 'features' is a quite dumb attribute. --86.42.0.170 13:12, 20 November 2005 (CET)

[edit] Numbers

User:Max_Völkel: Again, we want to know that (London, has surface area, 1234 sqm) and not just (London, Number, 123). We really need the relation :-)

(Electron, has Spin, 0.5); the datatype is a simple float. I recomment the use because a float needs fewer space in the Database and fewer processing power (because all calulations are numerically) than the Dimension-Type I've described below. MovGP0 02:42, 19 November 2005 (CET)

[edit] Dimension

Feedback: Ok, although this looks like a nice feature, it will not be so easy to implement this. Also, result is then stored in a special Database witch structure is optimized for handling Dimensions - this is not trivial. Would you like to contribute to this? Are you familar with RDF? Have you read our paper? Is this the begin of a great collaboration? :-) --Max Völkel 20:14, 18 November 2005 (CET)

I'm familar with RDF and SQL - but not with Java or PHP or RDF-Database-Applications - just C#. Also I won't have time for programming in my freetime when my holidays end next week.
MovGP0 02:37, 19 November 2005 (CET)

end of move

I'm sure we can map this Datatable to RDF, but a SQL Database is far more efficient when speaking about speed and resources. Personally I would use the RDF store to store just the relations and simple types like integers, floats, or strings. Instead of storing complex Datatypes like "Dimension" in RDF as Expression trees, you can store just a unique ID witch maps to a Datarow in the SQL-Database. Complex Dimensions witch are not fitting into the Database-Layout I've described can keept as String or when possibly as Expression tree - but Expressiontrees are very hard to be seach in, even they are simplified to a common syntax (therefore is no "x/y" but only the general form "x*y^(-1)").
MovGP0 03:23, 19 November 2005 (CET)

[edit] OWL Schema for Dimension Type

This is the OWL-schema for the Database I've described above.

<?xml version="1.0"?>
<!DOCTYPE owl [
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY owl "http://www.w3.org/2002/07/owl#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY DimensionType "www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#">
]>
<rdf:RDF
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:owl="http://www.w3.org/2002/07/owl#"
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:DimensionType="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#"
 xml:base="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType"
>
<owl:Ontology rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType">
<rdfs:label>Dimension Type</rdfs:label>
<owl:versionInfo>0.1</owl:versionInfo>
</owl:Ontology>
<owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#float">
</owl:Class>
<owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
</owl:Class>
<owl:Class rdf:about="http://www.w3.org/2002/07/owl#Thing">
</owl:Class>
<owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
  <rdfs:subClassOf>
    <owl:Class rdf:about="http://www.w3.org/2002/07/owl#Thing">
    </owl:Class>
  </rdfs:subClassOf>
</owl:Class>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#A">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of the Amperes</rdfs:comment>
  <rdfs:label>Ampere</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#K">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:label>Kelvin</rdfs:label>
  <rdfs:comment>A integer storing the Exponent of the Kelvin Part</rdfs:comment>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#cd">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of the Candela part</rdfs:comment>
  <rdfs:label>Candela</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#complex">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#float">
    </owl:Class>
  </rdfs:range>
  <rdfs:label>The complex part</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#kg">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of the Kilogramm Part</rdfs:comment>
  <rdfs:label>Kilogramms</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#m">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of of the meter part</rdfs:comment>
  <rdfs:label>Meter</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#mol">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of the Kelvin Part</rdfs:comment>
  <rdfs:label>Molar Mass</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#real">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#float">
    </owl:Class>
  </rdfs:range>
  <rdfs:label>The real Part</rdfs:label>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#s">
  <rdfs:domain>
    <owl:Class rdf:about="www.onotowiki.org/www.onotowiki.org/www.onotowiki.org/DimensionType#Dimension">
    </owl:Class>
  </rdfs:domain>
  <rdfs:range>
    <owl:Class rdf:about="http://www.w3.org/2001/XMLSchema#integer">
    </owl:Class>
  </rdfs:range>
  <rdfs:comment>A integer storing the exponent of the second part</rdfs:comment>
  <rdfs:label>Second</rdfs:label>
</owl:ObjectProperty>
</rdf:RDF>

Well, I'll also need to look at other schemes representing mathematical expression trees and structures.
MovGP0 19:07, 19 November 2005 (CET)

[edit] Unit Conversion in the CVS

I've taken a quite look at the file "SMW_Datatype.php" - even I don't speak PHP I can guess the meaning.

The code looks like:

function SMWConvertGeographicLength($value, $unit) {
	$result=Array();
	// Input: convert the unit to the main unit
	switch ( $unit ) {
		case '': case 'm': case 'meters': case 'metres':
			$mainval=$value; 
			$result['UNIT']='m'; 
			break;
		case 'km': case 'kilometers': case 'kilometres':
			$mainval=$value*1000; 
			$result['UNIT']='km'; 
			break;
		case 'mi': case 'ml': case 'miles': case 'mile':
			$mainval=$value*1609.344; 
			$result['UNIT']='miles'; 
			break;
		default: //unsupported unit
			$result['ERROR']='Warning: Unit "'.$unit.'" is not supported for this attribute.';
			return $result;
	}

This casing structure seems not appealing to me. Instead I would recommend to have a XML configuration file containing:

<conversions>
  <unit name="km" factor="1000" targetunit="m" />
  <unit name="kilometers" factor="1000" targetunit="m" />
  <unit name="kilometres" factor="1000" targetunit="m" />
  <unit name="mile" factor="1609.344" targetunit="m" />
  <unit name="mi" factor="1609.344" targetunit="m" />
  <rdf:comment>uncommented because of Milliliter [ml]; but there is mi already</rdf:comment>
  <!--<unit name="ml" factor="1609.344" targetunit="m" />-->
  <unit name="miles" factor="1609.344" targetunit="m" />
  <unit name="mile" factor="1609.344" targetunit="m" />
  <unit name="°C" factor="1" offset="-273.15" targetunit="K" />
</conversions>

Then we can state something like the following (C# like) pseudocode does:

// we assume here that value is already handeled and evaluated by a numeric parser
public void SubstitudeUnits(ref float value, ref string units, XmlFile file)
{
   foreach (unit in file.conversations)
   {
      if(units.Contains(unit.name))
      {
         // replace the unit with the corresponding SI-Type
         units.ReplaceAll(unit.name, unit.targetunit); 

         // scale the unit relative to the basic SI-Unit
         value *= unit.factor; 

         if(unit.offset != null)
            value += offset;
      }
   }
}

Note: that code doesn't take care about exponents and others, but I guess the meaning is clear.

MovGP0 21:48, 19 November 2005 (CET)

We got close to this for simple linear units in SMW 0.5, see Help:Custom units. If a Type:Length page just has a set of [[Converts to:=1 m, meters, metres]] [[Converts to:=100 cm, centimeters]] ... special properties, then SMW can infer that "foo"'s main unit is 'm' and attribute values can also be given in centimeters. It's not XML but it's in Wiki pages, which is a big win (thanks Markus). What's missing is your proposed offset support, so e.g. Type:Temperature still has to be written in PHP. -- Skierpage 01:52, 14 September 2006 (CEST)

I would suggest any code developed here should try to be Object Oriented after all we are already dealing with classes, objects, relations, attributes, conversions are functions i.e. methods. Using this formalism would map 1-1 and allow special cases like exponents etc to be dealt with more easily. Also you could import or even export the xml from the object as needed. For example:

class Unit
{
  //attributes
  String name;
  Float factor;
  String targetUnit;

  //methods 
  void substitute(); //details would be similar to the code above but now it is "belongs" to a proper class
  void importXML(); //updates the values of the attributes
  void exportXML(); //from attribute values to XML file
};

class MetricUnit : Unit //inheritance
{
  //etc
};

No matter which language is used, most modern ones have OO capability now including php, perl, JavaScript etc, so it should not be too difficult. A lot of the relations such as is-a, part-of would also be implicit in the notation i.e. MetricUnit is a Unit

[edit] coordinates?

The page says: Types can be ... "global coordinates" (where one uses specific syntax to denote two values of longitude and latitude).

I don't see an attribute or type for this in Category:Attribute. I see someone added Type:Box from Dublin Core, which seems pretty complex for existing coordinates, e.g. the latitude/longitude of http://en.wikipedia.org/wiki/Berlin.

There is already Wiki syntax {{coor}} and possibly <geo> for coordinates, see http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Geographical_coordinates. Can this be reused? --Skierpage 06:29, 24 January 2006 (CET)

There are more such projects which would be better wait for a good semantic database. But there are also existing tables in the database which might be better stored semantically like the Redirect-, Backlink-Tables.
In your case it seems to be mostly about writing a fitting Datatypehandler - the rest is just a change in the Template.
MovGP0 21:40, 9 May 2006 (CEST)

should include support for coordinates on other planets, e.g. Mars. Many geological features on Mars have their coordinates stated in their wikipedia articles. Also the existence of google mars makes this a practical application.

Personal tools
Namespaces

Variants
Actions
Navigation
services
Toolbox